Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Moreover, we show that in certain instances our proposed scaled PCA can improve cluster identification. Rescaling principal component-based distances using MI results in a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals’ genomic mutual information. This entropy-rescaled PCA, while preserving order relationships (along a dimension), quantifies relative distances into information units, such as “bits”. We illustrate the effect of this rescaling using genomics data derived from world populations and describe how the interpretation of results are impacted.
Writing the foreword to a textbook on Antifragility by @mathoncbro et al: ANTIFRAGILITY is not a rediscovery of hormesis, but a mathematical framework uniting classes of phenomena and, centrally, transferring from the dose-response to the probabilistic domains.
Statistical and applied probabilistic knowledge is the core of knowledge; statistics is what tells you if something is true, false, or merely anecdotal; it is the “logic of science”; it is the instrument of risk-taking; it is the applied tools of epistemology; you can’t be a modern intellectual and not think probabilistically—but… let’s not be suckers. The problem is much more complicated than it seems to the casual, mechanistic user who picked it up in graduate school. Statistics can fool you. In fact it is fooling your government right now. It can even bankrupt the system (let’s face it: use of probabilistic methods for the estimation of risks did just blow up the banking system).
Lebanon’s rich history as a cultural crossroad spanning millennia has significantly impacted the genetic composition of its population through successive waves of migration and conquests from surrounding regions. Within modern-day Lebanon, the Koura district stands out with its unique cultural foundations, primarily characterized by a notably high concentration of Greek Orthodox Christians compared to the rest of the country. This study investigates whether the prevalence of Greek Orthodoxy in Koura can be attributed to modern Greek heritage or continuous blending resulting from the ongoing influx of refugees and trade interactions with Greece and Anatolia. We analyzed both ancient and modern DNA data from various populations in the region which could have played a role in shaping the current population of Koura using our own and published data. Our findings indicate that the genetic influence stemming directly from modern Greek immigration into the area appears to be limited. While the historical presence of Greek colonies has left its mark on the region’s past, the distinctive character of Koura seems to have been primarily shaped by cultural and political factors, displaying a stronger genetic connection mostly with Anatolia, with affinity to ancient but not modern Greeks.
(First Draft of the Foreword to Pierre Zalloua’s forthcoming book. For comments.)
Some people believe that the Levant is the end of the East and a portal to the West; others describe it as the end of the West and a portal to the East. Those in the first group tend to belong to the main branches of the Islamic faith, while those in the second belong to various Christian Levantine churches. Now, one might think that the two descriptions are equivalent: an intersection, after all, is an intersection. However, by the same mechanism that generates the so-called ‘narcissism of small differences,’ not only are these two statements not equivalent, but they are, in practice, contradictory. It even took a civil war for the Lebanese to understand this fallacy.
A new version of the paper using entropy-based Principal Components maps for genetic distance (vs Gaussin correlation-based methods). Applied to the PCA of the entire world population, relative distances are markedly different!