Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save mnarayan/9df615e559d9dbeb094b8c83c323f32a to your computer and use it in GitHub Desktop.

Select an option

Save mnarayan/9df615e559d9dbeb094b8c83c323f32a to your computer and use it in GitHub Desktop.
A conversation on genetic ancestry, race and evolution

User:

It is easy to isolate ancestry using Y haplotype and mitochondrial DNA. How is this different from ancestral admixture and population stratification as studied by geneticists and GWAS studies. And how does genetic ancestry and such lineages differ from social conventions of defining race?

What makes external features of ethnicity identifiable? Why is evolution of such traits similar to or different from other traits that define human cognitive traits? Is it fair to say tha rare denovo variants might get entrenched within a genetically isolated population within the last 2000-3000 years; whereas common traits that confer abilities common to the human cognitive development are far holder ~ evolved under selection pressures over 100s of thousands of years and are relatively stable? Agree or Disagree?

Write an essay. It should be rigorous and thorough and engage with misconceptions that PhD level readers outside of expertise of Broad Institute / EMBL level geneticists.


Gemini Pro 2.5

Untangling Ancestry: From Simple Lines to Complex Webs and Human Categories

The surge in direct-to-consumer genetic testing has popularized the idea of tracing one's "ancestry." Often, this journey begins with Y-chromosome (Y-DNA) and mitochondrial DNA (mtDNA) results, which seem to offer clear, direct lines back through paternal or maternal ancestors. These findings provide satisfying narratives, linking individuals to specific haplogroups potentially shared with historical populations. Yet, this neat picture stands in contrast to the intricate, probabilistic world of ancestral admixture and population stratification—concepts central to population genetics and crucial for interpreting Genome-Wide Association Studies (GWAS). Understanding how these different ways of knowing ancestry relate to each other, and how they intersect with the conventional ways humans categorize group differences, is essential for navigating the complex landscape of genetics and identity. It also illuminates fascinating questions about the different evolutionary paths taken by various human traits.

Y-DNA and mtDNA derive their perceived clarity from their unique inheritance patterns. Y-DNA is passed almost entirely from father to son, while mtDNA passes from a mother to all her children, but only daughters transmit it further. Crucially, neither undergoes significant recombination, the process that shuffles genetic material from both parents across the rest of our chromosomes (autosomes). This lack of shuffling allows them to act like stable markers, tracing single, unbroken lines of descent deep into the past. Identifying a specific Y or mtDNA haplogroup provides a verifiable link to one specific ancestor in each generation going back millennia. However, the precision of this lineage tracing comes with profound limitations: these two molecules represent an infinitesimally small fraction of a person's total genetic inheritance. Ten generations back, an individual has 1024 ancestors, but Y-DNA and mtDNA track only two of these ancestral paths, offering no information about the genetic contributions of the other 1022.

This contrasts sharply with the study of ancestral admixture and population stratification using the autosomal genome. Our autosomes do recombine with each generation, creating a complex mosaic reflecting contributions from all our recent ancestors. Admixture analysis uses statistical methods to estimate the proportions of an individual's genome that best match reference panels representing populations from different geographic regions (e.g., assigning percentages to ancestral components associated with Europe, Africa, Asia, etc.). It provides a probabilistic overview of one's recent ancestral origins, reflecting historical migrations and mixing between populations. Population stratification refers to the non-random distribution of genetic variants across different subgroups, resulting from historical factors like geographic separation, mating patterns, random genetic drift, and differing selective pressures. Understanding and mathematically correcting for stratification is vital in GWAS, which aim to link specific genetic variants to traits or diseases. Without accounting for population structure, researchers risk finding spurious associations—mistaking correlation due to shared ancestry for a causal link between a variant and a trait. Thus, while Y/mtDNA provide deep but narrow lineage information, autosomal analysis offers a broader, more recent, statistically inferred picture of overall ancestry, critical for understanding population history and conducting rigorous genetic research.

How does this genetically inferred ancestry relate to the common ways societies group people, often referred to using terms like "race" or "ethnicity"? Genetic data clearly shows that human populations exhibit patterns of variation that correlate with geography. Individuals whose ancestors lived predominantly in one region for many generations tend to share more genetic similarities with each other than with individuals whose ancestors lived further away. This reflects shared demographic history, migration patterns, and adaptation. These genetic patterns often align, to some extent, with the observable physical differences (phenotypes) that humans have historically used to form group categories.

However, equating these genetic patterns directly with conventional categories like race is an oversimplification of both the genetic data and the nature of these categories. Firstly, human genetic variation is predominantly continuous and clinal. Allele frequencies shift gradually across geographic space, lacking the sharp, discrete boundaries implied by rigid categorical systems. There are no known genes exclusively present in one major continental group and absent in all others. The vast majority of human genetic diversity exists within any local population or conventionally defined group, rather than between such groups. Secondly, the categories themselves often rely on a small subset of highly visible traits (like skin pigmentation or hair texture) that do not reflect the totality of genetic variation. Thirdly, admixture is a pervasive feature of human history; many individuals and populations possess mixed ancestry that defies simple classification into predefined boxes. While genetic ancestry data provides statistical inferences about likely geographic origins based on genomic patterns, conventional racial or ethnic categories represent classification systems developed within societies based on observable traits, shared history, culture, and lineage, which map imperfectly onto the underlying genetic continuities and complexities. They function as heuristic categories for social interaction and identification, but their boundaries and definitions often lack precise correspondence with the granular structure revealed by genomics.

This leads to considering how different types of traits evolve. What makes external features often associated with geographic origin or ethnicity (skin color, hair form, certain facial features) identifiable? These traits are often influenced by a relatively small number of genes, sometimes with large individual effects. They appear to have been subject to significant selective pressures, often related to climate, diet, or pathogen exposure in specific geographic regions (e.g., UV radiation and skin pigmentation), or potentially shaped by mate choice preferences. Because the genetic basis can be simpler and the local selective pressures strong, these traits can diverge noticeably between populations over timescales of tens of thousands of years, contributing to the visible variation observed today.

The evolutionary history of human cognitive traits appears substantially different. The fundamental capacities for complex language, abstract reasoning, long-term planning, and intricate social learning form the bedrock of human uniqueness. These traits are understood to be highly polygenic, meaning they are influenced by the combined small effects of thousands of genes. The core genetic architecture enabling these abilities is likely the product of intense selection acting over hundreds of thousands, possibly millions, of years, long predating the relatively recent human dispersals that established modern patterns of population structure. While genetic variation certainly exists within populations influencing individual differences in cognitive functions, the basic toolkit enabling complex cognition is thought to be deeply conserved and shared across the entire human species. Modifying this fundamental, complex architecture would likely require coordinated changes across a vast network of genes, representing a much slower evolutionary process compared to adjustments in traits like pigmentation.

Therefore, the proposition that rare, de novo variants can become entrenched in isolated populations over recent millennia (e.g., 2000-3000 years) seems plausible. Genetic drift and founder effects can indeed cause new or initially rare mutations to increase in frequency within specific groups relatively quickly, contributing to local genetic distinctiveness and sometimes influencing health profiles. This is a well-understood mechanism of microevolutionary change.

Conversely, the common genetic variants underlying the fundamental aspects of human cognitive development are almost certainly much older, reflecting the long-term evolutionary trajectory of our species. These core abilities were likely established and stabilized under selection pressures operating over vast timescales, long before the divergences that created recent population structures. While human populations have visibly adapted to diverse environments relatively recently, the fundamental cognitive architecture enabling our species' global success appears remarkably stable and universal, reflecting a shared, ancient evolutionary heritage. Altering this deeply rooted, complex polygenic system represents a far greater evolutionary undertaking than modifying traits governed by simpler genetic architectures responding to local environmental pressures.

In summary, appreciating human ancestry requires differentiating between the deep, narrow lineage tracing of Y/mtDNA and the broad, probabilistic tapestry revealed by autosomal analysis and population genetics. These genetic insights provide valuable information about demographic history and relatedness, which correlate imperfectly with the conventional categories humans use to classify group differences. Similarly, understanding human evolution requires recognizing the distinct timescales and genetic architectures involved. While visible traits associated with geographic origin can change over relatively recent evolutionary time, the complex, shared foundation of human cognition appears to be a product of a much deeper, conserved evolutionary history. Navigating these distinctions allows for a clearer, more accurate understanding of human variation and our common biological heritage.

User:

Okay, but there is a natural question that arises for some. If genetics enables some probabilistic differentiation between races that have been defined largely in the context of social and cultural norms norms; why is this not partly biological as well?

Offer an example or recipe if you will of how social hierarchies can create discernible statistically detectable differences between socially defined races? What would happen if say social hierarchies developed around a minority population with green eyes and darker skin over 2000 years? Why don't such differences support the idea that races are biological? What is the implicit definition of biological here that is or isn't essentialist?


Gemini 2.5 Pro

That's a natural and perceptive question. It gets right to the heart of why discussions about race and genetics are so complex and often fraught. If genetic analysis can find statistical differences between groups that align, however imperfectly, with socially defined races, why isn't race at least partly biological?

The confusion arises from what we mean by "biological" and how social forces can actively shape biological patterns over time. Let's break it down:

1. Yes, Statistical Genetic Differences Exist and Correlate with Social Labels:

It's undeniable that modern population genetics can take DNA samples from anonymous individuals and, with high statistical probability, infer their likely recent ancestral origins (e.g., predominantly European, West African, East Asian). Because socially constructed racial categories in many societies (like the US) were historically based, at least in part, on perceived geographic origin and associated phenotypes, there is an observed correlation between genetic ancestry clusters and these social labels. If you sample people who self-identify as "White" in Chicago and people who self-identify as "Black" or "African American" in Chicago, you will find statistically significant differences in the frequencies of many genetic variants between these two groups, reflecting different predominant ancestral origins (Europe vs. West Africa). This is "biological" in the sense that DNA and allele frequencies are biological phenomena.

2. How Social Hierarchies Create Detectable Genetic Differences (The Recipe):

Here's how social structures, particularly hierarchies, can actively generate or reinforce measurable genetic distinctions between socially defined groups, even if the initial group distinction wasn't purely genetic:

  • Ingredient 1: Pre-existing Variation: Start with a human population that already has internal genetic diversity, as all human populations do.
  • Ingredient 2: Social Stratification: Society imposes a division or hierarchy based on some criteria. This could be ancestry, religion, class, perceived physical traits (like skin color, hair texture), or even something initially arbitrary. Crucially, this division becomes socially meaningful and influences behavior.
  • Ingredient 3: Enforced Endogamy (The Key Mechanism): The most powerful social tool for creating genetic differentiation is restricting mate choice. If strong social rules, laws, or intense social pressure (stigma, violence, economic disadvantage) discourage or prevent marriage and reproduction between the defined groups, you limit gene flow. People overwhelmingly reproduce within their assigned social category.
  • Ingredient 4: Time: Allow this system of socially enforced reproductive isolation (even if imperfect) to persist for many generations.
  • The Process & Outcome:
    • Reduced Gene Flow: The social barriers act like geographic barriers, slowing the mixing of genes between the groups.
    • Genetic Drift: In each relatively isolated group, random chance causes some alleles to become more or less common purely by accident (genetic drift). This happens independently in each group, leading them to diverge genetically over time. Drift is stronger in smaller groups (often minorities or historically bottlenecked populations).
    • Differential Selection (Possible): If the social hierarchy also leads to significantly different environments (e.g., diet, disease exposure, access to healthcare, stress levels), there might be subtle differences in natural selection pressures acting on the groups, further contributing to genetic divergence (though drift is often a more potent force over shorter timescales for many traits).
    • Founder Effects (If Applicable): If one group was founded by a small number of individuals, their particular set of gene variants (including potentially rare ones) will be overrepresented in subsequent generations as long as endogamy persists.

Outcome: After enough time (even just centuries or a couple of millennia), these processes will inevitably lead to statistically significant differences in allele frequencies across many parts of the genome between the socially defined groups. A geneticist could then statistically distinguish members of these groups based purely on their DNA.

3. The Hypothetical Example: Green Eyes and Dark Skin Minority

Let's apply the recipe:

  • Setup: Imagine a diverse ancestral population settles an island. Within this population, a combination of alleles leading to green eyes (relatively uncommon globally) and alleles leading to darker skin pigmentation exists in some individuals. For complex social/historical reasons (perhaps they were early arrivals, practiced a different craft, followed a different leader), this group becomes a distinct social minority.
  • Social Hierarchy & Endogamy: A rigid social hierarchy develops over 2000 years (roughly 80-100 generations). The "Green-Dark" group faces prejudice and is strongly discouraged (or forbidden) from marrying/reproducing with the majority population. They primarily marry within their own group.
  • Genetic Consequences after 2000 years:
    • Target Traits: The alleles for green eyes and darker skin would become highly concentrated (high frequency) within the minority group due to both the initial social definition and subsequent endogamy.
    • Genome-Wide Drift: More importantly, due to 2000 years of reproductive isolation and genetic drift operating independently, thousands of other unrelated genetic variants across the genome would also show significant frequency differences between the Green-Dark minority and the majority population purely by chance. If the minority group remained small, drift would be particularly effective in changing their allele frequencies compared to the larger majority group. Some rare alleles present in the founders might become relatively common within the group.
    • Detectability: Genetic analysis would easily find statistically significant differences between the two groups, sufficient to predict group membership with reasonable accuracy based on DNA alone.

4. Why Doesn't This Support Race as "Biological"? The Implicit Definition

This is the crux. The reason the scenario above doesn't support the idea that these socially constructed groups are fundamental "biological races" hinges on the implicit definition of biological race that is usually being debated.

  • The Folk/Essentialist Definition of "Biological Race": When most people argue about whether race is "biological," they often implicitly invoke an essentialist view. This view assumes:

    • Humanity is naturally divided into a small number of distinct, discrete categories (races).
    • These categories are based on deep, intrinsic, and relatively fixed biological differences (an "essence").
    • This essence determines a whole suite of traits, often including not just appearance but also capabilities, temperament, health predispositions, etc.
    • These categories are ancient and fundamental divisions of the species.
  • Why the Socially Driven Differences Don't Fit:

    • Origin: The genetic differences in our example were caused primarily by social factors (rules about marriage) acting on random chance (drift), not by deep, ancient biological divergence creating the groups in the first place. The biology reflects the social history.
    • Nature of Differences: The differences are statistical frequencies of alleles shared across both groups, not the presence/absence of unique "race genes." Variation within each group remains vast, and the genetic overlap between groups is still enormous. They are not discrete categories but rather populations with shifted frequency distributions due to restricted gene flow.
    • Lack of Deep Structure: These differences, shaped over only 2000 years, are shallow compared to the divergence between human populations separated for tens of thousands of years (like Africans vs. non-Africans initially) or compared to divergences between species. They don't represent fundamental branches of the human tree.
    • Malleability: If the social hierarchy were dismantled and intermarriage became common, the genetic distinctions would begin to blur relatively quickly (over generations) as gene flow resumed. Deep biological divisions (like species) don't merge back together easily.
    • No Biological "Package": While the Green-Dark group would have high frequencies of alleles for those specific traits and other randomly drifted alleles, there's no biological reason to assume these drifted alleles create a functionally coherent "package" determining other complex traits like intelligence or behavior. The linkage is historical (shared ancestry due to endogamy) and stochastic (drift), not functionally deterministic.

Conclusion:

Genetic differences between human populations, including those that correlate with socially defined races, are biologically real in the sense that allele frequencies are biological data. However, the patterns of these differences among groups that identify with common racial labels are often profoundly shaped by relatively recent social history, particularly mating patterns enforced by social structures and hierarchies, acting on top of older patterns of migration and drift.

Saying race is not "biological" is a rejection of the essentialist definition. It's not a denial that populations have different genetic histories or that statistical differences exist. It is an assertion that these differences don't form discrete, fundamental, essential categories that determine suites of traits, but rather reflect a complex interplay of ancient ancestry, random drift, adaptation to local environments (like skin color), and, crucially, the powerful impact of social behaviors and structures on gene flow over time. The correlation we see is often a case of social categories mapping onto patterns that social forces themselves helped to create or maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment