Main contributor: Diahan Southard
Scientists in the processing of DNA sequences.
Scientists in the processing of DNA sequences.

Reference populations are targeted members of a group whose characteristics or data are used to represent characteristics of the larger group.[1] Reference populations are chosen because they have defining qualities of the larger population being studied, and others are compared to them. Companies offering at-home DNA tests like MyHeritage, use reference populations when analyzing DNA test results. MyHeritage has been able to recognize genetic associations with 42 different ethnic regions of the world because of its Founder Population Project.[2] Participants in this project helped MyHeritage define typical or definitive genetic signatures of each of these unique populations.

Role of reference populations in the MyHeritage DNA analysis

For the Founder Population Project, more than 5,000 people were chosen as a “reference population” from among millions of MyHeritage users. Potential participants’ family trees were screened. Participants were invited whose trees showed all branches of their family to have heritage from the same region or ethnicity for at least six generations–and sometimes up to fifteen.

The idea that a few sample people can genetically represent a single, small place comes from the “founder effect” concept in population genetics.[3] Founder effect describes the distinctive DNA that appears in those who descend from small, isolated communities founded by a small number of people. Their descendants have reduced genetic diversity; over time, they developed a unique, shared genetic signature.

When defining their reference population, the MyHeritage Science Team avoided including individuals who have admixed ancestry (ancestors from several different places or groups). That way, participants’ genetic characteristics clearly reflected one ethnicity or location, without including results from other populations. That wasn’t always possible, as many groups intermarried with each other in many parts of the world over time, especially in those regions that have experienced migrations, colonization, invasions and that are in the area of influence of trade routes, as all these bring along with it individuals of varying genetic backgrounds.

All the participants took DNA tests, and MyHeritage compared their results to each other using a statistical method called Principal Component Analysis.[4] This kind of analysis shows all the testers’ results compared to each other, in a smaller, more manageable dataset. The MyHeritage Science Team looked for participants who shared known roots and also who also shared common genetic signatures. Their DNA commonalities were identified and assigned to represent one of 42 specific ancestries, such as Japanese, Italian, Ashkenazi Jewish and others. Those whose family trees and genetics didn’t seem to be pointing to the same origins, based on larger patterns in the data, were excluded.

Ethnicity Estimate at MyHeritage.
Ethnicity Estimate at MyHeritage.

MyHeritage’s reference population is unique because the company chose a unique group of participants and the company itself ran the analysis. When someone takes a MyHeritage DNA test, their DNA is compared against the genetic signatures identified with those 42 ethnicities. Most people’s ancestry is admixed, which means that most people’s Ethnicity Estimates at MyHeritage include percentages that assign their origins to various populations that match their own unique genetic signatures.

The Ethnicity Estimate shown above reflects the largest portion of the tester’s ancestry from England, but with significant heritage from North and West European, East European and Iberian; the additional region shown on the map in gray will be explained below.

Reference populations and Genetic Groups

In addition to the Ethnicity Estimates received by every MyHeritage DNA testee, many people are also assigned to Genetic Groups. MyHeritage has identified more than 2,000 Genetic Groups that represent even more recent and specific places and cultural groups. These Genetic Groups are not based on reference populations, but on DNA matching. To create new Genetic Groups, which is an ongoing process, the MyHeritage Science Team clustered testers into progressively smaller groups who share genetic similarities. Then the team reviewed the family trees of those who are part of the same cluster and looked for patterns in their ancestral locations.

This combination of genetics and genealogy has allowed MyHeritage to identify even more specific ancestral associations, such as “Black-Sea Germans in Ukraine (Odessa) and in Russia (Volgograd).” That is just one Genetic Group assigned to the testee whose results are shown below (Genetic Groups are highlighted in yellow and represented on the map in gray).

Ethnicity Estimate at MyHeritage showing Genetic Groups.
Ethnicity Estimate at MyHeritage showing Genetic Groups.

That location confirms a connection the testee had already made to Odessa and German-speaking Russians through traditional genealogy research. If genealogy research had not been done, this Genetic Group assignment alone would offer the tester a strong clue.

Using ethnicity information –which relies on those reference populations– together with Genetic Groups allows MyHeritage to provide a multi-faceted look into the biogeographical origins of each testee.

Explore More about reference populations

References

  1. Eynard, Sonia E; Croiseau, Pascal; Laloë, Denis; Fritz, Sebastien; Calus, Mario P. L.; Restoux, Gwendal. Which Individuals To Choose To Update the Reference Population? Minimizing the Loss of Genetic Diversity in Animal Genomic Selection Programs. G3 (Bethesda). 2018.
  2. MyHeritage Launches New Comprehensive DNA Ethnicity Analysis. Businesswire
  3. FOUNDER EFFECT National Human Genome Research Institute.
  4. What Is Principal Component Analysis? Built In


Contributors

Main contributor: Diahan Southard
Additional contributor: Sunny Morton