Anthropology Humans: Ancestry

Showing posts with label Ancestry. Show all posts

Thursday, 28 March 2013

Refined IBD in Beagle 4

The Beagle page doesn't show version 4 yet, but I'm sure it will eventually turn up there since this paper has just been published.
Genetics doi: 10.1534/genetics.113.150029

Improving the Accuracy and Efficiency of Identity by Descent Detection in Population Data
Brian L. Browning and Sharon R. Browning
Segments of identity by descent (IBD) detected from high-density genetic data are useful for many applications, including long-range phase determination, phasing family data, imputation, IBD mapping and heritability analysis in founder populations. We present Refined IBD, a new method for IBD segment detection. Refined IBD achieves both computational efficiency and highly accurate IBD segment reporting by searching for IBD in two steps. The first step (identification) uses the GERMLINE algorithm to find shared haplotypes exceeding a length threshold. The second step (refinement), evaluates candidate segments with a probabilistic approach to assess the evidence for IBD. Like GERMLINE, Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses. To investigate the properties of Refined IBD, we simulate SNP data from a model with recent super-exponential population growth that is designed to match UK data. The simulation results show that Refined IBD achieves a better power/accuracy profile than fastIBD or GERMLINE. We find that a single run of Refined IBD achieves greater power than 10 runs of fastIBD. We also apply Refined IBD to SNP data for samples from the UK and from Northern Finland, and describe the IBD sharing in these data sets. Refined IBD is powerful, highly accurate, easy to use, and is implemented in Beagle version 4.
Link

Wednesday, 27 March 2013

Population structure in the Netherlands

The three PCs are color-coded in panels b,c,d.
European Journal of Human Genetics , (27 March 2013) | doi:10.1038/ejhg.2013.48
Population structure, migration, and diversifying selection in the Netherlands
Abdel Abdellaoui et al.
Genetic variation in a population can be summarized through principal component analysis (PCA) on genome-wide data. PCs derived from such analyses are valuable for genetic association studies, where they can correct for population stratification. We investigated how to capture the genetic population structure in a well-characterized sample from the Netherlands and in a worldwide data set and examined whether (1) removing long-range linkage disequilibrium (LD) regions and LD-based SNP pruning significantly improves correlations between PCs and geography and (2) whether genetic differentiation may have been influenced by migration and/or selection. In the Netherlands, three PCs showed significant correlations with geography, distinguishing between: (1) North and South; (2) East and West; and (3) the middle-band and the rest of the country. The third PC only emerged with minimized LD, which also significantly increased correlations with geography for the other two PCs. In addition to geography, the Dutch North–South PC showed correlations with genome-wide homozygosity (r=0.245), which may reflect a serial-founder effect due to northwards migration, and also with height (♂: r=0.142, ♀: r=0.153). The divergence between subpopulations identified by PCs is partly driven by selection pressures. The first three PCs showed significant signals for diversifying selection (545 SNPs - the majority within 184 genes). The strongest signal was observed between North and South for the functional SNP in HERC2 that determines human blue/brown eye color. Thus, this study demonstrates how to increase ancestry signals in a relatively homogeneous population and how those signals can reveal evolutionary history.

Link

Friday, 15 March 2013

Admixture in Southern Africa (Petersen et al. 2013)

PLoS Genet 9(3): e1003309. doi:10.1371/journal.pgen.1003309
Complex Patterns of Genomic Admixture within Southern Africa
Desiree C. Petersen et al.
Within-population genetic diversity is greatest within Africa, while between-population genetic diversity is directly proportional to geographic distance. The most divergent contemporary human populations include the click-speaking forager peoples of southern Africa, broadly defined as Khoesan. Both intra- (Bantu expansion) and inter-continental migration (European-driven colonization) have resulted in complex patterns of admixture between ancient geographically isolated Khoesan and more recently diverged populations. Using gender-specific analysis and almost 1 million autosomal markers, we determine the significance of estimated ancestral contributions that have shaped five contemporary southern African populations in a cohort of 103 individuals. Limited by lack of available data for homogenous Khoesan representation, we identify the Ju/'hoan (n = 19) as a distinct early diverging human lineage with little to no significant non-Khoesan contribution. In contrast to the Ju/'hoan, we identify ancient signatures of Khoesan and Bantu unions resulting in significant Khoesan- and Bantu-derived contributions to the Southern Bantu amaXhosa (n = 15) and Khoesan !Xun (n = 14), respectively. Our data further suggests that contemporary !Xun represent distinct Khoesan prehistories. Khoesan assimilation with European settlement at the most southern tip of Africa resulted in significant ancestral Khoesan contributions to the Coloured (n = 25) and Baster (n = 30) populations. The latter populations were further impacted by 170 years of East Indian slave trade and intra-continental migrations resulting in a complex pattern of genetic variation (admixture). The populations of southern Africa provide a unique opportunity to investigate the genomic variability from some of the oldest human lineages to the implications of complex admixture patterns including ancient and recently diverged human lineages.
Link

Friday, 1 March 2013

Genomewide diversity in the Levant (Haber et al. 2013)

Razib points me to a new paper (and its associated data, consisting of Christian, Druze, and Muslim Lebanese).
Genome-Wide Diversity in the Levant Reveals Recent Structuring by Culture
Marc Haber et al.
The Levant is a region in the Near East with an impressive record of continuous human existence and major cultural developments since the Paleolithic period. Genetic and archeological studies present solid evidence placing the Middle East and the Arabian Peninsula as the first stepping-stone outside Africa. There is, however, little understanding of demographic changes in the Middle East, particularly the Levant, after the first Out-of-Africa expansion and how the Levantine peoples relate genetically to each other and to their neighbors. In this study we analyze more than 500,000 genome-wide SNPs in 1,341 new samples from the Levant and compare them to samples from 48 populations worldwide. Our results show recent genetic stratifications in the Levant are driven by the religious affiliations of the populations within the region. Cultural changes within the last two millennia appear to have facilitated/maintained admixture between culturally similar populations from the Levant, Arabian Peninsula, and Africa. The same cultural changes seem to have resulted in genetic isolation of other groups by limiting admixture with culturally different neighboring populations. Consequently, Levant populations today fall into two main groups: one sharing more genetic characteristics with modern-day Europeans and Central Asians, and the other with closer genetic affinities to other Middle Easterners and Africans. Finally, we identify a putative Levantine ancestral component that diverged from other Middle Easterners ~23,700–15,500 years ago during the last glacial period, and diverged from Europeans ~15,900–9,100 years ago between the last glacial warming and the start of the Neolithic.
Link

Saturday, 26 January 2013

Ancestry Composition to be fixed

From the explanation at the relevant thread:

Ancestry Composition (AC) works by learning (training) a set of useful features from reference individuals with known ancestry (the training set) and then using these features to predict the ancestry of our customers.

Our set of reference individuals consists in part of customers who reported their 4 grandparents were born in the same country. Remember that we also remove the outliers, or people whose genetic ancestry doesn't match their survey answers. From this set, AC learns to associate certain haplotypes with their geographical origin. AC is then able to recognize similar haplotypes and thus to predict the ancestry of other customers.

However, when predicting the ancestry of reference individuals, AC suffers from overfitting, a problem common to many supervised learning methods. As a consequence, AC predicts the ancestry of most reference individuals as being 100% from their grandparents’ birthplace.

We addressed this issue using a method inspired from cross-validation. We divided the training set into 5 folds, each containing 20% of the reference individuals. We then trained 5 AC models in which each fold in turn is excluded from the set of reference individuals. So each of these models is learned using 80% of the reference individuals. Additionally, we retain the model that was trained using all the reference individuals. From this process, we end up with 6 different models from which we can predict the ancestry of our customers.

Now, when predicting the ancestry of a customer, we start by figuring out if he/she is a reference individual. If yes, we identify the fold in which the customer belongs, and we use the corresponding model for prediction. If not, we use the fold containing all of the reference data. This way, we ensure that AC was never trained using the haplotypes of the individual it tries to predict.

I had proposed basically the same solution about a month ago, and it's great that the issue is being addressed so soon after it first appeared. If any of the people who had written to me/commented on the topic get their new updated results and want to comment, feel free to do so in this post.

I am not sure how 23andMe plans to handle their Ancestry Composition feature in the future, but I would suggest that they periodically re-update it as they get more samples. According to a recent estimate, there are over 180,000 people in their database at the moment, a fraction of which meets the twin requirements of: (i) having 4 grandparents from the same country, and (ii) not being an outlier. As this number increases over time, it might be a good idea to occasionally re-partition the sample and re-calculate participants' ancestry composition results.

The fact that they are ready to roll out their updated results so soon after the initial ones tells me that they do have the computing power to do so, and it might be a good idea to update Ancestry Composition periodically, say on a quarterly basis or when a certain increase in the training set (say, 10%) is achieved. Eventually the admixture estimates may stabilize, in which case the way forward may involve rethinking the choice of ancestral populations currently in use.

Monday, 14 January 2013

Gene flow between Indian populations and Australasia ~4,000 years ago

Only the press release is available so far, I will add the paper abstract when I see it on the PNAS website:

Researcher Irina Pugach and colleagues now analysed genetic variation from across the genome from aboriginal Australians, New Guineans, island Southeast Asians, and Indians. Their findings suggest substantial gene flow from India to Australia 4,230 years ago. i.e. during the Holocene and well before European contact. “Interestingly,” says Pugach, “this date also coincides with many changes in the archaeological record of Australia, which include a sudden change in plant processing and stone tool technologies, with microliths appearing for the first time, and the first appearance of the dingo in the fossil record. Since we detect inflow of genes from India into Australia at around the same time, it is likely that these changes were related to this migration.”

Their analyses also reveal a common origin for populations from Australia, New Guinea and the Mamanwa – a Negrito group from the Philippines – and they estimated that these groups split from each other about 36,000 years ago. Mark Stoneking says: “This finding supports the view that these populations represent the descendants of an early ‘southern route’ migration out of Africa, while other populations in the region arrived later by a separate dispersal.“ This also indicates that Australians and New Guineans diverged early in the history of Sahul, and not when the lands were separated by rising sea waters around 8,000 years ago.

A relationship between Indian and Australasian populations has long been suspected on various grounds (e.g., HGDP Papuans often show membership in a "South Asian" ancestral component at low levels of resolution). It will be interesting to see the model proposed in the new paper about the admixture event leading to modern Australasians.

UPDATE: Ed Yong covers the story in Nature News:

Some aboriginal Australians can trace as much as 11% of their genomes to migrants who reached the island around 4,000 years ago from India, a study suggests. Along with their genes, the migrants brought different tool-making techniques and the ancestors of the dingo, researchers say1.

From World News Australia:

The study suggests that in addition to an earlier northern route of migration out of Africa, into Asia, and then South East Asia about 60,000 to 70,000 years ago, the second wave occurred much later, arriving during the Holocene period about 4,230 years ago.
...
“About that point in the archaeological record, there were significant changes in the use of stone tools, in hunting techniques and significantly, the introduction of the dingo,” Professor Cooper said.
...
There are other theories that may support the evidence of a more recent influx of migrants from India, including that they brought with them a disease of epidemic proportions that wiped out earlier Aboriginal populations.

UPDATE II: I added the abstract.

PNAS doi: 10.1073/pnas.1211927110

Genome-wide data substantiate Holocene gene flow from India to Australia

Irina Pugach et al.

The Australian continent holds some of the earliest archaeological evidence for the expansion of modern humans out of Africa, with initial occupation at least 40,000 y ago. It is commonly assumed that Australia remained largely isolated following initial colonization, but the genetic history of Australians has not been explored in detail to address this issue. Here, we analyze large-scale genotyping data from aboriginal Australians, New Guineans, island Southeast Asians and Indians. We find an ancient association between Australia, New Guinea, and the Mamanwa (a Negrito group from the Philippines), with divergence times for these groups estimated at 36,000 y ago, and supporting the view that these populations represent the descendants of an early “southern route” migration out of Africa, whereas other populations in the region arrived later by a separate dispersal. We also detect a signal indicative of substantial gene flow between the Indian populations and Australia well before European contact, contrary to the prevailing view that there was no contact between Australia and the rest of the world. We estimate this gene flow to have occurred during the Holocene, 4,230 y ago. This is also approximately when changes in tool technology, food processing, and the dingo appear in the Australian archaeological record, suggesting that these may be related to the migration from India.

Link

Thursday, 21 June 2012

Ethiopian origins (Pagani et al. 2012)

The study attempts to answer four questions:

Our current study is motivated by four questions. First, where do the Ethiopians stand in the African genetic landscape? Second, what is the extent of recent gene flow from outside Africa into Ethiopia, when did it occur, and is there evidence of selection effects? Third, do genomic data support a route for out-of-Africa migration of modern humans across the mouth of the Red Sea? Fourth, assuming temporal stability of current populations, what are the estimated ages of Ethiopian populations relative to other African groups?

Link to press release. Link the supplemental data.

The authors reiterate that modern humans left Africa 50-70kya, a hypothesis that seems to me pretty much dead in the light of recent archaeological evidence.

The lack of antiquity in the Ethiopian population, even in only the African component thereof argues against that population being ancestral to modern humans. Note that if the Out-of-East Africa hypothesis is correct, then skulls like Omo I represent ancestral modern humans and they are followed much later by modern humans anywhere else. So, while anatomical modernity may have emerged in East Africa --or maybe not; let's not forget that we have early modern skulls from the region in part because of the excellent preservation conditions and excess of scholarly interest-- there is no evidence that they spread from there.

I have little doubt that my own theory about substantial back-migration of Eurasians into Africa will eventually win the day. Of course, I am not referring to the recent (in the last 3,000 years) admixture with West Eurasians that the Ethiopian population has undergone, but rather to the more ancient migration that was probably associated with Y-haplogroup DE-YAP.

The fact that the African component of diverse African populations is more closely related to West than to East Eurasians is one piece of evidence among many for that scenario. Hopefully, it can be tested soon using whole genome data which may have enough density to detect much older admixture events.

UPDATE I: Since the dates in the paper are based on ROLLOFF, a piece of software that is not publicly available more than a year after its announcement, and which contradicts other software released by the same authors, I will take the Queen of Sheba stories circulated in the media with a huge grain of salt.

The American Journal of Human Genetics, 21 June 2012 doi:10.1016/j.ajhg.2012.05.015

Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool

Luca Pagani et al.

Humans and their ancestors have traversed the Ethiopian landscape for millions of years, and present-day Ethiopians show great cultural, linguistic, and historical diversity, which makes them essential for understanding African variability and human origins. We genotyped 235 individuals from ten Ethiopian and two neighboring (South Sudanese and Somali) populations on an Illumina Omni 1M chip. Genotypes were compared with published data from several African and non-African populations. Principal-component and STRUCTURE-like analyses confirmed substantial genetic diversity both within and between populations, and revealed a match between genetic data and linguistic affiliation. Using comparisons with African and non-African reference samples in 40-SNP genomic windows, we identified “African” and “non-African” haplotypic components for each Ethiopian individual. The non-African component, which includes the SLC24A5 allele associated with light skin pigmentation in Europeans, may represent gene flow into Africa, which we estimate to have occurred ∼3 thousand years ago (kya). The African component was found to be more similar to populations inhabiting the Levant rather than the Arabian Peninsula, but the principal route for the expansion out of Africa ∼60 kya remains unresolved. Linkage-disequilibrium decay with genomic distance was less rapid in both the whole genome and the African component than in southern African samples, suggesting a less ancient history for Ethiopian populations.

Link

Saturday, 28 August 2004

EURO-DNA test

AncestryByDNA has released a EURO-DNA test which reports percentages of "Northern European," "Southeastern European," "Middle-Eastern," and "South Asian" admixture based on a 320 ancestry-informative markers (AIMs).

The ad-hoc choice of the four ancestral groups and the rather confusing commentary and/or anomalous results (Iberians on average ~16% "South Asian"?) may discourage many from taking the test, especially at a price tag of $399. Still, EURO-DNA is a step towards personalized genetic archaeology, even though the theoretical assumptions and methodology leave much to be desired at this stage.

Update:

If you start with the a priori breakdown into 4 groups, then each individual will have 4 numbers that add up to 100%. One could just as easily have used a "Southwestern European", "Northeastern European", "Middle Eastern" and "South Asian" breakdown, and again each individual would have 4 numbers adding up to 100%.

The trick is to start with a collection of individuals, remove identifying tags and cluster them, thus identifying the real genetic components in the population, if any such components can be detected. This was the procedure followed by Rosenberg et al. [1]. In that analysis, wholly different clusterings emerged, with e.g., the specificity of Iberian Basques, who were allocated their own cluster, was discovered.

By contrast, an Iberian Basque taking the EURO-DNA test would perhaps get a score high in NOR/MED which however obfuscates the real genetic structure of the Basque population which is highly specific, as the Basques are an ancient ethnolinguistic isolate of the Iberian peninsula rather than the product of "admixture".

AncestryByDNA must show why its chosen four-group breakdown is used in lieu of other potential choices.

[1] Rosenberg et al (2002)

Update #2: Check out the comments for some additional information by Dr. Tony Frudakis of DNAPrint who is involved in the creation of EURO-DNA 1.0 and the AncestryByDNA tests.

Saturday, 14 August 2004

Celtic Origins on the Atlantic Facade of Europe

A new study re-evaluates mtDNA evidence, in conjunction with Y-chromosomal and autosomal DNA, to trace the origins of Celtic-speaking populations of Western Europe. The main conclusion is that while some Central European origin can't be ruled out using genetic data, current and former Celtic speakers seem to share a common ancestry with other non-Celtic populations of the Atlantic zone of Europe.

Am. J. Hum. Genet., 75:000, 2004

The Longue Durée of Genetic Ancestry: Multiple Genetic Marker Systems and Celtic Origins on the Atlantic Facade of Europe

Brian McEvoy et al.

Celtic languages are now spoken only on the Atlantic facade of Europe, mainly in Britain and Ireland, but were spoken more widely in western and central Europe until the collapse of the Roman Empire in the first millennium A.D. It has been common to couple archaeological evidence for the expansion of Iron Age elites in central Europe with the dispersal of these languages and of Celtic ethnicity and to posit a central European "homeland" for the Celtic peoples. More recently, however, archaeologists have questioned this "migrationist" view of Celtic ethnogenesis. The proposition of a central European ancestry should be testable by examining the distribution of genetic markers; however, although Y-chromosome patterns in Atlantic Europe show little evidence of central European influence, there has hitherto been insufficient data to confirm this by use of mitochondrial DNA (mtDNA). Here, we present both new mtDNA data from Ireland and a novel analysis of a greatly enlarged European mtDNA database. We show that mtDNA lineages, when analyzed in sufficiently large numbers, display patterns significantly similar to a large fraction of both Y-chromosome and autosomal variation. These multiple genetic marker systems indicate a shared ancestry throughout the Atlantic zone, from northern Iberia to western Scandinavia, that dates back to the end of the last Ice Age.

Link

Thursday, 12 August 2004

Samaritan mtDNA and Y chromosomes

Hum Mutat. 2004 Sep;24(3):248-60.

Reconstruction of patrilineages and matrilineages of Samaritans and other Israeli populations from Y-Chromosome and mitochondrial DNA sequence Variation.
P. Shen et al.

The Samaritan community, which numbered more than a million in late Roman times and only 146 in 1917, numbers today about 640 people representing four large families. They are culturally different from both Jewish and non-Jewish populations in the Middle East and their origin remains a question of great interest. Genetic differences between the Samaritans and neighboring Jewish and non-Jewish populations are corroborated in the present study of 7,280 bp of nonrecombining Y-chromosome and 5,622 bp of coding and hypervariable segment I (HVS-I) mitochondrial DNA (mtDNA) sequences. Comparative sequence analysis was carried out on 12 Samaritan Y-chromosome, and mtDNA samples from nine male and seven female Samaritans separated by at least two generations. In addition, 18-20 male individuals were analyzed, each representing Ethiopian, Ashkenazi, Iraqi, Libyan, Moroccan, and Yemenite Jews, as well as Druze and Palestinians, all currently living in Israel. The four Samaritan families clustered to four distinct Y-chromosome haplogroups according to their patrilineal identity. Of the 16 Samaritan mtDNA samples, 14 carry either of two mitochondrial haplotypes that are rare or absent among other worldwide ethnic groups. Principal component analysis suggests a common ancestry of Samaritan and Jewish patrilineages. Most of the former may be traced back to a common ancestor in the paternally-inherited Jewish high priesthood (Cohanim) at the time of the Assyrian conquest of the kingdom of Israel.
Link