Friday, 4 January 2013

Deep whole-genome sequencing of 100 Malays

The 1000 Genomes Project is the largest collection of full human genomes currently available, but most of its 2.5k samples have been sequenced at low coverage. One downside of this is that infrequent variants are often missed. If an individual is polymorphic at some site, then the chance of detecting this polymorphism increases with the number of reads covering that site. If a number of individuals are sampled, then polymorphisms that are common in the population will probably be detected in a few individuals even if a low number of reads is used for each of them; but, if they are infrequent, then they are more likely to be missed. Hence, low-coverage sequencing of population samples will tend to find common variants and will tend to miss less common variants relative to high-coverage sequencing.

This idea is intuitively correct, but the question of the added power of high-coverage sequencing to detect variants can only be addressed by giving the same individuals both low- and high-coverage sequencing. This is the topic of a new paper in AJHG which creates a useful comparison benchmark for the performance of the two types of sequencing methods. High-coverage sequencing may be needed for things like disease studies (because deleterious alleles tend to be low-frequency), or the study of recent human demography (because recent population growth has resulted in an abundance of low-frequency SNPs that have not had enough time to reach a high population frequency yet).

AJHG dx.doi.org/10.1016/j.ajhg.2012.12.005

Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

Lai-Ping Wong et al.


Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30? coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (less than 5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

Link

Thursday, 3 January 2013

Body form variation of prehistoric Jomon (Fukase et al. 2012)

Am J Phys Anthropol DOI: 10.1002/ajpa.22112

Geographic variation in body form of prehistoric Jomon males in the Japanese archipelago: Its ecogeographic implications 

Hitoshi Fukase et al.

Diversity of human body size and shape is often biogeographically interpreted in association with climatic conditions. According to Bergmann's and Allen's rules, populations in regions with a cold climate are expected to display an overall larger body and smaller/shorter extremities than those in warm/hot environments. In the present study, the skeletal limb size and proportions of prehistoric Jomon hunter-gatherers, who extensively inhabited subarctic to subtropical areas in the ancient Japanese archipelago, were examined to evaluate whether or not the inter-regional differences follow such ecogeographic patterns. Results showed that the Jomon intralimb proportions including relative distal limb lengths did not differ significantly among five regions from northern Hokkaido to the southern Okinawa Islands. This suggests a limited co-variability of the intralimb proportions with climate, particularly within genealogically close populations. In contrast, femoral head breadth (associated with body mass) and skeletal limb lengths were found to be significantly and positively correlated with latitude, suggesting a north-south geographical cline in the body size. This gradient therefore comprehensively conforms to Bergmann's rule, and may stem from multiple potential factors such as phylogenetic constraints, microevolutionary adaptation to climatic/geographic conditions during the Jomon period, and nutritional and physiological response during ontogeny. Specifically, the remarkably small-bodied Jomon in the Okinawa Islands can also be explained as an adjustment to subtropical and insular environments. Thus, the findings obtained in this study indicate that Jomon people, while maintaining fundamental intralimb proportions, displayed body size variation in concert with ambient surroundings.


Link

Tuesday, 1 January 2013

Y-chromosome and mtDNA of Henri IV

A recent paper had determined the Y-chromosome haplotype of Louis XVI of France from a handkerchief preserving his blood after his execution. A new study looks at the mummified head of Henri IV, the first Bourbon King of France. Even though only a limited number of Y-STRs were successfully typed, they match those of Louis XVI, who belonged to the not-so-frequent-anymore haplogroup G2a. So, while we cannot be entirely sure that the two Y-chromosomes were related in a genealogical time frame, the evidence is consistent with their known genealogical relationship and with the attribution of the two samples (mummified head/blood) to the respective kings.

Also of interest, Henri IV's mtDNA haplotype:
The majority of the clones generated show an U5b* mtDNA haplotype defined by three nucleotide changes at positions 16239T 16270T 16311C (see Supplementary material). The three HVR1 diagnostic positions were confirmed in two different amplifications of the L16185-H16378 HVR1 fragment, proving that the results are reproducible. This mtDNA haplotype is present so far in one single individual from France (originally published in [10]) in an in-house database of 22,807 published European sequences, and it is absent in all people involved in the laboratory analysis.
 If I followed the trail of ancestry correctly, this matrilineage leads all the way to a Tochter von Egisheim in the 11th century.

Forensic Science International Available online 30 December 2012

Genetic comparison of the head of Henri IV and the presumptive blood from Louis XVI (both Kings of France)

Philippe Charlier et al.

A mummified head was identified in 2010 as belonging to Henri IV, King of France. A putative blood sample from the King Louis XVI preserved into a pyrographically decorated gourd was analyzed in 2011. Both kings are in a direct male-line descent, separated by seven generations. We have retrieved the hypervariable region 1 of the mitochondrial DNA as well as a partial Y-chromosome profile from Henri IV. Five STR loci match the alleles found in Louis XVI, while another locus shows an allele that is just one mutation step apart. Taking into consideration that the partial Y-chromosome profile is extremely rare in modern human databases, we concluded that both males could be paternally related. The likelihood ratio of the two samples belonging to males separated by seven generations (as opposed to unrelated males) was estimated as 246.3, with a 95% confidence interval between 44.2 and 9729. Historically speaking, this forensic DNA data would confirm the identity of the previous Louis XVI sample, and give another positive argument for the authenticity of the head of Henri IV.

Link

Mating between Modern Humans, Neanderthals and other Archaics (Waddell & Tan 2012)

arXiv:1212.6820 [q-bio.GN]

New g%AIC, g%AICc, g%BIC, and Power Divergence Fit Statistics Expose Mating between Modern Humans, Neanderthals and other Archaics

Peter J. Waddell, Xi Tan

The purpose of this article is to look at how information criteria, such as AIC and BIC, relate to the g%SD fit criterion derived in Waddell et al. (2007, 2010a). The g%SD criterion measures the fit of data to model based on a normalized weighted root mean square percentage deviation between the observed data and model estimates of the data, with g%SD = 0 being a perfectly fitting model. However, this criterion may not be adjusting for the number of parameters in the model comprehensively. Thus, its relationship to more traditional measures for maximizing useful information in a model, including AIC and BIC, are examined. This results in an extended set of fit criteria including g%AIC and g%BIC. Further, a broader range of asymptotically most powerful fit criteria of the power divergence family, which includes maximum likelihood (or minimum G^2) and minimum X^2 modeling as special cases, are used to replace the sum of squares fit criterion within the g%SD criterion. Results are illustrated with a set of genetic distances looking particularly at a range of Jewish populations, plus a genomic data set that looks at how Neanderthals and Denisovans are related to each other and modern humans. Evidence that Homo erectus may have left a significant fraction of its genome within the Denisovan is shown to persist with the new modeling criteria.

Link

Tuesday, 4 December 2012

Disentangling the histories of mtDNA haplogroups M1 and U6

mtDNA haplogroups M1 and U6 are often mentioned in terms of Eurasian back-migration in Africa. The former is the only clade of the Asian haplogroup M which occurs in Africa at all; the latter is the only clade of the West Eurasian haplogroup U that does the same. These haplogroups also tend to co-exist in North and East Africa, although they are largely absent in sub-Saharan Africa. Different ideas have been offered for their occurrence, including a "Paleolithic" spread or a more recent one associated with the spread of Afroasiatic languages.

The new paper offers useful new data on this debate. The most important conclusion is that despite their oft-mentioned association, these two haplogroups appear to have distinct histories. One argument for this is their separate geographic distribution:


M1 (on panel A) is much more common in Northeast Africa and the Near East (including the Caucasus), whereas U6 (panel B) is more confined in Africa, and has its stronger peak in NW Africa, being rare in NE Africa.

An interesting aside, is that all the mysterious M1 from the Caucasus belongs to subclade M1a, while the smaller M1b clade tends to co-occur with M1a in other parts of Africa and the Near East. This indicates a founder effect for the origin of Caucasian M1a, but leaves open the issue of the immediate origins of M1. Hopefully it will become possible to place this haplogroup within the broader M phylogeny in the future.

The Bayesian skyline plots also contrast M1 and U6 in terms of their demographic histories:



The authors argue that these histories are inconsistent with either a very early dispersal history with the Dabban industry, as well as a more recent spread with Afroasiatic. From the paper:
The transition from the Middle Palaeolithic to Upper Palaeolithic in North Africa is characterised by the appearance of the “Dabban”, an industry that is restricted to Cyrenaica in northeast Libya and represented at the caves of Hagfet ed Dabba and Haua Fteah [19]. Whilst a techno-typological shift occurred within the Dabban ~33 KYA [19], starker changes in the archaeological record occurred throughout North Africa and Southwest Asia ~23-20 KYA, represented by the widespread appearance of backed bladelet technologies. The appearance of these backed bladelet industries more or less coincides with the timing of the Last Glacial Maximum (LGM) (~23-18 KYA), including: ~21 KYA in Upper Egypt [20]; ~20 KYA at Haua Fteah with the Oranian [21]; the Iberomaurusian expansion in the Jebel Gharbi ~20 KYA [22]; and the first Iberomaurusian at Tamar Hat in Algeria ~20 KYA [23]. The earliest Iberomaurusian sites in Morocco appear to be only slightly younger ~18 KYA [24].
A disassociation of these haplogroups from the UP in North Africa might be consistent with my idea that the UP was in part a cultural revolution that spread not only with people, but often with ideas across a species that already had the "biological machinery" for behavioral modernity and was already established in both Africa and the Near East.

As for the connection to Afroasiatic, the authors detect a linguistic correlation with M1a, which, however, appears too old to have been involved directly in the spread of this language family:
Concerning haplogroup M1 individually, a significant correlation with languages was observed. Furthermore, within M1, it appears that the correlation is mostly due to M1a. However, given the small sample size of M1b, any potential signal correlating with language might not be detectable. Interestingly, M1a has a likely East African origin, but its coalescent age of ~21 KYA still largely predates that of the proto-AA. Maybe a sub-clade of M1a would still give a similar correlation, but there are not sufficient samples to allow splitting M1a into its various sub-clades, and to test for a correlation. Although we found a correlation, limited sample sizes do not allow drawing unambiguous connection between genes and languages. Furthermore, it is also possible that this putative sub-clade of M1 does not testify for the expansion of AA speaking people, but was already present among the people who inhabited the area before the spread of the AA languages.
Personally, I am in favor of an East African origin of Afroasiatic, as this makes sense of various lines of evidence, one of which is the African shift of the "Southwest_Asian" component that is modal in Semitic populations. I envision that M1 was geographically circumscribed in a NE African population after its much earlier arrival from Asia and piggy-backed onto the expansion of Afroasiatic speakers, thus explaining the observed correlation. A good analogy would be with the expansion of, say, haplogroup H in the Americas which piggybacked on the European colonization, even though the coalescence age of H predates the arrival of Europeans in the New World by many millennia.

BMC Evolutionary Biology 2012, 12:234 doi:10.1186/1471-2148-12-234


Divorcing the Late Upper Palaeolithic demographic histories of mtDNA haplogroups M1 and U6 in Africa

Erwan Pennarun et al.

Abstract (provisional)
Background
A Southwest Asian origin and dispersal to North Africa in the Early Upper Palaeolithic era has been inferred in previous studies for mtDNA haplogroups M1 and U6. Both haplogroups have been proposed to show similar geographic patterns and shared demographic histories.

Results
We report here 24 M1 and 33 U6 new complete mtDNA sequences that allow us to refine the existing phylogeny of these haplogroups. The resulting phylogenetic information was used to genotype a further 131 M1 and 91 U6 samples to determine the geographic spread of their sub-clades. No southwest Asian specific clades for M1 or U6 were discovered. U6 and M1 frequencies in North Africa, the Middle East and Europe do not follow similar patterns, and their sub-clade divisions do not appear to be compatible with their shared history reaching back to the Early Upper Palaeolithic. The Bayesian Skyline Plots testify to non-overlapping phases of expansion, and the haplogroups' phylogenies suggest that there are U6 sub-clades that expanded earlier than those in M1. Some M1 and U6 sub-clades could be linked with certain events. For example, U6a1 and M1b, with their coalescent ages of ~20,000-22,000 years ago and earliest inferred expansion in northwest Africa, could coincide with the flourishing of the Iberomaurusian industry, whilst U6b and M1b1 appeared at the time of the Capsian culture.

Conclusions
Our high-resolution phylogenetic dissection of both haplogroups and coalescent time assessments suggest that the extant main branching pattern of both haplogroups arose and diversified in the mid-later Upper Palaeolithic, with some sub-clades concomitantly with the expansion of the Iberomaurusian industry. Carriers of these maternal lineages have been later absorbed into and diversified further during the spread of Afro-Asiatic languages in North and East Africa.

Link

Thursday, 16 August 2012

Neandertal STAT2 haplotype in Eurasians

Two recent papers have argued that African population structure or late Middle Paleolithic/Upper Paleolithic Neandertal admixture have contributed to the finding that Non-Africans appear to be a few percent more similar to Neandertals than Africans are across the genome. I would add that modern human admixture in the Vindija individual remains a distinct possibility.

What percentage of the ~3% Eurasian excess can be accounted by each of these three processes? The jury is out, and we won't find out until someone decides to tackle the problem comprehensively and/or new ancient DNA samples become available to inform the discussion. African population structure cannot be discounted, and intriguing new evidence may appear thanks to ancient DNA analysis.

But, there is a different approach to detecting Neandertal admixture that zeroes in on specific genomic locations and dissects them in great detail. This single-region approach provides evidence for admixture, without necessarily arguing about how extensive it was.

The single-region dissection was previously used in the Hammer lab to identify the first very convincing evidence for archaic admixture in Africans and Melanesians. In a new paper, Mendez et al. identify a small region in chromosome 12 that shows evidence for archaic introgression from Neandertals, or a species closely related to them.

But, it is worthwhile to begin with a list of other Neandertal introgression candidates from the literature:

Thus far, only a handful of loci have been hypothesized to have entered the human gene pool through archaic admixture and positive selection, including MAPT (MIM 157140),5 MCPH1 (MIM 607117),3 and particular alleles at the HLA locus (MIM 142800, 142830, 142840).6 However, analysis of the Neanderthal genome failed to provide evidence of introgressive alleles at the former two loci.1 Because of its role in fighting pathogens, HLA presents an instance where it is relatively easy to conceive of an a priori reason that acquisition of an archaic Eurasian HLA allele would benefit human ancestors, especially as they expanded into new habitats.7 However, the fact that HLA haplotypes are known to exhibit transspecific polymorphism and show evidence of strong balancing selection 8,9 increases the probability that similarities between modern and archaic haplotypes are due to ancestral shared polymorphism (i.e., as opposed to archaic admixture). In addition, the SNPs tagging the main HLA haplotype that was said to have introgressed were not observed in the Denisova or Neanderthal draft genomes. 
So, what lines of evidence support the notion that the new STAT2 haplotype is the "real deal"?
First, N matches the Neanderthal sequence at all 18 sites that fall within the resequenced 8.6 kb STAT2 region and have Neanderthal sequence coverage (Table 1). Second, N lineages are broadly distributed at relatively low frequencies in Eurasian populations (Figure 3) and are not observed in sub-Saharan African populations (Table S6). Third, the N haplotype extends for ~130 kb in West Eurasians and up to ~260 kb in some East Asians and Melanesians, producing much stronger LD than that observed in sub-Saharan Africans.

...

Given that the N lineage and the reference sequence diverged ~600 kya, these results suggest that population structure has influenced the recent evolution of this locus. Balancing selection alone is not expected to maintain this extent of LD and consequently is not sufficient to explain these patterns. Moreover, although a strong bottleneck could generate extended LD similar to the levels we observe near STAT2 in non-Africans, it would not explain why the N lineage went extinct in Africa (i.e., why the SNPs associated with the N lineage in non- Africans were not observed in sub-Saharan Africans that are part of our WGS or public SNP panels).

...

We point out that although a recent common ancestry between a human lineage and Neanderthal sequences might indicate gene flow between Neanderthals and modern humans, this information alone does not inform us about the direction of gene flow. With the additional evidence of the observed extent of LD in modern human sequences, it is possible to infer that the N lineage introgressed into modern humans (either from Neanderthals or another archaic source that contributed to both Neanderthals and AMH).
Actually, the N haplotype is observed in North Africa, but this might be due to relatively recent back-migration. One might also argue that a recent bottleneck in a Eurasian population generated the high degree of LD, and the N haplotype was lost in a back-to-Africa migration, or North-to-Sub-Saharan Africa migration. But, that would not seem to explain how the deeply divergent lineage persisted in the North African population of proto-modern humans for such a long time; the evidence for recent common ancestry of N with the Neandertal haplotype would argue against incomplete lineage sorting (=inheritance of related forms of the haplotype from before the modern-Neandertal divergence).

All in all, this probably represents the best evidence for Neandertal-to-modern introgression to date. As full genomes of different human groups become available, it will be possible to automate this analysis and pick off other such strong signals. This may not indicate the level of admixture, but it might provide strong evidence against the idea of reproductive isolation between modern humans and Neandertals.

It is also noteworthy that this is barely consistent with the coastal migration theory with respect to the origin of Australo-Melanesians, because humans trekking along the coast would not have the opportunity to admix with Neandertals who are completely unattested there in either their physical, or archaeological (Mousterian) form.

But, it is consistent with my Out-of-Arabia theory. Australo-Melanesian Y chromosomes belong to the CF clade of the phylogeny. I have speculated that the post-70ka climate crisis in Arabia spurred some human groups to escape north (CF), and others to remain south (DE). The latter eventually gave rise to the major African lineage, heading west (E), as well as a relic Asian lineage heading east (D) that was later inundated by the descendants of CF. If Australo-Melanesians are descended from the CF folk who went north out of Arabia, then they too would have had the opportunity to admix with Neandertals in the Near East.

The American Journal of Human Genetics, Volume 91, Issue 2, 265-274, 10 August 2012

A Haplotype at STAT2 Introgressed from Neanderthals and Serves as a Candidate of Positive Selection in Papua New Guinea

Fernando L. Mendez, Joseph C. Watkins and Michael F. Hammer

Signals of archaic admixture have been identified through comparisons of the draft Neanderthal and Denisova genomes with those of living humans. Studies of individual loci contributing to these genome-wide average signals are required for characterization of the introgression process and investigation of whether archaic variants conferred an adaptive advantage to the ancestors of contemporary human populations. However, no definitive case of adaptive introgression has yet been described. Here we provide a DNA sequence analysis of the innate immune gene STAT2 and show that a haplotype carried by many Eurasians (but not sub-Saharan Africans) has a sequence that closely matches that of the Neanderthal STAT2. This haplotype, referred to as N, was discovered through a resequencing survey of the entire coding region of STAT2 in a global sample of 90 individuals. Analyses of publicly available complete genome sequence data show that haplotype N shares a recent common ancestor with the Neanderthal sequence (∼80 thousand years ago) and is found throughout Eurasia at an average frequency of ∼5%. Interestingly, N is found in Melanesian populations at ∼10-fold higher frequency (∼54%) than in Eurasian populations. A neutrality test that controls for demography rejects the hypothesis that a variant of N rose to high frequency in Melanesia by genetic drift alone. Although we are not able to pinpoint the precise target of positive selection, we identify nonsynonymous mutations in ERBB3, ESYT1, and STAT2—all of which are part of the same 250 kb introgressive haplotype—as good candidates.

Link

Friday, 20 July 2012

Redating of the Early Upper Paleolithic site of Riparo Mochi (Italy)

There are two possibilities on how the early Aurignacian entered Europe. According to one hypothesis, its bearers followed the Danube, which formed a natural corridor into the heartland of the continent which was, at the time, thickly forested. A different hypothesis is that the early Aurignacian entered Europe via the Mediterranean. Distinguishing between the two hypotheses depends on obtaining reliable chronological estimates for the Mediterranean and Central European Aurignacian

A recent dating of a site in the Swabian Jura suggested that the Aurignacian was earlier attested in Central Europe. But, another paper in the Journal of Human Evolution examines meticulously the sequence in the Moch rockshelter and finds that it is just as early.
Comparisons with dates for other Upper Palaeolithic contexts outside Italy suggest that the date of the Protoaurignacian of Mochi compares closely. In Fig. 9a the start boundaries for the earliest Aurignacian evidence at the sites of Geissenklösterle (Germany), Abri Pataud and Isturitz (France) are compared to the start boundary for unit G in Mochi. The first two sites were dated recently in Oxford with reliable methodologies (Higham et al., 2011; Higham et al., in press) while for Isturitz only a small number of dates exist for the earliest Upper Palaeolithic (Szmidt et al., 2010). This comparison reveals that the lowermost Aurignacian levels at Geissenklosterle (AHIII) and Isturitz (C4d) date to the same period as Mochi G, at around 42.7-41.5 ka cal BP (68.2%). The earliest Aurignacian of Abri Pataud dates slightly later to around 41e40 ka cal BP (68.2%), but the assemblage there has always been considered more evolved, so this is not surprising. No Mousterian dates are included in any of these calculations, therefore the start boundaries in the Bayesian models are not well constrained at their earliest end. What is interesting is that there appears to be a close similarity between the dates for the Protoaurignacian and Early Aurignacian sites in Germany on the Danube and on the Mediterranean coast. This might suggest a rapid dispersal of both variants of the Aurignacian across Europe at c. 44-42 ka cal BP.
It does appear that the Aurignacian was a continent-wide punctuational event in Europe which occurred in the middle to late 40 thousands ka cal BP.

Either there were two streams into Europe (Danubian and Mediterranean), or one stream that quickly inundated much of the continent. Given that the argument for the Danubian Corridor is partly related to the ease of access it provided, it is difficult to imagine how the people who followed it would quickly stray far from it all the way to Italy. Overall, it does appear that there were multiple streams into Europe, and perhaps new research in the Balkans, Eastern Europe, and West Asia, may help us trace the earlier predecssor of these streams before they followed their separate ways into Europe.

Journal of Human Evolution DOI:10.1016/j.jhevol.2011.11.009

A new chronostratigraphic framework for the Upper Palaeolithic of Riparo Mochi (Italy)


Katerina Douka et al.


The rockshelter of Mochi, on the Ligurian coast of Italy, is often used as a reference point in the formation of hypotheses concerning the arrival of the Aurigancian in Mediterranean Europe. Yet, the site is poorly known. Here, we describe the stratigraphic sequence based on new field observations and present 15 radiocarbon determinations from the Middle Palaeolithic (late Mousterian) and Early Upper Palaeolithic (Aurignacian and Gravettian) levels. The majority of dates were produced on humanly modified material, specifically marine shell beads, which comprise some of the oldest directly-dated personal ornaments in Europe. The radiocarbon results are incorporated into a Bayesian statistical model to build a new chronological framework for this key Palaeolithic site. A tentative correlation of the stratigraphy to palaeoclimatic records is also attempted.

Link