DNA Testing & Genealogy

DNA Testing: Overview

The root meaning of DNA testing today, means reading out portions of the unique genetic code which each of us inherits from our parents. Reading out the whole is usually called, misleadingly: “mapping the genome”—misleading because because the end product is nothing but a set of meaningless letters. The meaning comes in a bit at a time from painstakingly correlating tiny patches of a small portion of this code (the 5% or so that constitute the genes) with the development and manifestation of interesting traits. Thus, in time, the genetic part of the genome might truly be mapped in a general way onto the observed characteristics of species, or even at the detailed level of particular organisms. Those are the goals of classical DNA testing.

In parallel with the grand goal of mapping the genome, other more limited, but also more focused kinds of DNA testing have arisen. For example, where the testing is thorough enough to identify an individual uniquely, or at least to identify him and his closest blood relatives in a way which distinguishes them from all others on the planet, it can have important forensic applications, for example, in conclusively establishing paternity. And a set of tests performed on an individual DNA sample can, if extensive enough, establish a unique genetic fingerprint, and be used as such to place a suspected perpetrator at the scene of a crime, as with the O.J. Simpson evidence.

A different, though overlapping, kind of DNA testing aims to determine whether a set of tested males probably have a common ancestor within the purview of genealogical research.

Genealogy is concerned with working out the ancestries of people alive today, or, more broadly, on reconstructing the tree of descent from a common ancestor of a set of lineage cousins. The tree metaphor might, if we please, be extended to the whole human race, because it can be shown that all males descend from a single Adam, though the existing evidence instructs us that this forefather of us all was very far from being the first male of the homo sapiens species. Or, since man is a social animal, distinguished from the other animals perhaps in more than any other way by the elaborate transmissable body of knowledge and values he shares with his tribe (in a word, “culture”), the descent of the human race may also be conceived in ethnographic terms. But ethnographic family trees are of a scale that far exceeds the scope of any genealogical project.

The kind of DNA testing we are primarily concerned with here as would-be “genetic genealogists” has a far narrower focus: estimating the time back to a common paternal ancestor of two males. And its practical scope is confined to a period that one might call “genealogical time”—the period in each culture since written records began to be kept that document the lives of ordinary men, by name. Thus, genealogical time is roughly coincident, at least in Western cultures, with the time period since hereditary surnames came into general use—usually thought to compass the period 1350-1540 in England, for example. To understand how and why DNA testing can be used to predict that two men have a common ancestor who lived not too many centuries ago, we need to review some of the basics of human DNA.

The Basics of Human DNA

In order to explain DNA testing for genealogical purposes, it is first necessary to review some of the basics of human DNA, and it’s replication to produce offspring. Some of the DNA-related terms found in the following sections, and throughout this website, are defined in the glossary in the left column of this page.

Each of us has developed from a single cell containing our unique DNA blueprint. This DNA is organized into 23 paired “chromosomes”, one chromosome of each pair coming directly from the father, and one from the mother. Every cell in our body contains an exact copy of this complete genetic blueprint, except that when we produce germ cells for replication (sperm for men, eggs for women), our separate parental parts mix and recombine in a new and unique way for each sex cell we produce.

22 of these 23 chromosomal pairs are called “autosomal”, and each consists of matching paternal and maternal parts, perfectly aligned. DNA is a blueprint for producing the proteins of life, and two matched, but differing, versions of each gene sets up a genetic competition for determining the offspring’s characteristics, which results in some wins for the father, some for the mother, and a large proportions of compromises.

The remaining chromosomal pair, called the “sex chromosomes”, works quite differently. Instead of a matched pair, we find, at least in males, an odd couple: an X chromosome inherited from the mother, and a runty Y chromosome from the father (I prefer to style these “xChromosome” and “yChromosome”, just as I refer to ySTR, rather than the more conventional “Y-DNA” or “Y DNA”). The yChromosome contains fewer than 100 genes, only 9 of which match to those of the female xChromosome, and most the remaining genes and the rest of the yChromosome are concerned with the regulation of the developmental process that produces the male variation from the standard (default) female genotype.

The XX female, like the XY male, inherits one xChromosome from her mother, and the other from her father’s mother, so there is plenty of genetic competition between her two sex chromosome genes. However, since the male yChromosome fails to match up to most of the xChromosome, the male inherits most of his mother’s xChromosome genes as is. This can cause problems where the mother transmits a recessive genetic abnormality from one of her xChromosomes to a son; such an abnormality is hemophilia, which rarely occurs in women, but for those who are carriers, crops up in half of their sons. And of course those other genes on the yChromosome that have no xChromosome counterpart also operate to make men more exceptional. Interestingly, even with the autosomal chromosomes, women’s DNA recombines in a much more homogenized way than for men’s, keeping females much closer to the norms of the species, while males, more prone to extreme differentiation, may be considered nature’s experimental sex.

The Premier benefits of testing male yChromosome DNA (yDNA)

The two kinds of yDNA tests that are most valuable for genealogical purposes are: (1) testing of the ySTR areas of the male-only yChromosome; and (2) extensive sampling of the yChromsome for ySNP point mutations.

The first type of tests ever offered commercially for genealogical purposes were the ySTR tests purveyed by Family Tree DNA beginning in the year 2000, and FTDNA’s ySTR tests are still be far the most cost-effective way to put one’s genealogical research on a sound footing.

What matters most for our present purposes about DNA transmission from one generation to the next is that the yChromosome replicates virtually unchanged down the male paternal line. Current models of population genetics hypothesize that all men descend either from a single Adam, or at least a very small set of original progenitors, and women too have their Eve(s). But if all men descended from the same man, and if the yChromosome never changed at all, then all men would have identical yChromosomes and there would be nothing to be learned from testing. Fortunately for genetic genealogists, mutations creep into the germ cells, or occur during the replication process, and it is these mutations that produce the variations that yDNA testing measures.

As it happens, in Western societies, and in many other cultures as well, surname runs with the paternal line, and since tracking surnames is the main preoccupation of genealogists, ySTR testing fits perfectly into their epistemological paradigms. If we test the ySTR of two males with the same surname and find that they are very closely matched, we have strong positive confirmation that they descend from a common ancestor of their patrilineage, while otherwise we may say that although they share a surname, they are probably no more likely to have a common ancestor than if one of them was surnamed Jones, and the other, Smith.

But ySTR can tell us more than just that two males do, or do not, have a common ancestor within the genealogical research horizon (the period since records of individuals began to be kept). Starting from the premises that ySTR is highly stable from generation to generation, but subject to change over very long stretches of time, and at a statistically predictable rate, the differential number of mutations that have accumulated between two tested male yChromosomes (the genetic distance) can serve as a kind of generational clock, measuring the time (in generations) back to their most recent common male ancestor—quite analogous to the archaeologists’s tool, radiocarbon dating. This estimate is called “TMRCA”.

The sensitivity of the clock depends on the average mutation rate at tested marker sites, and across the genome, these occur at a rate that ranges from 1 per billions of generations, to 1 per several hundred generations. The fastest mutations occur in stretches of DNA called microsatellites, and the most rapidly mutating microsatellites are those on the yChromosome, which are known as ySTRs (yChromosome Short Tandem Repeats). However, even with ySTRs that mutate at a rate of once every several hundred generations, it is still obviously necessary to test many of them in order to generate standard sets of markers that change within the narrow time span of genealogical time (roughly the last 400-1000 years). These standard sets of markers are called haplotypes.

It follows from this, that the more markers tested, the more mutations likely to show up, and thus the more finely calibrated would be the resulting TMRCA-measuring generational clock. However, it’s a bit more complicated than that because not all ySTR markers are created equal. In recent years, widely varying mutation rates have been observed across these markers, with some of them running at a mutational rate of 10 times those of the stodgiest ones. Thus, the average mutation rate across the various ySTR marker panels offered by the half-dozen or so ySTR testing companies, are at least as important as the number of markers tested. At present, the best “bang for the buck test” panel, and the one most useful for identifying patrilineage relationships, is the FTDNA 37-marker panel.

Incidentally, the marker sites sampled for the ySTR tests do not involve genes, per se. If they did, they might be subject to natural selection bias that would reduce the predictability of their mutation rates. Only about 5% of the genome actually codes for the genes that define our unique traits. The purpose and function of the rest of the genome, often called "junk DNA", is largely unknown at present.

The Benefits of yDNA Testing: Identifying, or Confirming, Your Patrilineage

The term “patrilineage” generally means the set of (male) patrilineal descendants of a common patriarch. However, it can reasonably be appropriated for purposes of ySTR DNA-based genealogy to mean the set of patrilineal descendants of a patriarch who lived within the period of genealogical time—usually the time since the patrilineage first came to be identified in the records by a particular hereditary surname, or family name.

However, I would argue that the term “genealogical time ”should be loosely construed to mean the mere possibility of identifying a remote patrilineal patriarch, either by name, or by fixing him in time and place. Thus, the DNA-identified Uí Néill group - the descendancy of the semi-legendary 4th century Irish patriarch, Niall Noigiallach (“Niall of the Nine Hostages”), might be considered a patrilineage in the DNA-based genealogical context, even though there seems little or no prospect of ever tracing a particular genealogical line back to him.

I advocate for this loose construction of “genealogical time” and “patrilineage” because knowing something about the ultimate DNA-based patriarch of a patrilineage, or about the hereditary surname that he bore, can provide clues or leads to records-based genealogical investigation. Thus, if one’s haplotype falls within the broadly construed patrilineage of the Uí Néill clan, the chances are very high that an ancestor who lived during the period when surname-based records were kept can be traced back to the small set of counties in northern Ireland or southwestern Scotland where that clan once prevailed, and where as many as 15% of the population falls into this broad patrilineage.

I have deliberately chosen an extreme example to illustrate the benefits of broadly construing patrilineage, and in general the concept should be restricted to cover just the period when hereditary surnames, or at least unique bynames, began to come into general use, raising at least the possibility of tracing one’s particular ancestor back through records. And because the advent of hereditary surnames was so disjointed, and so variable from country to country, it is almost the usual case that patrilineages thus broadly construed will comprise multiple surnames. Indeed, patrilineal descendants of the Uí Néill clan bear scores of different surnames, though in the more typical case, where classical NPE (Non-Paternity Events) have occurred since the adoption of a permanent hereditary surnames, one can expect at least several different surnames but not as many as 10.

In classifying haplotypes by patrilineage I therefore give “genealogical time ”a broad construction to bring to one’s specific, records-based, genealogical research as much of the historical context as may be useful. Thus, for England, where permanent hereditary surnames began to appear in the 12th century, became moderately common in the 13th, and were the norm by about 1400, I allow patrilineages to run back as far as the 12th century or so, even though the 2009 King & Jobling study, “Founders, Drift, and Infidelity: The Relationship between Y Chromosome Diversity and Patrilineal Surnames” has given us reason to suppose that many English surname lines are shallowly rooted, going back only a few hundred years. Even in such cases, given the remarkable tendency of English families to remain rooted in the same local area over many centuries, there are excellent prospects of uncovering local records that document the occurence of NPEs, as when a man inherited landed property through his wife and changed his surname accordingly. By the same token, where a deeply rooted (broadly construed) genealogical patrilineage bundles together several surnames into the same DNA-patriline, the less-common surnames that may be known to have orginated in certain areas may useful focus one’s research on the more common (and more recent) surnames of the same patrilineage.

In any patrilineage project based on ySTR haplotypes it’s important to recognize that the immediate focus of the project must be, not on the ultimate patriarch who first assumed a particular hereditary surname but on the actual MRCA (Most Recent Common Ancestor) of the particular set of tested haplotypes. However, over time, as new and more far-flung cousins (perhaps of different surnames) are tested and brought into the fold, the project MRCA may be pushed farther back into the shadowy realm where records begin to fail. My argument is that this should be regarded, not as a calamity, but as an opportunity that may open up fruitful new research territory.

The Benefits of yDNA Testing: Disconfirming Your Patrilineage

ySTR testing can be expected occasionally to disconfirm membership in an expected patrilineage group. Although a disruption of one’s established assumptions or conclusions may seem to be a negative outcome of ySTR testing (and it is certainly likely to be at least mildly disturbing), the possibility of such unexpected results actually represents one of the chief benefits of ySTR testing. Where a fair amount of research has been done on a lineage, and testing is merely confirmatory of membership in an expected patrilineage, it may be reassuring, but it adds little beyond that to what one already knows. On the other hand, if one suddenly finds oneself barking up the wrong ultimate surname tree, and especially if FTDNA reports patrilineage cousin matches to people of other surnames, the unexpected disconfirmation is likely to set one off on new and productive research tracks in the direction of one’s true patrilineal ancestry.

What needs to be done in such cases is to carefully scrutinize each ancestral link working backwards, not excepting even the most recent links. With less than comprehensive and exhaustive research, it’s all too easy to misidentify a well-documented ancestor with the right name as one’s own, overlooking a more obscure namesake who has never been thoroughly researched, and whose scant records have yet to be captured in published indexed abstract collections, or which may no longer exist.

And even where the records are more or less complete, and the research has been impeccable, one’s ancestral line may have been interrupted by an NPE (Non-Paternity Event). An NPE may be due to an adoption, an out-of-wedlock birth, or perhaps just an elective name change, none of which are likely to have left a paper trail, whether due to inadvertence, or simply to a disinclination to publicize an embarrassing family or personal event.

As Shakespeare observed (in the Merchant of Venice, 2.2.85) “It is a wise father that knows his own child”, and one might update this to “It is a wise child that knows his own father—without DNA testing”. The silver lining here, though, is that with yDNA testing and dedicated research it is possible to know one’s own father, or at least to reduce the scope of ones paternity or ancestry to a narrow compass: not even a perfect match between two haplotypes can conclusively rule out the possibility that an NPE has occurred within a narrow patrilineal family circle—for example, a child might be fathered by a man everyone thought was his uncle, who would almost certainly have ySTR identical to the man everyone thought (mistakenly) was his real father. I’ve recently heard of such a case that turned up in one of the current DNA surname projects.

Discovering that there must have been an NPE somewhere up the patrilineal ancestral chain doesn’t, of course, mean that one has lost a surname: rather on has gained another to research. And at the point where the NPE occurred, it shows that there was probably a close connection and association between two different surname lineages; thus the family story opens up to a larger scope.

Disconfirmation of one’s expected patrilineage may not be indicative an NPE at all. It may indicate instead that, although your surname line runs true back at least as far as anybody has researched it, your own research may have gone off on a tangent at some generational juncture: that you made a fault attribution of one your JONES surnamed ancestors to a JONES who belonged to an entirely different patrilineage. This too, can be somewhat disturbing, at least for those who are irrevocably wedded to their established opinions, but such people can only by courtesy be called genealogists. Genealogy, family history, like all forms of history, is inherently an open-ended inquiry into the shadowy past, and even the best founded evidential arguments never attain to proof, in its stricter mathematical or logical senses.It’s all too common, indeed almost the norm, for genealogists to make this kind of error of attribution at some point up the ancestral chain where the evidence becomes scant and the research is less than comprehensive and exhaustive. Virtually the only ones of us who avoid this kind of error are those who rein in their speculation when the evidential background becomes fuzzy, or at least scrupulously avoid publishing their speculations without appending question marks to differentiate them from well-founded conclusions.

I’m speaking here of the case where your reported ySTR DNA matches are all or mostly to others with the same surname, but where they belong to a different patrilineage of that surname than the one you thought you belonged to. As noted in the opening paragraph, this kind of disconfirmation can turn out to be much more valuable than confirmation by redirecting your research into the right directions, and differentiating your particular surname patrilineage from certain other prominent patrilineages with which it might easily have been confused.

And in the more usual case where you have matches to others of your surname, this can give you a leg up in researching your newly discovered surname patrilineage, provided that one or more of your FTDNA-reported matches: (1) responds to your inquiries; and (2) has done a lot of solid research that s/he is willing to share. If yDNA testing unexoectedly shows that you've been barking up the wrong surname patrilineage tree, making contact with other genealogists who are your real patrilineage cousins can allow you to piggyback onto their work, and promote collaboration with them to attain the mutual goals of reconstructing your overall patrilineage tree back to your common patriarchal ancestor.

Unfortunately, only a minority of reported matches meet both of these criteria, for a variety of reasons: their email address may have lapsed; they may be too busy or inherently disorganized; they may not have done much research, hoping that DNA testing would provide them with a magical shortcut to knowlege that can only in the end be attained by research; or they may simply have ceased responding because doing so has brought them no benefits in the past, their own attempts at correspondence with matches having failed for any and all of the above reasons.

All of these impediments to sharing and collaboration can, however, be overcome, and yDNA testing can be turned into a valuable complement to your research, through the organization of DNA surname projects into a set of active standalone genealogical patrilineage projects.

Making the Most of your yDNA tests: Patrilineage Projects

FTDNA has provided little guidance to its customers as to how to take advantage of their test results (or even how to interpret them correctly, but that is another story), and only a basic framework for the surname projects the company has, to its credit, promoted and encouraged. Fortunately, this void has been filled, to widely varying degrees by the host of volunteer FTDNA Surname project administrators who have emerged. And FTDNA’s skeleton framework for surname patrilineage projects has been significantly enlarged by the World Families Network, which provides a much more elaborate framework for yDNA surname projects. Though this canned WFN framework and its formats still leave much to be desired (e.g. haplotype charts are very awkward to examine), and the various potential components of a surname project are left largely undefined and with minimal linkage between them, a large proportion of FTDNA surname project admins have greatly improved their projects by taking advantge of WFN’s facilities, and the last thing I want to do is to discourage others from doing likewise.

Unfortunately, the WFN surname project framework, like FTDNA’s barebones framework, both fail to embody the organization principle which is the key to making the most genealogically of your ySTR DNA test results: matched testees are patrilineage cousins, not surname cousins. Two different patrilineages of the same surname, for example two different DENNISON patrilineages, are inherently no more likely to be related to each other than either of them to any particulary JONES patrilineage picked at random. And what genealogists are focused on are their particular patrilineal ancestries, not all the instances of the surname that their ancestry bears.

As an alternative, there is no reason why an FTDNA Surname project might not be conceived and organized as a mere umbrella for a set of stand alone DNA-differentiated patrilineage projects, each of them constituted by a set of matched patrilineage cousins who may not even all bear the same surname—in fact, with larger patrilineages, it’s rare thay they all do share the same surnmae—NPEs are by no means rare.

It is my contention, based on a dozen years of experience organizing and administering surname-patrilineage projects structured in this way, that patrilineage-focused genealogical projects, predicated as they are on particular ySTR DNA patterns, can add significantly to, and leverage the work of even the most self-sufficient and acccomplished genealogists. And I would also say that the principal value of ySTR DNA testing is to bring together as many as possible of the serious genealogists representing a particular patrilineage for sharing and collaborative research; and that purpose can best be served by the organization of a DNA-based patrilineage project. Moreover, the value of the project to its members depends on two factors: (1) the number of tested members; and (2) the quality and extent of the genealogical research they have arrived at, either through their own efforts or by finding quality published research, for their particular ancestral line.

On the DNA side, the more patrilineage project members who have tested or extended their haplotypes to a particular level, the more likely it is that mutations shared by two or more members will turn up—shared mutations that more often than not mean that those who share them belong to a particular family sub-branch of the extended patrilineage.

The chances of discovering shared markers, like the accuracy of TMRCA estimates, is primarily a function of the number of members who have extended their haplotypes to a particular level, with the depth of their separated lineages being a secondary factor. The standard 37-marker test is usually sufficient to yield some shared mutations given enough haplotypes, but greatly increasing the number of markers tested, ideally to 111, is that much better.

Unfortunately, all or most members must extend to reap the benefits of the additional markers, and that can be expensive. However, a collective synchronized extension effort can be planned to coincide with FTDNA’s traditional December sale, and strategic selection can cut down the number of haplotypes that actually need to be extended in order to identify shared mutations that are far enough upstream to benefit the membership at large. If such upstream markers are found, certain other haplotypes can then be extended on a case by case basis, and with some markers, FTDNA offers inexpensive individual marker tests, that obviate the need for a full extension.

If your ySTR test has resulted in a few reported patrilineage matches (and all the matches that FTDNA reports at 37 markers or better can be considerred patrilineage cousins, regardless of their possibly divergent surnames) the best way to capitalize genealogically on your test results is to organize or join a specific patrilineage project for your line, not just the omnibus project for the surname in general. Any number of patrilineage cousins can benefit from contacting, and collaborating with, each other genealogically, and by forming a patrilineage project, though the opportunities for discovering shared mutations indicative of particular family sub-branches begins to emerge only when there are 5-7 haplotypes to work with, and as always, the more the better. Thus, successful patrilineage projects need to constantly on the lookup for ways to increase their membership.

In fact, so important is the acquisition and testing of new members, particularly of distant cousins of existing members, that it can be worthwhile to focus on a likely brother or patrilineal cousin of a known early ancestor of surname X, try to trace this possible patrilineal relative down to a living male descendant surnamed X, and offer to sponsor a 37-marker test for him. If the test shows that this person is indeed a patrilineal cousin you will have made a significant addition to the genealogical knowledge of your patriline, and also quite likely helped to clarify the mutational pattern that marks your particular family branch. If the test results do not match, you will have eliminated a red herring from genealogical consideration, and contributed to both the genealogical and DNA knowledge of the other surname X patrilineage. Such genealogically-directed pre-emptive ySTR testing is likely to contribute more to your genealogical knowledge than merely extending your haplotype to 67 or 111 markers.

In summary, a focused patrilineage project can bring together the best and most knowledgeable genealogists of the patriline, identify the best published resources, and promote both collaborative research and planned testing projects. Once enough members of a patrilineage project have accrued, mutational patterns begin to emerge that characterize particular family branches, and synchronized upgrades of strategically selected project haplotypes to 67 or 111 markers can be planned to discover more shared mutations. A patrilineage project website can serve as a place to post the ancestral pedigrees of all the members, based on the work of the best and most knowledgeable genealogists for the line, and the internet presence established thereby can be an effective method of attracting additional patrilineage cousins to ySTR testing and project membership.

The ALLEN (I) Patrilineage Project, and the several patrilineage projects linked to the FTDNA DENNISON surname project, exemplify successful patrilineage projects of various sizes.

Principles of yChromosome DNA (yDNA) Analysis

The heading link above is to my eponymous paper on this subject, which provides explication, argument, and examples illustrating the principles, which, for those who don’t want to dig that deep, I summarize thus:

(1) The FTDNA 37-marker test is the one indispensable test for those who want to know for sure to which surname patrilineage they belong; this test is also sufficient to all but rule out the possibility that an NPE (Non-Paternity Event) has occurred in their patriline.

(2) GD comparisons (e.g. 35/37, 105/111) between two haplotypes (and TMRCA estimates based on them) do not provide genealogically useful estimates of closeness of relationship; they are therefore of little or no value in identifying the Closer Cousin Clusters (CCCs), which are genealogically meaningful.

(3) Analysis of mutational patterns across a patrilineage can identify CCCs, but only by means of shared, because inherited, mutations across uncommonly large sets of tested descendants who are quite distantly related to each other.

(4) The identification of and genealogical exploitation of Closer Cousin Clusters depends more on genealogical research than on ySTR DNA testing.

(5) Ultimately, the value of yDNA testing for genealogical purposes depends on the depth and quality of evidence that your patrilineage cousins are able and willing to share.

The Prospects for Additional yDNA Testing

Beyond basic patrilineage classification, the reason we DNA test, or extend the ySTR DNA haplotypes beyond the basic 37 marker set (to 67, or 111, or all the way to the expensive FTDNA BigY test) is to try to turn up mutations. All mutations have some value, at least potentially. Although BigY is expensive ($449 is the standard price as I write) it also provides, besides many more ySTR markers (though many of these are no-calls and most are relative duds) a whole separate parallel channel of mutations—a set of ySNP mutations that are collectively comparable in mutational frequency to the 67 marker ySTR test.

With respect to genealogical inference, mutations are of three kinds:

(1) those that are unique (unmatched) across the current set of patrilineage haplotypes, and which therefore occurred downstream of the most recent common patrilineal ancestor you have in common with any of the other members;

(2) those that are matched, but which occurred independently in your line of descent, and that of the person you are matched to;

(3) those that are matched across two or more haplotypes because they were all inherited from a common ancestor.

There’s a 1-10% chance that any particular shared mutation belongs in category (2) rather than (3).

Only mutations in category (3) can advance our knowledge of the mutational history tree, and therefore provide guidance for genealogical research, but mutations presently in category (1) may turn up in category (3) when new members come into the project with their DNA, or when existing members also extend their haplotypes.

Extending is always a gamble because one never knows what will, or will not, turn up. However, the odds are generally against turning up mutations by extending from 37 to 67, because most of the markers in the 38-67 band are relative duds that rarely mutate. It’s best, therefore, to extend to 111 in one fell swoop.

The value of extending is also roughly proportional to how deep your genealogy goes, and also how divergent your ancestry is from that of other tested members of the project. Thus, there is little point in testing or extending known close cousins; rather, extending the haplotypes of your most remote cousins is the most likely to benefit you and all the other project members.

Finally, the present value of extending is proportional to the number of people who have also extended. Thus, there is no present value for the first person to extend to 111 markers because there is no one else for him or her to match to. On the other hand, when many members with diverse lineages have already extended, extending an additional haplotype has a reasonable chance of turning up mutations in category (3), or at least of contributing to the set of mutations in category (1) that might later turn out to fall into category (3).


Other kinds of DNA Testing: Autosomal Genealogical

The most popular kind of DNA testing for genealogical purposes these days (probably because the tests are the cheapest) is autosomal DNA (atDNA) testing. Here, virtually all of the chromosomes of the subject’s genome are sampled for a kind of mutable variation called a SNP. Because SNP mutations are so rare and infrequent, it’s necessary to sample many hundreds of thousands of SNP sites (compared to just dozens of STR sites in ySTR DNA testing) to produce haplotypes that are distinctive of the test subject’s particular heredity. As with ySTR DNA (yDNA) testing, the purveyors of atDNA tests report haplotype matches between each new test subject and other tested subjects already in their databases, and provide a means for the matching parties to contact each other. They also provide supplementary information and tools for making sense of the comparisons, of widely varying quality. There are at present five companies that offer atDNA tests, and a thorough comparison of their offerings will be found on this ISOGG Wiki page.

The Problematics of atDNA Results Interpretation

As with yDNA testing (covered extensively elsewhere on this page) the purpose of atDNA testing for genealogical purposes is to help you determine just which ancestor(s) you and your matches have in common—or in other words to identify your mutual MRCA (Most Recent Common Ancestor)—or MRCAs, since the two of you may have inherited chunks of DNA from both members of an ancestral couple. Although the test results themselves, properly interpreted, provide estimates as to the generation (if not the specific ancestor) upon which the ancestries of a pair of matches converge, the farther back the MRCA the looser the estimates, and once the 3rd-4th cousin level is reached both matching and generational predictions rapidly start to break down.

The rapid deterioration in atDNA matching the farther back you go is due to a genetic phenomenon called crossover that slices up our inherited DNA into progressively smaller chunks down the generations. In fact, the amount of DNA that you and your matches have in common declines by some 50% for each generation back, and for 4th cousins there’s only about a 1 in 2 chance that their atDNA haplotypes will be reported as a match. Worse, the amount of DNA that 4th cousins typically share can vary so widely that they might present as anywhere from 2nd-7th cousins—or conversely, reported 4th cousins might not be related at all.

Technically, matching is based on comparing strings of half-identical DNA, called HIRs (Half Identical Regions), whose lengths are measured in cMs (centimorgans), and both the total length in cMs of matching HIRs, and the length of the largest matching HIR chunk, are relevant to estimating the genetic distance back to the MRCA. This ISOGG-published table, derived from the research of atDNA expert Blaine Bettinger, provides some statistics for translating the total number of reported cMs into predicted relationships, and it recognizes that relatives more distant than 2nd cousins may have no shared DNA.

This table does not, however, take into consideration another factor complicating atDNA matching: the possibility that a match reported on the basis of total cMs may be a false positive unless there is also at least one DNA chunk that exceeds 10-15 cMs in size (the range of values depends on how conservative your want to be). The problem is that even though all of our DNA is inherited from ancestors, due to crossover it can become so finely sliced and diced that any given chunk could have been inherited from any of hundreds or thousands of ancestors any number of generations back, and so for purposes of identifying a particular MRCA within a reasonable genealogical timespan, many of the smaller matching chunks that feed into the total cMs are just noise. This paper by Blaine Bettinger, based on his own research, documents this serious and generally overlooked problem.

To counter this problem all of the testing companies when reporting matches ignore chunks below a certain threshold cM size, but this threshold varies quite widely from company to company and is typically set to between 5-7cMs. But Bettinger’s study found that 41% of chunks smaller than 10cMs were just noise, and below 7cMs the noise ratio rises to 60%. Testing not only yourself, but also at least one, and preferably both of your parents and analyzing all three sets of test results together using a chromosome browser can help determine which of these small chunks may be significant, but that’s something that few bargain hunting and shortcut seeking atDNA testers do.

The #1 atDNA testing company (at least considering the size of its proprietorial database) is Ancestry, yet Ancestry doesn’t even provide its customers with the detailed data in terms of cMs that’s needed to evaluate the likelihood that a reported match is valid. Fortunately, Ancestry, and all but one of the testing companies, allow customers’ atDNA results to be uploaded to a third party atDNA matching site called GEDMATCH, where the subscriber can set his own minimum cM significance threshold, and anyone who is serious about using atDNA matching for genealogical purposes should do this, and not only for this reason. GEDMATCH (but only FTDNA and MyHeritage among testing companies) also allows the upload of GEDCOMs for displaying ancestral information, so that reported matches have am expeditious way to browse each other’s ancestral trees looking for surnames in common before wasting time trying to communicate with matches who may have little ancestral information to share. Sadly, only a small minority of GEDMATCH participants (or for that matter FTDNA customers), go to the trouble of creating and uploading their ancestral GEDCOM data.

Determining the specific MRCA you share with a reported atDNA match, even if the match is valid and not a false positive, can be a significant genealogical challenge, for a variety of reasons. The usual snap method is for both tested parties to compare lists of ancestral surnames, but as noted above, matches beyond the 3rd cousin level become increasingly problematic, as the size of the matching DNA chunks declines. Mere lists of surnames aren’t enough: it’s necessary also to correlate the degree of closeness of relationship with its predicted degree, and to restrict matching to a reasonable range of not more than about 6-7 generations back from the present, or to be more conservative, 4-5.

Second, every generation back from the present doubles the number of ancestors, and unless both you and your match have throughly filled out the more recent leaves of your overlapping family trees, many potential matches are going to be missed. At the 3rd cousin level, most people have 32 ancestors, and at the 4th cousin level, 64, but only a minority of genealogists are even able to credibly identify all their ancestors back to those points.

Third, one needs to be aware of the possibility of having more than one ancestral couple in common with each of your matches. In pre-1900 America, a large proportion of the population for many generations lived in and married within a small circle of neighboring or “allied families”, or even within their own extended family. It was commonplace for multiple siblings of one family to marry multiple siblings of another, and for descendants of these couples to marry each other, and even first cousin marriages were allowed. As a result of these endogamous cultural practices, thickets of cousins could be generated, and where they’ve continued down to the person tested, his/her own parents might have a MRCA or two in common.

I’ve already reviewed the pitfalls of assuming that matches reported on the strength of their total cM correspondence are valid, and noted that testing one or both parents, and using their specific raw results as a filter in a chromosome browser between your results and those of your reported match is a means of validating the significance of smaller DNA chunks below the 15 cM level. Testing parents is also a way to narrow down the matching possibilities to either the paternal or the maternal line—or both, where the parents themselves have a MRCA within the 4-5 generations back from them.

Contrariwise, I’ve also noted that every generation back doubles the number of ancestors who need to be researched by both matching parties in order to find the overlap. By the same token every generation back threatens to discard or slice up the large matching chunks that support a credible match between you and other fairly close relatives, and a way to address that problem is to test siblings or cousins who may have inherited large chunks that didn’t come down to you; again having parental DNA to work with helps confirm this possibility and to point you in the right direction for identifying the MRCA between your additionally tested sibling or cousin and an otherwise missed match.

I hope that these remarks suggest that while it’s cheap and easy to get your atDNA tested, and gratifying to be notified of hundreds or thousands of potential matches, deriving useful genealogical information from this kind of testing can be another story altogether. In fact, except for matches to close relatives (most of whom you’re probably already aware of), credible atDNA matching can easily devolve into complications and require a significant committment of time, mental energy, and ideally additional money to test your parents (where possible) and/or other close realtives. All this is true also of the other kinds of DNA testing for genealogical purposes, but yDNA testing, even though it may provide only a classification by patrilineage for a single male line important to you, is at least relatively straightforward and produces at least that one unequivocal result that you can take to the bank.

In the end, the value of autosomal testing, as with all DNA testing, is largely dependent on the quantity and quality of the genealogical research that has been done, both by you and by your reported matches. Because we are all likely to already know most of our close relatives—precisely the ones whom we are most likely to obtain credible atDNA matches to—for serious genealogists whose goal is to push back their ancestral tree as far as they can, I strongly recommend that they focus on yDNA testing. The initial Family Tree DNA 37-marker test costs more, but being able to connect with other descendants of the same surname patrilineage whom it would otherwise have been difficult, if not impossible, to establish were related, pays bigger dividends in the end.

atDNA can actually be a worthwhile substitute for or adjunct to the more definitive yDNA testing. Thus, a female genealogist whose paternal line appears to have daughtered out with herself, and who hasn’t been able to find a male relative who bears her father’s surname to yDNA test, may be able to do so by testing her own atDNA and finding a matching male relative to yDNA test (or who has already been yDNA tested) who descends from the same patriline as her father. Or where a patrilineage has already been identified through yDNA testing, testing the atDNA of others who are believed or suspected to share the same patrilineage, can either confirm that and/or provide a more accurate estimate of just where their connection to the patrilineage lies, provided that it doesn’t go too far back.

Other kinds of DNA Testing: Haplogroups & Clades

Another important kind of ySTR testing is the determination of a man’s patrilineal haplogroup. Although of little use for genealogical purposes, the haplogroup can give one an idea of where a man’s remote male ancestors originally came from going back thousands and tens of thousands of years.

Just as haplotype is determined by testing ySTR microsatellites on the yChromosome, so haplogroup is determined by testing SNP (Single Nucleotide Polymorphisms) sites on the yChromosome (or ySNPs). Compared to ySTRs, ySNPs mutate very rarely—so rarely that when a ySNP mutation happens to occur in a particular father-son transmission event, it is considered practically unique, and is therefore sometimes called a UEP (Unique Event Polymorphism), although the chances are that many of these mutations aren't really unique, just so rare that its unlikely a second occurence of one will ever be found.

A Terminological Digression: “Haplogroup”, “Clade”, and “Patrilineage”

As I have explained elsewhere, the collection of ySTR values that constitute a man’s haplotype place him fairly reliably within a particular patrilineage, which I have defined narrowly to mean all the male descendants of a man who lived within genealogical time, or roughly the timespan since a particular hereditary surname came into use for that patrilineage. However, I must confess here to having somewhat hijacked the term “patrilineage” to represent this vitally important genealogical concept. In reality, “patrilineage” as it is generally used, has a wider application, meaning all the male descendants of any arbitrarily chosen male. In fact, since it can be shown that all living males descend from a single male yAdam who lived perhaps 40-60,000 years ago, all living males are ipso facto members of the same patrilineage, but at this point the term ceases to have much value.

This broader sense of patrilineage does help one in understanding haplogroups, though, because the first male bearer of a unique SNP mutation on the yChromosome becomes thereby the patriarch of his own patrilineage, and the founder of a new sub-haplogroup. Except that the terms “sub-haplogroup” and “patrilineage” aren’t much used in this context, but another term is: “clade”, or more often “subclade”. The term “clade” also has a broader meaning, but it is used (and understood, in this deep ancestry context) without qualification as a synonym for branching haplogroups, just as I have used patrilineage without qualification to represent the small recent portion of an ancestry that falls within the scope of genealogical research.

But why, exactly, has it been deemed necessary to bring in the alternate term “clade”, when “haplogroup” is meant (both terms being defined in a special restrictive sense)? I think it’s because what we really need to talk about are the way haplogroups are constantly branching off into subhaplogroups, and “subclades” sounds a little less awkward.

And, for that matter, why do we need the terms “haplogroup” or “clade”, when “patrilineage” (defined with a different scope from my “(genealogical) patrilineage” usage) would do as well? I suppose that it’s because as it is, when we see the words “haplogroup” or “clade”, as used by genetic genealogists, they invoke the deep ancestral context defined by SNP testing, just as “patrilineage”, in my usage, is meant to invoke its specifically genealogical meaning.

Back to the Concept of a Haplogroup or Clade

What’s important is to understand the underlying concepts: there is a single tree of descent from one ancient patriarch to all living men, which branches each time a ySNP occurs—a ySNP that we know about. Each such branch point defines a new subclade (or subhaplogroup—take your pick); thus every living man belongs to a set of nested subclades of an original haplogroup. Haplogrouping is a way of classifying a man’s kinship group from the top down.

Meanwhile, classifying a man into a patrilineage on the basis of a set of ySTR marker values called a haplotype represents the bottoms-up approach. Eventually, as more and more ySNPs are found, these approaches may converge in many cases, but in the meantime it is useful to distinguish them, and this can most economically be done by differential terminology. Thus I reserve the term “patrilineage” for genealogical purposes (implying a reference to ySTR testing and haplotypes), and otherwise refer by preference to “clades and subclades” (implying thereby a reference to ySNP testing and haplogroups).

Although the ascertainment of one’s lowest order subclade requires SNP testing in most cases, membership in a more general clade, or haplogroup, can usually be inferred with a high degree of confidence from one’s haplotype. Thus, the general clade, or haplogroup, for the DENNISON DNA Surname Project Patrilineage 1 group is R1b1a2—perhaps the single most common subclade of the R1b haplogroup, shared by about 65-85% of all men who have British ancestry, depending on where in Britain they live. A haplogroup predictor program for inferring broad haplogroup from haplotype is available online, and besides that, the FTDNA testing company is commited to performing free SNP testing for any of its haplotype customers whose broad haplogroup cannot be inferred with confidence from their haplotype; beyond that, FTDNA and other companies offer detailed SNP testing for a more fully resolved subclade determination.

Progress in this haplogroup classification field has been so rapid that new, more recent SNPs (further articulating the tree) are being added constantly. The best way to keep abreast of new developments is to check the ISOGG Haplogroup Tree from time to time. Even the nomenclature has been changing so frequently of recent years, that the old “Henry System” style of nomenclature, in which one of the more articulated branches on R1b has now become R1b1a2a1a1a4a1a1, is giving way to the more compact (and stable) terminology, R-L237, where the “R” refers to the master haplogroup clade, and the “L237” to the most recent mutation in one of its particular branches.

Thus R1b itself has become R-269, and the DENNISON Patrilineage 1 group mentioned above is now best designated, not as R1b1a2a1a1a, but as U106*, with the SNP mutation at U106 being the defining mutation of this haplogroup subclade, and the “*” meaning that all the currently known SNPs downstream of U106 have been tested and come up negative. If a new SNP more recent than U106 were to be discovered and added to the tree, the U106* designation would have to be changed to U106+, to indicate that at least one additional SNP test remains to be performed.

One’s haplogroup can harbor surprises. One DNA Surname project administrator I know who thought that his surname was of German origin, instead came up with a Norse (Viking) haplogroup, while my Robb genealogical patrilineage, which is clearly Scotch-Irish, turned out to come originally from northern Germany, probably part of the wave of Anglo-Saxon settlement that swept southern England in the wake of the Romans in the 4th Century—although my particular ancestors could have come over many hundreds of years either before or after that period.

All this is quite interesting in its own right, though it takes us far afield from genealogy, per se. However, one thing we may infer from the fact that two people with a common surname have different haplogroups, is that they have no common ancestor for at least thousands of years, and thus can hardly be of the same patrilineage.

Other kinds of DNA Testing: Mitochondrial DNA (mtDNA)

Besides the diploid (from two parents) DNA that lies coiled in the cell nucleus in a doubled helical spiral, there is the mitochondrial DNA in the cell’s cytoplasm. Every cell has mitochondria that both liase with the nuclear DNA and act as the high-volume factories of protein production, using copies of the nuclear DNA for much of their factory plan. Mitochondria, which are thus crucial to the life cycle, also have their own DNA blueprints that are independent of the diploid nuclear DNA, and these are inherited directly from the mother, via the egg cell that plays host to the fertilizing sperm.

Thus, analogous to the patrilineal yChromosome, the same mitochondrial DNA that your mother got from her mother, and so forth, is also subject to mutations, which allows one’s maternal line ancestors to be classified into one of a handful of deep matrilineages descended from a small number Eves who lived several tens of thousands of years ago. Mutations in two tested “hypervariable control regions” have been used to define mtDNA haplogroups, which in turn have been mapped onto the human population dispersion out of Africa. Thus, the general patterning of mtDNA, ascertainable with even a minimal mtDNA test can tell you which branch of the deep ancestral human population tree your mother’s remote matrilineal ancestor got her mtDNA from. Given the known articulation of this tree so far, it’s likely that your matriline diverged from the main trunk of the tree many thousands, or even tens of thousands of years ago. My own mtDNA haplotype, I1a1, is general but fairly rare (about 1%) throughout the Middle East, the Causcasus, Europe to Scandinavia, and extends even into Africa and Eurasia.

However, this form of testing has only limited genealogical value—in fact virtually none, I would say, unless you order the FMS (Full Mitochondrial Sequence) test from Family Tree DNA. FTDNA claims that a perfect match to someone else on your full mtDNA genome predicts a Most Recent Common (matrilineal) Ancestor in common with your match who lived within the last 22 generations (95% confidence interval), and a 50-50 chance that she will have been born within the last 5 Since this places the common ancestor most likely within the span of genealogical time, in principle this test could be used to identify common matrilineal ancestors in the same way that ySTR haplotype tests can be used to determine that two tested males are, or are not, of the same patrilineage.

But there is more to it than that. In the first place, the other numbers you will find in FTDNA’s referenced article are misleading. The range given for the 50-95% confidence intervals, 5-22 generation, doesn’t mean 125-550 years. The average length of a patrilineal generation is about 34 years, and for a matrilineal generation its 29 (see this paper), which would translate to a woman born, say, 1805 (50% chance), or at least one born by 1300 (95% chance).

Worse, few genealogists have been able to trace their earliest known matrilineal ancestor back more than a few generations. My own goes back just 7, to a woman born in 1719 (for an average generational length of 29), and I have been able to trace back that far only because the line runs back through New England, where complete vital records are available almost back to the very beginnings of settlement. In examining some of the mtDNA matches for people in my DNA projects, I happened to notice that the earliest known matrilineal ancestor of eminent genealogist Elizabeth Shown Mills goes back only to a woman born about 1750 in North Carolina, a much more difficult area to research because vital records or their surrogates are so scant. The problem is that there are far fewer records that name women by their maiden names (often just a birth record and a marriage record, if that) than for men, and with female surnames changing every generation, the trail goes quickly cold.

Although the prospects for linking up with others are thus much dimmer for descendants of a common matriline, there is no reason in principle why those with very tight matches (say either 0 or 1 mutational differences at the FMS level of testing) might not form matrilineage groups with posted matrilineal genealogies that if they can be extended deeply enough, might point to a common matriarchal origin.

In theory, mtDNA testing could be useful for certain more modest goals. For example, if there was a question as to whether several children of a family all have the same mother, or where one would like to know which of two wives was the mother of a particular child, the children’s mtDNA could be tested and compared with that of each other or of their mother. However, because the sons don’t pass on their mother’s mtDNA, this type of testing cannot be applied beyond the first generation unless the line of descent from a particular mother is purely matrilineal, i.e. that it runs only through daughter lines. Also, as with patrilineal yChromosome testing, mtDNA testing can only rarely discriminate between, say, a putative mother and her sister. Thus, in the common case where two sisters marry two brothers of another family, one can expect that the mtDNA of all of both sets of their children will be identical.

However, autosomal testing such as FTDNA’s FamilyFinder test, can provide a cheaper and more informative (if slightly less reliable) answers to such question, because siblings, half-siblings, first cousins, etc., each typically share a certain range of DNA-matching percentages.

Beyond these practical genealogical considerations, you will find plenty of material on mtDNA and its mutation process by exploiting the online resources linked to the ISOGG Wiki on mtDNA testing, and at FTDNA’s FAQ on this subject.

Other kinds of DNA Testing: Ethnographic Testing

Still another kind of DNA testing with a wide time horizon, is ethnographic testing, which doesn’t bother trying to construct a mutational tree of descent, but rather simply samples DNA from all over the genome looking for characteristic markers associated with various ethnic populations. This kind of testing, which takes into account all of ones ancestors, and not just those at either edge of the tree (the purely patrilineal and matrilineal lines) has little to offer genealogy, but it does provide an estimate of the percentage contribution of various ethnic groups to one’s overall ancestry. It is thus one way to explore the popular tradition in many American families of Native American ancestry.

However, there are some caveats that come with this kind of testing. First, unless the Native American ancestor, for example is rather recent, there is a significant chance that in the sampling, his/her DNA will be altogether missed. Second, many Europeans whose ancestors have never left the continent also have Native American ancestry in their makeup. How can that be? Because, besides the original east Asians who crossed the Bering Straits during a period when the land bridge to America opened, and thus became “Indians”, many of their own deep ancestors, out of Africa, went north and west to the middle East and Europe instead of east into Asia, and so influenced the DNA on the opposite side of the world.

Thus, like all the other DNA tests, these ethnographic tests too need to be interpreted in the light of other, more conventional sorts of evidence—in this case, the evidence provided by archeaology.

Paternity and Forensic DNA Testing

This type of testing alone depends on no wider context than that of father and son. And because it aims for the maximum degree of certainty, testing both ySTR sites and SNPs, it is conclusive beyond any sane person’s definition of reasonable doubt. Much of the mutation rate literature relied on by genetic genealogists is predicated on paternity test databases, which have the advantage, thus, of completely eliminating the NPE factor. Paternity testing is itself the father of all the other kinds of DNA testing, with their varied purposes, and it is still the “gold standard” for measuring mutation rates; the only problem is that, like gold, paternity test results are relatively scarce, so they need to be fleshed out by data derived more problematically from genealogical DNA databases, applying sophisticated statistical and the mathematics of probability to try to compensate for the many unknowns in the equations.



A Brief DNA Glossary

(genealogical) patrilineageCCC (Closer Cousin Cluster)
GD (Genetic Distance)haplotypehaplogroup,
MRCA (Most Recent Common Ancestor)MHT (Mutation History Tree),
NPE (Non-Paternity Event)RPHTMRCA (Time to MRCA),
ySNP, and ySTR.

These and other terms follow alphabetically.
For a more extensive glossary, see the ISOGG Wiki Glossary

invisible writing

pertaining to the numbered human chromosome, 1-22; all the human chromosomes except the “sex chromosomes”, the yChromosome, and the xChromosome

invisible writing

CCC (Closer Cousin Cluster)
a subset of a patrilineage consisting of two or more ySTR DNA tested cousins who are more closely related to each other than they are to all the other tested cousins of the patrilineage. CCCs are most often determined by their members’ sharing a “defining” mutation or mutational pattern that all the other yDNA tested descendants of the patrilineage lack.
     CCCs may have a nested hierarchical relationship. That is, one CCC may incorporate another plus additional members who don’t fit into the lower level CCC. It may be useful in such cases to speak of the higher level CCC as a MetaCluster.

invisible writing

one of 46 strands of the complete human DNA that constitute the genetic blueprint for each individual, organized into pairs, with one member of each pair inherited from the father, the other from the mother. 22 of these 23 chromosomal pairs are called autosomal chromosomes, while the remaining pair, made up of the xChromosome and the yChromosome, are called the sex chromosomes. Other species have variant numbers of chromosomes. The chromosomes of an organism taken as a whole are called the “genome”.

invisible writing

a (once) living organism and all of its descendants; in the context of genetic testing of the male yChromosome, a common patriarch and all his male descendants.

invisible writing

a process that occurs during the replication of one of a parent’s two chromosomal strands to pass on to the next generation, in which part of the genetic material is taken from the other chromosomal strand instead; since crossover is likely to occur at some point on most chromosomes each generation, over time the segments of DNA passed on from ancestors get smaller and smaller, and eventually frustrate attempts to demonstrate relationship through autosomal DNA testing.

invisible writing

deep clade testing
the testing for particular ySNP values to determine a man’s most specific (closest to the present) haplogroup, also called a clade or subclade.

invisible writing

(genetic) deletion
a gap in the resultant DNA after a flawed copying operation during mitosis

invisible writing

genealogical time
the time period within which genealogical research is possible and practical—roughly coincident with the time since written records began to be kept identifying individuals by name, and especially by fixed hereditary surname.
     Since the time periods when hereditary surnames became general in particular populations has varied widely in the world, the concept of genealogical time is necessarily broadly elastic. However, for Britain, it may be approximated as compassing the period since between 1350-1538 in England, and since 1500-1700 in the Celtic areas of Britain (Wales, Scotland, and Ireland), though in the more rural Celtic areas there were people as of 1800 who still didn’t have fixed hereditary surnames. Many of the British surnames themselves go back much farther, to about the time of the Conquest and before, though only in gentry lines. And in general there’s little prospect of tracing most commoner British lineages back much further than the systematic recording of vital records in the local parish registers—a practice that was mandated by law in 1538, but which in many parishes didn’t get going until the 1600s and beyond.

invisible writing

genetic distance (GD) (in the context of ySTR surname projects)
the number of mutation events that have occurred to a panel of tested ySTR markers in the descent of two male line cousins from their common male ancestor.
     Each generational passing of the male yChromosome from father to son represents a transmission event—an opportunity for one or more mutation events to occur amongst the set of tested ySTR markers on that chromosome, and the GD is a count of the number of mutation events that have occurred down the generations in both male descendants. So, given that the tested markers mutate at a widely varying, but roughly predictable rates, GD provides an estimate of the closeness of the genetic relationship between two male patrilineal cousins.
     Usually, the genetic distance between the ySTR haplotypes of two men is simply the sum of the absolute number marker value differentials (the stepwise mutation model), but a simpler way of measuring GD is to simply count the number of markers that are different (the infinite alleles model), which usually provides a close approximation to the number of mutation events. Markers only occasionally mutate by two or more repeats in a single step: these are called multistep mutations.

invisible writing

said of two humans who share at least one allele value at a particular SNP site. Long consecutive stretches of half-identical sampled SNPs, measured in CM's (centimorgans, which adjust for the variant rates of crossover in different chromosomes) are indicative of a shared descent from a common ancestor. The term HIR is sometimes used to mean half-identical region, whose length may be quantified either in cMs or in the number of SNPs. The principle testing companies at present, 23andME, and FTDNA, consider anywhere from 5-7 cMs (or about 500-700 SNPs) to be the minimum length to be possibly indicative of a reasonably close cousin relationship.

invisible writing

the deep ancestry of a particular individual
The common male ancestor of the members of a ySNP haplogroup usually goes back many thousands, or even tens of thousands of years. Haplogroups should not be confused with the ySTR-based haplotypes that are used for genealogical purposes, where the common male ancestor goes back only hundreds of years.
     Haplogroups have a branching tree structure, dividing meta-groups like R, called “clades”, into “subclades” like R1b, or R1b1a2, with each subclade branch defined by the particular sequence of SNP mutations that have accumulated in the genome of the common male ancestor of members of that subclade. Thus, a subclade like R1b1a2 is defined by the chain of sequential SNP mutations: M173, M343, P25, P297, M269.
     As the haplogroup tree has been progressively articulated over the years, the original Henry Sytem nomenclature for subclades has become increasingly unwieldy. There’s now a subclade of R1b1a2 denominated R1b1a2a1a1c2b2a1a1b2a1. For that reason, this old nomenclature is now deprecated in favor of one that appends to the the first, defining, letter of the human haplotree, the name of the lowest level SNP that has tested positive.
     Thus, R1b1a2 is now preferably called R-M269, and its subclade R1b1a2a1a1c2b2a1a1b2a1 is called R-S3334. Since new SNPs are constantly being found, most people haven’t tested the latest of their line, and this is recognized by designating their haplogroup, e.g. R-M269+, while in cases where all the more recent (subordinate) SNPs have been tested, but come up negative, their haplotype would be designated, e.g. R-M269*.
     For much more about haplogroup classification check out this section.

invisible writing

a set of ySTR/mtDNA marker values associated with a particular individual (haplotypes are only rarely unique)
ySTR marker values (also called alleles) are determined by testing a subset of highly mutable microsatellite sites on the yChromosome called ySTRs.

invisible writing

IBD (Identical By Descent)
obfuscatory jargon for “inherited”, typically used to characterize a particular stretch of DNA that is known to have been inherited from some relatively recent ancestor (and perhaps shared with another descendant), as opposed to the same stretch of DNA that is IBS (Identical By State), meaning simply “identical” between two individuals and not known to have been inherited from a common ancestor.

invisible writing

infinite alleles mutation model
The assumption that each difference between ySTR marker values in a panel of tested ySTR marker values is due to a single mutation, even when there may have been a gain or loss of several repeats. This model of the way mutations work is a considerable simplification of the complex reality of the mutation process, but it provides a reasonable quantitative approximation to it over the period of genealogical time.

invisible writing

a stretch of DNA characterized by multiple repeats of the same 2-6 nucleotide base sequence letters in which the genetic code is written. Miscrosatellites occur throughout the genome, but the ones most useful for genealogical testing purposes are located on the yChromosome.

invisible writing

marker (in the context of DNA testing)
a stretch of DNA whose allele values are sampled as a means of identifying individuals or placing individuals within (deep) patrilineages

invisible writing

MRCA (Most Recent Common Ancestor)

MRCPA (Most Recent Common Patrilineal Ancestor)
These acronyms, whose expansions should be fairly self-explanatory, pertain, respectively, to a set of two or more related descendants, or two or more male descendants related through their common patriline, or patrilineage.
      Since my principle focus in genetic genealogy has been on yDNA testing and the working out of patrilineages, more often than not in my writing I just use the more familiar and inclusive term “MRCA” in lieu of the more precise “MRCPA”—relying on the context to flesh out my meaning.
      In the context of genealogical patrilineage projects whose members have all been shown to be descendants of a common MRCPA through yDNA testing, it’s important to understand that particular subsets of the members may have more recent MRCPAs than the common MRCPA shared by all, and I’ve defined these member subsets as Closer Cousin Clusters.
      Another closely related term is “TMRCA”, but since, unlike FTDNA and many of its customers, I make little use of this concept in my analyses, I haven’t bothered to differentiate an homologous term “TMRCPA”.

invisible writing

multicopy marker
a composite ySTR marker with two or more values whose order cannot be determined from the regular ySTR testing procedure. The most important of these markers is DYS464 which may actually consist of a set of from 2-7 or so repeating values, in a particular order, though the regular test will typically show only a block of four values, conventionally sorted into ascending numbers of repeats. FTDNA offers a special test for DYS464 called the DYS464X test, and where members of a particular patrilineage turn up with different sets of values for DYS464, it’s usually desirable for those with deviant values, and at least one member with the “normal” value to order this test. DYS464, taken as a whole, is by far the single most mutable marker, and therefore the single most valuable, and it is particularly important to ascertain its actual patterns within a patrilineage, when more than one such pattern is indicated.

invisible writing

multistep mutation
a single ySTR mutation that adds or subtracts two or more repeats from a marker. Probably no more that 1 out of 20-30 ySTR mutations that occur are multistep.

invisible writing

Mutation History Tree
is a schematic tree of descent constructed for a set of descendant haplotypes of the same patrilineage that shows when particular mutations within the patrilineage tree of descent occurred, and thus how the tested members of the set are related. Here is a sample mutation history tree.

invisible writing

NPE (Non-Paternity Event)
in Western cultures, an unexpected disjunction somewhere in the paternal ancestral chain between the inherited surname and the inherited ySTR, due to a replacement of a son’s biological father (with his inherited surname) by a surrogate father with (usually) a different surname. The most frequent cause of NPEs historically was probably adoption, but there are many other possible causes, including out-of-wedlock births. See Identifying/Confirming Your Patrilineage and Disconfirming your patrilineage for more on NPEs.

invisible writing

There are four of these protein bases, denominated “A”, “G”, “C”, and “T”, and they constitute the alphabet of the genetic code

invisible writing

(genealogical) patrilineage
the patrilineal (male line) descendants of an earliest male ancestor, the patriarch, who lived within genealogical time.
      The patriarch of a patrilineage, thus defined, is typically the first of his male line to adopt a particular surname and pass it on to his children, and his patrilineal descendants will also bear that surname unless an NPE (Non-Paternity Event) has occurred.
      The most recent common patrilineal ancestor (MRCPA) of any particular set of ySTR DNA tested descendants is likely to be well downstream of the original patriarch. The methods (and pitfalls) of sorting people into genealogical patrilineages are discussed at length under at Identifying/Confirming Your Patrilineage and Disconfirming your patrilineage.

invisible writing

patrilineage cousins
a set of tested or testable (male) paternal line cousins who are members of a patrilineage as defined above; most will thus bear a common surname, but where an NPE has occurred there may be males with other surnames who belong to the same biological patrilineage.
      More loosely, the term “patrilineage cousins” might be used to refer to other males or even females with male ancestors belonging to a particular patrilineage.

invisible writing

private ySNP
a ySNP mutation that has occurred in the lineage

invisible writing

reclOH mutationor event
An uncommon, but not rare, kind of mutation to a portion of the yChromosome that can affect more than one of a set of ySTR markers that usually mutate separately and independently. Read this article, and this one, to learn more.

invisible writing

one iteration of a sequence of nucleotide letters that is repeated a number of times to make up a ySTR marker; when the marker mutates, it usually gains or loses a single repeat. Occasionally, though, a multistep mutation will occur, adding or subtracting two or more repeats in a single mutation.

invisible writing

RPH (Root Prototype Haplotype)
the hypothetical haplotype of the MRCPA of all the ySTR DNA tested members of a particular patrilineage
     The RPH may the haplotype of that member of a set of tested patrilineage cousins who is most closely related to all of the others, collectively, or an RPH may be constructed synthetically, by choosing for each marker the value which most likely belonged to the haplotype of the patriarchal founder of the patrilineage (usually the most common marker value across the set of tested haplotypes).
      For a fuller discussion of RPH (a term, and concept, developed by yours truly), see this paper.

invisible writing

SNP (Single Nucleotide Polymorphism)
an observed difference in allele values between single nucleotides on the chromosomal strands of two individuals of the same species. The term is also used to refer to the paired nucleotides, or "base pair" of the nuclear DNA of an individual of a diploid species, like we humans, who inherit a copy of each chromosome from each of our parents.
     In autosomal testing for genealogical purposes, large numbers of SNP sites (base pairs) are sampled across whole chromosomes in two individuals, with the aim of identifying long half-identical stretches that are likely indicative of shared DNA from a common ancestor.

invisible writing

stepwise mutation model
The assumption that each unit of difference between measured ySTR marker values is due to the gain or loss of a single repeat. This model of the way mutations work provides a close approximation to the complex reality of the mutation process.

invisible writing

terminal ySNP
is the most recent known (through testing), and non-private, ySNP mutation in a particular yDNA-tested male’s yChromosome.
      Every male is situated somewhere on the hierarchical male yHaplotree of descent from yAdam, the ancestor of all living males, and the terminal ySNP for a particular male is simply the lowest level branching node on that tree known for that male.
      Actually, a male who has been tested on FTDNA’s BigY test is likely to also have additional private ySNPs—more recent ySNPs than the terminal SNP that may either be unique to his sub-lineage, or shared by a small number of his closer patrilineal relatives.

invisible writing

TMRCA (Time to the Most Recent Common Ancestor)
TMRCA, like genetic distance, is a measure of the closeness of relationship between two haplotypes. TMRCA may be measured in generations, or in years, where the number of years/generation is defined. TMRCA is calculated as a probabilistic function of the number of marker variations between the two haplotypes, and the calculation depends crucially on the estimated mutation rates for the particular markers that constitute the haplotype. Simple TMRCA calculators apply an average mutation rate across the marker panel, while more sophisticated calculators take account of which particular markers have mutated; if all the variant markers are fast ones, a closer relationship is indicated than if some of them are slow mutators. Another factor that may be taken into consideration is to adjust for the positive knowledge that there is no common ancestor back a certain number of generations from the present; this factor has the effect of pushing TMRCA farther back into the past. See my paper Deconstructing TMRCA & Genetic Distance for an extended discussion of TMRCA and GD (Genetic Distance).

invisible writing

transmission event
the event of male parentage in which the yChromosome of the father is replicated, with the possibility of mutations, and passed on to a son.

invisible writing

yChromosome (or “Y Chromosome”)
the yChromosome is that one of the 23 paired human chromosomes that is possessed only by the male, and which is handed down virtually unchanged to each of his sons.

invisible writing

yDNA (or “Y-DNA”)
the DNA of the male yChromosome (or yChromosome), which is said to be “non-recombinant” because (except for a tiny “pseudoautosomal” region containing 9 genes) it cannot combine with its odd couple partner, a female xChromosome.
     Tests of panels of ySTR markers are offered by Family Tree DNA, and other companies, for genealogical purposes, and FTDNA and others also offer tests of particular ySNPs that are being used to reconstruct the deep ancestry of humankind.

invisible writing

ySNP (Single Nucleotide Polymorphism)
a single nucleotide on the male yChromosome for which a mutation has been found to occur; because such ySNP mutations occur so infrequently, they are used to mark branch points in the male descendancy from the original yAdam.

invisible writing

ySTR (Short Tandem Repeat)
a type of (male) yChromosome DNA sequence composed of multiple copies (or repeats) of the same multi-nucleotide sequence; another name for one of these sequences of repeats is microsatellite, and in the context of testing for genealogical purposes they are more familiarly called “marker”s. Sets of these ySTR markers are preferred for constructing test haplotypes for genetic genealogical purposes, because they mutate much faster than single point (SNP) loci. Several hundred of these ySTR sites, or markers, have been identified but only 120 or so are currently being tested for genealogical purposes.
     Because these sampled ySTR sites come from areas of the yChromosome that have no known genetic function, the only thing they are indicative of (and that only collectively, when a particular subset of 37 are tested) is membership in a particular (genealogical) patrilineage.

Last updated 22Apr2022
© John Barrett Robb
Valid XHTML 1.0 Strict Valid CSS 3