Too Many DNA Matches:
A Catalogue of Discovered DNA Regions

by Wesley Johnston
begun 8 Dec 2018, inspired by Diahan Southard's presentation at the 2018 Institute for Genetic Genealogy (i4gg) conference in San Diego
last updated 10 Oct 2024 - correct end location on chromosome 21
Click here to return to my main family history page.
Copyright © 2024 by Wesley Johnston All rights reserved

I am definitely NOT the best person to do this catalogue. But at this point, no one else is doing it in a way that allows for ready identification of a specific area that might come up in a research project.

The ISOGG Wiki "Identical by descent" page's section on "Excess IBD sharing" comes closest, and DNAPainter uses that page's problem regions to warn about potential problems when you are doing chromosome painting in DNAPainter.)

Family History Fanatics also have a nice concise page on pileups which builds on Debbie Kennett's 22 Jan 2018 post "Small Segments and Pileups.

I would really like to include a web page about GEDmatch Q-Matching which addresses the SNP-poor regions, but I have yet to find a good web page on Q-Matching (although the Family History Fanatics page noted above does address it well though not completely). There is a good 2 Aug 2020 statement by Wilbon Davis in a reply a query on the GEDmatch forums that is worth quoting "Be sure to recognize that Q only helps eliminate segments that are accidental matches. Rely more heavily on cM lengths for estimating kinship and IBD filtering."

So I am doing what I can, with what knowledge I have.
-- Wesley Johnston


Overview

When examining DNA matches, some regions of shared DNA come from a shared common ancestor. These are called IBD (Inheritance by Descent). But others come from various other situations that are not indicative of the DNA coming from a recent common ancestor. These are in a group of match situations called IBS (Inheritance by State).

While there are more forms to consider, I will leave it at this being a catalogue of excess IBD of cases of too many DNA matches on the same region of the same chromosome. I had originally thought to include IBS due to a shared ancestral population (IBP), but those are not included.

I do think that it is very important for qualified researchers to research each of these and see just what they tell us. They are not useful for identifying common ancestors, and the catalogue is helpful for identifying when you have stumbled onto one of those cases, which is the main purpose of the catalogue. But I suspect that they have importance that we need to understand. This is fertile research ground for some enterprising well-qualified researcher seeking to "boldly go where no [researcher] has gone before".

At this point, I do not plan to classify any of these, as to which type they are or as to what they might mean. I simply want to capture all cases where this is found. I began (7 Feb 2021) a C22 Pileup Project, but I have not worked on it for a very long while now, and it really never got off the ground.

Standard Method for Identification and Definition of Excess IBD Regions

While this method has yet to be applied to every region shown below, it is the standard I use since the c7 Mexican region in April 2024.

When I initially explore a newly-uploaded test result on GEDmatch, I create a tag group of about 100 of the closest matches. I then run triangulation of the reference kit with the other kits. Excess IBD regions appear as hundreds of triangulations on the same region. I can then copy and paste those triangulations into a spreadsheet where I can define the starting (minimum starting point) and ending (maximum ending point) of the region as well as the minimum, maximum and median number of centiMorgans. If deeper exploration of the region is desired, a GEDmatch Tier 1 Segment Search of up to 10,000 kits on the chromosome in question will probably expand the number of kits included.

A related method starts with a GEDmatch Tier 1 Segment Search on a chromosome that shows up as a shared segment in a one-to-one comparison of two kits who were not expected to share DNA. These can then be placed in a tag group to enable the above analysis.

An important tool for working with discovered regions to determine the number of centiMorgans is the CM Estimator tool created by Jonny Perl on the DNA Painter website with data provided by Amy Williams of Cornell University.


KNOWN REGIONS

Chr
StartEndLen (cM)Source - Details
1
798,3593,035,2277.3 cMThis is a region that may be Scots-Irish DNA that shows up in some of the Johnstons of North Ireland kits. This may be the same as the region identified below for the Lake family which is also Irish or Scots-Irish DNA with which it significantly overlaps.
1
~838,000~5,600,000~7-11?This is a region that may be Irish DNA that shows up in some of the Loyalist Lake Family kits. It was identified by Craig Kanalley. So far, the kits that triangulate on this all have Irish or Scots-Irish ancestry, and some of them definitely did not inherit the region from their Lake ancestors.
1
118,434,520153,401,1089.95https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
2
~16,300,000~28,800,000~10.5This is a region that appears to be a possible Slavic excess IBD region.
2
85,304,24399,558,0136.53https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
2
132,695,025141,442,6369.16https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
2
192,352,906198,110,2295.04https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
3
~42,571,729~73,704,141~35.5This is a region that appears to be a possible Mexican excess IBD region. None of the kits seen have the entire region. The largest seen is 30.2 cM. However there is a significant gap between the starting positions of these high-cM kits at the low end of the region and a narrower region that starts about 51,802,912. However, there are many kits at or above 20 cM within even the narrower range. - This region was discovered by Wesley Johnston, with help from Kevin Borland, in April 2024. It may be specific to families originating in Jalisco, since some of the known kits are descendants of Colotlan (and Tlaltenango in Zacatecas) families.
6
29,750,00033,100,000 https://dx.doi.org/10.1371/journal.pgen.1004144 - HLA Region
6
~70,300,000~99,200,000 21.9 cM/ 13056 SNPs https://familylocket.com/the-irish-endogam-ish - County Cork, Ireland, pileup identified by Heidi Mathis
7
~124,322,409~153,028,287up to 33.5This appears to be a Mexican excess IBD region. The following is based on 601 triangulations on this region: largest 33.5, smallest 5.3, median 14.5. None of the kits seen have the entire region. - This region was discovered by Wesley Johnston in April 2024.
8
9,600,00010,400,000.https://www.genetics.org/content/186/1/295 - Charts 6 and 7: " On the basis of the simulations we used a conservative critical value requiring extreme regions to have more IBD than the most extreme region observed in any of the simulations (1.34% IBD sharing). Only two regions fulfilled this very conservative criterion: the HLA region on chromosome 6 and a region from 9.36 to 10.4 Mb on chromosome 8 (see Figure 6 for a closeup of the IBD sharing in these regions). The amount of IBD sharing in the two regions is not likely to be the result of neutral evolution, at least not under the demographic models used here. "
8
10,428,64713,469,6937.96https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
9
38,293,48372,605,2618.15https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
10
44,555,09353,420,1887.58https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
15
20,060,67325,145,26010.46https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
15
27,115,82330,295,7509.29https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
15
~76,158,983102,376,761~49.2This is a region that appears to be a possible enormous Mexican excess IBD region. None of the kits seen have the entire region. The largest seen is 52.6 cM. - This region was discovered by Wesley Johnston, with help from Kevin Borland, in April 2024. It may be specific to families originating in Jalisco, since some of the known kits are descendants of Colotlan (and Tlaltenango in Zacatecas) families. But the number of kits included in this region is extremely large so that defining the geographic scope would take extensive research.
16
19,393,60624,031,5566.18https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
17
~9,308,301~15,627,3295.6-17.1This is a region that appears to be a possible Mexican excess IBD region. None of the kits seen have the entire region. - This region was discovered by Wesley Johnston in April 2024. It may be specific to families originating in Jalisco.
17
59,519,08364,970,5316.23https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
17
77,186,66678,417,4785.66https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
17
~42,571,729~73,704,141~35.5This is a region that appears to be a possible Mexican excess IBD region. None of the kits seen have the entire region. The largest seen is 30.2 cM. However there is a significant gap between the starting positions of these high-cM kits at the low end of the region and a narrower region that starts about 51,802,912. However, there are many kits at or above 20 cM within even the narrower range. - This region was discovered by Wesley Johnston, with help from Kevin Borland, in April 2024. It may be specific to families originating in Jalisco, since some of the known kits are descendants of Colotlan (and Tlaltenango in Zacatecas) families.
21
16,344,18619,375,1686.91https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)
22
16,051,88125,095,45120.82https://dx.doi.org/10.1371/journal.pgen.1004144 - Table 3 (Build 37)

Chromosome 11

"High IBD sharing signals were also seen on each side of the centromere on chromosome 11. These regions contain clusters of olfactory receptors."
[source:
ISOGG Wiki on Excess IBD, citing "Albrechtson A, Moltke I, Nielsen R. Natural selection and the distribution of identity-by-descent in the human genome. Genetics 2010; 186 (1 ):295-308. See Table 2 for the genomic positions of the regions on the different chromosomes where the peaks of IBD sharing were found."]


Contact Information

Send E-mail to wwjohnston01@yahoo.com

Copyright © 2023 by Wesley Johnston
All rights reserved