Monday, August 6, 2018

Color Clustering: Creating Color Clusters

Unsure of how other people were sorting their Shared Matches from AncestryDNA, I created my own method. This method is quick - it usually takes less than 10 minutes - and visually shows genetic connections while also "sorting" the matches into groups reflecting the test taker's great grandparents' lines.

Please test out this method and let me know what you think! Although I think it will be valuable for many genealogists, I think it will be especially useful for adoptees, Search Angels, and others who are trying to identify unknown, close relatives.

COLOR CLUSTERING: The Method

STEP 1:
List all of your 2nd and 3rd cousin DNA matches. (Note: These are fictitious names, but the results are based on real matches.) 


STEP 2:
Assign a color to your first DNA match (for example, blue to Ralph.)


STEP 3:
Open the shared matches for that person (Ralph), and assign them each the same color in the same column (blue).


STEP 4: 
Find the first person who does not have a color assigned (Robert), and assign him a color in the next column (orange).


STEP 4:
Open the shared matches for that person (Robert), and assign them each the same color in the same column (orange).


STEP 5:
Continue steps 3 & 4 until all of your shared matches have at least one color assigned to them.




COLOR CLUSTERING: Analyzing the Results

4 Columns/No Overlap:

If your results show 4 distinct clusters, like below, without any overlap, your sort is likely showing matches to your 4 sets of great grandparents.


Less than 4 Columns:

If your results show less than 4 clusters, it is likely these clusters represent 3 of your 4 sets of great grandparents and that you have no matches at the 2nd/3rd cousins levels who have tested for the 4th set of great grandparents.



Some Overlap:

If your results show 4 clusters but some of your matches have been assigned more than one color (for example, Herbert & Stacy are both blue and orange), your sort is likely showing either your 4 sets of great grandparents, but also showing you that two of these results (i.e. blue & orange) are on one side of your family. Or, the overlapped clusters (blue & orange) might belong to one set of great grandparents and, in this example, you are missing matches for 1 set of your 4 sets of great grandparents.



Lots of Overlap

In this real example, there is a lot of overlap between all of the clusters except the yellow and brick red clusters. All of the overlapping clusters are on the maternal side of this test taker and visually show a lot of cousins marrying cousins resulting in pedigree collapse. The paternal mother's side is represented by both the yellow and brick red clusters. The paternal father's side has no cousins matching at the 2nd/3rd cousin levels. So, even though there are a lot of clusters and matches, this sort represents only 3 of the 4 sets of great grandparents for this individual.





 A special thank you to everyone who allowed me to access their DNA and gave me feedback!

Please be aware: Your results may vary! This new method is still in its infancy and more test cases are needed to see how it works in various situations.

TIP: When I say "2nd and 3rd cousins," I am using the categories Ancestry has used to define them. The 3rd cousins appear to go down to 90 shared cM which works out well for this process.

TIP: If you chart is "too messy," look at the shared cM of your top matches and take off any that are above 400 shared cM. Then redo the chart. Hopefully, it'll be a lot "cleaner!"

Happy Clustering!

79 comments:

  1. Thank you for sharing this method to separate matches. A visual method, it will be very valuable to adoptees and others working to break through brick walls in their genealogy. Looking forward to sharing it in the facebook group, Texas DNA and Adoptee Search Support and seeing the results.

    ReplyDelete
    Replies
    1. You're welcome, Connie! And, I look forward to working with these new groups and from learning more from you. :)

      Delete
  2. One thing I also did was for each color, I also assigned a number. The first color is 1, the second color is 2, etc. That way, I can also add a filter and see the overlaps a little easier as well as it also easily isolates everyone who is related just through one common ancestor, etc.

    ReplyDelete
    Replies
    1. Did you know you can also filter each column by color? I didn't until I was working with this! You can go to each column and have the colors "rise" to the top. Then, for the next column, only sort people BELOW that. I might have to share an image. :)

      Delete
  3. Thank you so much for sharing this, I'm really not sure I have sufficient close enough matches but as an adoptee looking for birth family, I'm going to give it a go.
    I have found that if you copy and paste names from Ancestry tree, it creates a hyperlink in the spreadsheet so you can click on it and refer back to your tree, which may be helpful for some people.

    ReplyDelete
    Replies
    1. Thank you! That is such a neat 'trick'! I had no idea!

      Delete
  4. I have been using your method since you showed it to me at GRIP, and love it. I have added centimorgans to the matches blocks, and also notes below the chart regarding common surnames, etc. This is now the first step I take when I take on a new unknown parentage case. Thanks for sharing!

    ReplyDelete
    Replies
    1. Yay! It's my first step in unknown parentage cases, too. :)

      Delete
  5. Intriguing idea! i help adoptees, and this looks like it will be very helpful! I can hardly wait to try it out! thanks!

    ReplyDelete
    Replies
    1. Helen, I hope you like it! I have found it very helpful in working with adoptees!

      Delete
  6. Trying this for a match who's adopted. Of course, I get six columns, with overlaps. He also has two 'first cousin' matches. I notice if I start with them I get a vastly different chart. I'm assuming this works best without them.

    ReplyDelete
    Replies
    1. I think it works better without them, but THEN you can see who the first cousins are and use them to figure out where other people belong... IF you know who the first cousins are. With adoptees, you likely don't. But, if mentally add the overlapping columns to create one, are you getting more like 3 or 4 clusters?

      Delete
  7. Wow, this is simple and visual and powerful. I'm going to try it! Thank you SO much for explaining in such detail exactly what to do and how to interpret the outcome.

    ReplyDelete
  8. I have a situation where a 3rd cousin match (100 cms)is not a shared match to any of the top matches I have assigned my 4 different colors to. I would assign her a different color, but she matches 2 others further down the list (who match one of the top matches and thus have a color assigned to them) Should she be assigned a color the same as the two matches she does have in common? Also, is there a way to apply this same principle to 4th-5th cousin matches? Lots more of those to sort- :)

    ReplyDelete
    Replies
    1. Hi, Lisa. You might end up with more than 4 different colors. (See my last example!) As you go down the list, you assign a new color to anyone who does not already have a color. I'll try to get in touch with you & see if I can help.

      Delete
    2. I too am interested in knowing if it could help with 4th-5th cousins. Other than a match with my nephew I have no matching kit closer than 4th-5th (ftdna) / 4 gens to MCRA (gedmatch).

      Delete
  9. I'm an adoptee, and was new to DNA and searching as of February this year.

    A week ago, I found my birthfather. (Ta-dah!) Finally. One thing I did was just put my close cousin matches on a piece of paper, closest matches at the bottom of the sheet, further matches higher on the sheet. Then, I did the matches at Ancestry between each one -- if they matched with someone on the sheet, I drew a line between them. When finished, I looked to see who was grouping with whom AND who had the most lines coming out of their spot on the page. It's somewhat primitive, but it helped me visualize the connections. I think your system is better.

    ReplyDelete
    Replies
    1. Yours worked, though! CONGRATS! I have more steps to share and the rest I usually am using paper & pencil. :)

      Delete
  10. Thanks for this Dana! I was so excited to try this, as I am a visual learner. But I'm not sure what to do with my results. I did my mother's test, she has 57 2nd and 3rd cousins. I have 10 colors. If we're talking about 4 sets of gr-grandparents, we're talking about 8 people. Shouldn't I only have 8 colors? What do I do with a person who has 3 colors in their row? Is this what you're calling overlap? I have one guy who has 5 colors. Does it sound like I didn't do it right or do I have a lot of intermarriage? I have a suspicion that my maternal grandparents might have been distantly related and this chart isn't showing that where I'm expecting to see it.

    ReplyDelete
    Replies
    1. Look at my last example... you CAN have more than 4 or even 8 colors. But, when you mentally (or physically) put the overlapping clusters together, you should end up with only 3 or 4 "main" clusters. I will try to get in touch with you & see if I can help!

      Delete
    2. I would love to chat with you about this Dana. It occurred to me this morning that my two extra colors might be NPE/adoptions.

      Delete
  11. This comment has been removed by the author.

    ReplyDelete
  12. I'm from a highly endogamous rural county where my great and sometimes great great grandparents arrived around 1800-1810. I also have a grandfather and his brother who married sisters. I didn't end up with as many columns as your endogamous example, but of my 36 second and third cousin matches, 10 shared 2 columns, 3 shared 3 columns, 1 shared four columns and 22 showed up in only one column. It was a good exercise. I think it gave me direction for those I had not yet identified as they now are in columns of those who I HAVE previously identified. Good job. Now I'll work on those others!!

    ReplyDelete
  13. I've been meaning to do this at some point and your color-coding idea spurred me to do it today. The 2nd and 3rd cousins were straightforward. (Unfortunately, I have no second OR third cousins on my father's maternal line so I have three primary colors.) I then tackled some of the more distant cousins at Ancestry (some of which are 3rd cousins even though they're in the 4th-6th cousin range). I've got many groups!

    I'm glad I read the additional advice here. I also added centimorgans to the matches blocks and inserted a comment at the top cell of each column with a note as to the MRCA (if known). I also ended up adding a left-hand column with the cM number so I could re-sort by cM if needed.

    ReplyDelete
    Replies
    1. Nice, Elizabeth! And, thank you for sharing how you are working with this basic method. I often write the cM on my printout - depending on what exactly I'm doing the sort for. Mostly, my work with this has been with adoptees. And, with my own family, I have added the names of the great grandparent's surnames to the top of each column. It really helps to quickly see where people fit in!

      Delete
  14. Are we supposed to use the estimated relationships generated by Ancestry or known/confirmed relatives? For ex, one of my AncestryDNA 2nd cousins is actually my 4th cousin.

    ReplyDelete
    Replies
    1. I have been using Ancestry's estimated relationships, though I'm usually using this method with adoptees. If you know a specific person is really a 4th cousin, it would probably be best to leave them out. But, I think if you run it both ways, it will show it doesn't make much of a difference. But, I'd love to hear your results!

      Delete
  15. Thank you Dana! This is a great tool.
    I'm trying to find out who my maternal grandfather was, and I only have 3 colours for 3rd cousin or closer matches, which I expected - I thought it would be unlikely I'd missed a match this close.
    I have many 4th cousin matches though, so I'm going to continue the system and assign the next 4th cousin without a color and go from there. Can you see any issues with me doing this? Any other additions on the spreadsheet you'd suggest? My maternal grandfather is unknown due to an affair, and he had Huntington's Disease.

    ReplyDelete
    Replies
    1. Hi, Tonia. If you know the 3 color clusters are for your other 3 grandparents, I think there are two possibilities that I can see: one, that no one at a 2nd/3rd cousin level has tested, or two, maybe the maternal grandfather's family was actually related to one of your other lines and they are being mixed together. I'd suggest running the "Are Your Parent Related" tool on GEDmatch. And,I would continue looking down your 4th cousin matches until you find one - or several matches - who do not match any of your already formed clusters. Then, I would concentrate on working on those matches. Stay tuned for part 2... it might help you in your search! :)

      Delete
  16. It just shows how much my grandmother's 2nd and 3rd cousins overlap..Still no clear picture for those who I can't figure out how they fit at all :(

    ReplyDelete
    Replies
    1. Kim, If you share your results with me, I'll see if I can determine anything else from your matches.

      Delete
  17. Greetings, Dana. Just a question. When I click on the shared matches of let's say one of my 3rd cousins, sometimes names appear which I have not seen before. I assume they are probably buried in the long list of 4th cousins somewhere. So, do I ignore them, add them to the original list of names, or go back and look for them in that list of 4th cousins and add them to the original list if/when I find them? Would I also do a shared match look on the newly discovered ones or just stick with looking at shared matches on the original 2nd/3rd cousin list? Thanks!

    ReplyDelete
    Replies
    1. Hi, Chip. When you look at shared matches, they are still divided by Ancestry's predicted relationships (close family, 1st cousins, etc.) Just stick with those listed at the 2nd & 3rd cousin levels and there shouldn't be any extra names. Hope this helps!

      Delete
  18. Hi Dana, I was only getting 2 columns so I went on to 4th Cousin matches. I put in the shared 4th cousin matches. Example - main person AB, one of their matches is CD and a shared match is EF. However when I search for EF in ABs list of matches on ancestry it's not to be found. Any idea why?
    Ps. Great work by the way

    ReplyDelete
    Replies
    1. First of all, if you're only getting 2 clusters with 2nd & 3rd cousins, I would look to see how many cM your top matches are and consider dropping that down to 400 and seeing what effect it has. As far as why matches don't show up - it is AB & CD are related to each other with different pieces of DNA than AB & EF. But, AB might have DNA in common with both CD & EF. Hope this helps!

      Delete
  19. Hi Dana, Thank you so much for this. I ended up with 3 columns no overlap. I was able to sort them out to which great grandparents. I would like to find links to my gg grandparents. Do I just continue on and do the 4th cousin matches? I did my dna quite awhile ago and never knew what to do with it. Sad, I know but this helps alot. Baby steps for me. Thanks again.

    ReplyDelete
    Replies
    1. Kelley, I'm so glad you found this helpful! :) I have not used it with 4th cousins yet, but some people in Blaine's GGT&T Facebook group have been trying it with mixed results. These matches are further out & share less DNA so it wouldn't be as simple and clear. But, I think you can make some progress with it. You might just add the highest 4th cousin matches & keep adding until the columns start getting too unorganized & you are losing the clear images. It likely would be more like 8 columns since you have 8 great grandparents.

      Delete
    2. I am in the group so I'll check the feed. Thank You!

      Delete
  20. Great method Dana. When you message your matches on Ancestry be sure and get their GEDmatch kit numbers. You can do this sort of color coding using GEDmatch Tier 1. Then when you look at the one to many page of unknown matches you can see if any of your color coded people show up and help you determine which line they are in for you. It can also help you find people who match on both sides of parental lines.

    ReplyDelete
    Replies
    1. Great advise, Barbara!!! Thanks for this wonderful suggestion!

      Delete
  21. Have just tried this chart and have 4 distinct columns - trouble is one of the columns I can't relate to a set of great grandparents as the 3 matches have an unknown parent (at different levels) - if I look at their 4th cousin matches they seem to have matches from both sides of my family. How on earth do I figure that one out?

    ReplyDelete
    Replies
    1. Hi, Jo. Sometimes it is really clear. Sometimes it takes more analysis. Just a thought as you say this 4th column is matching on both sides of the family - have you tried GEDmatch's "Are Your Parents Related" tool? It doesn't even have to be a very close relationship, but it could "mess" with the charting.

      Delete
  22. Hi Dana, I loved this approach :-) I extended my sort to include the entire first page of matches as I only have 11 2nd and 3rd cousins. I added three new columns beyond the first 4 for grandparents. The next two were for maternal undetermined and paternal undetermined. The last was completely undetermined. I was surprised that out of 50 matches I only had 3 completely undetermined, 5 maternal undetermined and 0 paternal undetermined. Super way to get a handle on Ancestry's frustrating muddle :-) Thanks for sharing.

    ReplyDelete
    Replies
    1. Nice, Linda!!! Thanks for sharing. That's a great application!

      Delete
  23. Would this work with FTDNA? And is each column a common surname to a few Christian names?

    ReplyDelete
    Replies
    1. If you have enough close matches it would work with FTDNA. I would try starting with a range of 400 down to 90 cM of shared DNA. I haven't tried this yet, but please let me know how it works out! I just don't have as many matches on FTDNA.

      Delete
  24. When you open the Shared Matches in Step 3, do you add 4th Cousins & beyond to your list or just stick to the 2nd & 3rd cousin range you already have? Thanks! I'm anxious to try this.

    ReplyDelete
    Replies
    1. The particular adoptee I'm helping only has 8 2C & 3Cs, with only 3 color codes represented, so I'm thinking I'll need to look at 4th cousins as well, right? Just add to the original list? Thanks!

      Delete
  25. I tried doing this using my dad's matches on ancestry and he only has 3 3rd cousin matches and the rest are 4th cousins. When I click on the 3 3rd cousins matches to see the shared matches, they are all 4th cousin matches....should i attempt to do this with 4th cousin matches

    ReplyDelete
    Replies
    1. I'm not sure it will work, but you can sure try! If so, you will likely be producing approximately 8 clusters instead. Please let me know how it works!

      Delete
  26. I only have one (known) 2nd Cousin and 7 x 3rd Cousins (out of those 7, 2 are already placed in my tree). I have ended up with 6 colours with one column containing three matches (the already known ones). I only know of 3 sets of great grandparents (on my tree) as mother's father wasn't named on BC. Don't know what to do now.

    ReplyDelete
    Replies
    1. I use this method to help adoptees and Unknown Parentage cases. You don't need to know how they are related to you. I just use everyone Ancestry assigns as a 2nd or 3rd cousin - NOT their actual relationships. And, depending on the result, I sometimes only use those under 400 shared cM. Have you put your DNA in "all of the ponds"? You can transfer for free to FamilyTreeDNA and MyHeritage. You could pay to be tested at 23andMe. And, GEDmatch is a 3rd party tool that is also helpful. You never know in which of these places the closest match might turn up! Also, I'd suggest joining the Facebook group, DNA Detectives, where volunteers offer suggestions for people looking for recent, unknown biological family.

      Delete
  27. This is certainly a *DuH* moment for me. I know there will be lots of overlap in my matches (Can you say endogamy? Sure you can! 7 of my 8 greats are tied to somebody on the other side), but for hubby? This should be most excellent. He has a "normal tree ... even if it does have gaping huge holes in it!

    ReplyDelete
    Replies
    1. Susan, That sounds like a challenge! I have one of my 4 grandparents who has pedigree collapse. I really find it tough!!! "Normal" trees are so easier to work with! :)

      Delete
  28. Question: When going on to Step 4, do you add all matches? or only those below the name? Your last example showing lots of overlap seems to show you input all the shared matches of that particular match, both above and below their place in line. my last "empty" spot on my list also matches people above her, should I fill out the color for all previous matches or leave it as a single color? Thank you for sharing!

    ReplyDelete
    Replies
    1. Good question, Heather! I do put ALL of the 2nd/3rd cousin matches, both above and below whatever person I'm working with.

      Delete
  29. Thank you Dana, this is making things a little clearer. I am an adoptee fortunate to find my birth parents. In an attempting to extend my tree I did my Dna and found my Grandfather was not my birth mothers father and indeed my GGreat is also not the named father. I used the above instructions and had 3 clear lines. My fathers maternal and paternal matches were all matched together. I moved down to my first 4th cousin match and entered that with a new colour then checked the shared matches. This match enabled me to seperate the ggrandmothers line from the ggrandfathers on the paternal line giving me 4 distinct lines. Is this correct or have I totally misunderstood?

    ReplyDelete
    Replies
    1. I am just starting to use this on 4th cousins, so this is a new area. But, it does sound like it worked in your case. Keep working and let me know what you figure out! :)

      Delete
  30. Hi Dana, My question is are you using a Word Document? I know this is a really stupid question I just want to understand it correctly. Thank you for sharing, Maria

    ReplyDelete
    Replies
    1. Hi, Maria. It isn't a stupid question! I am using Excel, but you could use a spreadsheet in Word. OR... you can even use a piece of paper with markers, crayons, or whatever. Whatever works for you!

      Delete
    2. Great. Thank you so much for replying. πŸ€žπŸ€žπŸ‘ΌπŸ™πŸ’―

      Delete
  31. Hi Dana, this is beautiful. I tried it for a man not related to me whose paternal grandfather's paternity is unknown. He only has eight 2nd-3rd cousins, and they group into five clusters with no overlap. Based on doing the genealogy, I can map three to his three known grandparents, the fourth to his paternal grandmother. The fifth I strongly believe is from his paternal grandfather (and if I incorporate 4th cousin matches, this fits). Now I'm seeing the same pattern (5 clusters, no overlap) for a cousin of mine. How regularly do you see more than 4 clusters with no overlap?

    ReplyDelete
  32. Good question! Actually, I see 5 or 6 color clusters/columns sometimes. If there is NO overlap, like in your case, it is sometimes because your matches are sorting even "better" into great grandparents. Actually, I think of the 4 columns as 4 sets of great grandparents. But, if I see a 5th or 6th with no overlaps, I would look to see if it has broken down my tree even more for me. Best wishes and let me know if you figure this out!

    ReplyDelete
  33. Hi Dana, I was excited to try this, but I can't seem to get anywhere. According to Ancestry I have 1 2nd cousin and 2 3rd cousins. Actually they are really 2 2C and a 2C1R. Another 2C2R is listed as a 4th cousin by Ancestry. I started with just what Ancestry says. My 2C's shared matches are all 4C or higher. Both of the 3C, according to Ancestry have no shared matches.
    I'm going to try adding my other known 2C2R, that Ancestry calls 4C and add the names of 4C from the shared matches for that cousin that has shared matches and see if it helps. Any suggestions? Thanks, Patsy

    ReplyDelete
  34. Hi, Patsy. I'm not sure it will be helpful, but you might read my latest post about working with 4th cousins. Also, you might find some 2nd cousins and ask them to take DNA tests?

    ReplyDelete
    Replies
    1. I'm reading that now. I'd done some of them this morning. What I did was to write down the cM shared, and who else each of my 2C's matches had as shared matches, then if they had a tree and locations I found in their tree. I just read your 2nd post and added the 4th great grandparents to them. Now I'll add the rest of my 4th cousins the same way I did this morning. Thanks!

      Delete
    2. I just added all what Ancestry calls 4th cousins, and I actually found a 2nd cousin (listed as 4th!) We share great grandparents! She's only the 2nd cousin I have on my Mom's Dad's side.

      Delete
  35. I manage most of our families dna tests and did as you suggested above for our English cousin and I got all the first row matched except for two names and they each didn't match each other. What does that mean? our close family, we know how we all match, so does that mean these first column matches are from ONE grandparent? Thanks

    ReplyDelete
    Replies
    1. I'm not sure what you mean by "all the first row matched." Did you mean first column?

      Delete
    2. yes sorry, must have been doing something else at the time as well ;)

      Delete
    3. all the first column is the same color except two dna matches that don't match the first column nor each other.

      Delete
  36. Thanks for a new tool. I found it through Blaine Bettinger's facebook page. I filled in the chart with my one second and twenty-one third cousins. It helped with the people who didn't have a tree. I am looking for an unknown grandfather. I e-mailed the completed chart to my brother who asked what he was looking at. After 45 minutes he said he was afraid he was starting to understand. I am the one doing the DNA "stuff". He did ask why the vast amount of "cousin" names were female. I believe most "home" genealogists are female. Any data? Again thanks for the tool.

    ReplyDelete
    Replies
    1. Hi, Robert. I'm glad it is helping! And, good point about females. No, I don't have any stats, but I would guess when I go to genealogy events we usually have 70-90% females. But, hopefully they are also testing male relatives!

      Delete
  37. Thank you for sharing your method with everyone. Can you please tell me why you skip to the next blank person to start the new color? I am wondering about two matches that are related on the same line, but inherited different segments, so that while they match each other, they have different shared matches. I would appreciate your insight on this. Thank you

    ReplyDelete
    Replies
    1. Hi, Kent. Sorry I missed your comment! I am skipping to the next blank person because it makes this procedure SO quick & visual. BUT, if you do EVERY person, you would see more about what segments they inherited - like you mentioned. I have yet to write about this, but I would call that Complete Color Clustering. I think you can use either method depending on what your goal is! Hope this helps, and thanks for the great question!

      Delete

Color Clustering: Top 25 Fourth Cousins

For another look at how Color Clustering works with 4th cousins, I created a Color Cluster chart then added the test taker's top twenty...