Wednesday, August 8, 2018

Color Clustering: Identifying "In Common" Surnames

Please see an updated version of this post and more on the Leeds Method of DNA Color Clustering on my new website, www.danaleeds.com

After creating Color Clusters using the new Color Cluster Method (aka Leeds Method), the next step is to identify the surnames associated with these groups. (For creating Color Clusters, please read my original Color Clustering post.)

Note: This method is especially useful for people working with adoptees or other unknown parentage cases where they do not already know what surnames to concentrate on!

COLOR CLUSTERS: Identifying Common Surnames

STEP 1: Create Color Clusters and determine which clusters you need to work with (or work with all of them).
Actual data from an adoptee I worked with,
but names changed for privacy.
In this case, the adoptee identified the Blue Cluster as her biological mother's. We were trying to identify her biological father, so we concentrated on the Orange and Yellow Clusters. (The Green column did not have a cluster.)

STEP 2:  Determine which matches have trees and which do not and label.

Actual data from an adoptee I worked with,
but names changed for privacy.

I look at each match and see if they have a tree - whether attached or not attached! I then label them to indicate "tree" or "no tree."

STEP 3: List the "4th Gen" (great grandparents) surnames for each match with a tree. If they don't have 4th Generation matches, use grandparents or even parents.

Actual data from an adoptee I worked with,
but names changed for privacy.
To find the surnames, open the match's "pedigree and surnames" page and look at the surnames under the "4th Gen" column. If their tree is complete enough, you will see 8 surnames at this level - the match's great grandparents. In this example, both Gabby and Jamie have all 8 great grandparents listed on their tree along with their surnames.

STEP 4: Identify common surnames, if any, in each Color Cluster.

Actual data from an adoptee I worked with,
but names changed for privacy.

(I find this step truly amazing!) I have highlighted the shared surnames:
  • Orange Cluster: Griffin & Bartles
  • Yellow Cluster: Paulson, Austin, and Gray
STEP 5: Assign potential surnames to the Color Clusters, if identified, and use these clues to further your research!
Actual data from an adoptee I worked with,
but names changed for privacy.
At this point, you have clues as to what surnames you are looking for in each cluster. Continue your research using these clues!

You also might be able to look at first cousins or other "close family" matches to help label these clusters. (And, a big thank you to John Motzi for his help in refining this process!)

Happy Clustering!

24 comments:

  1. Thank you! This looks fairly straightforward to do. I have some families that I would like to try it with once DNA testing is done.

    ReplyDelete
    Replies
    1. Thank you, Clorinda! I would love to hear about your results!

      Delete
  2. Excellent explanation of how to use your system to identify ancestors, this will help so many people! Great work!

    ReplyDelete
    Replies
    1. Thanks, Connie! As I'm sure you realize, this can be especially helpful with adoptees who do now know what surnames they are looking for!

      Delete
  3. Dana, I especially like your reminder that other people's trees are *clues* and not *facts*! So many trees are incomplete or downright wrong, which means we have to confirm anything and everything on someone else's tree. I have your first blog post printed out so I can follow along as I try this new method. TY!

    ReplyDelete
    Replies
    1. Marian, Other people's trees, and even the clusters we are creating are clues, not facts! DNA cannot stand alone except for direct family like parent/child or siblings. Let me know how it works for you!

      Delete
  4. This is a fabulous process. I'm finally able to make sense of all those DNA matches. I've been able to make progress on my most troublesome line. Thanks!

    ReplyDelete
    Replies
    1. Beth, YAY!!! That's great to hear! Thank you for sharing!

      Delete
  5. I'm working with a distant cousin trying to identify her bio dad. There are only a few matches with trees, a couple just have a few people on their trees. Would it be useful to try to make private mirror trees to possibly fill out the missing 4th gens? Of course just to generate clues to check out, not as facts.

    ReplyDelete
    Replies
    1. Hi, Laurie. I know a lot of people working with adoptees use mirror trees. I never have. So, if it helps, you can give it a try!

      Delete
    2. Thanks Dana, love the visual-ness of this method!

      Delete
  6. Hi Dana, this is exciting and I'm sure this is going to help me a lot.

    But I'd like to point something out, because I am Captain Obvious, and if it helps someone else, yay!

    My Mom's very first 2nd cousin match is a test that is managed by someone else (I know who they both are). When I pulled up his tree to grab those surnames, I went what the what? There was one person showing on that tab (the tester), yet it says there are over a thousand people in that tree. I went to view the full tree and the answer is clear. The test manager is the brother-in-law of the tester. The tree is the test manager's family, not the tester's. Since the tester is an in-law, he only got his rightful place in the tree, but none of his own family.

    I don't know how common this might be, but thank goodness I already had the tester in my offline database and had enough info on the in-law to figure it out.

    So if you're seeing weird things in your matches' trees that are managed by others, this might be one reason.

    On to the rest of them!

    ReplyDelete
    Replies
    1. Robin, Thank you for sharing this example! I don't think I've come across this yet! Or, if I did, maybe it was just more obvious. Anyway, it is definitely something to watch out for!

      Delete
  7. Having fun playing with this method, thanks. We don't have more than a couple of 3rd cousins, so using the top 100-200 3rd-4th cousins. One thing I've done is to colour the font for female names red in my list (so many have initials etc) and add '(mg name...)' if the kit is managed by someone as often there's a few managed by the same person.

    ReplyDelete
    Replies
    1. I like your idea about changing the female names to red. I actually do add "by x" on mine, too. It can be very helpful! I should add that to a blog post. :) Thanks for sharing!

      Delete
  8. Hi Dana,

    I happened across this blog the other day and have spent the last 3 full days color clustering!!

    I am adopted and used 23 and Me for my DNA. Although a few matches are listed as 2nd cousins, I had MANY 3rd and 4th. Because I have almost no information, I created a 10 column chart , hoping to see a pattern and so far I have clusters in 9 of them. After reading all of the comments on the blog, I am not sure having 9 columns containing clusters makes sense. I also have a good amount of over lapping. I have surnames for 8/10 of the columns too.

    Should I keep plugging along??

    ReplyDelete
    Replies
    1. Hi, Susan. I think it mainly depends on how many "closer" cousins have tested. I generally use those with shared centimorgans of 90 to 400. But, you can go higher than 400 - it will just sometimes make two or more columns combine. After building the chart down to 90 cM, I just continue to add people but NOT create new columns. See my latest post for more information. I hope it helps! :)

      Delete
    2. Hi Dana,
      I think my issue is that with 23 and me, the amount of cm's is not listed. I only had 3 "Second to Fourth" and multiple Third to Fourth" cousins match. The closest percent of DNA to me in the Second to Fourth group ranges from 1.44% - 0.81%. All of the cousins in both groups are then predicted to be 3rd cousins. So, what I did was work the chart with the 10 closest cousins in the order of the strength of relationship and plugged them into your method of clustering.
      So, I guess I have 2 questions. 1. I did find significant clustering in my last column (# 10), but are you saying I shouldn't have columned out that far?
      2. Most importantly, could you please take a clock at my chart to see what you think??
      I could really use your help.

      Susan

      Delete
    3. Correction. I just found the shared DNA tool. The cousins I listed on my chart share between 74 cm's and 107 cm's as the highest.

      Delete
    4. Susan, Great! I was going to point you towards that tool. So, you might try creating clusters with only those cousins that are 90 to 107 cM. Hopefully there are a handful of those. And then add those down to 74 cM in the SAME columns. I think most of the kits I have looked at just don’t have enough cousins to work with on 23andMe. That’s why I’ve been using this method with AncestryDNA. If you haven’t tested there, too, I’d recommend it!

      Delete
    5. Hi Dana.
      I only had 2 cousins in the 90 to 107 cm range. So I created those two columns and found 14 clusters in the 1st column of the cousin with 107 cm's. The range ran from 107 down to 24 cm's. Is 14 a significant amount of clusters??

      In the 2nd column, with the match at 92 cm's I only formed 2 clusters, with the range being 92 cm - 42 cm

      Because I had only the two I created a 3rd and 4th column of the next highest cm's matches that didn't fall into the first two columns above. In column # 3 with a match of 76 cm, there were 7 clusters, within a range of 76 to 24 cm. In the 4th column with a match at 74 cm, I found 10 clusters, with in the range of 74 to 24 cm's.

      I am not sure what to do with this information at this point. Do I even have enough clustering to go any further??

      I have entered 480 cousins out of the 980 that matched from 23 and me, just to see the matches more clearly. I uploaded the data to GEDMatch but the highest match on there was only 64 cm's, so I ordered Ancestry and its on its way.

      Delete
    6. Hi! I have not worked with only 2 cousins in that range before, and am not sure if it would work. I have only been using AncestryDNA because I/most people have so many more matches there. Glad you ordered a test! I hope you have a lot more matches there and can use this method. I will continue to investigate how this method can be used with less matches, but right now I don't know how it would work.

      Delete
  9. Hello. I am working on my husbands DNA line (he was adopted). This tool is great! However, here is what happened for me. I ended up with 7 clusters, the first four were all related to the paternal tree (I know of at least 2 second cousin marriages), and two of the clusters I identified as the maternal clusters. I am pretty sure of that because I do know my husband's biomother, and there were people in those trees that matched that tree. I have one cluster of two people that I am not sure about.

    Here is what is odd. My second and third cousins neatly fell in the above clusters but when I started looking at 4th cousins I started having crossover into the maternal cluster! What would that indicate? I am thinking it might be an earlier marriage between the two lines? But why wouldn't it show up until the 4th cousin check.

    Thanks again for this wonderful tool! It has helped me a lot.

    ReplyDelete
    Replies
    1. Hi, pruble. I think it is likely that these 4th cousins are matching two or more of your side of the family because of THEIR ancestors. Also, sometimes I have one or two people left & they don't have a tree & aren't matching others (like your cluster of two) and I'm not sure where they belong. Hopefully, by working with 4th cousins, or after more matches show up, you'll be able to identify how they fit in the family.

      Delete

Color Clustering: Top 25 Fourth Cousins

For more on Color Clustering & DNA, please visit my new website at: www.danaleeds.com  For another look at how Color Clustering works...