You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gramps has no native DNA data model. The existing workaround uses Person Attributes, Events, Associations, and Notes. This has three consequences: no standard schema, no purpose-built UI, and no reliable interoperability between addons.
Existing Addons
DNA Segment Map Gramplet
Visualises shared autosomal segments as coloured bands across chromosomes.
Data model: Associations of type "DNA" between people, with a Note containing segment data (Chromosome, Start, End, cM, SNPs) as structured text.
Uses the same Association + Note + Citation model as Segment Map.
FamilyTreeDNA Gramplet (2024)
Imports and displays FTDNA-specific data for use with the DNA Segment Map gramplet.
Current De Facto Data Model
No official schema exists. Addons have converged on the following conventions.
Kit and identity data - Person Attributes
Attribute key
Value
AncestryID
username
DNAkit
GEDmatch kit number
FTDNA kit
kit number
Y-DNA Hg
haplogroup (e.g. "I-L22*, confirmed")
mtDNA Hg
haplogroup
Y-DNA and mtDNA test results - Event
Event type: "DNA Test" on the Person.
One Attribute per STR marker (DYS393 = 13, etc.).
HVR1, HVR2, Coding attributes for mtDNA mutations (comma-separated values).
Raw data files attached as Media.
Autosomal match data - Association
PersonRef between two people, relation type "DNA" (or "cM").
Attached Note: one segment per line as Chromosome, Start, End, cM, SNPs.
Attached Citations: one per testing provider.
Reciprocal records required (A -> B and B -> A).
Limitations of current de facto model
No support for common ancestor research: this is a three-way relationship between two test takers and an ancestor which associations can't support.
No way to record an unidentified match. Associations link two Person records. If a match has not been identified in the tree, either a placeholder Person must be created (polluting the People database with non-people) or the match is simply not recorded. The vast majority of matches fall into this category.
Individual match records have no independent privacy or tagging. Privacy is inherited from the Person record. A sensitive match such as an unknown biological relative, non-paternity event, cannot be independently marked private or tagged without affecting the Person.
A Person Attribute has no provider context. The kit number is stored on the Person, but the format, the genome build, and the test type must all be inferred from naming conventions.
Segment coordinates have no associated genome build. GRCh37 and GRCh38 positions are not interchangeable.
Segment data cannot be queried without parsing. "Find all segments on chromosome 7 over 10 cM" requires re-parsing free text each time. No addon can filter or sort without reimplementing the parser.
Phase (maternal/paternal) is ephemeral. The Segment Map gramplet computes it from the tree at display time and discards it. It cannot be stored alongside the segment, corrected by the user, or used as a filter criterion.
Generally, writing filters and searches against DNA data can't use any of the rich facilities that Gramps provides for navigating the database.
Proposed Data Model
DNA test data divides into two distinct things:
A kit / test - one person, one provider, one result set. It anchors many matches and carries structured result data.
A match - a pairwise relationship between two kits. Not owned by either person. A bilateral relationship that genealogists actively research, accumulate evidence for, and track to a common ancestor.
DNATest (primary object)
Represents a single test kit for a single person at a single provider.
Provider-assigned kit identifier (e.g. kit number on FTDNA, alphanumeric code on GEDmatch)
test_type
DNATestType
Autosomal, Y-DNA 12, Y-DNA 37, Y-DNA 67, Y-DNA 111, Big Y, mtDNA HVR1, mtDNA Full
genome_build
DNAGenomeBuildType
Coordinate system used by this kit's provider: GRCh37, GRCh38, or unknown
date
DateBase
When the test was taken or results received
haplogroup
str
e.g. "R-L21", "H1b"; populated for Y-DNA and mtDNA
attribute_list
AttrBase
Y-STR markers, mtDNA mutations, and other key-value metadata
citation_list
CitationBase
note_list
NoteBase
media_list
MediaBase
Raw data files (CSV, BAM, etc.)
Marker and mutation storage
Y-STR markers and mtDNA mutations are stored as CUSTOM attributes in attribute_list. The marker or locus name becomes the type string and the allele or mutation value becomes the attribute value:
Attribute type (CUSTOM string)
Attribute value
DYS393
13
DYS390
24
HVR1
16069T, 16126C
Coding
315.1C
Multi-copy Y-STR markers store both allele values as a comma-separated string, following the convention used by the existing DNA Gramplet addon.
DNAMatch (primary object)
Represents a pairwise DNA match between two kits.
One side is the subject's DNATest (the kit whose match list this came from). The other side always has a DNATest record. The genealogical research task is assigning Person records to DNATest records; once done, the match is fully identified without any changes to DNAMatch itself.
Field
Type
Notes
handle
str
Internal DB key
gramps_id
str
Prefix "M%05d" -> M00001
change
timestamp
private
bool
tag_list
list
subject_test_handle
DNATest ref
The subject's DNA Test that generated this match
match_test_handle
DNATest ref
The matching DNA Test
shared_cm
float
Total shared centimorgans
percent_shared
float
Percentage of genome shared
segment_count
int
Number of shared segments
largest_segment_cm
float
Largest segment in cM
predicted_relationship
str
Platform's predicted relationship label
predicted_generations
float
Platform's estimated generations to MRCA
shared_ancestor_list
list of SharedAncestor
See below
segment_list
list of DNASegment
See below
attribute_list
AttrBase
citation_list
CitationBase
note_list
NoteBase
media_list
MediaBase
DNASegment (secondary object of DNAMatch)
Field
Type
Notes
chromosome
str
"1"-"22", "X", "Y", "MT"
start_bp
int
Start position in base pairs
end_bp
int
End position in base pairs
start_rsid
str
RSID of the first SNP in the segment; nullable
end_rsid
str
RSID of the last SNP in the segment; nullable
shared_cm
float
cM for this segment
snp_count
int
Number of SNPs compared in this segment
phase
int
Class constants: PHASE_UNASSIGNED=0, PHASE_UNKNOWN=1, PHASE_MATERNAL=2, PHASE_PATERNAL=3
SharedAncestor (secondary object of DNAMatch)
The core research task on a match is identifying the individual or individuals through whom the shared DNA passes. Each SharedAncestor entry names one person. Multiple entries represent parallel working hypotheses, independently confirmed connections through separate lines, or both members of an ancestral couple - in which case one entry is added per person.
Field
Type
Notes
person_handle
Person ref
Nullable; the specific individual who is the MRCA
description
str
Free text for ancestors not yet in the tree
confidence
int
Class constants: CONF_POSSIBLE=0, CONF_PROBABLE=1, CONF_CONFIRMED=2, CONF_REJECTED=3
citation_list
CitationBase
Documentary evidence for the connection
note_list
NoteBase
person_handle is nullable to allow recording a hypothesis before the person has been added to the tree. description carries the free-text description in that case and can supplement a linked person with additional context.
Provider export examples
GEDmatch match list export. Kit codes and names are anonymised. Email addresses are PII and are discarded on import. LargestXSeg and TotalXCM are X-DNA summaries not stored in the model; X segments appear in segment_list when chromosome browser data is fetched separately. Overlap is a GEDmatch-specific SNP overlap count with no equivalent in the schema and is discarded. TestCompany maps to the match DNATest's provider. Gen maps to predicted_generations.
MyHeritage chromosome browser export. Start RSID and End RSID map to start_rsid and end_rsid.
Name Match Name Chromosome Start Location End Location Start RSID End RSID Centimorgans SNPs
Jane Brown John Smith 12 106436777 114410428 rs61940195 rs999444 7.2 3712
Jane Brown John Smith 15 29582647 33301508 rs11071875 rs1375933 7.2 1664
Jane Brown John Smith 16 55880825 59662867 rs11076120 rs62059121 6.3 2432
Jane Brown John Smith 18 10703200 25002778 rs16975229 rs2879447 14.6 5632
GEDmatch segment data. The "B37" prefix in the column headers declares the genome build explicitly; genome_build can be set to GRCh37 without user input. The Segment threshold, Bunch limit, and SNP Density Ratio columns are GEDmatch
algorithm parameters with no equivalent in the schema and are discarded on import.
Kit anchoring. Both sides of a match are represented by DNATest records. Importing a match list creates one DNATest per match (with account_name from the platform and person_handle null). Identifying a match means setting DNATest.person_handle on that record; the DNAMatch itself does not change. A person with kits at multiple providers has independent match lists per kit. A match from AncestryDNA and a match from 23andMe are separate DNAMatch records
even if the other side is the same person.
Haplogroup propagation. Haplogroup is stored on DNATest as the raw result reported by the provider. Confirmation, triangulation, and inference from tested relatives are derived analyses outside the scope of the data model; a future plugin could compute and display these without writing to the database.
Relationship display. No confirmed relationship field is stored. Once both DNATest records have person_handle set, the relationship is computable from the tree using Gramps' existing relationship calculator. A single stored string cannot represent multiple lines of descent (endogamy, pedigree collapse). The UI displays computed relationship(s) on demand. predicted_relationship is retained as a raw record of what the provider platform reported.
Genome build. Stored on DNATest.genome_build. The build is a property of the kit's provider platform (e.g. MyHeritage uses GRCh37, FTDNA uses GRCh38). Segment coordinates in any match are implicitly in the build of subject_test_handle.genome_build. GEDmatch issues separate kit codes per build; the kit code determines which build applies.
Gramps XML Format
New top-level sections
<dnatests> and <dnamatches> are added inside <database>, after <notes> and
before <bookmarks>, following the same order convention as all other primary object
types.
<person> is omitted when person_handle is null. <provider>, <test_type>, and <genome_build> use xml_str() and set_from_xml_str(), the same pattern as EventType. An unrecognised string in any of these fields round-trips as CUSTOM. <kit_id>, <haplogroup>, and <account_name> are omitted when empty.
<person> within <shared_ancestor> is omitted when person_handle is null. confidence and phase are stored as their integer class constant values. start_rsid and end_rsid on <dna_segment> are omitted when null.
The numeric stat fields (shared_cm, percent_shared, etc.) are omitted when zero.
Bookmarks
The existing <bookmarks> section gains two new target values:
GEDCOM export and import are out of scope. The GEDCOM community is actively discussing DNA extensions as of 2024 but no ratified standard exists. GEDCOM support should be revisited once a stable extension proposal is available.
User Interface Mockups
Screenshots are taken from my working prototype dna-core branch
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
This is a proposal of a new data model for maintaining DNA test and match data with Gramps.
Note: I maintain a working prototype of this design on the dna-core branch in my fork of gramps (comparison with current gramps)
Background
Gramps has no native DNA data model. The existing workaround uses Person Attributes, Events, Associations, and Notes. This has three consequences: no standard schema, no purpose-built UI, and no reliable interoperability between addons.
Existing Addons
DNA Segment Map Gramplet
DNA Matches Gramplet
FamilyTreeDNA Gramplet (2024)
Current De Facto Data Model
No official schema exists. Addons have converged on the following conventions.
Kit and identity data - Person Attributes
AncestryIDDNAkitFTDNA kitY-DNA HgmtDNA HgY-DNA and mtDNA test results - Event
DYS393 = 13, etc.).HVR1,HVR2,Codingattributes for mtDNA mutations (comma-separated values).Autosomal match data - Association
Chromosome, Start, End, cM, SNPs.Limitations of current de facto model
Proposed Data Model
DNA test data divides into two distinct things:
DNATest (primary object)
Represents a single test kit for a single person at a single provider.
handlegramps_idchangeprivatetag_listperson_handleaccount_nameproviderkit_idtest_typegenome_builddatehaplogroupattribute_listcitation_listnote_listmedia_listMarker and mutation storage
Y-STR markers and mtDNA mutations are stored as CUSTOM attributes in
attribute_list. The marker or locus name becomes the type string and the allele or mutation value becomes the attribute value:DYS39313DYS39024HVR116069T, 16126CCoding315.1CMulti-copy Y-STR markers store both allele values as a comma-separated string, following the convention used by the existing DNA Gramplet addon.
DNAMatch (primary object)
Represents a pairwise DNA match between two kits.
One side is the subject's DNATest (the kit whose match list this came from). The other side always has a DNATest record. The genealogical research task is assigning Person records to DNATest records; once done, the match is fully identified without any changes to DNAMatch itself.
handlegramps_idchangeprivatetag_listsubject_test_handlematch_test_handleshared_cmpercent_sharedsegment_countlargest_segment_cmpredicted_relationshippredicted_generationsshared_ancestor_listsegment_listattribute_listcitation_listnote_listmedia_listDNASegment (secondary object of DNAMatch)
chromosomestart_bpend_bpstart_rsidend_rsidshared_cmsnp_countphaseSharedAncestor (secondary object of DNAMatch)
The core research task on a match is identifying the individual or individuals through whom the shared DNA passes. Each SharedAncestor entry names one person. Multiple entries represent parallel working hypotheses, independently confirmed connections through separate lines, or both members of an ancestral couple - in which case one entry is added per person.
person_handledescriptionconfidencecitation_listnote_listperson_handleis nullable to allow recording a hypothesis before the person has been added to the tree.descriptioncarries the free-text description in that case and can supplement a linked person with additional context.Provider export examples
GEDmatch match list export. Kit codes and names are anonymised. Email addresses are PII and are discarded on import.
LargestXSegandTotalXCMare X-DNA summaries not stored in the model; X segments appear insegment_listwhen chromosome browser data is fetched separately.Overlapis a GEDmatch-specific SNP overlap count with no equivalent in the schema and is discarded.TestCompanymaps to the match DNATest'sprovider.Genmaps topredicted_generations.MyHeritage chromosome browser export.
Start RSIDandEnd RSIDmap tostart_rsidandend_rsid.GEDmatch segment data. The "B37" prefix in the column headers declares the genome build explicitly;
genome_buildcan be set to GRCh37 without user input. TheSegment threshold,Bunch limit, andSNP Density Ratiocolumns are GEDmatchalgorithm parameters with no equivalent in the schema and are discarded on import.
Design Notes
Kit anchoring. Both sides of a match are represented by DNATest records. Importing a match list creates one DNATest per match (with
account_namefrom the platform andperson_handlenull). Identifying a match means settingDNATest.person_handleon that record; the DNAMatch itself does not change. A person with kits at multiple providers has independent match lists per kit. A match from AncestryDNA and a match from 23andMe are separate DNAMatch recordseven if the other side is the same person.
Haplogroup propagation. Haplogroup is stored on DNATest as the raw result reported by the provider. Confirmation, triangulation, and inference from tested relatives are derived analyses outside the scope of the data model; a future plugin could compute and display these without writing to the database.
Relationship display. No confirmed relationship field is stored. Once both DNATest records have
person_handleset, the relationship is computable from the tree using Gramps' existing relationship calculator. A single stored string cannot represent multiple lines of descent (endogamy, pedigree collapse). The UI displays computed relationship(s) on demand.predicted_relationshipis retained as a raw record of what the provider platform reported.Genome build. Stored on
DNATest.genome_build. The build is a property of the kit's provider platform (e.g. MyHeritage uses GRCh37, FTDNA uses GRCh38). Segment coordinates in any match are implicitly in the build ofsubject_test_handle.genome_build. GEDmatch issues separate kit codes per build; the kit code determines which build applies.Gramps XML Format
New top-level sections
<dnatests>and<dnamatches>are added inside<database>, after<notes>andbefore
<bookmarks>, following the same order convention as all other primary objecttypes.
DNATest element
<person>is omitted whenperson_handleis null.<provider>,<test_type>, and<genome_build>usexml_str()andset_from_xml_str(), the same pattern asEventType. An unrecognised string in any of these fields round-trips as CUSTOM.<kit_id>,<haplogroup>, and<account_name>are omitted when empty.DNAMatch element
<person>within<shared_ancestor>is omitted whenperson_handleis null.confidenceandphaseare stored as their integer class constant values.start_rsidandend_rsidon<dna_segment>are omitted when null.The numeric stat fields (
shared_cm,percent_shared, etc.) are omitted when zero.Bookmarks
The existing
<bookmarks>section gains two new target values:GEDCOM
GEDCOM export and import are out of scope. The GEDCOM community is actively discussing DNA extensions as of 2024 but no ratified standard exists. GEDCOM support should be revisited once a stable extension proposal is available.
User Interface Mockups
Screenshots are taken from my working prototype dna-core branch
DNA Tests Navigator View
DNA Test Edit Form
DNA Matches Navigator View
DNA Match Edit Form
Shared Ancestor Edit Form
DNA Segment Edit Form
Beta Was this translation helpful? Give feedback.
All reactions