This past April, an alleged serial rapist and murderer, who had remained unidentified for over 40 years, was located by comparing a crime scene DNA profile to a public genetic genealogy database designed to identify biological relatives and reconstruct family trees. The so-called "Golden State Killer" had not placed his own profile in the database.
Forensic use of genetic genealogy data is possible thanks to widening public participation in direct-to-consumer recreational genetic testing.
Instead, a number of his distant genetic cousins had, resulting in partial matches between themselves and the forensic profile. Investigators then traced the shared heritage of the relatives to great-great-great-grandparents and using these connections, as well as other public records, narrowed their search to just a handful of individuals, one of whom was found to be an exact genetic match to the crime scene sample.
Forensic use of genetic genealogy data is possible thanks to widening public participation in direct-to-consumer recreational genetic testing. The Federal Bureau of Investigation maintains a national forensic genetic database (which currently contains over 16 million unique profiles, over-representing individuals of non-European ancestry); each profile holds genetic information from only 13 to 20 variable gene regions, just enough to identify a suspect. However, since this database and related forensic databases were established, the nature of genetic profiling has significantly changed: direct-to-consumer genetic tests routinely use whole genome scans involving simultaneous analysis of hundreds of thousands of variants.
With such comprehensive genetic information, it becomes possible to discern more distant genetic relatives. Thus, even though public DNA collections are smaller than most law enforcement databases, the potential to connect a crime scene sample to biological relatives is enhanced. The successful use of one genealogy database (GEDMatch) in the GSK case demonstrates the power of the approach, so much so that the genetic profiles of over 100 similar cold cases are now being run through the same resource. Indeed, in the two months since the GSK case was first reported, 5 other cold cases have been solved using similar methods.
Autonomy in the Genomic Age
While few would disagree with the importance of finally bringing to justice those who commit serious violent offenses, this new forensic genetic application has sparked broad discussion of privacy-related and ethical concerns. Before, the main genetic databases accessible to the police were those containing the profiles of accused or convicted criminals, but now the DNA of many more "innocent bystanders," across multiple generations, are in play.
The genetic services that provide a venue for data sharing typically warn participants that their information can be used for purposes beyond those they intend, but there is no legal prohibition on the use of crowd-sourced public collections for forensic investigation. Some services, such as GEDMatch, now explicitly welcome possible law enforcement use.
The decisions of individuals to contribute their own genetic information inadvertently exposes many others across their family tree.
The implication is that consumers must choose for themselves whether they are willing to bring their genetic information into the public sphere. Many have no problem doing so, seeing value in law enforcement access to such data. But the decisions of individuals to contribute their own genetic information inadvertently exposes many others across their family tree who may not be aware of or interested in their genetic relationships going public.
As one well-known statistical geneticist who predicted forensic uses of public genetic data noted: "You are a beacon who illuminates 300 people around you." By the same token, 300 people, most of whom you do not know and have probably never met, can illuminate your genetic information; indeed a recent analysis has suggested that most in the U.S. are identifiable in this way. There is nothing that you can do about it, no way to opt out. Thus, police interaction with such databases must be addressed as a public policy issue, not left to the informed consent of individual consumers.
When Consent Will Not Suffice
For those concerned by the broader implications of such practices, the simplest solution might be to discourage open access sharing of detailed genetic information. But let's say that we are willing to continue to allow those with an interest in genealogy to make their data readily searchable. What safeguards should we implement to ensure that the family members who don't want to opt in, or who don't have the ability to make that choice, remain unharmed? Their autonomy counts, too.
We might consider regulation similar to the kind that limit law enforcement use of forensic genetic databases of convicted and arrested individuals. For example, in California, familial searches can only be performed using the database of convicted individuals in cases of serious crimes with public safety implications where all other investigatory methods have been exhausted, and where single-source high-quality DNA is available for analysis. Further, California policy separates the genealogical investigative team from local detectives, so as to minimize the impact of incidental findings (such as unexpected non-paternity).
Importantly, the individual apprehended was not the first, or even second, but the third person subjected to enhanced police scrutiny.
No such regulations currently govern law enforcement searches of public genealogical databases, and we know relatively little about the specifics of the GSK investigation. We do not know the methods used to infer genetic relationships, or their likelihood of mistakenly suggesting a relationship where none exists. Nor do we know the level of genetic identity considered relevant for subsequent follow-up. It is also unclear how law enforcement investigators combined the genetic information they received with other public records data. Together, this leaves room for an unknown degree of investigation into an unknown number of individuals.
Why This Matters
What has been revealed is that the GSK search resulted in the identification of 10 to 20 potential distant genetic relatives, which led to the investigation of 25 different family trees, 24 of which did not contain the alleged serial rapist and murderer. While some sources described a pool of 100 possible male suspects identified from this exercise, others imply that the total number of relatives encompassed by the investigation was far larger. One account, for example, suggests that there were roughly 1000 family members in just the one branch of the genealogy that included the alleged perpetrator. Importantly, the individual apprehended was not the first, or even second, but the third person subjected to enhanced police scrutiny: reports describe at least two false leads, including one where a warrant was issued to obtain a DNA sample.
These details, many of which only came to light after intense press coverage, raise a host of concerns about the methods employed and the degree to which they exposed otherwise innocent individuals to harms associated with unjustified privacy intrusions. Only with greater transparency and oversight will we be able to ensure that the interests of people curious about their family tree do not unfairly impinge on those of their mostly law-abiding near and distant genetic relatives.