Genomic Data Has a Diversity Problem, But Global Efforts Are Underway to Fix It

Genetic data sets skew too European, threatening to narrow who will benefit from future advances.
Genomics has begun its golden age. Just 20 years ago, sequencing a single genome cost nearly $3 billion and took over a decade. Today, the same feat can be achieved for a few hundred dollars and the better part of a day . Suddenly, the prospect of sequencing not just individuals, but whole populations, has become feasible.
The genetic differences between humans may seem meager, only around 0.1 percent of the genome on average, but this variation can have profound effects on an individual's risk of disease, responsiveness to medication, and even the dosage level that would work best.
Already, initiatives like the U.K.'s 100,000 Genomes Project - now expanding to 1 million genomes - and other similarly massive sequencing projects in Iceland and the U.S., have begun collecting population-scale data in order to capture and study this variation.
The resulting data sets are immensely valuable to researchers and drug developers working to design new 'precision' medicines and diagnostics, and to gain insights that may benefit patients. Yet, because the majority of this data comes from developed countries with well-established scientific and medical infrastructure, the data collected so far is heavily biased towards Western populations with largely European ancestry.
This presents a startling and fast-emerging problem: groups that are under-represented in these datasets are likely to benefit less from the new wave of therapeutics, diagnostics, and insights, simply because they were tailored for the genetic profiles of people with European ancestry.
We may indeed be approaching a golden age of genomics-enabled precision medicine. But if the data bias persists then there is a risk, as with most golden ages throughout history, that the benefits will not be equally accessible to all, and existing inequalities will only be exacerbated.
To remedy the situation, a number of initiatives have sprung up to sequence genomes of under-represented groups, adding them to the datasets and ensuring that they too will benefit from the rapidly unfolding genomic revolution.
Global Gene Corp
The idea behind Global Gene Corp was born eight years ago in Harvard when Sumit Jamuar, co-founder and CEO, met up with his two other co-founders, both experienced geneticists, for a coffee.
"They were discussing the limitless applications of understanding your genetic code," said Jamuar, a business executive from New Delhi.
"And so, being a technology enthusiast type, I was excited and I turned to them and said hey, this is incredible! Could you sequence me and give me some insights? And they actually just turned around and said no, because it's not going to be useful for you - there's not enough reference for what a good Sumit looks like."
What started as a curiosity-driven conversation on the power of genomics ended with a commitment to tackle one of the field's biggest roadblocks - its lack of global representation.
Jamuar set out to begin with India, which has about 20 percent of the world's population, including over 4000 different ethnicities, but contributes less than 2 percent of genomic data, he told Leaps.org.
Eight years later, Global Gene Corp's sequencing initiative is well underway, and is the largest in the history of the Indian subcontinent. The program is being carried out in collaboration with biotech giant Regeneron, with support from the Indian government, local communities, and the Indian healthcare ecosystem. In August 2020, Global Gene Corp's work was recognized through the $1 million 2020 Roddenberry award for organizations that advance the vision of 'Star Trek' creator Gene Roddenberry to better humanity.
This problem has already begun to manifest itself in, for example, much higher levels of genetic misdiagnosis among non-Europeans tested for their risk of certain diseases, such as hypertrophic cardiomyopathy - an inherited disease of the heart muscle.
Global Gene Corp also focuses on developing and implementing AI and machine learning tools to make sense of the deluge of genomic data. These tools are increasingly used by both industry and academia to guide future research by identifying particularly promising or clinically interesting genetic variants. But if the underlying data is skewed European, then the effectiveness of the computational analysis - along with the future advances and avenues of research that emerge from it - will be skewed towards Europeans too.
This problem has already begun to manifest itself in, for example, much higher levels of genetic misdiagnosis among non-Europeans tested for their risk of certain diseases, such as hypertrophic cardiomyopathy - an inherited disease of the heart muscle. Most of the genetic variants used in these tests were identified as being causal for the disease from studies of European genomes. However, many of these variants differ both in their distribution and clinical significance across populations, leading to many patients of non-European ancestry receiving false-positive test results - as their benign genetic variants were misclassified as pathogenic. Had even a small number of genomes from other ethnicities been included in the initial studies, these misdiagnoses could have been avoided.
"Unless we have a data set which is unbiased and representative, we're never going to achieve the success that we want," Jamuar says.
"When Siri was first launched, she could hardly recognize an accent which was not of a certain type, so if I was trying to speak to Siri, I would have to repeat myself multiple times and try to mimic an accent which wasn't my accent so that she could understand it.
"But over time the voice recognition technology improved tremendously because the training data was expanded to include people of very diverse backgrounds and their accents, so the algorithms were trained to be able to pick that up and it dramatically improved the technology. That's the way we have to think about it - without that good-quality diverse data, we will never be able to achieve the full potential of the computational tools."
While mapping India's rich genetic diversity has been the organization's primary focus so far, they plan, in time, to expand their work to other under-represented groups in Asia, the Middle East, Africa, and Latin America.
"As other like-minded people and partners join the mission, it just accelerates the achievement of what we have set out to do, which is to map out and organize the world's genomic diversity so that we can enable high-quality life and longevity benefits for everyone, everywhere," Jamuar says.
Empowering African Genomics
Africa is the birthplace of our species, and today still retains an inordinate amount of total human genetic diversity. Groups that left Africa and went on to populate the rest of the world, some 50 to 100,000 years ago, were likely small in number and only took a fraction of the total genetic diversity with them. This ancient bottleneck means that no other group in the world can match the level of genetic diversity seen in modern African populations.
Despite Africa's central importance in understanding the history and extent of human genetic diversity, the genomics of African populations remains wildly understudied. Addressing this disparity has become a central focus of the H3Africa Consortium, an initiative formally launched in 2012 with support from the African Academy of Sciences, the U.S. National Institutes of Health, and the UK's Wellcome Trust. Today, H3Africa supports over 50 projects across the continent, on an array of different research areas in genetics relevant to the health and heredity of Africans.
"Africa is the cradle of Humankind. So what that really means is that the populations that are currently living in Africa are among some of the oldest populations on the globe, and we know that the longer populations have had to go through evolutionary phases, the more variation there is in the genomes of people who live presently," says Zane Lombard, a principal investigator at H3Africa and Associate Professor of Human Genetics at the University of the Witwatersrand in Johannesburg, South Africa.
"So for that reason, African populations carry a huge amount of genetic variation and diversity, which is pretty much uncaptured. There's still a lot to learn as far as novel variation is concerned by looking at and studying African genomes."
A recent landmark H3Africa study, led by Lombard and published in Nature in October, sequenced the genomes of over 400 African individuals from 50 ethno-linguistic groups - many of which had never been sampled before.
Despite the relatively modest number of individuals sequenced in the study, over three million previously undescribed genetic variants were found, and complex patterns of ancestral migration were uncovered.
"In some of these ethno-linguistic groups they don't have a word for DNA, so we've had to really think about how to make sure that we communicate the purposes of different studies to participants so that you have true informed consent," says Lombard.
"The objective," she explained, "was to try and fill some of the gaps for many of these populations for which we didn't have any whole genome sequences or any genetic variation data...because if we're thinking about the future of precision medicine, if the patient is a member of a specific group where we don't know a lot about the genomic variation that exists in that group, it makes it really difficult to start thinking about clinical interpretation of their data."
From H3Africa's conception, the consortium's goal has not only been to better represent Africa's staggering genetic diversity in genomic data sets, but also to build Africa's domestic genomics capabilities and empower a new generation of African researchers. By doing so, the hope is that Africans will be able to set their own genomics agenda, and leapfrog to new and better ways of doing the work.
"The training that has happened on the continent and the number of new scientists, new students, and fellows that have come through the process and are now enabled to start their own research groups, to grow their own research in their countries, to be a spokesperson for genomics research in their countries, and to build that political will to do these larger types of sequencing initiatives - that is really a significant outcome from H3Africa as well. Over and above all the science that's coming out," Lombard says.
"What has been created through H3Africa is just this locus of researchers and scientists and bioethicists who have the same goal at heart - to work towards adjusting the data bias and making sure that all global populations are represented in genomics."
Following the Footsteps of a 105-Year-Old Sprinter
No human has run a distance of 100 meters faster than Usain Bolt’s lightning streak in 2009. He set this record at age 22. But what will Bolt’s time be when he’s 105?
At the Louisiana Senior Games in November 2021, 105-year-old Julia Hawkins of Baton Rouge became the oldest woman to run 100 meters in an official competition, qualifying her for this year's National Senior Games. Perhaps not surprisingly, she was the only competitor in the race for people 105 and older. In this Leaps.org video, I interview Hawkins about her lifestyle habits over the decades. Then I ask Steven Austad, a pioneer in studying the mechanisms of aging, for his scientific insights into how those aspiring to become super-agers might follow in Hawkins' remarkable footsteps.
Following the Footsteps of a 105-Year-Old Sprinter
No human has run a distance of 100 meters faster than Usain Bolt’s lightning streak in 2009. He set this record at age 22. But what will Bolt’s time be when ...Matt Fuchs is the editor-in-chief of Leaps.org. He is also a contributing reporter to the Washington Post and has written for the New York Times, Time Magazine, WIRED and the Washington Post Magazine, among other outlets. Follow him on Twitter @fuchswriter.
Monkeypox produces more telltale signs than COVID-19. Scientists think that a “ring” vaccination strategy can be used when these signs appear to help with squelching the current outbreak of this disease.
A new virus has emerged and stoked fears of another pandemic: monkeypox. Since May 2022, it has been detected in 29 U.S. states, the District of Columbia, and Puerto Rico among international travelers and their close contacts. On a worldwide scale, as of June 30, there have been 5,323 cases in 52 countries.
The good news: An existing vaccine can go a long way toward preventing a catastrophic outbreak. Because monkeypox is a close relative of smallpox, the same vaccine can be used—and it is about 85 percent effective against the virus, according to the World Health Organization (WHO).
Also on the plus side, monkeypox is less contagious with milder illness than smallpox and, compared to COVID-19, produces more telltale signs. Scientists think that a “ring” vaccination strategy can be used when these signs appear to help with squelching this alarming outbreak.
How it’s transmitted
Monkeypox spreads between people primarily through direct contact with infectious sores, scabs, or bodily fluids. People also can catch it through respiratory secretions during prolonged, face-to-face contact, according to the Centers for Disease Control and Prevention (CDC).
As of June 30, there have been 396 documented monkeypox cases in the U.S., and the CDC has activated its Emergency Operations Center to mobilize additional personnel and resources. The U.S. Department of Health and Human Services is aiming to boost testing capacity and accessibility. No Americans have died from monkeypox during this outbreak but, during the COVID-19 pandemic (February 2020 to date), Africa has documented 12,141 cases and 363 deaths from monkeypox.
Ring vaccination proved effective in curbing the smallpox and Ebola outbreaks. As the monkeypox threat continues to loom, scientists view this as the best vaccine approach.
A person infected with monkeypox typically has symptoms—for instance, fever and chills—in a contagious state, so knowing when to avoid close contact with others makes it easier to curtail than COVID-19.
Advantages of ring vaccination
For this reason, it’s feasible to vaccinate a “ring” of people around the infected individual rather than inoculating large swaths of the population. Ring vaccination proved effective in curbing the smallpox and Ebola outbreaks. As the monkeypox threat continues to loom, scientists view this as the best vaccine approach.
With many infections, “it normally would make sense to everyone to vaccinate more widely,” says Wesley C. Van Voorhis, a professor and director of the Center for Emerging and Re-emerging Infectious Diseases at the University of Washington School of Medicine in Seattle. However, “in this case, ring vaccination may be sufficient to contain the outbreak and also minimize the rare, but potentially serious side effects of the smallpox/monkeypox vaccine.”
There are two licensed smallpox vaccines in the United States: ACAM2000 (live Vaccina virus) and JYNNEOS (live virus non-replicating). The ACAM 2000, Van Voorhis says, is the old smallpox vaccine that, in rare instances, could spread diffusely within the body and cause heart problems, as well as severe rash in people with eczema or serious infection in immunocompromised patients.
To prevent organ damage, the current recommendation would be to use the JYNNEOS vaccine, says Phyllis Kanki, a professor of health sciences in the division of immunology and infectious diseases at the Harvard T.H. Chan School of Public Health. However, according to a report on the CDC’s website, people with immunocompromising conditions could have a higher risk of getting a severe case of monkeypox, despite being vaccinated, and “might be less likely to mount an effective response after any vaccination, including after JYNNEOS.”
In the late 1960s, the ring vaccination strategy became part of the WHO’s mission to globally eradicate smallpox, with the last known natural case described in Somalia in 1977. Ring vaccination can also refer to how a clinical trial is designed, as was the case in 2015, when this approach was used for researching the benefits of an investigational Ebola vaccine in Guinea, Kanki says.
“Since Monkeypox spreads by close contact and we have an effective vaccine, vaccinating high-risk individuals and their contacts may be a good strategy to limit transmission,” she says, adding that privacy is an important ethical principle that comes into play, as people with monkeypox would need to disclose their close contacts so that they could benefit from ring vaccination.
Rapid identification of cases and contacts—along with their cooperation—is essential for ring vaccination to be effective. Although mass vaccination also may work, the risk of infection to most of the population remains low while supply of the JYNNEOS vaccine is limited, says Stanley Deresinski, a clinical professor of medicine in the Infectious Disease Clinic at Stanford University School of Medicine.
Other strategies for preventing transmission
Ideally, the vaccine should be administered within four days of an exposure, but it’s recommended for up to 14 days. The WHO also advocates more widespread vaccination campaigns in the population segment with the most cases so far: men who engage in sex with other men.
The virus appears to be spreading in sexual networks, which differs from what was seen in previously reported outbreaks of monkeypox (outside of Africa), where risk was associated with travel to central or west Africa or various types of contact with individuals or animals from those locales. There is no evidence of transmission by food, but contaminated articles in the environment such as bedding are potential sources of the virus, Deresinski says.
Severe cases of monkeypox can occur, but “transmission of the virus requires close contact,” he says. “There is no evidence of aerosol transmission, as occurs with SARS-CoV-2, although it must be remembered that the smallpox virus, a close relative of monkeypox, was transmitted by aerosol.”
Deresinski points to the fact that in 2003, monkeypox was introduced into the U.S. through imports from Ghana of infected small mammals, such as Gambian giant rats, as pets. They infected prairie dogs, which also were sold as pets and, ultimately, this resulted in 37 confirmed transmissions to humans and 10 probable cases. A CDC investigation identified no cases of human-to-human transmission. Then, in 2021, a traveler flew from Nigeria to Dallas through Atlanta, developing skin lesions several days after arrival. Another CDC investigation yielded 223 contacts, although 85 percent were deemed to be at only minimal risk and the remainder at intermediate risk. No new cases were identified.
How much should we be worried
But how serious of a threat is monkeypox this time around? “Right now, the risk to the general public is very low,” says Scott Roberts, an assistant professor and associate medical director of infection prevention at Yale School of Medicine. “Monkeypox is spread through direct contact with infected skin lesions or through close contact for a prolonged period of time with an infected person. It is much less transmissible than COVID-19.”
The monkeypox incubation period—the time from infection until the onset of symptoms—is typically seven to 14 days but can range from five to 21 days, compared with only three days for the Omicron variant of COVID-19. With such a long incubation, there is a larger window to conduct contact tracing and vaccinate people before symptoms appear, which can prevent infection or lessen the severity.
But symptoms may present atypically or recognition may be delayed. “Ring vaccination works best with 100 percent adherence, and in the absence of a mandate, this is not achievable,” Roberts says.
At the outset of infection, symptoms include fever, chills, and fatigue. Several days later, a rash becomes noticeable, usually beginning on the face and spreading to other parts of the body, he says. The rash starts as flat lesions that raise and develop fluid, similar to manifestations of chickenpox. Once the rash scabs and falls off, a person is no longer contagious.
“It's an uncomfortable infection,” says Van Voorhis, the University of Washington School of Medicine professor. There may be swollen lymph nodes. Sores and rash are often limited to the genitals and areas around the mouth or rectum, suggesting intimate contact as the source of spread.
Symptoms of monkeypox usually last from two to four weeks. The WHO estimated that fatalities range from 3 to 6 percent. Although it’s believed to infect various animal species, including rodents and monkeys in west and central Africa, “the animal reservoir for the virus is unknown,” says Kanki, the Harvard T.H. Chan School of Public Health professor.
Too often, viruses originate in parts of the world that are too poor to grapple with them and may lack the resources to invest in vaccines and treatments. “This disease is endemic in central and west Africa, and it has basically been ignored until it jumped to the north and infected Europeans, Americans, and Canadians,” Van Voorhis says. “We have to do a better job in health care and prevention all over the world. This is the kind of thing that comes back to bite us.”