Genomic Data Has a Diversity Problem, But Global Efforts Are Underway to Fix It
Genomics has begun its golden age. Just 20 years ago, sequencing a single genome cost nearly $3 billion and took over a decade. Today, the same feat can be achieved for a few hundred dollars and the better part of a day . Suddenly, the prospect of sequencing not just individuals, but whole populations, has become feasible.
The genetic differences between humans may seem meager, only around 0.1 percent of the genome on average, but this variation can have profound effects on an individual's risk of disease, responsiveness to medication, and even the dosage level that would work best.
Already, initiatives like the U.K.'s 100,000 Genomes Project - now expanding to 1 million genomes - and other similarly massive sequencing projects in Iceland and the U.S., have begun collecting population-scale data in order to capture and study this variation.
The resulting data sets are immensely valuable to researchers and drug developers working to design new 'precision' medicines and diagnostics, and to gain insights that may benefit patients. Yet, because the majority of this data comes from developed countries with well-established scientific and medical infrastructure, the data collected so far is heavily biased towards Western populations with largely European ancestry.
This presents a startling and fast-emerging problem: groups that are under-represented in these datasets are likely to benefit less from the new wave of therapeutics, diagnostics, and insights, simply because they were tailored for the genetic profiles of people with European ancestry.
We may indeed be approaching a golden age of genomics-enabled precision medicine. But if the data bias persists then there is a risk, as with most golden ages throughout history, that the benefits will not be equally accessible to all, and existing inequalities will only be exacerbated.
To remedy the situation, a number of initiatives have sprung up to sequence genomes of under-represented groups, adding them to the datasets and ensuring that they too will benefit from the rapidly unfolding genomic revolution.
Global Gene Corp
The idea behind Global Gene Corp was born eight years ago in Harvard when Sumit Jamuar, co-founder and CEO, met up with his two other co-founders, both experienced geneticists, for a coffee.
"They were discussing the limitless applications of understanding your genetic code," said Jamuar, a business executive from New Delhi.
"And so, being a technology enthusiast type, I was excited and I turned to them and said hey, this is incredible! Could you sequence me and give me some insights? And they actually just turned around and said no, because it's not going to be useful for you - there's not enough reference for what a good Sumit looks like."
What started as a curiosity-driven conversation on the power of genomics ended with a commitment to tackle one of the field's biggest roadblocks - its lack of global representation.
Jamuar set out to begin with India, which has about 20 percent of the world's population, including over 4000 different ethnicities, but contributes less than 2 percent of genomic data, he told Leaps.org.
Eight years later, Global Gene Corp's sequencing initiative is well underway, and is the largest in the history of the Indian subcontinent. The program is being carried out in collaboration with biotech giant Regeneron, with support from the Indian government, local communities, and the Indian healthcare ecosystem. In August 2020, Global Gene Corp's work was recognized through the $1 million 2020 Roddenberry award for organizations that advance the vision of 'Star Trek' creator Gene Roddenberry to better humanity.
This problem has already begun to manifest itself in, for example, much higher levels of genetic misdiagnosis among non-Europeans tested for their risk of certain diseases, such as hypertrophic cardiomyopathy - an inherited disease of the heart muscle.
Global Gene Corp also focuses on developing and implementing AI and machine learning tools to make sense of the deluge of genomic data. These tools are increasingly used by both industry and academia to guide future research by identifying particularly promising or clinically interesting genetic variants. But if the underlying data is skewed European, then the effectiveness of the computational analysis - along with the future advances and avenues of research that emerge from it - will be skewed towards Europeans too.
This problem has already begun to manifest itself in, for example, much higher levels of genetic misdiagnosis among non-Europeans tested for their risk of certain diseases, such as hypertrophic cardiomyopathy - an inherited disease of the heart muscle. Most of the genetic variants used in these tests were identified as being causal for the disease from studies of European genomes. However, many of these variants differ both in their distribution and clinical significance across populations, leading to many patients of non-European ancestry receiving false-positive test results - as their benign genetic variants were misclassified as pathogenic. Had even a small number of genomes from other ethnicities been included in the initial studies, these misdiagnoses could have been avoided.
"Unless we have a data set which is unbiased and representative, we're never going to achieve the success that we want," Jamuar says.
"When Siri was first launched, she could hardly recognize an accent which was not of a certain type, so if I was trying to speak to Siri, I would have to repeat myself multiple times and try to mimic an accent which wasn't my accent so that she could understand it.
"But over time the voice recognition technology improved tremendously because the training data was expanded to include people of very diverse backgrounds and their accents, so the algorithms were trained to be able to pick that up and it dramatically improved the technology. That's the way we have to think about it - without that good-quality diverse data, we will never be able to achieve the full potential of the computational tools."
While mapping India's rich genetic diversity has been the organization's primary focus so far, they plan, in time, to expand their work to other under-represented groups in Asia, the Middle East, Africa, and Latin America.
"As other like-minded people and partners join the mission, it just accelerates the achievement of what we have set out to do, which is to map out and organize the world's genomic diversity so that we can enable high-quality life and longevity benefits for everyone, everywhere," Jamuar says.
Empowering African Genomics
Africa is the birthplace of our species, and today still retains an inordinate amount of total human genetic diversity. Groups that left Africa and went on to populate the rest of the world, some 50 to 100,000 years ago, were likely small in number and only took a fraction of the total genetic diversity with them. This ancient bottleneck means that no other group in the world can match the level of genetic diversity seen in modern African populations.
Despite Africa's central importance in understanding the history and extent of human genetic diversity, the genomics of African populations remains wildly understudied. Addressing this disparity has become a central focus of the H3Africa Consortium, an initiative formally launched in 2012 with support from the African Academy of Sciences, the U.S. National Institutes of Health, and the UK's Wellcome Trust. Today, H3Africa supports over 50 projects across the continent, on an array of different research areas in genetics relevant to the health and heredity of Africans.
"Africa is the cradle of Humankind. So what that really means is that the populations that are currently living in Africa are among some of the oldest populations on the globe, and we know that the longer populations have had to go through evolutionary phases, the more variation there is in the genomes of people who live presently," says Zane Lombard, a principal investigator at H3Africa and Associate Professor of Human Genetics at the University of the Witwatersrand in Johannesburg, South Africa.
"So for that reason, African populations carry a huge amount of genetic variation and diversity, which is pretty much uncaptured. There's still a lot to learn as far as novel variation is concerned by looking at and studying African genomes."
A recent landmark H3Africa study, led by Lombard and published in Nature in October, sequenced the genomes of over 400 African individuals from 50 ethno-linguistic groups - many of which had never been sampled before.
Despite the relatively modest number of individuals sequenced in the study, over three million previously undescribed genetic variants were found, and complex patterns of ancestral migration were uncovered.
"In some of these ethno-linguistic groups they don't have a word for DNA, so we've had to really think about how to make sure that we communicate the purposes of different studies to participants so that you have true informed consent," says Lombard.
"The objective," she explained, "was to try and fill some of the gaps for many of these populations for which we didn't have any whole genome sequences or any genetic variation data...because if we're thinking about the future of precision medicine, if the patient is a member of a specific group where we don't know a lot about the genomic variation that exists in that group, it makes it really difficult to start thinking about clinical interpretation of their data."
From H3Africa's conception, the consortium's goal has not only been to better represent Africa's staggering genetic diversity in genomic data sets, but also to build Africa's domestic genomics capabilities and empower a new generation of African researchers. By doing so, the hope is that Africans will be able to set their own genomics agenda, and leapfrog to new and better ways of doing the work.
"The training that has happened on the continent and the number of new scientists, new students, and fellows that have come through the process and are now enabled to start their own research groups, to grow their own research in their countries, to be a spokesperson for genomics research in their countries, and to build that political will to do these larger types of sequencing initiatives - that is really a significant outcome from H3Africa as well. Over and above all the science that's coming out," Lombard says.
"What has been created through H3Africa is just this locus of researchers and scientists and bioethicists who have the same goal at heart - to work towards adjusting the data bias and making sure that all global populations are represented in genomics."
Scientists redesign bacteria to tackle the antibiotic resistance crisis
In 1945, almost two decades after Alexander Fleming discovered penicillin, he warned that as antibiotics use grows, they may lose their efficiency. He was prescient—the first case of penicillin resistance was reported two years later. Back then, not many people paid attention to Fleming’s warning. After all, the “golden era” of the antibiotics age had just began. By the 1950s, three new antibiotics derived from soil bacteria — streptomycin, chloramphenicol, and tetracycline — could cure infectious diseases like tuberculosis, cholera, meningitis and typhoid fever, among others.
Today, these antibiotics and many of their successors developed through the 1980s are gradually losing their effectiveness. The extensive overuse and misuse of antibiotics led to the rise of drug resistance. The livestock sector buys around 80 percent of all antibiotics sold in the U.S. every year. Farmers feed cows and chickens low doses of antibiotics to prevent infections and fatten up the animals, which eventually causes resistant bacterial strains to evolve. If manure from cattle is used on fields, the soil and vegetables can get contaminated with antibiotic-resistant bacteria. Another major factor is doctors overprescribing antibiotics to humans, particularly in low-income countries. Between 2000 to 2018, the global rates of human antibiotic consumption shot up by 46 percent.
In recent years, researchers have been exploring a promising avenue: the use of synthetic biology to engineer new bacteria that may work better than antibiotics. The need continues to grow, as a Lancetstudy linked antibiotic resistance to over 1.27 million deaths worldwide in 2019, surpassing HIV/AIDS and malaria. The western sub-Saharan Africa region had the highest death rate (27.3 people per 100,000).
Researchers warn that if nothing changes, by 2050, antibiotic resistance could kill 10 million people annually.
To make it worse, our remedy pipelines are drying up. Out of the 18 biggest pharmaceutical companies, 15 abandoned antibiotic development by 2013. According to the AMR Action Fund, venture capital has remained indifferent towards biotech start-ups developing new antibiotics. In 2019, at least two antibiotic start-ups filed for bankruptcy. As of December 2020, there were 43 new antibiotics in clinical development. But because they are based on previously known molecules, scientists say they are inadequate for treating multidrug-resistant bacteria. Researchers warn that if nothing changes, by 2050, antibiotic resistance could kill 10 million people annually.
The rise of synthetic biology
To circumvent this dire future, scientists have been working on alternative solutions using synthetic biology tools, meaning genetically modifying good bacteria to fight the bad ones.
From the time life evolved on earth around 3.8 billion years ago, bacteria have engaged in biological warfare. They constantly strategize new methods to combat each other by synthesizing toxic proteins that kill competition.
For example, Escherichia coli produces bacteriocins or toxins to kill other strains of E.coli that attempt to colonize the same habitat. Microbes like E.coli (which are not all pathogenic) are also naturally present in the human microbiome. The human microbiome harbors up to 100 trillion symbiotic microbial cells. The majority of them are beneficial organisms residing in the gut at different compositions.
The chemicals that these “good bacteria” produce do not pose any health risks to us, but can be toxic to other bacteria, particularly to human pathogens. For the last three decades, scientists have been manipulating bacteria’s biological warfare tactics to our collective advantage.
In the late 1990s, researchers drew inspiration from electrical and computing engineering principles that involve constructing digital circuits to control devices. In certain ways, every cell in living organisms works like a tiny computer. The cell receives messages in the form of biochemical molecules that cling on to its surface. Those messages get processed within the cells through a series of complex molecular interactions.
Synthetic biologists can harness these living cells’ information processing skills and use them to construct genetic circuits that perform specific instructions—for example, secrete a toxin that kills pathogenic bacteria. “Any synthetic genetic circuit is merely a piece of information that hangs around in the bacteria’s cytoplasm,” explains José Rubén Morones-Ramírez, a professor at the Autonomous University of Nuevo León, Mexico. Then the ribosome, which synthesizes proteins in the cell, processes that new information, making the compounds scientists want bacteria to make. “The genetic circuit remains separated from the living cell’s DNA,” Morones-Ramírez explains. When the engineered bacteria replicates, the genetic circuit doesn’t become part of its genome.
Highly intelligent by bacterial standards, some multidrug resistant V. cholerae strains can also “collaborate” with other intestinal bacterial species to gain advantage and take hold of the gut.
In 2000, Boston-based researchers constructed an E.coli with a genetic switch that toggled between turning genes on and off two. Later, they built some safety checks into their bacteria. “To prevent unintentional or deleterious consequences, in 2009, we built a safety switch in the engineered bacteria’s genetic circuit that gets triggered after it gets exposed to a pathogen," says James Collins, a professor of biological engineering at MIT and faculty member at Harvard University’s Wyss Institute. “After getting rid of the pathogen, the engineered bacteria is designed to switch off and leave the patient's body.”
Overuse and misuse of antibiotics causes resistant strains to evolve
Seek and destroy
As the field of synthetic biology developed, scientists began using engineered bacteria to tackle superbugs. They first focused on Vibrio cholerae, whichin the 19th and 20th century caused cholera pandemics in India, China, the Middle East, Europe, and Americas. Like many other bacteria, V. cholerae communicate with each other via quorum sensing, a process in which the microorganisms release different signaling molecules, to convey messages to its brethren. Highly intelligent by bacterial standards, some multidrug resistant V. choleraestrains can also “collaborate” with other intestinal bacterial species to gain advantage and take hold of the gut. When untreated, cholera has a mortality rate of 25 to 50 percent and outbreaks frequently occur in developing countries, especially during floods and droughts.
Sometimes, however, V. cholerae makes mistakes. In 2008, researchers at Cornell University observed that when quorum sensing V. cholerae accidentally released high concentrations of a signaling molecule called CAI-1, it had a counterproductive effect—the pathogen couldn’t colonize the gut.
So the group, led byJohn March, professor of biological and environmental engineering, developed a novel strategy to combat V. cholerae. They genetically engineered E.coli toeavesdrop on V. cholerae communication networks and equipped it with the ability to release the CAI-1 molecules. That interfered with V. cholerae progress.Two years later, the Cornell team showed that V. cholerae-infected mice treated with engineered E.coli had a 92 percent survival rate.
These findings inspired researchers to sic the good bacteria present in foods like yogurt and kimchi onto the drug-resistant ones.
Three years later in 2011, Singapore-based scientists engineered E.coli to detect and destroy Pseudomonas aeruginosa, an oftendrug-resistant pathogen that causes pneumonia, urinary tract infections, and sepsis. Once the genetically engineered E.coli found its target through its quorum sensing molecules, it then released a peptide, that could eradicate 99 percent of P. aeruginosa cells in a test-tube experiment. The team outlined their work in a Molecular Systems Biology study.
“At the time, we knew that we were entering new, uncharted territory,” says lead author Matthew Chang, an associate professor and synthetic biologist at the National University of Singapore and lead author of the study. “To date, we are still in the process of trying to understand how long these microbes stay in our bodies and how they might continue to evolve.”
More teams followed the same path. In a 2013 study, MIT researchers also genetically engineered E.coli to detect P. aeruginosa via the pathogen’s quorum-sensing molecules. It then destroyed the pathogen by secreting a lab-made toxin.
Probiotics that fight
A year later in 2014, a Nature study found that the abundance of Ruminococcus obeum, a probiotic bacteria naturally occurring in the human microbiome, interrupts and reduces V.cholerae’s colonization—by detecting the pathogen’s quorum sensing molecules. The natural accumulation of R. obeumin Bangladeshi adults helped them recover from cholera despite living in an area with frequent outbreaks.
Engineered bacteria can be trained to target pathogens when they are at their most vulnerable metabolic stage in the human gut. --José Rubén Morones-Ramírez.
These findings inspired researchers to sic the good bacteria present in foods like yogurt and kimchi onto the drug-resistant ones. So far, researchers have engineered various probiotic organisms to fight pathogenic bacteria like Staphylococcus aureus (leading cause of skin, tissue, bone, joint and blood infections) and Clostridium perfringens (which causes watery diarrhea) in test-tube and animal experiments. In 2020, Russian scientists engineered a probiotic called Pichia pastoris to produce an enzyme called lysostaphin that eradicated S. aureus in vitro. Another 2020 study from China used an engineered probiotic bacteria Lactobacilli casei as a vaccine to prevent C. perfringens infection in rabbits.
In a study last year, Ramírez’s group at the Autonomous University of Nuevo León, engineered E. coli to detect quorum-sensing molecules from Methicillin-resistant Staphylococcus aureus or MRSA, a notorious superbug. The E. coli then releases a bacteriocin that kills MRSA. “An antibiotic is just a molecule that is not intelligent,” says Ramírez. “On the other hand, engineered bacteria can be trained to target pathogens when they are at their most vulnerable metabolic stage in the human gut.”
Collins and Timothy Lu, an associate professor of biological engineering at MIT, found that engineered E. coli can help treat other conditions—such as phenylketonuria, a rare metabolic disorder, that causes the build-up of an amino acid phenylalanine. Their start-up Synlogic aims to commercialize the technology, and has completed a phase 2 clinical trial.
Circumventing the challenges
The bacteria-engineering technique is not without pitfalls. One major challenge is that beneficial gut bacteria produce their own quorum-sensing molecules that can be similar to those that pathogens secrete. If an engineered bacteria’s biosensor is not specific enough, it will be ineffective.
Another concern is whether engineered bacteria might mutate after entering the gut. “As with any technology, there are risks where bad actors could have the capability to engineer a microbe to act quite nastily,” says Collins of MIT. But Collins and Ramírez both insist that the chances of the engineered bacteria mutating on its own are virtually non-existent. “It is extremely unlikely for the engineered bacteria to mutate,” Ramírez says. “Coaxing a living cell to do anything on command is immensely challenging. Usually, the greater risk is that the engineered bacteria entirely lose its functionality.”
However, the biggest challenge is bringing the curative bacteria to consumers. Pharmaceutical companies aren’t interested in antibiotics or their alternatives because it’s less profitable than developing new medicines for non-infectious diseases. Unlike the more chronic conditions like diabetes or cancer that require long-term medications, infectious diseases are usually treated much quicker. Running clinical trials are expensive and antibiotic-alternatives aren’t lucrative enough.
“Unfortunately, new medications for antibiotic resistant infections have been pushed to the bottom of the field,” says Lu of MIT. “It's not because the technology does not work. This is more of a market issue. Because clinical trials cost hundreds of millions of dollars, the only solution is that governments will need to fund them.” Lu stresses that societies must lobby to change how the modern healthcare industry works. “The whole world needs better treatments for antibiotic resistance.”
Meet Dr. Renee Wegrzyn, the first Director of President Biden's new health agency, ARPA-H
In today’s podcast episode, I talk with Renee Wegrzyn, appointed by President Biden as the first director of a health agency created last year, the Advanced Research Projects Agency for Health, or ARPA-H. It’s inspired by DARPA, the agency that develops innovations for the Defense department and has been credited with hatching world-changing technologies such as ARPANET, which became the internet.
Time will tell if ARPA-H will lead to similar achievements in the realm of health. That’s what President Biden and Congress expect in return for funding ARPA-H at 2.5 billion dollars over three years.
Listen on Apple | Listen on Spotify | Listen on Stitcher | Listen on Amazon | Listen on Google
How will the agency figure out which projects to take on, especially with so many patient advocates for different diseases demanding moonshot funding for rapid progress?
I talked with Dr. Wegrzyn about the opportunities and challenges, what lessons ARPA-H is borrowing from Operation Warp Speed, how she decided on the first ARPA-H project that was announced recently, why a separate agency was needed instead of reforming HHS and the National Institutes of Health to be better at innovation, and how ARPA-H will make progress on disease prevention in addition to treatments for cancer, Alzheimer’s and diabetes, among many other health priorities.
Dr. Wegrzyn’s resume leaves no doubt of her suitability for this role. She was a program manager at DARPA where she focused on applying gene editing and synthetic biology to the goal of improving biosecurity. For her work there, she received the Superior Public Service Medal and, in case that wasn’t enough ARPA experience, she also worked at another ARPA that leads advanced projects in intelligence, called I-ARPA. Before that, she ran technical teams in the private sector working on gene therapies and disease diagnostics, among other areas. She has been a vice president of business development at Gingko Bioworks and headed innovation at Concentric by Gingko. Her training and education includes a PhD and undergraduate degree in applied biology from the Georgia Institute of Technology and she did her postdoc as an Alexander von Humboldt Fellow in Heidelberg, Germany.
Dr. Wegrzyn told me that she’s “in the hot seat.” The pressure is on for ARPA-H especially after the need and potential for health innovation was spot lit by the pandemic and the unprecedented speed of vaccine development. We'll soon find out if ARPA-H can produce gamechangers in health that are equivalent to DARPA’s creation of the internet.
ARPA-H - https://arpa-h.gov/
Dr. Wegrzyn profile - https://arpa-h.gov/people/renee-wegrzyn/
Dr. Wegrzyn Twitter - https://twitter.com/rwegrzyn?lang=en
President Biden Announces Dr. Wegrzyn's appointment - https://www.whitehouse.gov/briefing-room/statement...
Leaps.org coverage of ARPA-H - https://leaps.org/arpa/
ARPA-H program for joints to heal themselves - https://arpa-h.gov/news/nitro/ -
ARPA-H virtual talent search - https://arpa-h.gov/news/aco-talent-search/
Dr. Renee Wegrzyn was appointed director of ARPA-H last October.
Matt Fuchs is the editor-in-chief of Leaps.org and Making Sense of Science. He is also a contributing reporter to the Washington Post and has written for the New York Times, Time Magazine, WIRED and the Washington Post Magazine, among other outlets. Follow him @fuchswriter.