big data

Health firm Populytics tracks and analyzes patient data, and makes care suggestions based on that data.

(Photo by National Cancer Institute (left) and Andrew Leu on Unsplash)

The diabetic patient hit the danger zone.

Ideally, blood sugar, measured by an A1C test, rests at 5.9 or less. A 7 is elevated, according to the Diabetes Council. Over 10, and you're into the extreme danger zone, at risk of every diabetic crisis from kidney failure to blindness.

In three months of working with a case manager, Jen's blood sugar had dropped to 7.2, a much safer range.

This patient's A1C was 10. Let's call her Jen for the sake of this story. (Although the facts of her case are real, the patient's actual name wasn't released due to privacy laws.).

Jen happens to live in Pennsylvania's Lehigh Valley, home of the nonprofit Lehigh Valley Health Network, which has eight hospital campuses and various clinics and other services. This network has invested more than $1 billion in IT infrastructure and founded Populytics, a spin-off firm that tracks and analyzes patient data, and makes care suggestions based on that data.

When Jen left the doctor's office, the Populytics data machine started churning, analyzing her data compared to a wealth of information about future likely hospital visits if she did not comply with recommendations, as well as the potential positive impacts of outreach and early intervention.

About a month after Jen received the dangerous blood test results, a community outreach specialist with psychological training called her. She was on a list generated by Populytics of follow-up patients to contact.

"It's a very gentle conversation," says Cathryn Kelly, who manages a care coordination team at Populytics. "The case manager provides them understanding and support and coaching." The goal, in this case, was small behavioral changes that would actually stick, like dietary ones.

In three months of working with a case manager, Jen's blood sugar had dropped to 7.2, a much safer range. The odds of her cycling back to the hospital ER or veering into kidney failure, or worse, had dropped significantly.

While the health network is extremely localized to one area of one state, using data to inform precise medical decision-making appears to be the wave of the future, says Ann Mongovern, the associate director of Health Care Ethics at the Markkula Center for Applied Ethics at Santa Clara University in California.

"Many hospitals and hospital systems don't yet try to do this at all, which is striking given where we're at in terms of our general technical ability in this society," Mongovern says.

How It Happened

While many hospitals make money by filling beds, the Lehigh Valley Health Network, as a nonprofit, accepts many patients on Medicaid and other government insurances that don't cover some of the costs of a hospitalization. The area's population is both poorer and older than national averages, according to the U.S. Census data, meaning more people with higher medical needs that may not have the support to care for themselves. They end up in the ER, or worse, again and again.

In the early 2000s, LVHN CEO Dr. Brian Nester started wondering if his health network could develop a way to predict who is most likely to land themselves a pricey ICU stay -- and offer support before those people end up needing serious care.

Embracing data use in such specific ways also brings up issues of data security and patient safety.

"There was an early understanding, even if you go back to the (federal) balanced budget act of 1997, that we were just kicking the can down the road to having a functional financial model to deliver healthcare to everyone with a reasonable price," Nester says. "We've got a lot of people living longer without more of an investment in the healthcare trust."

Popultyics, founded in 2013, was the result of years of planning and agonizing over those population numbers and cost concerns.

"We looked at our own health plan," Nester says. Out of all the employees and dependants on the LVHN's own insurance network, "roughly 1.5 percent of our 25,000 people — under 400 people — drove $30 million of our $130 million on insurance costs -- about 25 percent."

"You don't have to boil the ocean to take cost out of the system," he says. "You just have to focus on that 1.5%."

Take Jen, the diabetic patient. High blood sugar can lead to kidney failure, which can mean weekly expensive dialysis for 20 years. Investing in the data and staff to reach patients, he says, is "pennies compared to $100 bills."

For most doctors, "there's no awareness for providers to know who they should be seeing vs. who they are seeing. There's no incentive, because the incentive is to see as many patients as you can," he says.

To change that, first the LVHN invested in the popular medical management system, Epic. Then, they negotiated with the top 18 insurance companies that cover patients in the region to allow access to their patient care data, which means they have reams of patient history to feed the analytics machine in order to make predictions about outcomes. Nester admits not every hospital could do that -- with 52 percent of the market share, LVHN had a very strong negotiating position.

Third party services take that data and churn out analytics that feeds models and care management plans. All identifying information is stripped from the data.

"We can do predictive modeling in patients," says Populytics President and CEO Gregory Kile. "We can identify care gaps. Those care gaps are noted as alerts when the patient presents at the office."

Kile uses himself as a hypothetical patient.

"I pull up Gregory Kile, and boom, I see a flag or an alert. I see he hasn't been in for his last blood test. There is a care gap there we need to complete."

"There's just so much more you can do with that information," he says, envisioning a future where follow-up for, say, knee replacement surgery and outcomes could be tracked, and either validated or changed.

Ethical Issues at the Forefront

Of course, embracing data use in such specific ways also brings up issues of security and patient safety. For example, says medical ethicist Mongovern, there are many touchpoints where breaches could occur. The public has a growing awareness of how data used to personalize their experiences, such as social media analytics, can also be monetized and sold in ways that benefit a company, but not the user. That's not to say data supporting medical decisions is a bad thing, she says, just one with potential for public distrust if not handled thoughtfully.

"You're going to need to do this to stay competitive," she says. "But there's obviously big challenges, not the least of which is patient trust."

So far, a majority of the patients targeted – 62 percent -- appear to embrace the effort.

Among the ways the LVHN uses the data is monthly reports they call registries, which include patients who have just come in contact with the health network, either through the hospital or a doctor that works with them. The community outreach team members at Populytics take the names from the list, pull their records, and start calling. So far, a majority of the patients targeted – 62 percent -- appear to embrace the effort.

Says Nester: "Most of these are vulnerable people who are thrilled to have someone care about them. So they engage, and when a person engages in their care, they take their insulin shots. It's not rocket science. The rocket science is in identifying who the people are — the delivery of care is easy."

Anne Miller
Anne Miller is an editor and writer based in Brooklyn who is particularly curious about how technology impacts our daily lives. Her byline has appeared in the New York Times, the Washington Post, the Wall Street Journal and Slate, and she's a regular contributor to Dell Perspectives — when she's not managing editorial projects for Fortune 500 firms. She holds a master's degree in Human-Computer Interaction from the Rensselaer Polytechnic Institute.

A future app may help you avoid getting the flu by informing you of your local risk on a given day.

(© Dmytro Flisak/Adobe)

Applied mathematician Sara del Valle works at the U.S.'s foremost nuclear weapons lab: Los Alamos. Once colloquially called Atomic City, it's a hidden place 45 minutes into the mountains northwest of Santa Fe. Here, engineers developed the first atomic bomb.

Like AccuWeather, an app for disease prediction could help people alter their behavior to live better lives.

Today, Los Alamos still a small science town, though no longer a secret, nor in the business of building new bombs. Instead, it's tasked with, among other things, keeping the stockpile of nuclear weapons safe and stable: not exploding when they're not supposed to (yes, please) and exploding if someone presses that red button (please, no).

Del Valle, though, doesn't work on any of that. Los Alamos is also interested in other kinds of booms—like the explosion of a contagious disease that could take down a city. Predicting (and, ideally, preventing) such epidemics is del Valle's passion. She hopes to develop an app that's like AccuWeather for germs: It would tell you your chance of getting the flu, or dengue or Zika, in your city on a given day. And like AccuWeather, it could help people alter their behavior to live better lives, whether that means staying home on a snowy morning or washing their hands on a sickness-heavy commute.

Sara del Valle of Los Alamos is working to predict and prevent epidemics using data and machine learning.

Since the beginning of del Valle's career, she's been driven by one thing: using data and predictions to help people behave practically around pathogens. As a kid, she'd always been good at math, but when she found out she could use it to capture the tentacular spread of disease, and not just manipulate abstractions, she was hooked.

When she made her way to Los Alamos, she started looking at what people were doing during outbreaks. Using social media like Twitter, Google search data, and Wikipedia, the team started to sift for trends. Were people talking about hygiene, like hand-washing? Or about being sick? Were they Googling information about mosquitoes? Searching Wikipedia for symptoms? And how did those things correlate with the spread of disease?

It was a new, faster way to think about how pathogens propagate in the real world. Usually, there's a 10- to 14-day lag in the U.S. between when doctors tap numbers into spreadsheets and when that information becomes public. By then, the world has moved on, and so has the disease—to other villages, other victims.

"We found there was a correlation between actual flu incidents in a community and the number of searches online and the number of tweets online," says del Valle. That was when she first let herself dream about a real-time forecast, not a 10-days-later backcast. Del Valle's group—computer scientists, mathematicians, statisticians, economists, public health professionals, epidemiologists, satellite analysis experts—has continued to work on the problem ever since their first Twitter parsing, in 2011.

They've had their share of outbreaks to track. Looking back at the 2009 swine flu pandemic, they saw people buying face masks and paying attention to the cleanliness of their hands. "People were talking about whether or not they needed to cancel their vacation," she says, and also whether pork products—which have nothing to do with swine flu—were safe to buy.

At the latest meeting with all the prediction groups, del Valle's flu models took first and second place.

They watched internet conversations during the measles outbreak in California. "There's a lot of online discussion about anti-vax sentiment, and people trying to convince people to vaccinate children and vice versa," she says.

Today, they work on predicting the spread of Zika, Chikungunya, and dengue fever, as well as the plain old flu. And according to the CDC, that latter effort is going well.

Since 2015, the CDC has run the Epidemic Prediction Initiative, a competition in which teams like de Valle's submit weekly predictions of how raging the flu will be in particular locations, along with other ailments occasionally. Michael Johannson is co-founder and leader of the program, which began with the Dengue Forecasting Project. Its goal, he says, was to predict when dengue cases would blow up, when previously an area just had a low-level baseline of sick people. "You'll get this massive epidemic where all of a sudden, instead of 3,000 to 4,000 cases, you have 20,000 cases," he says. "They kind of come out of nowhere."

But the "kind of" is key: The outbreaks surely come out of somewhere and, if scientists applied research and data the right way, they could forecast the upswing and perhaps dodge a bomb before it hit big-time. Questions about how big, when, and where are also key to the flu.

A big part of these projects is the CDC giving the right researchers access to the right information, and the structure to both forecast useful public-health outcomes and to compare how well the models are doing. The extra information has been great for the Los Alamos effort. "We don't have to call departments and beg for data," says del Valle.

When data isn't available, "proxies"—things like symptom searches, tweets about empty offices, satellite images showing a green, wet, mosquito-friendly landscape—are helpful: You don't have to rely on anyone's health department.

At the latest meeting with all the prediction groups, del Valle's flu models took first and second place. But del Valle wants more than weekly numbers on a government website; she wants that weather-app-inspired fortune-teller, incorporating the many diseases you could get today, standing right where you are. "That's our dream," she says.

This plot shows the the correlations between the online data stream, from Wikipedia, and various infectious diseases in different countries. The results of del Valle's predictive models are shown in brown, while the actual number of cases or illness rates are shown in blue.

(Courtesy del Valle)

The goal isn't to turn you into a germophobic agoraphobe. It's to make you more aware when you do go out. "If you know it's going to rain today, you're more likely to bring an umbrella," del Valle says. "When you go on vacation, you always look at the weather and make sure you bring the appropriate clothing. If you do the same thing for diseases, you think, 'There's Zika spreading in Sao Paulo, so maybe I should bring even more mosquito repellent and bring more long sleeves and pants.'"

They're not there yet (don't hold your breath, but do stop touching your mouth). She estimates it's at least a decade away, but advances in machine learning could accelerate that hypothetical timeline. "We're doing baby steps," says del Valle, starting with the flu in the U.S., dengue in Brazil, and other efforts in Colombia, Ecuador, and Canada. "Going from there to forecasting all diseases around the globe is a long way," she says.

But even AccuWeather started small: One man began predicting weather for a utility company, then helping ski resorts optimize their snowmaking. His influence snowballed, and now private forecasting apps, including AccuWeather's, populate phones across the planet. The company's progression hasn't been without controversy—privacy incursions, inaccuracy of long-term forecasts, fights with the government—but it has continued, for better and for worse.

Disease apps, perhaps spun out of a small, unlikely team at a nuclear-weapons lab, could grow and breed in a similar way. And both the controversies and public-health benefits that may someday spin out of them lie in the future, impossible to predict with certainty.

Sarah Scoles
Sarah Scoles is a freelance science journalist based in Denver. She is a contributing writer at Wired, a contributing editor at Popular Science, and the author of the book Making Contact: Jill Tarter and the Search for Extraterrestrial Intelligence.
Get our top stories twice a month
Follow us on

A hacker activating a 3D rendering of DNA data.

(© Production Perig/Fotolia)

In February 2015, the health insurer Anthem revealed that criminal hackers had gained access to the company's servers, exposing the personal information of nearly 79 million patients. It's the largest known healthcare breach in history.

FBI agents worry that the vast amounts of healthcare data being generated for precision medicine efforts could leave the U.S. vulnerable to cyber and biological attacks.

That year, the data of millions more would be compromised in one cyberattack after another on American insurers and other healthcare organizations. In fact, for the past several years, the number of reported data breaches has increased each year, from 199 in 2010 to 344 in 2017, according to a September 2018 analysis in the Journal of the American Medical Association.

The FBI's Edward You sees this as a worrying trend. He says hackers aren't just interested in your social security or credit card number. They're increasingly interested in stealing your medical information. Hackers can currently use this information to make fake identities, file fraudulent insurance claims, and order and sell expensive drugs and medical equipment. But beyond that, a new kind of cybersecurity threat is around the corner.

Mr. You and others worry that the vast amounts of healthcare data being generated for precision medicine efforts could leave the U.S. vulnerable to cyber and biological attacks. In the wrong hands, this data could be used to exploit or extort an individual, discriminate against certain groups of people, make targeted bioweapons, or give another country an economic advantage.

Precision medicine, of course, is the idea that medical treatments can be tailored to individuals based on their genetics, environment, lifestyle or other traits. But to do that requires collecting and analyzing huge quantities of health data from diverse populations. One research effort, called All of Us, launched by the U.S. National Institutes of Health last year, aims to collect genomic and other healthcare data from one million participants with the goal of advancing personalized medical care.

Other initiatives are underway by academic institutions and healthcare organizations. Electronic medical records, genetic tests, wearable health trackers, mobile apps, and social media are all sources of valuable healthcare data that a bad actor could potentially use to learn more about an individual or group of people.

"When you aggregate all of that data together, that becomes a very powerful profile of who you are," Mr. You says.

A supervisory special agent in the biological countermeasures unit within the FBI's weapons of mass destruction directorate, it's Mr. You's job to imagine worst-case bioterror scenarios and figure out how to prevent and prepare for them.

That used to mean focusing on threats like anthrax, Ebola, and smallpox—pathogens that could be used to intentionally infect people—"basically the dangerous bugs," as he puts it. In recent years, advances in gene editing and synthetic biology have given rise to fears that rogue, or even well-intentioned, scientists could create a virulent virus that's intentionally, or unintentionally, released outside the lab.

"If a foreign source, especially a criminal one, has your biological information, then they might have some particular insights into what your future medical needs might be and exploit that."

While Mr. You is still tracking those threats, he's been traveling around the country talking to scientists, lawyers, software engineers, cyber security professionals, government officials and CEOs about new security threats—those posed by genetic and other biological data.

Emerging threats

Mr. You says one possible situation he can imagine is the potential for nefarious actors to use an individual's sensitive medical information to extort or blackmail that person.

"If a foreign source, especially a criminal one, has your biological information, then they might have some particular insights into what your future medical needs might be and exploit that," he says. For instance, "what happens if you have a singular medical condition and an outside entity says they have a treatment for your condition?" You could get talked into paying a huge sum of money for a treatment that ends up being bogus.

Or what if hackers got a hold of a politician or high-profile CEO's health records? Say that person had a disease-causing genetic mutation that could affect their ability to carry out their job in the future and hackers threatened to expose that information. These scenarios may seem far-fetched, but Mr. You thinks they're becoming increasingly plausible.

On a wider scale, Kavita Berger, a scientist at Gryphon Scientific, a Washington, D.C.-area life sciences consulting firm, worries that data from different populations could be used to discriminate against certain groups of people, like minorities and immigrants.

For instance, the advocacy group Human Rights Watch in 2017 flagged a concerning trend in China's Xinjiang territory, a region with a history of government repression. Police there had purchased 12 DNA sequencers and were collecting and cataloging DNA samples from people to build a national database.

"The concern is that this particular province has a huge population of the Muslim minority in China," Ms. Berger says. "Now they have a really huge database of genetic sequences. You have to ask, why does a police station need 12 next-generation sequencers?"

Also alarming is the potential that large amounts of data from different groups of people could lead to customized bioweapons if that data ends up in the wrong hands.

Eleonore Pauwels, a research fellow on emerging cybertechnologies at United Nations University's Centre for Policy Research, says new insights gained from genomic and other data will give scientists a better understanding of how diseases occur and why certain people are more susceptible to certain diseases.

"As you get more and more knowledge about the genomic picture and how the microbiome and the immune system of different populations function, you could get a much deeper understanding about how you could target different populations for treatment but also how you could eventually target them with different forms of bioagents," Ms. Pauwels says.

Economic competitiveness

Another reason hackers might want to gain access to large genomic and other healthcare datasets is to give their country a leg up economically. Many large cyber-attacks on U.S. healthcare organizations have been tied to Chinese hacking groups.

"This is a biological space race and we just haven't woken up to the fact that we're in this race."

"It's becoming clear that China is increasingly interested in getting access to massive data sets that come from different countries," Ms. Pauwels says.

A year after U.S. President Barack Obama conceived of the Precision Medicine Initiative in 2015—later renamed All of Us—China followed suit, announcing the launch of a 15-year, $9 billion precision health effort aimed at turning China into a global leader in genomics.

Chinese genomics companies, too, are expanding their reach outside of Asia. One company, WuXi NextCODE, which has offices in Shanghai, Reykjavik, and Cambridge, Massachusetts, has built an extensive library of genomes from the U.S., China and Iceland, and is now setting its sights on Ireland.

Another Chinese company, BGI, has partnered with Children's Hospital of Philadelphia and Sinai Health System in Toronto, and also formed a collaboration with the Smithsonian Institute to sequence all species on the planet. BGI has built its own advanced genomic sequencing machines to compete with U.S.-based Illumina.

Mr. You says having access to all this data could lead to major breakthroughs in healthcare, such as new blockbuster drugs. "Whoever has the largest, most diverse dataset is truly going to win the day and come up with something very profitable," he says.

Some direct-to-consumer genetic testing companies with offices in the U.S., like Dante Labs, also use BGI to process customers' DNA.

Experts worry that China could race ahead the U.S. in precision medicine because of Chinese laws governing data sharing. Currently, China prohibits the exportation of genetic data without explicit permission from the government. Mr. You says this creates an asymmetry in data sharing between the U.S. and China.

"This is a biological space race and we just haven't woken up to the fact that we're in this race," he said in January at an American Society for Microbiology conference in Washington, D.C. "We don't have access to their data. There is absolutely no reciprocity."

Protecting your data

While Mr. You has been stressing the importance of data security to anyone who will listen, the National Academies of Sciences, Engineering, and Medicine, which makes scientific and policy recommendations on issues of national importance, has commissioned a study on "safeguarding the bioeconomy."

In the meantime, Ms. Berger says organizations that deal with people's health data should assess their security risks and identify potential vulnerabilities in their systems.

As for what individuals can do to protect themselves, she urges people to think about the different ways they're sharing healthcare data—such as via mobile health apps and wearables.

"Ask yourself, what's the benefit of sharing this? What are the potential consequences of sharing this?" she says.

Mr. You also cautions people to think twice before taking consumer DNA tests. They may seem harmless, he says, but at the end of the day, most people don't know where their genetic information is going. "If your genetic sequence is taken, once it's gone, it's gone. There's nothing you can do about it."

Emily Mullin
Emily Mullin is a science and biotech journalist whose work has appeared in The Washington Post, New York Times, Wall Street Journal, Scientific American, National Geographic and Smithsonian Magazine.

Digital blockchain concept.

(© Sashkin/Fotolia)

The hacker collective known as the Dark Overlord first surfaced in June 2016, when it advertised more than 600,000 patient files from three U.S. healthcare organizations for sale on the dark web. The group, which also attempted to extort ransom from its victims, soon offered another 9 million records pilfered from health insurance companies and provider networks across the country.

Since 2009, federal regulators have counted nearly 5,000 major data breaches in the United States alone, affecting some 260 million individuals.

Last October, apparently seeking publicity as well as cash, the hackers stole a trove of potentially scandalous data from a celebrity plastic surgery clinic in London—including photos of in-progress genitalia- and breast-enhancement surgeries. "We have TBs [terabytes] of this shit. Databases, names, everything," a gang representative told a reporter. "There are some royal families in here."

Bandits like these are prowling healthcare's digital highways in growing numbers. Since 2009, federal regulators have counted nearly 5,000 major data breaches in the United States alone, affecting some 260 million individuals. Although hacker incidents represent less than 20 percent of the total breaches, they account for almost 80 percent of the affected patients. Such attacks expose patients to potential blackmail or identity theft, enable criminals to commit medical fraud or file false tax returns, and may even allow hostile state actors to sabotage electric grids or other infrastructure by e-mailing employees malware disguised as medical notices. According to the consulting agency Accenture, data theft will cost the healthcare industry $305 billion between 2015 and 2019, with annual totals doubling from $40 billion to $80 billion.

Blockchain could put patients in control of their own data, empowering them to access, share, and even sell their medical information as they see fit.

One possible solution to this crisis involves radically retooling the way healthcare data is stored and shared—by using blockchain, the still-emerging information technology that underlies cryptocurrencies such as Bitcoin. And blockchain-enabled IT systems, boosters say, could do much more than prevent the theft of medical data. Such networks could revolutionize healthcare delivery on many levels, creating efficiencies that would reduce medical errors, improve coordination between providers, drive down costs, and give researchers unprecedented insights into patterns of disease. Perhaps most transformative, blockchain could put patients in control of their own data, empowering them to access, share, and even sell their medical information as they see fit. Widespread adoption could result in "a new kind of healthcare economy, in which data and services are quantifiable and exchangeable, with strong guarantees around both the security and privacy of sensitive information," wrote W. Brian Smith, chief scientist of healthcare-blockchain startup PokitDok, in a recent white paper.

Around the world, entrepreneurs, corporations, and government agencies are hopping aboard the blockchain train. A survey by the IBM Institute for Business Value, released in late 2016, found that 16 percent of healthcare executives in 16 countries planned to begin implementing some form of the technology in the coming year; 90 percent planned to launch a pilot program in the next two years. In 2017, Estonia became the first country to switch its medical-records system to a blockchain-based framework. Great Britain and Dubai are exploring a similar move. Yet in countries with more fragmented health systems, most notably the U.S., the challenges remain formidable. Some of the most advanced healthcare applications envisioned for blockchain, moreover, raise technological and ethical questions whose answers may not arrive anytime soon.

By creating a detailed, comprehensive, and immutable timeline of medical transactions, blockchain-based recordkeeping could help providers gauge a patient's long-term health patterns in a way that's never before been possible.

What Exactly Is Blockchain, Anyway?

To understand the buzz around blockchain, it's necessary to grasp (at least loosely) how the technology works. Ordinary digital recordkeeping systems rely on a central administrator that acts as gatekeeper to a treasury of data; if you can sneak past the guard, you can often gain access to the entire hoard, and your intrusion may go undetected indefinitely. Blockchain, by contrast, employs a network of synchronized, replicated databases. Information is scattered among these nodes, rather than on a single server, and is exchanged through encrypted, peer-to-peer pathways. Each transaction is visible to every computer on the network, and must be approved by a majority in order to be successfully completed. Each batch of transactions, or "block," is date- and time-stamped, marked with the user's identity, and given a cryptographic code, which is posted to every node. These blocks form a "chain," preserved in an electronic ledger, that can be read by all users but can't be edited. Any unauthorized access, or attempt at tampering, can be quickly neutralized by these overlapping safeguards. Even if a hacker managed to break into the system, penetrating deeply would be extraordinarily difficult.

Because blockchain technology shares transaction records throughout a network, it could eliminate communication bottlenecks between different components of the healthcare system (primary care physicians, specialists, nurses, and so on). And because blockchain-based systems are designed to incorporate programs known as "smart contracts," which automate functions previously requiring human intervention, they could reduce dangerous slipups as well as tedious and costly paperwork. For example, when a patient gets a checkup, sees a specialist, and fills a prescription, all these actions could be automatically recorded on his or her electronic health record (EHR), checked for errors, submitted for billing, and entered on insurance claims—which could be adjudicated and reimbursed automatically as well. "Blockchain has the potential to remove a lot of intermediaries from existing workflows, whether digital or nondigital," says Kamaljit Behera, an industry analyst for the consulting firm Frost & Sullivan.

The possible upsides don't end there. By creating a detailed, comprehensive, and immutable timeline of medical transactions, blockchain-based recordkeeping could help providers gauge a patient's long-term health patterns in a way that's never before been possible. In addition to data entered by their caregivers, individuals could use app-based technologies or wearables to transmit other information to their records, such as diet, exercise, and sleep patterns, adding new depth to their medical portraits.

Many experts expect healthcare blockchain to take root more slowly in the U.S. than in nations with government-run national health services.

Smart contracts could also allow patients to specify who has access to their data. "If you get an MRI and want your orthopedist to see it, you can add him to your network instead of carrying a CD into his office," explains Andrew Lippman, associate director of the MIT Media Lab, who helped create a prototype healthcare blockchain system called MedRec that's currently being tested at Beth Israel Deaconess Hospital in Boston. "Or you might make a smart contract to allow your son or daughter to access your healthcare records if something happens to you." Another option: permitting researchers to analyze your data for scientific purposes, whether anonymously or with your name attached.

The Recent History, and Looking Ahead

Over the past two years, a crowd of startups has begun vying for a piece of the emerging healthcare blockchain market. Some, like PokitDok and Atlanta-based Patientory, plan to mint proprietary cryptocurrencies, which investors can buy in lieu of stock, medical providers may earn as a reward for achieving better outcomes, and patients might score for meeting wellness goals or participating in clinical trials. (Patientory's initial coin offering, or ICO, raised more than $7 million in three days.) Several fledgling healthcare-blockchain companies have found powerful corporate partners: Intel for Silicon Valley's PokitDok, Kaiser Permanente for Patientory, Philips for Los Angeles-based Gem Health. At least one established provider network, Change Healthcare, is developing blockchain-based systems of its own. Two months ago, Change launched what it calls the first "enterprise-scale" blockchain network in U.S. healthcare—a system to track insurance claim submissions and remittances.

No one, however, has set a roll-out date for a full-blown, blockchain-based EHR system in this country. "We have yet to see anything move from the pilot phase to some kind of production status," says Debbie Bucci, an IT architect in the federal government's Office of the National Coordinator for Health Information Technology. Indeed, many experts expect healthcare blockchain to take root more slowly here than in nations with government-run national health services. In America, a typical patient may have dealings with a family doctor who keeps everything on paper, an assortment of hospitals that use different EHR systems, and an insurer whose system for processing claims is separate from that of the healthcare providers. To help bridge these gaps, a consortium called the Hyperledger Healthcare Working Group (which includes many of the leading players in the field) is developing standard protocols for blockchain interoperability and other functions. Adding to the complexity is the federal Health Insurance and Portability Act (HIPAA), which governs who can access patient data and under what circumstances. "Healthcare blockchain is in a very nascent stage," says Behera. "Coming up with regulations and other guidelines, and achieving large-scale implementation, will take some time."

The ethical implications of buying and selling personal genomic data in an electronic marketplace are doubtless open to debate.

How long? Behera, like other analysts, estimates that relatively simple applications, such as revenue-cycle management systems, could become commonplace in the next five years. More ambitious efforts might reach fruition in a decade or so. But once the infrastructure for healthcare blockchain is fully established, its uses could go far beyond keeping better EHRs.

A handful of scientists and entrepreneurs are already working to develop one visionary application: managing genomic data. Last month, Harvard University geneticist George Church—one of the most influential figures in his discipline—launched a business called Nebula Genomics. It aims to set up an exchange in which individuals can use "Neptune tokens" to purchase DNA sequencing, which will be stored in the company's blockchain-based system; research groups will be able to pay clients for their data using the same cryptocurrency. Luna DNA, founded by a team of biotech veterans in San Diego, plans a similar service, as does a Moscow-based startup called the Zenome Project.

Hossein Rahnama, CEO of the mobile-tech company Flybits and director of research at the Ryerson Centre for Cloud and Context-Aware Computing in Toronto, envisions a more personalized way of sharing genomic data via blockchain. His firm is working with a U.S. insurance company to develop a service that would allow clients in their 20s and 30s to connect with people in their 70s or 80s with similar genomes. The young clients would learn how the elders' lifestyle choices had influenced their health, so that they could modify their own habits accordingly. "It's intergenerational wisdom-sharing," explains Rahnama, who is 38. "I would actually pay to be a part of that network."

The ethical implications of buying and selling personal genomic data in an electronic marketplace are doubtless open to debate. Such commerce could greatly expand the pool of subjects for research in many areas of medicine, enabling the kinds of breakthroughs that only Big Data can provide. Yet it could also lead millions to surrender the most private information of all—the secrets of their cells—to buyers with less benign intentions. The Dark Overlord, one might argue, could not hope for a more satisfying victory.

These scenarios, however, are pure conjecture. After the first web page was posted, in 1991, Lippman observes, "a whole universe developed that you couldn't have imagined on Day 1." The same, he adds, is likely true for healthcare blockchain. "Our vision is to make medical records useful for you and for society, and to give you more control over your own identity. Time will tell."

Kenneth Miller
Kenneth Miller is a freelance writer based in Los Angeles. He is a contributing editor at Discover, and has reported from four continents for publications including Time, Life, Rolling Stone, Mother Jones, and Aeon. His honors include The ASJA Award for Best Science Writing and the June Roth Memorial Award for Medical Writing. Visit his website at

A panoramic view of DNA

(© vectorfusionart/Fotolia)

Netscape co-founder-turned-venture capitalist billionaire investor Marc Andreessen once posited that software was eating the world. He was right, and the takeover of software resulted in many things. One of them is data. Lots and lots and lots of data. In the previous two years, humanity created more data than it did during its entire existence combined, and the amount will only increase. Think about it: The hundreds of 50KB emails you write a day, the dozens of 10MB photos, the minute-long, 350MB 4K video you shoot on your iPhone X add up to vast quantities of information. All that information needs to be stored. And that's becoming an issue as data volume outpaces storage space.

The race is on to find another medium capable of storing massive amounts of information in as small a space as possible.

"There won't be enough silicon to store all the data we need. It's unlikely that we can make flash memory smaller. We have reached the physical limits," Victor Zhirnov, chief scientist at the Semiconductor Research Corporation, says. "We are facing a crisis that's comparable to the oil crisis in the 1970s. By 2050, we're going to need to store 10 to the 30 bits, compared to 10 to the 23 bits in 2016." That amount of storage space is equivalent to each of the world's seven billion people owning almost six trillion -- that's 10 to the 12th power -- iPhone Xs with 256GB storage space.

The race is on to find another medium capable of storing massive amounts of information in as small a space as possible. Zhirnov and other scientists are looking at the human body, looking to DNA. "Nature has nailed it," Luis Ceze, a professor in the Department of Computer Science and Engineering at the University of Washington, says. "DNA is a molecular storage medium that is remarkable. It's incredibly dense, many, many thousands of times denser than the densest technology that we have today. And DNA is remarkably general. Any information you can map in bits you can store in DNA." It's so dense -- able to store a theoretical maximum of 215 petabytes (215 million gigabytes) in a single gram -- that all the data ever produced could be stored in the back of a tractor trailer truck.

Writing DNA can be an energy-efficient process, too. Consider how the human body is constantly writing and rewriting DNA, and does so on a couple thousand calories a day. And all it needs for storage is a cool, dark place, a significant energy savings when compared to server farms that require huge amounts of energy to run and even more energy to cool.

Picture it: tiny specks of inert DNA made from silicon or another material, stored in cool, dark, dry areas, preserved for all time.

Researchers first succeeded in encoding data onto DNA in 2012, when Harvard University geneticists George Church and Sri Kosuri wrote a 52,000-word book on A, C, G, and T base pairs. Their method only produced 1.28 petabytes per gram of DNA, however, a volume exceeded the next year when a group encoded all 154 Shakespeare sonnets and a 26-second clip of Martin Luther King's "I Have A Dream" speech. In 2017, Columbia University researchers Yaniv Erlich and Dina Zielinski made the process 60 percent more efficient.

The limiting factor today is cost. Erlich said the work his team did cost $7,000 to encode and decode two megabytes of data. To become useful in a widespread way, the price per megabyte needs to plummet. Even advocates concede this point. "Of course it is expensive," Zhirnov says. "But look how much magnetic storage cost in the 1980s. What you store today in your iPhone for virtually nothing would cost many millions of dollars in 1982." There's reason to think the price will continue to fall. Genome readers are improving, getting cheaper, faster, and smaller, and genome sequencing becomes cheaper every year, too. Picture it: tiny specks of inert DNA made from silicon or another material, stored in cool, dark, dry areas, preserved for all time.

"It just takes a few minutes to double a sample. A few more minutes, you double it again. Very quickly, you have thousands or millions of new copies."

Plus, DNA has another advantage over more traditional forms of storage: It's very easy to reproduce. "If you want a second copy of a hard disk drive, you need components for a disk drive, hook both drives up to a computer, and copy. That's a pain," Nick Goldman, a researcher at the European Bioinformatics Institute, says. "DNA, once you have that first sample, it's a process that is absolutely routine in thousands of laboratories around the world to multiply that using polymerase chain reaction [which uses temperature changes or other processes]. It just takes a few minutes to double a sample. A few more minutes, you double it again. Very quickly, you have thousands or millions of new copies."

This ability to duplicate quickly and easily is a positive trait. But, of course, there's also the potential for danger. Does encoding on DNA, the very basis for life, present ethical issues? Could it get out of control and fundamentally alter life as we know it?

The chance is there, but it's remote. The first reason is that storage could be done with only two base pairs, which would serve as replacements for the 0 and 1 digits that make up all digital data. While doing so would decrease the possible density of the storage, it would virtually eliminate the risk that the sequences would be compatible with life.

But even if scientists and researchers choose to use four base pairs, other safeguards are in place that will prevent trouble. According to Ceze, the computer science professor, the snippets of DNA that they write are very short, around 150 nucleotides. This includes the title, the information that's being encoded, and tags to help organize where the snippet should fall in the larger sequence. Furthermore, they generally avoid repeated letters, which dramatically reduces the chance that a protein could be synthesized from the snippet.

"In the future, we'll know enough about someone from a sample of their DNA that we could make a specific poison. That's the danger, not those of us who want to encode DNA for storage."

Inevitably, some DNA will get spilt. "But it's so unlikely that anything that gets created for storage would have a biological interpretation that could interfere with the mechanisms going on in a living organism that it doesn't worry me in the slightest," Goldman says. "We're not of concern for the people who are worried about the ethical issues of synthetic DNA. They are much more concerned about people deliberately engineering anthrax. In the future, we'll know enough about someone from a sample of their DNA that we could make a specific poison. That's the danger, not those of us who want to encode DNA for storage."

In the end, the reality of and risks surrounding encoding on DNA are the same as any scientific advancement: It's another system that is vulnerable to people with bad intentions but not one that is inherently unethical.

"Every human action has some ethical implications," Zhirnov says. "I can use a hammer to build a house or I can use it to harm another person. I don't see why DNA is in any way more or less ethical."

If that house can store all the knowledge in human history, it's worth learning how to build it.

Editor's Note: In response to readers' comments that silicon is one of the earth's most abundant materials, we reached back out to our source, Dr. Victor Zhirnov. He stands by his statement about a coming shortage of silicon, citing this research. The silicon oxide found in beach sand is unsuitable for semiconductors, he says, because the cost of purifying it would be prohibitive. For use in circuit-making, silicon must be refined to a purity of 99.9999999 percent. So the process begins by mining for pure quartz, which can only be found in relatively few places around the world.

Noah Davis
Noah Davis is a writer living in Brooklyn. Visit his website at