2017 Science, Technology and Environment

David J. Lipman and the GenBank Team

Built and heads the world’s largest and most influential repository of genetic sequence data now being used by biomedical researchers around the world, including those studying infectious, autoimmune and cardiovascular diseases

After nine people in four states fell sick with listeria, a type of food poisoning, federal investigators went looking for a link among the victims. They found the likely culprit to be bacteria in frozen peas, carrots and other vegetables packaged by a company in Pasco, Washington, according to the Food and Drug Administration. Yet individuals who became ill lived as far away as Connecticut and Maryland.

Investigators were able to figure out the connection among the cases by examining the genome of the bacteria—akin to a DNA fingerprint—and identifying patients who were infected by the same strain. To keep others from getting sick, the frozen food company recalled more than 800 different types of products possibly linked to the outbreak.

Federal investigators got help with their sleuthing from an enormous and growing storehouse of hundreds of millions of DNA sequences compiled by the National Institutes of Health. GenBank, the world’s largest repository of genetic sequence data, is managed by Dr. David Lipman and his team at the National Center for Biotechnology Information, a world-class research center located in NIH’s National Library of Medicine.

“They are the world’s genomic Google,” said Dr. Alex Greninger, a resident in laboratory medicine at the University of Washington. Greninger said he is “constantly in awe” of the work done by the center’s team of computer scientists, molecular biologists, mathematicians, biochemists, research physicians and biologists.

Scientists studying virtually all types of infectious diseases—such as influenza, hepatitis, Zika, Ebola, bacterial pneumonia, tuberculosis and malaria—are using the DNA sequence database. So are those doing surveillance regarding antibiotic resistance, a serious and worldwide problem. Researchers also use the database to study chronic diseases including cardiovascular disease, rheumatoid arthritis and lupus.

Comparing pieces of a genome, which is a complete set of chromosomes of a human or other organism, is similar to taking a DNA sample from a crime scene and seeing if it matches that of a suspect. Matching the unique sequence of the bacteria’s DNA found in listeria patients, for example, indicated a high likelihood the illness came from the same source.

According to Dr. Steven Musser, FDA’s deputy director for scientific operations, Lipman had the foresight to understand the value of trustworthy and available data on genome sequences for scientists and researchers throughout the world.

“He was a visionary in his ability to look at how important this information would be,” said Musser. “It’s one thing to have books in the library and another to have people look at them, read them and make use of them.”

Researchers not only make use of the genomic “books” that the biotechnology center provides, they also contribute to the collection by sending raw data for the center to annotate and add to the database.

GenBank is one of several enormous repositories developed under Lipman’s leadership that the public can use online for free anywhere around the world—from PubMed Central, an archive of articles from thousands of the world’s leading biomedical journals, to PubChem, a resource that connects chemical information with biological studies.

For GenBank, Lipman and his team process hundreds of thousands of genome sequences daily, using high-performance computing. They manage incoming data, clean it up, remove mistakes, annotate it and release it in a form that is easier for scientists, researchers and others to use.

“The government does an amazing job for the world,” Greninger said, handling this “firehose of data” and making sense of it.

Lipman, who has led the center since 1989 as its first and only director, doesn’t just manage the organization. He also has made “the single largest contribution to the development of these essential algorithms of the software, contributing many highly original and unique ideas and principles,” said Eugene Koonin, a senior investigator at the center.

A side benefit from the center’s work is that federal agencies, states and local governments now collaborate in ways they didn’t before, according to Musser. FDA and the Centers for Disease Control and Prevention “kind of fumbled around,” he said. “Now, we work together seamlessly.”

Collaborators also include the Department of Agriculture and state public health labs, as well as scientists and researchers in Europe and Canada.

“This is bringing people together who would not be brought together any other way,” Musser said. “It’s one of the most significant outcomes.”

GenBank started as a small and simple database. Now Lipman and his team realize they can do much more to make it easier for researchers to use the data.

“We’re doing many more steps than researchers would have done on their own. Now CDC and FDA folks can push a button and get the answer,” Lipman said.

“We’re really at the cusp of a transformation of how we do infectious disease surveillance and public health worldwide,” Lipman said. “This is no longer just a research tool on infectious disease, but a tool to combat it.”