The research out of UC San Diego and the J. Craig Venter Institute (JCVI) offers promise for non-invasive diagnostic tools by taking the genetic sequencing of fecal samples as opposed to blood sampling. There are trillions of bacteria, viruses and other microbes that live in the human gut that could be insightful for diagnosing diseases and gauging the health of humans.
Researchers used the genetic sequencing technique metagenomics that breaks up the DNA of microbes in the human large intestine, or the gut, with 30 healthy people and 30 people who have inflammatory bowel disease (IBD), ulcerative colitis and Crohn’s disease. Around 600 billion DNA bases were sequenced and put into a supercomputer that reconstructed the abundance of a microbe species.
Each bacterium genome has thousands of genes that indicate a certain protein. Metagenomics allowed the reconstructed DNA to be translated into hundreds of thousands of proteins that could be grouped into 10,000 protein families. The Gordon supercomputer at the San Diego Supercomputer Center and the software, developed by JCVI associate professor Weizhong Li, used 180,000 core-hours to group the protein families.
Researchers used what they called “fairly-out-of-the-bag” machine-learning techniques to decipher the patterns in the numbers taken from the protein families to be able to identify and classify major changes in the bacteria from the large intestine from both sample pools. They used standard biostatistics to show the 100 proteins families that determine whether a patient is healthy or has a disease. Then they used that protein family information to make a machine learning classifier that determined what the remaining 9,900 protein families were.
“You can try to categorize healthy and sick people by looking at their intestinal bacterial composition,” said UC San Diego biomedical sciences graduate student and researcher Bryn Taylor. “But the differences are not always clear. Instead, when we categorize by the bacterial protein family levels, we see a distinct difference between healthy and sick people. This is because proteins are the workhorses of biology, and by analyzing the proteins produced by these bacteria, we can get an idea of what the bacteria are doing in your gut.”
The UC San Diego researchers hope that this study will be a stepping stone for processing one million individual genes as opposed to the 10,000 the supercomputer currently does.
The study was presented at the 2016 IEEE Big Data International Conference in Washington, D.C., and was published on the Calit2 Qualcomm Institute website.
[Want to stay more on top of MDO content? Subscribe to our weekly e-newsletter.]