Thanks in part to a partnership with Intel, the newest version of the Genome Analysis Toolkit developed by the Broad Institute of MIT and Harvard will be soon released under an open source software license – enabling more researchers to do high-performance analytics on troves of genomic data from a wide array of sources.
The fourth iteration of the software, GATK4, featuring new tools and a rebuilt architecture, is available now to preview on the Broad Institute’s website. A beta release is expected next month.
More than 45,000 researchers worldwide use GATK, touted as the industry standard for identifying variations in DNA and RNA data. The new version both expands its analytics capabilities and offers support for performance-enhancing technologies such as Apache Spark. Its cloud-based framework allows for faster and more efficient processing of large volumes of genomic data.
“Thanks to the rapid adoption of cloud computing, researchers can finally do away with many of the infrastructure-related complications that have hampered progress, especially at smaller institutions and startups,” said GATK’s creator Eric Banks, senior director of data sciences and data engineering at the Broad Institute.
As a fully open-source product, more researchers will have access to the toolkit, which researchers at the Intel-Broad Center for Genomic Data Engineering have fine-tuned to better enable the use of data sets that reside on private, public and hybrid clouds.
Other technology partners that contributed to GATK4 include Amazon Web Services, Cloudera, Google, IBM and Microsoft.
“Releasing GATK4 as open source was the obvious next step for our team,” said Geraldine Van der Auwera, associate director of outreach and communications at the Broad Institute, in a statement. “We believe it’s the most effective way to support the community, and we hope it continues to grow, innovate, and help researchers make insights that are essential for future human health breakthroughs.”
“Open source code is a foundation of efficient biomedical research,” said Brad Chapman, a research scientist at the Harvard T.H. Chan School of Public Health, in a statement. “It enables reproducibility, reuse and remixing by removing barriers for sharing and distributing analyses.”
“Open sourcing the GATK is a big deal for open genomics, and for open science in general,” added Jeremy Freeman, manager of computational biology at the Chan Zuckerberg Initiative. “Not only does it make this critical tool available to as broad as possible an audience for use, reuse, inspection and contribution – it provides a powerful example to the community for how an existing project can embrace open source.”