Since the premiere of the wildly popular 1993 dinosaur cloning film Jurassic Park, the science, genetic engineering and genomics featured in the film have advanced at breathtaking rates. When the film was released, the Human Genome Project was already working on sequencing the entire human genome for the first time. He completed the project in 2003 after 13 years and at a cost of $1 billion. Today, the human genome can be sequenced in less than a day and at a cost of less than $1,000.
The Wellcome Sanger Institute in England, a leading genomics research organization, is on a mission to improve the health of all humans by developing a comprehensive understanding of the 23 chromosomes in the human body. They are relying on cutting-edge technology to operate at incredible speed and scale, which involves reading and analyzing an average of 40 trillion DNA base pairs a day.
With advances in DNA sequencing technologies and computational biology, high-performance computing (HPC) is at the heart of advances in genomic research. The powerful HPC helps researchers process massive sequencing data to solve complex computing problems and perform massive, resource-intensive computing operations.
genomics at scale
Genomics is the study of the genes or genome of an organism. From treating cancer and fighting COVID-19 to better understanding the growth and cellular development of humans, parasites and microbes, the science of genomics is booming. According to Fortune Business Insights, the global genomics market is projected to grow from $27.81 billion in 2021 to $94.65 billion by 2028. Enabling this growth is an HPC environment that is contributing daily to a greater understanding of our biology, helping to accelerate the production of vaccines and other approaches to health around the world.
Using HPC resources and math techniques known as bioinformatics, genomics researchers analyze massive amounts of DNA sequence data to find variations and mutations that affect health, disease, and drug response. For example, the ability to search through the nearly 3 billion units of DNA in the 23,000 genes in the human genome requires an enormous amount of computation, storage, and networking resources.
After sequencing, billions of data points must be analyzed to look for things like mutations and variations in the virus. Computational biologists use pattern-matching algorithms, mathematical models, image processing, and other techniques to derive meaning from this genomic data.
a genomic powerhouse
At the Sanger Institute, scientific research is taking place at the intersection of genomics and HPC informatics. Institute scientists face some of the toughest challenges in genomic research to promote scientific discoveries and push the boundaries of our understanding of human biology and pathogens. Among many other projects, the institute’s Tree of Life program explores the diversity of complex organisms found in the UK through sequencing and cellular technologies. Scientists are also making reference maps of different types of human cells.
Science at the scale conducted at the Sanger Institute requires access to vast amounts of data processing power. The Institute’s Informatics Support Group (ISG) helps meet this need by providing a high-performance computing environment for Sanger’s scientific research teams. The ISG team provides support, architecture design and development services for the Sanger Institute’s traditional HPC environment and an extensive OpenStack private cloud compute infrastructure, among other HPC resources.
Responding to a global health crisis
During the COVID-19 pandemic, the Institute began working closely with public health agencies and academic partners in the UK to sequence and analyze the evolution and spread of the SARS-COV-2 virus. The work has been used to inform public health measures and help save lives.
As of September 2022, more than 2.2 million coronavirus genomes have been sequenced in Welcome Sanger. They are immediately made available to researchers around the world for analysis. Mutations that affect the virus’s spike protein, which it uses to bind to and enter human cells, are of particular interest and a target of current vaccines. Genomic data is used by scientists, along with other information, to determine which mutations may affect a virus’s ability to transmit, cause disease, or evade an immune response.
Society’s greater understanding of genomics, and the information science that goes with it, has accelerated the development of vaccines and our ability to respond to disease in a way that was never possible before. At the same time, the world is witnessing for the first time the amazing power of genomic science.
Read more about genomics, informatics, and HPC in this white paper and case study from the Wellcome Sanger Institute.
Advance Intel® Technologies Analytics
Data analytics is the key to extracting the most value from your organization’s data. To build a productive, cost-effective analysis strategy that yields results, you need high-performance hardware that is optimized to work with the software you use.
Modern data analytics span a wide range of technologies, from dedicated analytics platforms and databases to deep learning and artificial intelligence (AI). Just getting started with analytics? Ready to develop your analytics strategy or improve your data quality? There is always room to grow, and Intel is ready to help. With analytics technologies and a deep ecosystem of partners, Intel accelerates the efforts of data scientists, analysts and developers in every industry. Find out more about Intel Advanced Analytics.