My intent is to upload the various file formats for genomic data as defined by the HL7 (Health Systems 7) Clinical Genomics working group, but I am too dumb to figure out how to load documents on the OSEHRA site - I will start by initiating a discussion.
Clinical practitioners can now interactively produce and query a patient report for genetic tests spanning over 2000 inherited diseases from a single whole-genome sequence, using www.genetests.org (www.ncbi.nlm.nih.gov/sites/GeneTests/?db=GeneTests) as a valid guide.
The amount of human genomic data is accumulating at an unprecedented rate (see Figure 1 below). For example, the BGI at Shenzhen, China has now installed over two exabyte (2 billion gigabytes) of storage to house DNA sequencing data. The institute will use the storage infrastructure to unify its 250 next generation sequencers onto a single shared pool of storage with a single file system. The BGI’s computing platform is greater than 1000 Teraflops, or one quadrillion floating point operations per second. BGI, as it is now known, is the world’s largest genome sequencing center. Its sequencing output is now more than 40,000 human genomes per year. Its key accomplishments have included the first de novo sequencing and assembly of various mammalian species including the human genome with short-read sequencing (so-called “next generation sequencing”) and the first sequencing of an ancient human genome. It has received over $1.5 B in collaborative U.S. funds from the China Bank.
The storage and access of different files containing patient genomic data represents a “Big Data” challenge, as was elaborated in PCAST NITRD “Big Data” Strategy Directive 12/2010:
“Data volumes are growing exponentially”
- There are many reasons for this growth:
- the creation of nearly all data today in digital form
- a proliferation of sensors (e.g. Next-Generation Sequencing)
- new data sources such as high-resolution imagery and video.
- The collection, management, and analysis of data is a fast-growing concern of NIT research.
- Automated analysis techniques such as data mining and machine learning facilitate.
- Transformation of data into knowledge, and of knowledge into action.
“Every Federal agency needs to have a ‘big data’ strategy”
The next blogs in this sequence will define the Technical Requirements and routes for EHR integration of these massive patient-specific data records.