tranSMART: Democratizing Translational Data to Accelerate Patient Care

This is from a group effort, including the University oif Michigan Medical School, University of Londona, Janssen, and Recombinant Data Coporation.


tranSMART is the vision of Dr. Eric Perakslis[1], now CIO and Chief Scientist (Informatics) at the U.S. Food & Drug Administration. It is based on the i2b2 platform[2] that was developed by an NIH-funded CTSA award to Harvard Medical School.  Dr. Perakslis’s vision was recently re-emphasized at a talk at the Bio-IT conference in Boston, MA.[3]. It is open source, based on a GPL license.



There exists a tremendous need to accelerate translation of discovery in biomedical research and translate invention into better patient care in the context of avoiding harm and tracking risk.  A multiplicity of different organizations, ranging from pharmaceutical and biotechnology companies to academic medical centers, contain silos of health data that could be shared using using a common set of analytical tools without compromising intellectual property.  However, collaborative analysis of medical research data sets needed to make data-driven decisions for translational research is not scalable today. Scientists with knowledge of biological processes are unable to access prior research results without needing to work through biostatisticians, IT support teams, and public data resources using antiquated tools and methods. Often, biostatistician resources are miss-spent answering simple questions with scientists receiving limited answers. The ultimate result is that organizations lack resources to ask advanced analytical questions of data sets, and are constrained by barriers between institutional entities when collaborative data-sharing could realize beneficial patient outcomes in a more open environment. Disparate groups lack needed standard integration within and between data sets, across disparate domains including ‘omics’, clinical research, and outcomes, linked with scientifically meaningful semantics.



A platform that enables scientists to share high quality data across experimental data sets with standardized storage, query, analytics, and visualization models is needed to enable integrative informatics driven analyses. tranSMART is a community of organizations involved in clinical and translational research including pharma, non-profit, government, and academic groups who collaboratively build, share, and use a common data platform to break down technical and cross-organizational barriers that prevent critical analyses across related data sets from being conducted.

The tranSMART platform is a set of data models, shared data sets, data transformation utilities, and analytical web applications that accelerates discoveries within complex biological systems by creating a standardized and semantically integrated database of research results linked to reusable and scalable self-service analytics.


1.      Example Use Cases

A.     Use case#1: Pharma Biomarker stratification

Problem: A pharmaceutical company has executed a phase 3 oncology clinical trial. The compound was ‘partially effective’ for the indication so a secondary phase 3 trial using a companion diagnostic for an inclusion criteria is viable.

Solution: Data from public studies, cell line assays, gene expression, prior trial results, and pathway analysis are loaded into tranSMART and are used to identify and verify candidate biomarkers.

B.     Use case#2: Pharma Indication selection

Problem: A compound is known to actively interfere with a key pathway in inflammation. It is approved for a single indication but many others are considered. Selecting follow-up indications requires more knowledge.

Solution: Data is integrated into tranSMART regarding the pathway from public studies, cell line data, pre-clinical studies, clinical trial data, and observational studies of patients with positive results from the compound. These are integrated to evaluate gene expression and genomic profiles/signatures of diseases, available alternative therapeutics, and personalized phenotypes to establish high probability indications to focus on.

C.      Use case#3: Research Consortium Biomarker/mutation collaboration

Problem: The genomic basis for a disease is complex/multigenic. A number of studies (public and private) into the disease have generated differing results of potential markers. There is no location to ‘share data’ across groups to confirm novel hypotheses in individual data sets.

Solution: tranSMART is deployed to establish a collaborative neutral hosted system for sharing heterogeneous data sets across the groups committing to pre-competitive data sharing. Each group is given controlled access to data as established by collaborators in a governance model, where viable data sets are downloaded into local research applications.

D.     Use case#4: Scientist Self-service knowledge search

Problem: An investigator studies the function of a specific gene relating to disease and treatment (e.g. IL7). The organization conducts many studies that include IL7 results as a part of microarray and next gen sequencing (NGS) experiments, but they are not available to the scientist.

Solution: Results from analyses across studies including internal, public data, and data from collaborations are integrated into a search engine. The search engine is able to both allow interactive searches to look at combinations of the gene, related pathways, and indications of interest, and subscription to saved searches to distribute ‘new results’ when they are available.

E.      Use Case #5: Molecular Diagnostic Company selection of pharmacogenomic markers

Problem: The Science Team of a biotech company needs to assess the feasibility of using NGS-based diagnostics for its laboratory-developed test (LDT). They have identified variants from NGS data that be informative in the context of the development of a new pharmacogenomic test for psychiatrists, but require confirmation from a large set of genomic sequences, as well as any outstanding regulatory issues that need to be addressed.

Solution: Results from analyses across studies including internal, public data, and data from collaborations are integrated into a search engine. The search engine is able to both allow interactive searches to look at combinations of the pharmacodynamic gene sequence in the context of the 1000 genomes project and other public/private sequence database resources. The integrated database search enables them to confirm access to a larger dataset of human genome sequences, as well as existing or anticipated regulatory hurdles that they have to address before test validation and launch.

See for more information.







Tom Munnecke


Looks good.