Lessons learned from eMERGE: Mapping common Data Elements (DE)

The eMERGE (electronic MEdical Records and GEnomics:https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page)  network was funded by the NIH (NHGRI) to mine data from Electronic Health Records (EHRs) for mapping Phenotype-Genotype associations using large population data drawn from EHRs and biorepositories. The network consists of five primary sites: (1) Vanderbilt, (2) The Marshfield Clinic, (3) Mayo, (4) Northwestern and (5) Group Health Seattle. There are also ancillary investigators at other sites.

I am not part of this NIH-funded program, but would like to use this as a model for how OSEHRA could be configured for discovery of gene-phene associations. As is often the case, definition of clinical phenotypes is a first step, and there are many databases avilable for this purpose. The International Health Terminology Standards Development Organization is one approach for harmonization. For example, since SNOMED CT is presumed by many to be the most comprehensive clinical terminology available, it can be searched at www.hl7.org.

There are different meanings for what defines a Common Data Element (from www.biomedcentral.com/1755-8794/2/66):

"A Common Data Element (CDE) is a metadata definition with an informal explanation of its meaning and usage, a list of alternative names and definitions, units of measurement, and the type of values to be recorded. CDEs can be created for any kind of concept, measurement, or application,and, although grouped into "Data Element Concepts" for convenience, need not derive their meaning from their position in a complex hierarchy or graph. This is in contrast to the ontological approach to data definition, often used in bioinformatics applications, where each subclass is part of a specification for a representational vocabulary for a particular domain. Although classifications or ontologies can be added to a database of CDEs, they can be used to support navigation and inference on an application-specific basis: there is no requirement to locate a CDE within an existing domainontology before recording the semantics of a data definition."

The following Table was taken and modified from Table 2 in "Mapping clinical phenotype data elements to standardized metadata respositories and controlled terminologies: the eMERGE network experience" (see: Pathak J et al. J Am Med Inform Assoc 2011;18:376-386 - http://jamia.bmj.com/content/18/4/376.long).

Table #1: Glossary of key terms and definitions. Modification - any errors are mine!

"For querying the caDSR, we use the caDSR HTTP API, which allows an application to connect to caDSR remotely and search the database. The API provides various forms of functions for querying the caDSR, and returns the results in a well-formed XML document. As mentioned above, ...this is based on the ISO/IEC 11179 model for metadata registration and, as a result, decomposes the essence of a DE in well-formed parts, separating the conceptual entity (DE concept) from its physical representation in a database (value domain). The DE concept may be associated with an object class and a property, and the value domains have a list of permissible values (Figure 1). Consequently, our searches for appropriate string matches were restricted to the DE concept and permissible values of the CDEs in the caDSR."

Figure 1: Cancer data standards repository (nci) caDSR and ISO/IEC 11179 model for metadata registries  (from Pathak J et al. J Am Med Inform Assoc 2011;18:376-386 - http://jamia.bmj.com/content/18/4/376.long)


Well, I hope that this provides some value for the start of a discussion. I must admit that I am learning as I go, even though I have been funded in the past by the National Science Foundation to help to develop a multi-scale ontology for cardiac anatomy. Let me know if this makes sense in the current context of the project.