A Proposal to create a Universal Namespace for Health Information Architectures

Proposal:

To introduce an architectural priciple called a Universal Namespace: that all information objects have a unique name that can be universally referenced.  This has the effect of creating an information space framework in which clinical data, medical knowledge, genomic information, and other knowledge can be linked.  As a large-scale, fine-grained network of information, it allows security and privacy considerations to be dealt with great power and simplicity.  By providing a simple but universal connectivity model, it provides a way of connecting information through linguistic references, rather than just special purpose, static application program interfaces (APIs).  This allows users to design from a state of connectivity - they have some level of connectivity "out of the box" - which in many cases may be good enough.

Shifting our architectural perspective from a single, "integrated system" to a more general "information space" opens the doors to many forms of innovation.  This opens the architecture up to all of the web tools that are so prevalent in today's world.  It allows extension into the Linked Data movement, semantic web, and integration with genomic knowledge management system, personalized health, and innovations not yet considered.

This principle still supports integrated systems approaches.  Amazon.com, for example, is an independent system embedded in the web.  Inside it's web site, it has huge technical capacity, and technology that is proprietary - and invisible - to the rest of the web.  But, by embedding itself in the web (rather than, say, using text messaging on cellphones or using a bank of dial-up modems) it opens itself to a broader world of connectivity.

This proposal offers very fine granularity for understanding and controlling patient privacy.  Some objects may be very tightly controlled (or invisible) to the general public (say, one's psychosexual history), other objects may be generally viewable and searchable via Google (e.g. someone might opt-in to an exercise/weight loss competition)

In a future post, I'll be discussing how I think this principle can lead to a notion of what I'll call emergent simplicity.  Yahoo, in its early days, tried to maintain a staff of web organizers, creating a hierarchical "table of contents" to the web.  This effort collapsed out of its own weight.  Google emerged with its scalable search technology, introducing a much simpler way to find material on the web - an example of emergent simplicity.  Tim Berners-Lee could not have designed Google in 1995, the web had to evolve from its simple beginnings, and go through a chaotic, evolutionary process in order for Google to emerge. 

Similarly, we can look at the current explosion of complexity in health IT, privacy, genomics, personalization, cost containment, knowledge management, research, and "big data" as the breeding ground for the emergence of a new, higher-level understanding of health and technology.  Creating a universal namespace for health information can be seen as a babystep towards a new model of emergent simplicity in our health care system.

Discussion:

The notion of a universal resource identifier (aka URL) is one of the founding principles of the World Wide Web: As Tim Berners-Lee wrote in 1996:

The Web is a universal information space. It is a space in the sense that things in it have an address. The "addresses", "names", or as we call them here identifiers, are the subject of this article.  They are called Universal Resource Identifiers (URIs).

An information object is "on the web" if it has a URI.  Objects which have URIs are sometimes known as "First Class Objects" (FCOs).  The Web works best when any information object of value and identity is a first class object.  If something does not have a URI, you can't refer to it, and the power of the Web is the less for that.

By Universal I mean that the web is declared to be able to contain in principle every bit of information accessible by networks. It was designed to be able to include existing information systems such as FTP, and to be able simply in the future to be extendable to include any new information system.

The URI schemes identify things various different types of information object, wich play different roles in the protocols. Some identify services, connection end points, and so on, but a fundamental underlying architectural notion is of information objects - otherwise known as generic documents. These can be represented by strings of bits. An information object conveys something - it may be art, poetry, sensor values or mathematical equations.

The Semantic Web allows an information objects to give information about anything - real objects, abstract concepts. In this case, by combining the identifier of a document with the identifier, within that document, of something it describes, one forms an idenifier for anything. This is done with "#" and fragment identifiers, discussed later.

This concept has gone on to be one of the most successful technological innovations of our time.  Facebook, Amazon, eBay, and Twitter present trillions of URLs to the world that can be referenced within a single linguistic framework.  This URIs can be used irrespective of one's geological location, device, or type of communications connection to the internet.

Tim Berners-Lee envisioned the web as an open space for information to exist:

What was often difficult for people to understand about the design of the web was that there was nothing else beyond URLs, HTTP, and HTML.  There was no central computer “controlling” the web, no single network on which these protocols worked, not even an organization anywhere
that “ran” the Web. The web was not a physical “thing” that existed in a certain “place.” It was a “space” in which information could exist.

If someone wants to tweet a book reference, they simply drag an Amazon URL (say, http://www.amazon.com/Creative-Destruction-Medicine-Digital-Revolution/d... )  into Twitter.  Someone else can make a comment, which can be referenced as well (e.g. http://www.amazon.com/review/R3UEA8TJP4YW12/ref=cm_cr_pr_perm?ie=UTF8&AS... )  The URLs may become difficult to read manually, but the web can use this information to uniquely identify the information.

Note that Amazon and Twitter did not require an "interface" between them - their connection was part of a linguistic relationship, managed by the web's design and infrastructure.  The web's connectivity is defined by this linguistic layer as an intrinsic property of the information space.  This connectivity does not preclude more specific API-sytle interfaces.  If Twitter wanted to use Amazon's shopping cart, for example, it could develop a specialized interface.

 

Comments

Stephen.Hufnagel

Universal Namespace for Health Information

This is an interesting proposal, which would require venture capital to startup. My understanding is that Tom is proposing a Facebook or Amazon.com analogue for healthcare information, which would be managed on the web. The big questions are

  • Who owns and manages the information in the “cloud”.
  • Will clinicians trust cloud information and will they be willing to add information to the  cloud "repository".
  • How are medical-legal issues resolved

PHRs are somewhat similar to Tom’s proposal and they have had mixed results. In all of these proposals, one must follow the money … who is willing to pay for this service? This is the PHR and also HIE problem!.  Facebook, Google and Amazon.com have advertising/sales business models to subsidize the perceived free services.

Define a credible business model and I would support this concept. Otherwise, I believe this proposal is beyond OSEHRA’s scope and capability …

VISTACarol

My understanding was somewhat different. . .

I was reading this to mean that all repositories of health data should be given unique namespaces (identifiers), much like domain names, which could allow the data within them to be uniquely identified, through a chain of meaning and location. Future repositories could be set up - from their beginning - to have all data be easily addressable, and existing repositories could potentially become addressable, though they would have to have a schema-parsing layer added.

A meta-data analysis would also be needed - generally for medicine, and specifically for various jurisdictions, institutions,etc. - involving the identification of what data items, or data combinations, are sensitive, and at what level. Just because I know the "address" of some data would not allow me to gain access to it.

I may not be understanding this completely correctly, but I didn't see this as a call to put all data into something singular and new, but to create both a methodology and a conceptual framework - an additional layer of meaning - to be used with existing and future systems. Most systems are already connected to the Internet to some extent, and it's a matter of their data or "query-able items" having unique identifiers (and parsable schemas) - allowing authorized users and entities to use data across systems.

Tom, am I understanding you?

 

jjensen

My take is similar

I tend to agree with Carol's interpretation that Tom is proposing a standardized naming convention that allows identification and exchange of data elements by creating unique addresses based upon both the 'owning' system and the meta-data. I believe that the trend toward requiring use of C32, or similar structures, for extracting and sharing data currently resident in disparate respositories is a step toward this goal. Tom's vision is to unbind the current patchwork by establishing a syntax and grammar structure to add more addresses seamlessly, while maintaining the integrity and security of the original data sources. 

diverzulu

Unique identifier in the Health Semantic Community developments

If there is such a thing as a "Health Semantic Community" I think that some work has already been developed to try to uniquely identify different issues in that domain.

Please refer to the RTU (Referent Tracking Unit) work

http://www.referent-tracking.com/RTU/

and the papers that can be found there

http://www.referent-tracking.com/RTU/?page=papers

 

This is of course a research domain by itself and it has to be approached as seriously as the institutions involved are.

With the recent advent of technologies and scientific knowledge able to extract automatically ontology instances from EHR the work of Dr. Ceusters and his team has an increased value in my humble opinion.

 

Yours truly

David Mendes

conordowling

Every resource in every VistA gets identified - you too VA

to go narrow for a moment, giving every resource of every EHR a unique URL makes all the sense in the world and as Jenny says, current HIE specs overlay unique ids on those portions of EHR data that go into CCDs. Each publisher has a unique id and the contents of their documents are unique in this context.

The main problem with the "standard" mechanism is that it uses OIDs ( Oy Vey OIDs! ) Why use a technology last seen as promising in the early 1990s when you can use URLs?

To scope up to part of what Tom is addressing: give every VistA resource a URL and that's what I did/had to do in FMQL. Every VistA instance gets a unique base URL (ex/ vista.caregraf.org ) and once inside VistA, files establish context (file 2 is Patient so http://vista.caregraf.org/2 ) and their contents have unique ids (IENs) so record 9 in file 2 is http://vista.caregraf.org/2-9, the 301ist Vital Measurement is ... This scales and it's unambiguous. Addressing and identifying information from an additional VistA (or EHR) is easy - just add a new base URL (vista.va.gov/boston1/ , vista.va.gov/westla_test/ ...)

You could say URLs liberate data, establishing each piece as a standalone entity. If you think about it, nothing properly exists without identity.

Two asides:
- if you move things: newvista.../2-9 owl:sameAs http://vista.caregraf.org/2-9 ...
- "same concept in two vistas": http://vista.caregraf.org/50-10 owl:sameas anothervista/50-55

As for being within OSEHRA's scope: if you want to document all the VistAs running and converge their code bases, then identifying each unambiguously is a good start. If you want to document where they differ, then identifying their content unambiguously is mandatory. You can't compare and contrast without identity.

I realize that the VA is reluctant to publish the setup of their VistAs though that would be a great step forward for all: were there to be a place for identifying these setups, a base URL assigned by OSEHRA then maybe ... (osehra.org/vistas/va/boston-1 ...),
Conor

Tom Munnecke

We aren't that far from a Universal namespace already

Thanks for your comments...

As Conor pointed out, we are already close to this architecture through his FMQL interpretation of the data dictionary. 

re: Steve's questions:

Who owns and manages the information in the “cloud”.

Will clinicians trust cloud information and will they be willing to add information to the  cloud "repository".

How are medical-legal issues resolved

These -and many more- are exactly the questions that need to be asked - and in the context of a modern networked approach rather than the hierarchical "Castles and drawbridges" architectures that seem to be proliferating.  We sequester information into enterprise "castles" and then try to manage the flow of information via drawbridges. 

For example, VA's NWHIN interface to Kaiser is a "drawbridge" model between the VA and Kaiser "Castles."  The information flow is legally defined in a 39 page lawyer-friendly DURSA agreement, which I can't imagine any clinician ever reading or even knowing exists.  But, it keeps the bureaucrats happy.  We just push the data across the drawbridge, and assume that the hundreds of thousands of people who have access to this are well-behaved :)  This is sometimes called the "One Size Fits None" model of information sharing.

If we used a universal namespace approach, we would have the potential for a large scale, fine grained network approach.  We would have a simple, repeating model that could mediate access by content and context at whatever level of granularity is appropriate.  The location of where a patient's blood sample at his home down clinic may not be particularly sensitive, but the location of a Navy Seal in some remote African country could be extremely confidential.  Similarly, genomic information that may appear to be innocuous in one context could assume a very different meaning as new research appears.

Understanding the contextual sensitivity of medical information (even its existence) requires a far more subtle approach than "castles and drawbridges."

I don't think that these principles are beyond the scope of OSEHRA - I think that they are necessary if we are ever going to understand the metadata-level operations of VistA.  (VistA FileMan security uses this level of granularity, by the way).

I think that there is a lot of technology out there today which is of interest to this:

Project hData http://www.projecthdata.org/ :  

hData is designed for ease of implementation and improved efficiency by reducing the size of the data set, implementing a single way to represent data, and using standard web best practices. Crucial to achieving these design goals is a clear separation of content and exchange specifications. For this, the HL7 hData Record Format defines tbe concept of a hData Content Profile (HCP), which allows the specification for a detailed content model (including syntax, semantics, and behavioral models) for a given business use case.

Security and Privacy Ontology: http://wiki.hl7.org/index.php?title=Security_and_Privacy_Ontology

One of the ways to determine the scope of the ontology is to sketch a list of questions that a knowledge base based on the ontology should be able to help answer.

  • Can Dr. Bob update Mr. Jones’ progress note?
  • Does Mr. Jones’ consent directive conflict with organizational policy?
  • Does Mr. Jones’ consent directive allow Dr. Bob to read his medical history?
  • Is there information in Mr. Jones’ surgical report that requires a higher level of confidentiality because of its sensitivity?

Semantic Web Health Care and Life Science group at W3C: shttp://www.w3.org/blog/hcls/

The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine. These domains stand to gain tremendous benefit from intra- and inter-domain application of Semantic Web technologies as they depend on the interoperability of information from many disciplines. Please see the accompanying Use Cases and Rationale document.

My conversation with Isaac Kohane about the role of the Semantic Web in the future of the EHR:   http://youtu.be/VS8AIRFsxSw?t=4m10s 

Tim Berners-Lee's vision of Linked Data Cloud http://youtu.be/OM6XIICm_qo?t=32s providing an overarching technology for the semantic linking of data.