Progress on RDF/Sparql repository for VistA Foundation Schema

This is a follow up to my earlier proposal to use RDF and SPARQL to create a directed graph to map out all of the foundational elements in a VistA instance.   My goal is to create a SPARQL Endpoint through which we have common, universal access to all of the elements of the software "engine" that drives a VistA implementation.  I am calling this repository a Foundation, which is defined by a Foundation Schema (here is a rough draft of this schema)

This has many advantages:

  1. It creates a unique name for every element in the Foundation, called a URI.  This eliminates any ambiguity as to which element is being referenced, where it is located, and what version is being referenced.  This insures that everyone using the same name is referring to the same object.
  2. It creates a common REST interface format, exposing elements through a web-accessible.  For example, this is a list of everything our test repository knows about the VistA routine AABSVE :
  3. It allows sophisticated searching and reporting through the SPARQL language,
  4. It allows programmatic access through APIs do do more specific activities, such graphical displays, Javascript-based user interaction, or other software.
  5. It provides a common access point for the information within the Foundation.  Currently, OSEHRA maintains information on spreadsheets, flat files, wiki pages, GITHUB repositories, Enterprise Architect UML, emails, discussion threads, etc.  A common repository naming these sources would allow us to have a common versioning and access process, as well as archiving and comparison across forked implementations (e.g. Indian Health Service RPMS, DoD CHCS, World VistA, etc.)
  6. While much of the Foundation will be internal-use only, other aspects of the Foundation may be suitable for linkage to the burgeoning semantic web/linked data cloud markets (e.g. knowledge bases, coding systems, etc.)
  7. The Foundation could be used to directly connect to larger government "mashups" - such as Enterprise Architecture linkage of OMB 300 information to specific hospital information, such as clinic location, patient loads, software installed, etc.
  8. It would allow us to network together the Foundation RDF/SPARQL Endpoints of all VistA instances, allowing system-wide queries.
  9. It would allow us to use sophisticated knowledge management tools such as Stanford Protege Ontology Editor and OWL language for advanced ontological support.  This could be used to measure the consistency or the completeness of a foundation, or make advanced inferences about the relationships in the Foundation.
  10. It introduces an architectural concept of what I'm calling a Semantic Overlay model for VistA - linking together disparate systems through RDF/Semantic web technology, allowing information to be associated in a shared information space.  This loosely coupled, fine-grained, large-scale network approach is a radically new way of thinking about systems architecture.  Typically approaches are based on an assumption of a tightly coupled, centralized, "integrated systems" based on hierarchies that parallel the organization chart of the organizations.  Frequently, the organizations involved don't want to be integrated, so it becomes a delicate balance of telling Congress that things are being "integrated" while at the same time preserving the bureaucratic turf of the entities involved.  I have participated and watched the VA/DoD integration battlefield play out for 3 decades now, and think that a semantic overlay approach could be a "silver bullet" to allow all particpants to preserve their turf while still sharing medical information in support of our veterans and warriors. One metaphor would be to think of the current designs as being based on Castles and Drawbridges - who controls which castles, who lowers which drawbridges, and who connects the drawbridges together.  A semantic overlay model could be considered a Passport model, providing for more open potential passage across a much larger territory, but also providing a much finer-grained, individualized control of who sees what, when, and how.  (Tip of the Hat to Mike Davis of the VA for an early version of this metaphor).

Moving the Prototype Forward:

I will be the first to acknowledge that this is my first serious foray into RDF/SPARQL semantic web design.  It is a steep learning curve for me, but at the same time, resonates with much of the orginal design concepts of VistA from it's earliest beginnings.  There are many levels of understanding involved, from "abstraction junkies" who can leap tall ontologies in a single bound, and many others (most?) who will look at RDF and think "what in the world is this all about?"

It's a little like moving from arithmetic to algebra.  You can do a lot of things, knowing only arithmetic.  You can calculate the square footage of a room by using tables and arithmetic listing the length and width of the room.  For square rooms that happen to fit the table, this works fine.  However, understanding area as an algebraic formula gives us a much richer toolset.  Learning algebra requires some effort, but lifts us up to new ways of understanding the world that are unimaginable from an arithmetic point of view.

Semantic web/RDF thinking is analogous to moving to algebra instead of arithmetic.  Not everyone has to learn all the details - not everyone in IT has to know SQL, nor would everyone using RDF need to know SPARQL.

Regardless, I think is important that we move forward with a working proof of concept of a Foundation repository.  This would give us real world experience in getting the data into a common format, as well as understanding how the queries work, etc. 

Here are some steps I suggest:

  1. Opening up an OSEHRA RDF repository.  I suggest using the Sesame 2 server available at  The advantage to using this server is that it supports the latest version of SPARQL.  There are many other servers available, ranging from free to very expensive.  This server seems to be a good starting point for investigation.
  2. Load the refactoring teams' RDF into this repository, as well as other RDF available from other sources - Conor Dowling's data dictionary RDF would be of particular interest.
  3. Get other input on how this repository might be used for SKIDS, visualization of graphs, architecture, integration into an Eclipse-based VistA refactoring tool, etc.

I think that this is a very exciting technology to be explored, but will also require significant intellectual horsepower from a broad variety of folks to pull it off.  To move beyond this level of demonstration will require additional funding.