Code Convergence and higher level tools for large scale code management.

Let me congratulate Conor Dowling for his excellent work to date using his semantic vista technology.  I think that this is a good start on getting a handle on the complexity of the multiple versions, and also an interesting platform to look at future evolution of the technology.

Let me also say, that while I was one of the original creators of the Package file, I have not been "in the trenches" with the various forks of VistA over the years, and am not an expert with what each group has done to the architecture.

I do know, however, that refactoring, converging, moving to SKIDS, and preparing for iEHR is a tremendous strain to put on any system, particularly one as large-scale as VistA.  Sooner or later, OSEHRA/VIstA needs to address the fact that we are dealing with an Ultra-Large Scale system, e.g.

I see some problems with relying on the Package File for convergence:

1.  It may not be up to date

2.  It does not address all of the elements of the full installation, e.g. other languages, elements, domain-specific languages, file and table builds, etc.

3.  The package file and the underlying objects it points to can get out of synch... folks might modify routines without modifying the package file, or vice versa.

4.  I'm not sure how this would connect with future architectural migrations to other languages and implementation formats.

5.  I'm not sure how gracefully the XINDEX dependency mapping would fit into the Package file.

I've been talking with Conor and others about some ideas to create a broader toolkit to accomplish these goals, and more. The idea is to create a single directed graph to express all of relationships in an open RDF/Semantic web based repository.  This Foundation is a collection of everything necessary to run an instance of VistA, and can associate whatever source code (in whatever language), documentation, files, ontologies, versions, installation histories, etc. defined by a Foundation Schema

The foundation of a VisA instance is visible through a SPARQL Endpoint, so that we have a common entry point from which to discover or query everything we need to know about a VistA instance, can compare different versions.  XINDEX could emit its results in an RDF format, to be deposited into the foundation repository.

This would also enforce a naming convention and procedures to insure that each element in the repository is uniquely named, and if necessary, versioned.  Having a unique URI per element insures that everyone knows which version of the object they are dealing with, and its history through the system.

Once we build a repository, we can then analyze it directly through SPARQL queries,  customized software, or generalized directed graph display software.  We can run the repository as tightly or as informally as we wish, (or let different elements have different policies.)  We can use the simpler RDFS schema ( my first cut at it is at ) or we can move to much more sophisticated OWL support that could do all kinds of consistency checking, inferencing, and dependency modeling.

This also opens the door to dealing with "Big Data" problems the VA is already facing, but will do so in the future at an accelerating rate.  For example, the petabytes the Million Veteran program could be expressed in Semantic web format, and connected through Linked Data Cloud principles  This would allow VistA to evolve as a "Semantic Overlay" model, connecting to other resources through semantic definitions of RDF and its associated standards.

Of course, privacy and security issues are growing exponentially with the prevalence and interoperability of data.  This approach would allow us to treat privacy and security controls from the same semantic level as the data, "ratcheting up" privacy and security to match the increased visibility and interoperability of our data.  Mike Davis from the VA has done some excellent work on this at

I think that Conor's initial work is a good step in a broader direction to use semantic web technology to deal with the scale and complexity of our task at hand.

I'd be delighted to work deeper on this desgn... I think it has huge potential...

here's my other posting on RDF