At the last Architecture phone call, I suggested using semantic web/ontology technology to describe the VistA architecture, which would give us a formal, common platform for discussing everything required to install and operate a VistA instance.
Attached is the proposal, which points to the RDFS of the schema at http://www.metavista.name/foundation/foundation.rdfs
An RDF approach to managing the VistA software foundation
January 9, 2012
This is a proposal to use a common Resource Description Framework (RDF) approach to identify and manage the components in OSEHRA software effort.
What is RDF?
Resource Description Framework (RDF http://www.w3.org/RDF/ ) is a standard model for data interchange on the Web by the World Wide Web Consortium W3C http://wc3.org RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.
RDF extends the linking structure of the Web to use URIs (Uniform Resource Identifers; the URL is one form of a URI) to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.
This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.
RDF represents complex relationships as a collection of assertions: A subject, a predicate (or “verb”), and an object. For example, from the XINDEX Fanin file:
3. PACKAGE NAME: QUASAR
RDF might express this information as triples (in pseudocode):
QUASAR is a package.
ADD^VADPT is an entrypoint.
VADPT is a routine.
ADD^VADPT is part of the Registration Package.
QUASAR calls ADD^VADPT.
An RDF Schema http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#rdfschema is a way of formally describing what is said in RDF expressions. For example, it would describe the fact that we are using Routines, and Packages as subjects and objects, and using “is_a,” “is_part_of,” and “calls” as predicates or verbs to describe the relationships between them. Schemas may be simple (or, not even pre-defined), or they may be allow rich expression of assertions and consistency constraints. Protégé http://protege.stanford.edu/ is an open source ontology editor and knowledge-base framework which can be used to manage these schemas.
RDF assertions may be stored in a triple-store data base (e.g. Sesame open source triple store http://www.openrdf.org/ ) to create a directed graph of information. This provides a much richer form of expression than that possible via traditional relational databases accessed via SQL.
SPARQL http://www.w3.org/TR/rdf-sparql-query/ is a query language to express queries across diverse data sources and schemas, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. A SPARQL endpoint http://semanticweb.org/wiki/SPARQL_endpoint is a machine-readable interface to a knowledge base, allowing queries to be returned in both machine and human readable formats.
The Foundation Schema: An RDF approach to VistA.
This paper proposes the creation of an RDF Schema called Foundation that would a common SPARQL endpoint with which to everything required to install and operate an instance of VistA:
1. Name all of the elements that are required to operate an instance of VistA. This includes routines, globals, packages, FileMan files and descriptions, test scripts, documentation, APIs, entry points, ontologies, file and table builds, device information, etc.
2. Name all of the relationships between these elements, such as which routines call others, belong to which packages, use which files, execute code, relate to external activities, etc.
3. A triple store repository for collecting all of this information, and providing a SPARQL End Point for query, statistical analysis, comparison between versions (e.g. VA and HIS forks), and tracking changes over time to a given fork.
This schema would provide a formal mechanism for defining the “glue” that pulls together all of the software and elements within VistA. It is capable of expressing the meta-level information that has driven VistA over the years. For example, Semantic Vista http://www.caregraf.org/semanticvista is an approach to expressing VistA FileMan metadata in RDF format. The information gleaned from the XINDEX refactoring project http://www.osehra.org/group/ehr-refactoring-services could be formatted into RDF Foundation format, which would allow the linkage between routines, packages, and FileMan files and fields to be expressed. The OSEHRA SKIDS project http://www.osehra.org/group/skids could use this repository for source code version control that is able to express the subtleties of the FileMan data dictionary. This framework could also be used by the architecture group http://www.osehra.org/group/architecture efforts as well.
Through the use of SPARQL, the foundation schema could be linked to other repositories, as well. For example, genomics ontologies http://www.osehra.org/blog/lessons-learned-emerge-mapping-common-data-elements-de-1 or linked to an XML feed of the Federal Enterprise Architecture http://www.itdashboard.gov/data_feeds .
An Experimental Prototype
An RDF Schema for the Foundation at http://www.metavista.name/foundation/foundation.rdfs The initial version defines classes (which may be used for subjects or objects) Routine, Package, File, Global, Parameter, X_code (Executable code), and Language. It defines properties (which may be used as verbs) of calls, entrypoint, contains, has_input_parameter, has_output_parameter, embedded_languge, set_global, kills_global, reads_global, uses_file, and uses_field,
An experimental version of this is in a data store is available online at http://vistaewd.net:8980/openrdf-workbench/repositories/gpltest/contexts courtesy of George Lilly.
A version of the RDF of the Patient File as captured by Conor Dowling of http://www.caregraf.org/semanticvista and has been loaded to the repository.