VISUAL CROSS-REFERENCE of PACKAGES, ROUTINES & GLOBALS is now available

VISUAL CROSS-REFERENCE of PACKAGES/ROUTINES/GLOBALS at: http://code.osehra.org/dox/ <-- NEW

The OSEHRA-VistA Visual Cross-Reference codebase-documentation is based on an automated XINDEX analysis and can be accessed directly via http://code.osehra.org/dox or from the OSEHRA web page -> Resources ->Development Tools -> Web Based Code Review.

 REQUESTED ACTION: Please provide suggestions for improvement.

Visual Cross-Reference needs to add:

  • Global/local file module/routine fan-in (e.g., verify ICRs)
  • Current module/routine version & patches
  • Patches and their Module dependencies
  • Module Install version dependencies
  • Package/module/patch links to online VistA Documentation Library (VDL)
like0

Comments

A good start

Tom Munnecke's picture

This is a good start at a static code-level analysis... some suggestions:

1) the visual display for the larger images was not really helpful... waaaay too wide for the brower, so it lost its effectiveness...  perhaps having a standard collapsible hierarchy display would be more effective.

2) Have you considered displaying this in a wiki format, so that each page is "wikified" with a "recent changes" listing, everything is fully hyper-linked, the sections are deep-linkable, etc?  I would suggest MediaWiki http://www.mediawiki.org/wiki/MediaWiki the wiki used to drive Wikipedia.  This would also allow additional metadata and tagging to the pages, for example, to track their status, or things like the use of a domain-specific language, archetypes, patterns, workflow process dynamics, REST exposure, cross-package semantics, etc?

3) I recall using a variation of utility back when I was Kernel team lead, and messing around with some dynamic displays, tracking the run time behaviors of the packages.  

I might mention the parable of the blind men and the elephant...  http://en.wikipedia.org/wiki/Blind_men_and_an_elephant

 

It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind

They conclude that the elephant is like a wall, snake, spear, tree, fan or rope, depending upon where they touch. 

Moral:

So oft in theologic wars,
The disputants, I ween,
Rail on in utter ignorance
Of what each other mean,
And prate about an Elephant
Not one of them has been or seen

 

The Meta-Moral:

We can imagine VistA as one very big elephant, and need to recognize that there are many perspectives to be dealt with :)

like0

Visual X-ref suggestions

Karen Clark's picture

I agree with Tom, the view is too wide, but it's a helpful resource.  Suggestions for improvement for package contents and dependencies:

XINDEX is a great place to start to determine package dependencies, but it’s only a start and probably contains an average 60% of a package’s internal/external references.

To perform a more complete x-ref of package dependencies a list of files belonging to a package needs to be set up and maintained….at a minimum: file#, file name, global reference and custodial package.  Having this definition would allow the XINDEX results to do a better job of determining whether global references made within routines, or routine calls to ^DIC, ^DIE and ^DIQ (file references within a routine – you have to look at the routine itself to get this) are internal to the package or external to another package and map these accordingly in the x-ref.

After completing the list of package files, and the above, the files themselves should be checked for pointers to external files and map that.

Next up is to get a list of all components within a package – not just files and routines, but options, remote procedures, protocols, templates, mail groups, etc, etc.  This will give you an estimate of a package size and can be used for further package dependency listings and in possibly determining a ‘refactor’ priority list.

Next up…OK these aren’t really in a ‘next up order’ ;-) are remote procedures external to a package – these are found in the Option file for Broker type options.  What rpcs does a GUI call that are external to their package?

Next…external interfaces – what protocols are used?  To whom are they sending data?  What type of data (message type)? Both sending and receiving applications are defined in VistA as are the message and event types.  Remember that many of the entries in the Protocol file (#101) are for ListMan and not HL7.

Now the GUIs.  We know what rpcs a GUI calls within an application – that’s in the Option file, but GUIs can also make FileMan calls using Delphi Fileman components and Fileman rpc wrappers (DDR FILER, etc).  This is the same type of call the routines are making, e.g. ^DIE calls – you need to know what file (minimum) and what field to know dependencies.

Automating this process is a must because the OSEHRA source code should be updated with new FOIA patches or you will soon be out of date.  New packages are added, new APIs are created, etc.  The OSEHRA VA db needs to be patched.

For the Visual X-ref… it would be nice if the links to other packages (maybe two links) provided two types of output - one to show the dependencies (routine calls, files, rpcs etc) and the other to jump to the other package itself - which you already have.

like0

Not necessarily "too wide", just "different"

Tom Munnecke's picture

First of all, let me say that all of this information is very important, and is very much in keeping with what I envisioned to be part of the kernel back when I was lead of the kernel development team.  I saw all of this as being just a part of the development process, and to be integrated into the actual running process as much as possible, not just external procedures.  The fact that we are now doing this in "retro" mode 25 years later should be a lesson learned in the costs of not investing in software infrastructure, but just focusing on heaping new functionality on the outer layers of the application layer :(

I see several issues here:

1.  Is it possible to understand the dynamic operation of VistA from a static analysis of its source code?

2. If we understand a package "in the small," will we understand it's behavior at scale?

3. What are the levels of abstraction by which we can view VistA?

These are intertwined, so let me start with #3, the levels of abstraction.

Imagine that we had a system running on a spreadsheet wriitten in Java to be a "linked web" approach to studying a disease rate by income level.  The user writes the application in a spreadsheet macro language, issuing SparQL queries against RDF definitions in a Semantic web format, inserting the numbers into columns that are then graphed using the spreadsheet tools.  The compiled Java code is executed on the Java Virtual Machine, which in turn, executes OS code which executes machine instructions, which execute microprogrammed instructions on the CPUs.  

We can look at this example from 8 layers of abstraction:

  1. Semantic Web, SparQL/RDF
  2. Macro Language
  3. Spreadsheet functions
  4. Java
  5. Java Virtual Machine (JVM)
  6. Operating system
  7. CPU instruction set
  8. CPU micro instructions

One could argue that they are all saying the same thing; that they are all equivalent to a Turing machine, so understanding any one layer is equivalent to understanding the others.  And if you were able to freeze the semantic web and RDF information, and everything in between, you might be able to take this static problem and compile it into an extremely fast, purpose-built set of micro instructions that would run that snapshot incredibly fast.  Of course, it would be incredibly brittle, failing if even the tiniest adaptation was required anywhere along the line.

The next question becomes, "what is the appropriate level of abstraction to look at this problem?"  We can simplify this and lump levels 5-8 as a black box, just assuming that the JVM folks will do their job for whatever installation we use.  We could read the Java Source code to understand the exact behavior of the system.  But even that would not tell us what the user was trying to say with the macros and SparQL queries against the semantic web.

The system crosses what I call an "organic threshold" between the static, compiled Java and the macro level.  We leave the realm of language as a finite state grammar, and enter a linguistic realm of a much richer generative language.  Users can deal with the meaning of the data directly, rather than worrying about the procedural issues of getting it.

There is a tension between system design at the "finite state" Java level and the "generative" spreadsheet/macro levels.  There are those who feel that the system should only do what it is expressly programmed for and behave only acccording to specifications.  This is very important if we are designing a shuttle launch system for example.  The last thing we want is the pilot deciding to add a macro to the flight control system software to change the launch parameters.  And, while we might want CFOs to use spreadsheets and macros to analyze quarterly results, we wouldn't want them to tweak the underlying accounting system or audit trail.

Supporting an EHR, however, is a much different process than launching a shuttle.  There is no one single point of view - there are hundreds.  The data is highly dynamic, the hospital is changing, and medical knowledge is changing. Peter Drucker called the hospital the most complex organization in our society.

After doing VistA and CHCS I architectures, I realized that the underlying issue in medical informatics is dealing with complexity.  I studied at the Santa Fe Institute, and hung out with Tim Berners-Lee starting when he was just a programmer at CERN developing the World Wide Web.  I studied Learning Communities and Pattern Languages.  Tim went on to shift his original focus on the "document-oriented" web to the Semantic Web.  

I saw my work with VistA as "creating a speech community" - and my source-code level work as ways of creating tools for enabling this community.  I was just trying to overcome the "failure to communicate" I saw; the EHR was just one form of communication.

Looking back, I think my metadata design work on the data dictionary foreshadowed the semantic web.  (See how simple it was to transform the data dictionary to SparQL/RDF in SemanticVistA), the design of FileMan foreshadowed NoSQL technololgies, the integration of user communities, decentralization, and electronic communication foreshadowed Learning Communities,  FORUM (a centralized MailMan system supporting 50,000 users, package distribution, online support, etc.) was a combination social network/content management system akin to Drupal (only designed from a networked perspective, not the hierarchical approach taken by Drupal).

My concern is that trying to refactor VistA solely from only the static source code perspective is going to become explosively complex as we discover ever-greater linkages, indirection, and embedded sublanguages.  The temptation would be to jettison this adaptability in order to pigeonhole the system into a chosen (finite-state) toolset.  If done in ignorance of the higher level abstractions and considerations of the "learning community" level that is so important to the VA, this could lead to a disasterous collapse of the system - we would have won the battle but lost the war.  VA Washington COS Ross Fletcher hints at this in this interview: http://www.youtube.com/watch?v=ai6OgG-wIt4 

I know that some will say that I'm being too abstract and general; that we need to deal with the alligators instead of draining the swamp, etc. etc.  (I heard this constantly while designing VistA: "What?  you are going to waste time designing a data dictionary to do things at a meta level when we need to get this code out next month?  Let's just hard-code it and worry about your abstractions later.)

But I think if we took a broader perspective of what we were doing, and how we might support the semantic web, triplestore databases, learning communities, patterns of health, cloud-based "ensembles" rather than "program the org chart" integration efforts, we might find things are actually much simpler than just staring at today's source code.  I think we might find that we are inducing much of the complexity in the system, blaming it on the source code rather than our ways of organizing it.

 

like0

Important point!

Carol Monahan's picture

 I agree with what you're saying, and I want to draw particular attention to a point you made toward the end:

"My concern is that trying to refactor VistA solely from only the static source code perspective is going to become explosively complex as we discover ever-greater linkages, indirection, and embedded sublanguages.  The temptation would be to jettison this adaptability in order to pigeonhole the system into a chosen (finite-state) toolset.  If done in ignorance of the higher level abstractions and considerations of the "learning community" level that is so important to the VA, this could lead to a disasterous collapse of the system - we would have won the battle but lost the war. "

We should all be sure we understand both how and why the system works, before coming to a conclusion about what form the next step in its evolution should take.

 

like0

Updates to VistA Visual Cross Reference

Peter Li's picture

Update to VistA Visual Cross Reference (http://code.osehra.org/dox/index.html):

The links in the direct graph are now clickable - when clicked, it will show the dependencies (routines and globals).  We are still working to add other dependencies such as file pointer, rpcs, etc.

There are other improvement to the directed graph such as indication of number and type of dependencies to other packages and sorting of dependent packages from left to right based on number of dependencies.

We've created a JIRA task - Improvement to the Visual Cross Reference, see http://issues.osehra.org/browse/CGCT-1 to allow the open source community to submit requests for improvement.  There are currently two subtasks being worked on:

1.File Pointer dependencies – based on extraction of File Pointers from “Map Pointer Relation” option of the FileMan Data Dictionary Utility.

2.Dependencies associated with ^DIC ,^DIE, and ^DIQ routines – based on improvements to XINDEX.

 

like0

This looks good..

Tom Munnecke's picture

This looks like a great step forward... what tools are you using to do this?  Is it available for SPARQL queries?

Also, are you using Data dictionary references as literals on the graph, or do you plan to continue the graph through the Data dictionary?

here is an RDF schema I am playing with :

 

# MetaVistA Foundation RDF Schema Tom Munnecke 12/5/2011 # The Purpose of this schema is to provide a single conceptual framework to describe all of the foundational elements # of the software and metadata eleements in the VistA EHR. # SPARQL endpoint for querying the internal software properties of the system. # The audience for this semantic web is architects, programmers, and other technical people interesed in the internal # operation of the VistA system. # The MetaVistA prefix 'mv:' refers to metavista - specific elements @prefix mv: <http://metavista.name/foundation#> . # Vista: refers to VistA specific concrete terms within a given Foundation @prefix vista: <http://osehra.org/ns#> . # Define other standard abbreviations @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . # Note that up-arrows and spaces are replaced with underscores for the sake of compatibility with the # other RDF syntax. # Define Metavista RDFS Types (names that can be used as Subject or Object). These are the "dots" being connected # on the graph. "Label" is the text inside the dot. mv:Routine rdf:type rdfs:Class ; rdfs:label "Routine"; rdfs:comment "An ANS MUMPS routine". mv:Package rdf:type rdfs:Class ; rdfs:label "Package"; rdfs:comment "A collection of routines to perform some application process". mv:File rdf:type rdfs:Class ; rdfs:label "FileMan_File"; rdfs:comment "A file defined in the Data Dictionary". mv:Global rdf:type rdfs:Class ; rdfs:label "Global"; rdfs:comment "An ANS MUMPS global". mv:Parameter rdf:type rdfs:Class ; rdfs:label "Parameter"; rdfs:comment "The name of a parameter passed to a routine". mv:X_Code rdf:type rdfs:Class ; rdfs:label "Executable_Code"; rdfs:comment "An ANS MUMPS routine". mv:Language rdf:type rdfs:Class ; rdfs:label "Language"; rdfs:comment "An language used within VistA". # Define MetaVistA RDFS properties (names that can be used to define verbs). These are the links connecting # the dots on the graph. # Domain names the kinds of subjects that this verb (property) relates to # Range names the kinds of objects that this verb relates to mv:calls rdf:type rdf:Property ; rdfs:comment "Defines a routine called by another routine" ; rdfs:label "calls" ; rdfs:domain mv:Routine ; rdfs:range mv:Routine . mv:entryPoint rdf:type rdf:Property ; rdfs:comment "Defines a tag of a routine used for entering the package" ; rdfs:label "contains" ; rdfs:domain mv:Routine ; rdfs:range mv:Routine . mv:contains rdf:type rdf:Property ; rdfs:comment "Defines a routine contained in a Package" ; rdfs:label "contains" ; rdfs:domain mv:Package ; rdfs:range mv:Routine . mv:has_input_parameter rdf:type rdf:Property ; rdfs:comment "Defines a parameter input to a routine" ; rdfs:label "has_input_parameter" ; rdfs:domain mv:Routine ; rdfs:range mv:Parameter . mv:has_output_parameter rdf:type rdf:Property ; rdfs:comment "Defines a parameter output to a routine" ; rdfs:label "output_parameter" ; rdfs:domain mv:Routine ; rdfs:range mv:Parameter . mv:embedded_language rdf:type rdf:Property ; rdfs:comment "A reference to another computer language embedded in a routine"; rdfs:label "embedded_language" ; rdfs:domain mv:Routine ; rdfs:domain mv:Global ; rdfs:range mv:Language . mv:sets_global rdf:type rdf:Property ; rdfs:comment "A reference to code that directly sets a MUMPS Global"; rdfs:label "sets_global" ; rdfs:domain mv:Routine ; rdfs:domain mv:X_code ; rdfs:range mv:Global . mv:kills_global rdf:type rdf:Property ; rdfs:comment "A reference to code that directly kills a MUMPS Global"; rdfs:label "kills_global" ; rdfs:domain mv:Routine ; rdfs:domain mv:X_code ; rdfs:range mv:Global . mv:reads_global rdf:type rdf:Property ; rdfs:comment "A reference to code that directly reads a MUMPS Global"; rdfs:label "reads_global" ; rdfs:domain mv:Routine ; rdfs:domain mv:X_code ; rdfs:range mv:Global . mv:uses_file rdf:type rdf:Property ; rdfs:comment "A reference to uses a file in FileMan"; rdfs:label "uses_file" ; rdfs:domain mv:Routine ; rdfs:domain mv:Global ; rdfs:range mv:Global . mv:uses_field rdf:type rdf:Property ; rdfs:comment "A reference to a routine that uses a file in FileMan"; rdfs:label "uses_field" ; rdfs:domain mv:Routine ; rdfs:domain mv:Global ; rdfs:range mv:Language .

 

like0

Re: Tools used to generate the documentation

Jason Li's picture

 

Hi Tom,

We used a set of Python scripts to generate the Visual Cross Reference Documentation. All these scripts are publicly available at the “OSEHRA-Automated-Testing” Git repository (under the “Dox” directory)

http://code.osehra.org/gitweb?p=OSEHRA-Automated-Testing.git;a=summary

Currently, only Globals that have FileMan File Number are being displayed under “Global Alphabetical List”. The Globals alphabetical list and Global/Package mapping are based on a “File Custody Spreadsheet ” that was provided by the VA.

Just as Peter Li pointed out, we will be working on adding “Data Dictionary” file pointer references, Options Files and Protocols into the Cross Reference documentation.

Also, in the repository there is a utility python script called CrossRefExternalize.py (still in very early development stage):

http://code.osehra.org/gitweb?p=OSEHRA-Automated-Testing.git;a=blob;f=Dox/PythonScripts/CrossRefExternalize.py

With some changes, this script could be used to output the cross reference information in the RDF format you just proposed.

Thanks,

-Jason

like0

UNKNOWNs

Robert Sax's picture

I see a lot of routines in the UNKNOWN package in this analysis. What does this mean? It seems like there is no source code for them. Are these routines that were removed, but still referenced? If so, why is this in a production system? Any plans to clean up these and other errors (such as the syntax errors when importing some of the routines into GTM/Cache)?

 

Rob

like0

Syntax errors

Christopher Edwards's picture

Rob:

To answer the second half of your question:  syntax errors while importing into GT.m or Caché - most of the errors you are probably seeing are in the Kernel of VistA and are expected.  VistA runs on multiple M(umps) platforms (GT.m & Caché for example) and require vendor specific syntax or OS platform specific syntax (VMS is different than Windows and *nix) to be able to run on that platform.  The VistA kernel contains versions of routines for the different platforms (example ZOSVONT - ZOSV for Caché for NT and ZOSVGUX - GT.m for *nix) and can be configured for either platform during the setup (D ^ZTMGRSET, D ^DINIT, etc) for the platform VistA is running on.

There is a percentage of code that is Caché specific (uses Caché Object Script) or places where Caché interprets the M(umps) standard differently than GT.m which cause syntax errors on GT.m but work fine on Caché.

Adding better GT.m support is obviously important for the community (providing a completely FOSS [Free and Open Source Software] stack) thus, finding, documenting, and providing patches for instances of syntax errors outside of platform specific code will provide for better GT.m support and suppress the rest of the syntax errors you may be seeing.

like0

Syntax errors

Robert Sax's picture

This makes sense for most of the errors, however, I still see some that I would expect are not valid in any flavor of mumps. I have given them a try on GTM and Cache and each results in errors. For example:

 

Outpatient Pharmacy/Routines/PSORELD1.m:156: S $P(ORC,"|",21)=$P(SITE,"^",1)_CS_CS_$P(SITE,"^",6)_

** Unterminated concatenation

 

Outpatient Pharmacy/Routines/PSORELD1.m:175: S $P(RXE,"|",9)=TRADENM_

**Unterminated concatenation

 

Kernel/Routines/ZISG3.m:37: .I %1=X!(%1]X S Y=% S X="" Q

**Unterminated LPAREN

 

Kernel/Routines/ZISHMSU.m:14: E  U:$D(IO(1,%I) %I S POP=1 Q

**Unterminated LPAREN

 

Dental/Routines/DENTA14.m:9:P1 I $D(DENTREL) Q:'$D(^DENT(221,DENT,1))  S Y(1)=$P(^(.1),"^",2) I 'Y(1)!<DENTSD1!Y(1)>DENTED Q

** OR LESS THAN? (!<)

 

Uncategorized/Routines/MUSMCR3.m:5: S DPT01=$P(^DPT(DFN,0),U),DPT01=$P(DPT01,",",2)_" _$P(DPT01,",",1)

** Missing double quote

 

I'm still curious how syntax errors like this get in the code base. I even tried Cache studio on DENTA14 and it shows an error both in the editor and during compile. 

 

Rob

like0

Syntax errors

Christopher Edwards's picture

Rob:

I am not extremely familiar with the routines you pointed out, however i do agree that they have obvious syntax errors.  I did notice that one is for MSM (ZISHMSU) and others have dates in the 80s.

Christopher

like0

UNKNOWNs

Jason Li's picture

Hi Rob,

To answer the first part of the question.

Any routine that is NOT in OSEHRA git repository will be categorized as UNKNOWN with the current implementation. 
There are a couple of cases that a routine could fall into UNKNOWN category.

1). Any routine starts with %, and as far as I know we currently do not have those routines in OSEHRA git repository, like %DTC, a routine under VA FileMan.2). Routines that are redacted by VA, and we have no way of obtaining the source code for those.

As it is right now, there are total 246 routines (you can see those routines unders package "UNKNOWN").

We are in the process of identifying those routines and will re-shuffle them if needed.

Thanks,
- Jason
like0

UNKNOWNs

Robert Sax's picture

So does this mean the OSEHRA code base does not work/will not behave like the platinum code? How does GT.M/Cache behave when it encounters calls to the missing code? Does the plan for OSEHRA include replacing the missing parts with open source equivalents?

 

Rob

like0

UNKNOWNs

Christopher Edwards's picture

Rob:

GT.m/Cache responds to errors in the code by throwing errors that can be caught with an error trap (like most other programming languages with execption processing).  In VistA user mode a default error trap is created and can be inspected by a programmer/others by running D ^XTER in programmer mode. If you are running in programmer mode without an explicit error trap set you may see the error on the commmand line.

Hope this helps.

Christopher

like0

VISUAL CROSS-REFERENCE of PACKAGES, ROUTINES & GLOBALS is now av

Tom Munnecke's picture

great, work Jason, Joseph, and Brad

I have a test Open RDF Sesame Server to play with triple stores at
http://vistaewd.net:8980/openrdf-workbench/repositories/mv1/ the RDF for
the Patient file is already loaded (courtesy of Conor Dowling)

and here is an example of exploring the schema with SPARQL:

http://vistaewd.net:8980/openrdf-workbench/repositories/mv1/explore?reso...

You might consider driving your software through a schema that is loaded at
run time, rather than hard coded per routine, which would allow you to
adapt the software to new RDF predicates (verbs linking subjects and
objects - links on your graphs.)

What I'd like to end up with is a Foundation schema that maps
*everything*involved in installing and operating an instance of VistA,
in a single,
integrated directed graph. This would give a SPARQL end point to map the
installation, the localization and customization procedures, the updates,
etc.

This would also give us a handle on defining the consistency relationships
required for an install. I am still getting up to speed on OWL (would love
to talk to anyone already smart in this), but I am curious just how far we
can push inferencing, reasoning, contradictions, etc. to understand a given
instance of a foundation.

The same schemas, of course, could be used for testing, S/Kids,
documentation, etc.

I

On Fri, Dec 23, 2011 at 11:39 AM, jason.li <jason.li@kitware.com> wrote:

>
>
> Hi Tom,
>
> We used a set of Python scripts to generate the Visual Cross Reference
> Documentation. All these scripts are publicly available at the
> “OSEHRA-Automated-Testing” Git repository (under the “Dox” directory)
>
> http://code.osehra.org/gitweb?p=OSEHRA-Automated-Testing.git;a=summary
>
> Currently, only Globals that have FileMan File Number are being displayed
> under “Global Alphabetical List”. The Globals alphabetical list and
> Global/Package mapping are based on a “File Custody Spreadsheet ” that was
> provided by the VA.
>
> Just as Peter Li pointed out, we will be working on adding “Data
> Dictionary” file pointer references, Options Files and Protocols into the
> Cross Reference documentation.
>
> Also, in the repository there is a utility python script called
> CrossRefExternalize.py (still in very early development stage):
>
>
> http://code.osehra.org/gitweb?p=OSEHRA-Automated-Testing.git;a=blob;f=Do...
>
> With some changes, this script could be used to output the cross reference
> information in the RDF format you just proposed.
>
> Thanks,
>
> -Jason
> --
> Full post:
> http://www.osehra.org/discussion/visual-cross-reference-packages-routine...
> Manage my subscriptions:
> http://www.osehra.org/og_mailinglist/subscriptions
> Stop emails for this post:
> http://www.osehra.org/og_mailinglist/unsubscribe/342
>

like0