The goal of this project is to create a new language file for Fileman. Two of the main goals for the language file are as follows:
- Create a file whose .01 field is the language name, rather than .01 being the internal entry number of the entry.
- Create a file that is standards based and useful for health care applications.
Having a comprehensive language file is also a requirement for Meaningful Use Stages I & II.Steps for creating the new Language File.
The file design was done by Fredrick D. S. Marshall. It looks as follows, in its final design.
CONDENSED DATA DICTIONARY---LANGUAGE FILE (#.85)UCI: DEV,FDE VERSION: 22.2 STORED IN: ^DI(.85, NOV 9,2012 PAGE 1 -------------------------------------------------------------------------------- FILE SECURITY DD SECURITY : ^ DELETE SECURITY: ^ READ SECURITY : LAYGO SECURITY : ^ WRITE SECURITY : ^ CROSS REFERENCED BY: ALTERNATE NAME(F) FILE #.85 INDEXED BY: NAME (B), TWO LETTER CODE (C), THREE LETTER CODE (D), ALTERNATE THREE LETTER CODE (E) FILE STRUCTURE FIELD FIELD NUMBER NAME .001 ID NUMBER (NJ10,0), [ ] .01 NAME (RFJ60), [0;1] .02 TWO LETTER CODE (FJ2), [0;2] .03 THREE LETTER CODE (FJ3), [0;3] .04 FOUR LETTER CODE (FJ4), [0;4] .05 ALTERNATE THREE LETTER CODE (FJ3), [0;5] .06 SCOPE (S), [0;6] .07 TYPE (S), [0;7] .08 LINGUISTIC CATEGORY (*P.85'), [0;8] .09 MEMBER OF LANGUAGE SET (*P.85'), [0;9] 1 ALTERNATE NAME (Multiple-.8501), [1;0] .01 ALTERNATE NAME (MFJ60), [0;1] 10 DESCRIPTION (Multiple-.8502), [10;0] .01 DESCRIPTION (Wx), [0;1] 10.1 ORDINAL NUMBER FORMAT (K), [ORD;E1,245] 10.2 DATE/TIME FORMAT (K), [DD;E1,245] 10.21 DATE/TIME FORMAT (FMTE) (K), [FMTE;E1,245] 10.22 TIME (K), [TIME;E1,245] 10.3 CARDINAL NUMBER FORMAT (K), [CRD;E1,245] 10.4 UPPERCASE CONVERSION (K), [UC;E1,245] 10.5 LOWERCASE CONVERSION (K), [LC;E1,245] 20.2 DATE INPUT (K), [20.2;E1,245] INDEX AND CROSS-REFERENCE LIST -- FILE #.85INDEX AND CROSS-REFERENCE LIST -- FIL E #.85 11/9/12 PAGE 1 ------------------------------------------------------------------------------- File #.85 New-Style Indexes: B (#1046) FIELD REGULAR IR LOOKUP & SORTING Unique for: Key A (#122), File #.85 Short Descr: Regular new-style B Index Set Logic: S ^DI(.85,"B",X,DA)="" Kill Logic: K ^DI(.85,"B",X,DA) Whole Kill: K ^DI(.85,"B") X(1): NAME (.85,.01) (Subscr 1) (forwards) C (#1047) FIELD REGULAR IR LOOKUP & SORTING Short Descr: Regular new style index on two letter language codes Set Logic: S ^DI(.85,"C",X,DA)="" Kill Logic: K ^DI(.85,"C",X,DA) Whole Kill: K ^DI(.85,"C") X(1): TWO LETTER CODE (.85,.02) (Subscr 1) (forwards) D (#1048) FIELD REGULAR IR LOOKUP & SORTING Unique for: Key B (#123), File #.85 Short Descr: Regular new-style index for three letter abbreviations for languages Set Logic: S ^DI(.85,"D",$E(X,1,30),DA)="" Kill Logic: K ^DI(.85,"D",$E(X,1,30),DA) Whole Kill: K ^DI(.85,"D") X(1): THREE LETTER CODE (.85,.03) (Subscr 1) (Len 30) (forwards) E (#1049) FIELD MUMPS IR LOOKUP & SORTING Short Descr: (Pseudo-)Mneumonic index for the Alternate three letter code Description: This will add entries to the C index for the three letter code a la the mnemonic style. If you need re-cross-reference this field, you need to kill of the entries in the regular C index, set the C index, and then set this index to update the C with the mnemonic xrefs. Set Logic: S ^DI(.85,"D",X,DA)=1 Kill Logic: K ^DI(.85,"D",X,DA) X(1): ALTERNATE THREE LETTER CODE (.85,.05) (Subscr 1) (forwards) Subfile #.8501 Traditional Cross-References: B REGULAR Field: ALTERNATE NAME (.8501,.01) 1)= S ^DI(.85,DA(1),1,"B",$E(X,1,30),DA)="" 2)= K ^DI(.85,DA(1),1,"B",$E(X,1,30),DA) F REGULAR WHOLE FILE (#.85) Field: ALTERNATE NAME (.8501,.01) Description: Whole file cross-reference for ALTERNATE NAME multiple. 1)= S ^DI(.85,"F",$E(X,1,30),DA(1),DA)="" 2)= K ^DI(.85,"F",$E(X,1,30),DA(1),DA) 3)= WHOLE FILE CROSS REFERENCE FOR ALTERNATE NAME
The language data was also compiled by Mr. Marhsall from ISO 639, ISO 639-1 and ISO 639-2, including bibliographic codes (known as ISO 639-2/B). The data was compiled in an open office spreadsheet. The ISO data was edited where necessary to make the choices better for end users, e.g. Northern Frisian becomes Frisian, Northern, with Northern Frisian as a synonym. This editing is a manual process.Creating the new file
Modifying one of Fileman’s own files is hard. You are not allowed to access it. Mr. Marshall has already identified the places in Fileman that don’t allow you to select its own files for editing: these are in DICRW and DICRW1. In addition, I identified a place in DIFROM that won’t let you export a Fileman file. The definition of a Fileman file is any file whose number is less than 2.
With that obstacle removed, I can use the MODIFY FILE ATTRIBUTES to manipulate the language file. But I have another problem now: the old field types cannot be re-defined. This means that I have to delete the fields and then re-create them. I opted for the sledgehammer method: I backed up the ^DD(.85) and then killed off all the data definitions using K ^DD(.85). Then by trial and error I rebuilt the ^DD back enough to be able to manipulate it. In between this menu option kept erroring due to undefined global variables.
One thing I want the reader to notice is that I didn’t delete the old data in the language file. Why not? Because some of it was rather long and I would rather not type it over again. The fields that I didn’t want to type in were ones I was planning to keep.
I work my merry way through the fields, until I get back to the fields I wanted to keep in my earlier paragraph. I edit the zwritten output of ^DD(.85) to put a set in front of the fields I want to restore, and I load it in, like this:
S ^DD(.85,10.1,0)="ORDINAL NUMBER FORMAT^K^^ORD;E1,245^K:$L(X)>245 X D:$D(X) ^DIM" S ^DD(.85,10.1,3)="This is Standard MUMPS code." S ^DD(.85,10.1,9)="@" S ^DD(.85,10.1,21,0)="^^6^6^2941121^^^^" S ^DD(.85,10.1,21,1,0)="MUMPS code used to transfer a number in Y to its ordinal equivalent in" S ^DD(.85,10.1,21,2,0)="this language. The code should set Y to the ordinal equivalent without" S ^DD(.85,10.1,21,3,0)="altering any other variables in the environment. Ex. in English:" S ^DD(.85,10.1,21,4,0)=" Y=1 becomes Y=1ST" S ^DD(.85,10.1,21,5,0)=" Y=2 becomes Y=2ND" S ^DD(.85,10.1,21,6,0)=" Y=3 becomes Y=3RD etc." S ^DD(.85,10.1,"DT")=2940307
Hard setters need to remember to re-index, and I do that like this:
S I=.85 S DA(1)=I,DIK="^DD("_I_"," D IXALL^DIK
This creates the data dictionary for me, and I have Mr. Marshall review it.
After a few corrections, I start entering the data by hand for the core languages, which are:
ID Name 1 English 2 German 3 Spanish 4 French 5 Finnish 6 Italian 7 Portuguese 10 Arabic 11 Russian 12 Greek 18 Hebrew
After I entered the data by hand, I confirm that the indexing worked properly. I find that I misplaced one of the indexes during one of the revisions, so I correct it, and then correct the old index by hand, then re-cross-reference to get the correct data in the indexes. I then build the DILAINIT for the core languages which I will incorportate later into the DINITs. The DILAINIT builds on my previous work on DIFROM to modify it to allow you to transport new style keys and indexes. That part was easy. I would like to show it, but I didn’t keep any of the screen captures doing that.
My next task was to build the DINIT routines; I did this on my local machine rather than on the development environment because the development environment was the canonical source for the new language file. I didn’t want to clobber it in my attempt to get it bootstrapped.
I first confirm that the DILAINIT produces the exact same file on the source development system (yay, my mods work!). I then use the data in DILAINIT to populate the DINIT data routines for language, as follows:
DILAI001 -> DINIT011 DILAI002 -> DINIT012 DILAI003 -> DINIT013
I used vimdiff to determine which lines I wanted to copy over. The answer is simply not all of them, because the loader line for the DIFROM style routines is slightly different.
Now I turn my attention to DINIT itself. I must say modifying DINIT is scary, because of its centrality to Fileman.
Here are my modifications to DINIT, in detail:
OSETC+18: New line - Kill off the old language file data as we are changing its definitions.
K ^DIC(.85),^DD(.85),^DD(.8501),^DD(.8502),^DI(.85) ; VEN/SMH - Kill the language file old DD, DIC and data. (22.2)
EGP+4 to +14 are new for installing New Style Keys and Indexes on the language file.
; Keys and new style indexes installer ; new in FM V22.2 N DIFRSA S DIFRSA=$NA(^UTILITY("KX",$J)) ; Tran global for Keys and Indexes N DIFRFILE S DIFRFILE=0 ; Loop through files F S DIFRFILE=$O(@DIFRSA@("IX",DIFRFILE)) Q:'DIFRFILE D . K ^TMP("DIFROMS2",$J,"TRIG") . N DIFRD S DIFRD=0 . F S DIFRD=$O(@DIFRSA@("IX",DIFRFILE,DIFRD)) Q:'DIFRD D DDIXIN^DIFROMSX(DIFRFILE,DIFRD,DIFRSA) ; install New Style Indexes . K ^TMP("DIFROMS2",$J,"TRIG") . S DIFRD=0 . F S DIFRD=$O(@DIFRSA@("KEY",DIFRFILE,DIFRD)) Q:'DIFRD D DDKEYIN^DIFROMSY(DIFRFILE,DIFRD,DIFRSA) ; install keys K @DIFRSA ; kill off tran global
DATA+1 was modified to add a K D1 in every loop, because D1 leaks out of calls and causes the matching algorithm for Keys in MATCHKEY^DITR1 to fail.
S DTO=0,DMRG=1,DTO(0)=^(D),Z=^(D)_"0)",D0=^(D,0),@Z=D0,DFR(1)="^UTILITY(U,$J,DDF(1),D0,",DKP=0 F D0=0:0 S D0=$O(^UTILITY(U,$J,DDF(1),D0)) S:D0="" D0=-1 Q:'$D(^(D0,0)) S Z=^(0) D I^DITR
S DTO=0,DMRG=1,DTO(0)=^(D),Z=^(D)_"0)",D0=^(D,0),@Z=D0,DFR(1)="^UTILITY(U,$J,DDF(1),D0,",DKP=0 F D0=0:0 S D0=$O(^UTILITY(U,$J,DDF(1),D0)) S:D0="" D0=-1 K D1 Q:'$D(^(D0,0)) S Z=^(0) D I^DITR
I think I am done. But not yet. I run the new DINIT in a destination test system, and I find that the subfiles don’t fire the cross-references on them. I slowly panic. I spend a couple of hours debugging, comparing DINIT with DIFROM (DIFROM re-indexes the subfile cross-references), and what I find leads me to make the next change:
EGP+3: New Data dictionaries for subfiles in the language file are indexed too. Somewhere in my head I re-discovered that in our design there are two new subfiles which are new but are not known to DINIT.
F I=.84,.841,.842,.844,.845,.847,.8471,.85 D XX^DINIT3¬
F I=.84,.841,.842,.844,.845,.847,.8471,.85,.8501,.8502 D XX^DINIT3 ; VEN/SMH - added .8501 and .8502 for new lang file¬
My main task of this day is to load the rest of Mr. Marshall’s spreadsheet into the language file. I had to change the spreadsheet format by hand. After a warning from XINDEX, I discover that some characters in the spreadsheet are Unicode, so I tell Openoffice that I want it exported as ASCII.
I write a routine to load the data from the spreadsheet and set it directly into the globals. Turns out that parsing a csv file is harder than you think.
KBANLANI ; VEN/SMH - Import language file ; 11/2/12 12:38pm ;;No package ; ;ID Number (.001),Name (.01),Two Letter Code (.02),Three Letter Code (.03),Four Letter Code (.04),Alternate Three Letter Code (.05),S cope (.06),Type (.07),Linguistic Category (.08),Member of Language Set (.09),Alternate Name (1) ;8,Abkhaz,ab,abk,,,Individual,Living,Northwest Caucasian,,Abkhazian|Abxazo ;16,Afar,aa,aar,,,Individual,Living,Afro-Asiatic,,Qafar Af|'Afar Af|Adal|Afaraf ; ^DI(.85,2,0) = GERMAN^DE^DEU^^GER ; ^DI(.85,2,1,0) = ^.8501^7^7 ; ^DI(.85,2,1,1,0) = GERMAN, STANDARD ; ^DI(.85,2,1,2,0) = STANDARD GERMAN ; ^DI(.85,2,1,3,0) = DEUTSCH ; ^DI(.85,2,1,4,0) = DEUTSCH SPRACHE ; ^DI(.85,2,1,5,0) = TEDESCO ; ^DI(.85,2,1,6,0) = MODERN GERMAN (1500-) ; ^DI(.85,2,1,7,0) = GERMAN,MODERN (1500-) N POP D OPEN^%ZISH("LF","/home/dev/lang_file","langFile4ImportsamASCII.csv","R") Q:POP U IO N X R X:0 ; get rid of first row. N Q S Q="""" N C S C="," N GREF S GREF=$NA(^DI(.85)) F R X:0 Q:$$STATUS^%ZISH() D . U $P W X,! U IO . N P S P=1 ; piece . S X=$$UP^XLFSTR(X) ; upper case . N ID S ID=$P(X,C,P) ; first piece, no commas expected . S P=P+1 . N NAME S NAME=$P(X,C,P) ; second piece, quoted commas expected . I $E(NAME)=Q S NAME=$P(X,Q,2) ; if quoted, value is what's inside the quotes . N I F I=1:1 Q:$E(NAME,I)="" S:$E(NAME,I)=C P=P+1 ; count the commas inside the quotes and increment piece . S P=P+1 ; increment piece . N ABB2 S ABB2=$P(X,C,P) ; 2 LTR ABBR . S P=P+1 . N ABB3 S ABB3=$P(X,C,P) ; 3 LTR ABBR . S P=P+1 . N ABB4 S ABB4=$P(X,C,P) ; 4 LTR ABBR . S P=P+1 . N AB3A S AB3A=$P(X,C,P) ; ALT 3 LTR ABBR . S P=P+5 . N ALTN S ALTN=$P(X,C,P,99) ; OTHER NAMES, possibly quoted. . I $E(ALTN)=Q S ALTN=$P(ALTN,Q,2) ; unquote it. . S @GREF@(ID,0)=NAME_U_ABB2_U_ABB3_U_ABB4_U_AB3A . I ALTN="" Q ; done . S $P(@GREF@(ID,1,0),U,2)=.8501 . S $P(@GREF@(ID,1,0),U,3)=$L(ALTN,"|") . S $P(@GREF@(ID,1,0),U,4)=$L(ALTN,"|") . F I=1:1:$L(ALTN,"|") S @GREF@(ID,1,I,0)=$P(ALTN,"|",I) D CLOSE^%ZISH("LF") QUIT
The variable GREF was set first into a scratch global while I experimented for what seemed an endless amount of time trying to balance commas and quotes. Once I was sure everything was fine I changed it to ^DI(.85), where the language file is stored.
Once I am happy with my file, I index it using IXALL^DIK.
I then check the sanity of the file, using the Verify Fields option in Fileman. Right away I get key-related errors. I realize that it’s the 3 letter abbreviation which is the secondary key, not the two letter abbreviation. So I fix that. I run verify again, doing it field by field, and I get no errors. I then examine the global visually. I find another bug of my own making. The pseudo-mnemonic index is supposed to plug into the D index, not the C index.
Now I make the DILAINITs, and test them on destination systems, and I am satisfied that they work properly.
My very last step is to carefully change the DINITs on the development system so that the Language File will be bootstrapped with Fileman. And that’s now done.