hdr_logo_top.gif
hdr_logo_bottom.gif

The support site for the Unified Compliance Framework


The UCF_Meta_Data_Type complex element components

The first thing you are going to need to do is to establish a suite of list management rules and element components. There is no sense and even getting starting with any type of research or mapping before you know what you are going to do with that research or mapping.

And there are certain inherent problems you are going to need to think about and ensure that you have answers for as well - we call this overcoming the "Jurassic Park syndrome". If you can think back to the original Jurassic Park movie, the biggest problem that they had in their database structure was that they were counting to ensure they didn't lose any dinosaurs. The problem was, they weren't counting to see if the number was increasing. Many database or list developers don't really think much about their lists or databases before they start building them. You need to ensure you don't fall into this trap.

Therefore, you'll need to think through your

  • ID structure

  • Version tracking

  • Change tracking

  • Hierarchical display and sorting

  • And what you are going to do about deprecation

It is our firm belief that all of these elements need to be tracked as a meta data (data bout data) component of each of your lists. Therefore, we propose the following items be a part of every list that you create and manage:

The Meta_Data_Type complex element components

Meta_Data_Type complex element is found in every UCF XML specification in version 2 and greater. The Meta_Data_Type complex element currently contains eight elements within its structure.

Within the list below, the "XXs" would be replaced with each XML list's appropriate identifier, such as "AD" within the Authority Document list, or "CE" for the Controls list.

The Meta_Date_Type structure below is derived from the Unified Compliance Framework's XML structure. Within that structure, each element's name begins with "UCF_", acting as an organizational identifier. In the example that follows, each element's name begins with "Co_", which acts as a generic replacement for the UCF organizational identifier. There are already several compliance mapping teams, such as Nanoroq out of Japan (NAN_) and the Center for Internet Security (CIS_) that leverage the UCF's XML structure and therefore have replaced the UCF organizational identifier with that of their own.

Co_XX_Release_Version (xs:string) - mandatory

The first element within the meta data component is used to signify which release date the table belongs to. While formatted as a string in the XSD, the text will be as follows:

Q3 09 - Final

The first section "Q3" represents which quarter the release belongs to.

The second section "09" is the two year digit for the year the release belongs to.

The final section "Final" represents the status of the release. Final is just that, the final, fully QA'd release. Pre-release will signify that this table belongs to a work in progress release, such as those sent to our XML licensees when we are working through the addition of new elements. And Update represents a critical update to a final release in case there is something urgent that needs to be sent out to the licensees.

Co_XX_ID (ID_Type) or (ID_Type2) - mandatory

Co_XX_ID is the unique and persistent identifier for each authority document that is restricted by the global definition of ID_Type (five digit) or ID_Type2 (seven digit). We use the XX_ID as the identifier so that if there is a discrepancy in how we any of the record's information, any linked references to the record will not change. And as obvious from the previous sentence, we use the XX_ID field as the linking field when referencing this list from other lists.

One key note about the XX_ID element is that even though the element appears to be a string of numbers, the element should be treated as text. The reason for this is twofold:

1. If treated as a number, the leading zeroes are most often deleted or at least ignored by the database.

2. Other elements, like Genealogy, which use the ID, are looking for text elements instead of number elements (the reasoning for which will be made clear under the Genealogy discussion below).

The ID element is created when the record in question is created and is always assigned the next highest non-used, non-reserved ID in the system for that particular list.

This element will always be present in all UCF XML lists.

Co_XX_ID_CheckDigit (xs:Integer) - mandatory

We humans have to use numbers. However, when entering numbers, we humans also have a tendency to screw up the entry or copying of those numbers. A Dutch mathematician named Jacobus Verhoeff conducted a study of 12,000 numerical errors and from that, proposed a check digit calculation scheme that catches all single errors as well as all adjacent transpositions and most other errors.

To ensure that the IDs assigned by the system have integrity during input as well as distribution while being transferred into various formats (such as Excel, Word, Text, XML), each ID will also have its own checksum value stored in a checksum field. Currently, the methodology for creating and verifying the checksum follows the Verhoeff calculation format.

This element will always be present in all UCF XML lists.

The CheckDigit is created along with the record's ID as a calculation by the UCF database system. As such, once assigned it should never change because the ID will never change. A sample calculation format is shown in the use case scenarios.

Co_XX_Genealogy (xs:string) - optional

Within the UCF, a record's genealogy is a set of UCF IDs strung together as distinct words (e.g., 0000000 0000001 0000002) that represent (from right to left) the current record's parent, grand-parent, great-grand-parent, on back to the very root element that spawned the list. At minimum, every record will have a genealogy of 0000000 which represents the root record within the list.

The genealogy element is initially created by the UCF database system when the record in question is created. If the record in question is moved lower or higher in the taxonomy, the genealogy is automatically re-calculated and the value will change to reflect the new taxonomic structure. Because the UCF editorial team does not have edit privileges for this element, the genealogy will always reflect the taxonomic position the record was last stored in. If there is a dispute about the record's genealogy, the dispute is an editorial one, and not a programming one.

This element will not always be present in all UCF XML lists. Some lists, like the Auditable Artifacts list do not require a genealogy and therefore the element will not be present in the XML schema.

Co_XX_Sort_ID (xs:integer) - optional

We sort our displayed information according to a taxonomic display hierarchy (which means that the genealogy plays a vital role). For the most port, each element in any of our lists is given a three digit sort identifier. We then append the record's sort identifier to its parent's sort identifier to create its Sort ID. We treat this numeric Sort ID as a text field so that we can run our sort routine from left to right in the character string.

There are some exceptions to the numeric Sort ID field, namely in the glossary and vendor lists wherein the Sort ID is actually the genealogical name of the record's predecessors through its title. For instance, in the vendor list one of the vendors might be Sybari, which is a subsidiary of Microsoft. Therefore, its Sort ID would be "Microsoft Sybari".

The Sort ID is created and managed in the same manner as the genealogy (it is a dynamic calculation). It directly reflects the record's place within the taxonomic hierarchy and is therefore uneditable by the UCF's editorial team (although the team does set the sort order, the system handles the ID to manage the sort order). Any disputes with the validity of the sort ID are in effect a dispute with where the UCF's editorial team placed the record in question within the taxonomic structure.

This element will not always be present in all UCF XML lists. Some lists, like the Auditable Artifacts list do not require sorting and therefore the element will not be present in the XML schema.

Co_XX_Live_Status (xs:Integer) - mandatory

Because the UCF™ treats every ID as both unique and persistent, we never delete an ID once used, nor do we re-use the ID. Therefore, if we have to redact a record, we merely mark the Live Status as moving from 1 (live) to 0 (redacted).

All records are initially created and marked by the system as Live (1). There are certain scripts that the UCF's database team will run to ensure that two instances of automated deprecation takes place:

1. If an Authority Document has been deprecated, all of its citations will be deprecated.

2. If a control has no citations pointing to it, the control in question will be deprecated.

Other than the instances noted above, records must be deprecated as an editorial process and approved by both the editorial reviewer and the editorial approver. When the Live Status is set to deprecated (0), there might also be a corresponding setting for the Deprecated By element, but this is not mandatory.

This element will always be present in all UCF XML lists.

Co_XX_Deprecated_By (xs:string) - optional

If a record in the UCF needs to be deprecated, the record will not be deleted from the system. Instead, the record will be marked as deprecated (its "Live Status" field will be set to 0), and the Deprecated By field will be filled out with the ID(s) of the record(s) that took its place (if any).

Initially this element is blank and only a UCF editorial process can indicate a Deprecated By content change. That change is then reviewed by the editorial reviewer and editorial approver. If there are contents in this field, the Live Status field must be set to deprecated (0).

This element will not always be present in all UCF XML lists. Therefore the element will not always be present in the XML schema.

Co_XX_Deprecation_Notes (xs:string) - optional

Deprecation notes are new to version 2.1 of the UCF, and we've done as good a job as possible back-filling them to ensure that we have covered our bases.

In a nutshell, when our mappers, reviewers, or approvers have made the decision to deprecate one of the records in the various XML tables, they will add their deprecation notes, their reasoning, to this field. There is no set format for what they are writing, so there aren't any hard and fast editorial rules, other than something has to be added to the field during deprecation.

Co_XX_Date_Added (xs:date) - mandatory

Date_Added is a date stamp for when the record was created.

This element is created when the record is entered into the UCF's Master Content database and not the working database. We chose this method because the UCF team's editorial process is a fluid one which allows, during the editing process, for records to be added, moved, deleted, or even "un-deleted" fluidly until the lock-date that ends the editorial process. Once the lock-date has been reached, all of the records are then finalized from the "working" list and uploaded as a batch to the Master Content database, which also triggers the change log process. Therefore, it is common to see all new records for any given quarter being added on the same date.

Because the Date Added element is controlled post-editorial process, the UCF database system manages everything automatically.

This element will always be present in all UCF XML lists.

Co_Date_Modified (xs:date) - mandatory

Date_Modified is a date stamp for when the record was modified. We use this as a key field for tracking all roll forward and roll backward field calculations. The initial date reflects the date the authority document was added to the database.

This element is created and updated when the record is entered into the UCF's Master Content database and not the working database. We chose this method because the UCF team's editorial process is a fluid one which allows, during the editing process, for records to be added, moved, deleted, or even "un-deleted" fluidly until the lock-date that ends the editorial process. Once the lock-date has been reached, all of the records are then finalized from the "working" list and uploaded as a batch to the Master Content database, which also triggers the change log process, which relies on this field to trigger that a change has taken place in the record. Therefore, it is common to see all new records for any given quarter being "modified" on the same date, and all modifications for the quarter to happen on the same date as well.

We have heard from multiple XML licensees that they would rather have the exact date and time that the record was modified instead of the batch upload date. That isn't possible, given that all of the XML licensees also want us to produce a compact and digestible change log. A change log based upon the exact date of modification would have already produced several instances with over ten changes for certain records. Changes that were of no consequence to either the XML licensee or an end user, because those changes were simply a part of our internal editorial process. Therefore, to save processing time on the change log and to shorten the length (of the already very heavy) change log, we made the strategic decision to limit both date modified and date created to be the batch upload dates.

Because the Date Added element is controlled post-editorial process, the UCF database system manages everything automatically.

This element will always be present in all UCF XML lists.

Generic XML list structure

Now that we've discussed the information you are going to need to have in your meta data, let's take a second and discuss a methodology to structure your XML lists overall. We suggest that each list have a minimum of two item-level elements; the meta data element and a basic info element. Optionally, if your list connects to other lists, you might want to add an external references element. Below is a screen shot of one of the UCF's XML lists showing all three top-level elements for each item.

[image]

A sample XML list with its top-level elements

List elements should contain items

Within each of your XML Lists, you should have a single list element. Never put more than one list in any given XML file. The list should contain an element that describes each record, or item, in the list.

Items should contain the main two (or three) elements

Each item (or record) should be split into the main elements that we've already described. Because we've covered the meta data element structure above, we won't elaborate on that any further.

UCF_Basic_Info

This complex element holds each record's real content. The record's name, description, and other key information is found within the Basic_Info element.

UCF_References

The UCF_References complex element holds a set of non-normalized references to external lists so that you can ensure your links are correct when you tie each list's appropriate primary key to the external list's foreign key.

The list of references is always defined from the current list's perspective. What this means is that each list has certain links to other lists, such as the Authority Documents list links to both the Citations list and the Terms list as shown in the diagram that follows. Therefore, within the Authority Document List XML file, the References element might look like this:

<UCF_References>

<UCF_Terms_List> This is the name of the XML list being referenced

<UCF_Term_Item_ID> This is the name of the ID element within the list referenced above

We provide this information as a reference to show how the primary list you are working with correlates with other tables within the UCF's content model.

You do not need to actually use these references for anything as long as you remember to connect the primary list to the correlated lists using the primary key and foreign key relationships.

XML allows you to document your structures - so do it

XML is a great language for documenting structured lists. Each element has the ability to add documentation to that element. We suggest that you maintain the descriptions for each of these elements the same way that the UCF team maintains the descriptions for its XSD elements - online in individual web pages that match the structure and top-to-bottom order of the XSDs themselves. A great example of this can be found online at http://netfrontiers.com/ucf-xml/the-ucf-metrics-xml-specificat.html.

Post a comment

 
 
 
Recent Site Updates
The UCF Acronym XML specification
The UCF Glossary XML specification
The UCF Common Metric Enumerator XML specification
Testing for uniqueness
Migrating an XML file into a database