hdr_logo_top.gif
hdr_logo_bottom.gif

The support site for the Unified Compliance Framework


Creating order out of chaos - the UCF's taxonomic ontology

When you walk into a library, have you ever thought about the cataloging system that they had to put into place in order for you to find something? Without a way to bring order out of the chaos, what you'd have is more or less a jumble of books on a table, like the diagram of authority documents shown below left. However, once organized, the documents can be examined according to their category and version history.

[image]

A jumble of documents organized neatly

This categorization is formally called a taxonomic ontology. While the term might sound fancy, it actually isn't complicated at all. You need to care about all of this simply because all of the UCF's lists are in this taxonomic ontology format which we'll refer to simply as a hierarchical list.

What the heck is a taxonomic ontology?

A taxonomy is more or less a hierarchical relationship of words, categories, and concepts. Before we delve any further into this definition, let's talk for a second about what we do naturally. It is in our nature to classify what we encounter, if only to help make sense of our surroundings. We look at a chair and think "wooden chair," or "comfortable chair," both of which are subordinate terms to the category of "chair." We all categorize that which we encounter. You might have heard the terms:

Domain, Kingdom, Phylum, Class, Order, Family, Genus, Species

This is the basic, worldwide accepted scientific classification and scientific taxonomy by which biologists group and categorize species of organisms. We'll cover the basics of classification first, then turn our attention to the taxonomy.

A classification is a defined method for organizing information. In general, classifications group similar things together. There is no one way to define similarity, and therefore, many different classifications of the same things may exist together because similarity is based upon the abstraction of the meaning to the classifier. Therefore, the intended use of the classification is an integral part of deciding what properties the classification system should be based upon.

One of the key properties of classifications is that they can be nested within one another in order to create a hierarchy. This allows the various groups within a classification to be split into ever greater detail based upon the relatedness among the meanings being considered. And there is no limit to the depth of a hierarchical classification.

    • A taxonomy is a formal classification of the entities being sorted out that ensures there are no incomplete or duplicate meanings.

So if a taxonomy is the formalized version of classification, when we put taxonomy and hierarchy together, we get a taxonomic hierarchy. Taxonomic hierarchies, according to Daniel A. Aberra, are classificatory systems that are supposed to reflect the way that speakers of a language categorize the world of experience. And a well-formed taxonomy should offer an orderly and efficient set of categories at different levels of specificity, defining the following elements:

    • The term itself with its definition

    • The hierarchical relationship of the term to other terms

    • Synonyms or cross references to similar terms

However, according to Daniel Aberra, the world isn't that simple, as most taxonomies seem to differ between disciplines. We have found out that within the world of IT and compliance metrics, the same holds true. Either folks are using different categorization terms to mean the same thing, or the same categorization terms to mean something different.

A quick background on taxonomies

In order to make sense of the world, as we process information bits, we have a natural tendency to categorize them. This means that, there must be something that underlies our categorization, some type of thinking and decision making process. To some, a category is a "class of objects that we believe belong together," which is the definition I like the best as it leaves categorization an open-ended process to be developed as we develop our sense of that which is around us.

Two basic taxonomic methods - scope and dependency

Most taxonomies follow one of two thought patterns. They either divide the world according to scope, or they divide the world according to dependency. Let's look at both methodologies, as both methodologies are used by the UCF team.

The taxonomy of scope

While many of the taxonomists use fancy terms like superordinate and hyponym, we aren't expecting anyone within the IT compliance space to do likewise. Again we are going to use different (simpler) words for the same thing: broader scope, similar, and narrower scope.

[image]

Hierarchical relationship

The rules for deciding where an item should be placed (in simple language) are as follows:

    • A term is broader in scope when it has more attributes or features than another term it is being compared to. If you think in terms of software and application, software could be an operating system, utility program, major application, minor application, etc. because software is anything other than hardware or firmware or files. The scope could then be narrowed down to application, then application management, and even further to application portfolio.

    • A term has similar terms (application and program are similar terms) when the terms can be used in place of each other. And in this example, we often hear people intermingling "program" and "application" in this context. Both terms are on the same basic level as each other.

    • To make things even simpler, a term that is more generic than the base term should be thought of as being of broader scope than the base term. A term that is more specific than the base term should be thought of as being narrower in scope than the base term.

The taxonomy of dependency

The taxonomy of scope moves from broader to narrower terms. The taxonomy of dependency creates an order out of items that cannot stand alone. Dependencies of taxonomy are used for creating a hierarchical order of tasks or controls that are contingent upon other tasks in the list.

    • Given any list of tasks or controls, those tasks or controls that are able to stand alone, free of dependence or support from other tasks, can become a simple unordered list, such as the first couple of the UCF's IT Impact Zones:

Leadership and high level objectives
Audits and risk management
Monitoring and measurement

The presence (or not) of these three impact zones within an organization's compliance program are not dependent upon each other. Therefore, they can stand alone or stand together without one having sovereignty over the other.

    • When a task or control relies on another task or control to be in place first, it must follow a hierarchy of dependency.

Let's take the example of UCF Common Control ID 00594 "Ensure audit logs contain a timestamp which tracks user activity." In order to place this control into its proper hierarchical order of other controls, we have to break the control down into its parts and examine it for dependencies. First, the control is looking for a timestamp within an audit log. Therefore, we know that the control is dependent upon an audit log being in place. Therefore, we would need to ensure that the taxonomy demonstrated that logging operations were in place before this control could be added. Our example list would be changed to include logging operations as well as our new control under the Monitoring and Measurement Impact Zone because logging is a narrower topic and our control is dependent upon having logging in place.

Leadership and high level objectives

Audits and risk management

Monitoring and measurement

Establish logging operations

Ensure audit logs contain a timestamp which tracks user activity

The second thing we notice is that we are looking for a timestamp tracking user activity, which tells us that we need to have proper date and time entries. And we have to have a way of identifying different users. Both of these must be in place before we can ensure that the audit logs contain a timestamp that tracks user activity.

Leadership and high level objectives

Audits and risk management

Monitoring and measurement

Establish logging operations

Ensure the logs maintain proper date and time entries

Log user identification

Ensure audit logs contain a timestamp which tracks user activity

Notice how the Monitoring and measurement list moves from things that must be accomplished first through those things that should be accomplished next through last? Before the last control can take place, those controls that are hierarchically above it must have taken place. That is the taxonomy of dependency in a very simple fashion.

Taxonomies + ontologies = hierarchical lists

So far we've talked about the taxonomic structure part - items that are broader in scope, similar in scope, or narrower in scope, or tasks and controls that must be accomplished according to an order of dependency. What we haven't talked about is the other fancy word - ontology. Simply put, an ontology is an organization of a knowledge domain into a taxonomic hierarchy. In other words, the ontology part is the rules you use to decide which items are broader, similar, or narrower, or which tasks or controls are dependent upon another.

With one exception - the glossary - each list that the UCF™ team produces is in this taxonomic ontology form. In order for you to understand our list system, you need to understand the inter-relationship of the three core ID fields that we use to create and maintain our lists: the ID, the genealogy, and the Sort ID.

In short, the genealogy field gives us the rough order and hierarchical level of each record; the Sort ID tells us how to precisely sort the records, and the ID field is the anchor that is the primary key field by which to link the control list to other tables.

The ID field

This is the UCF's unique and persistent identifier that we test for. It is the primary key element of the UCF CE Basic Info Table as well as the Audit, Metrics, Authority Documents, Glossary, and individual Guidance tables.

It is unique across all records and, in our case, is never re-used once assigned to a list item. The ID is a five-digit number treated as text, with leading zeros.

Record 00000

Every list we provide contains a root record with ID 00000. This is the source record for that list. It is important to include it in your data if you're going to do anything more complex with the data than merely display it in a hierarchical list. Otherwise, it can be discarded.

The XML representation of the ID

Each list will represent its ID field slightly different than the others. Each list's ID field is in the XML format of <UCF_XXX_ID>00000</UCF_XXX_ID>, where the "XXX" represents the different list. Here's a sample list of ID field names we have:

    • UCF_XXX_ID = generic representation used in documentation

    • UCF_AD_ID = Authority Document ID

    • UCF_CE_ID = Control Entry ID (also used for the Audit Question and various Guidance lists as they are derived from the same table)

    • UCF_Asset_ID = Common Asset Enumerator ID (also used for auditable artifacts as they are derived from the same table)

    • UCF_Vendor_ID = Vendor IDs for creators of assets

    • UCF_Glossary_ID = Glossary term ID

As we just said, each and every record within our various lists has a unique and persistent ID, starting with the list itself which always begins with 00000. In our example list below we show several authority documents, along with their originating authorities (in italic) and the general guidance area that the authority document belongs to (in bold) in a hierarchical view along with their associated UCF ID fields. Notice here that the order of the list and the hierarchy isn't drawn specifically from the ID field. That's because there are two other fields you need to understand that come into play in order to create, format, and sort this list. And those fields are the genealogy and the sort ID.

Title

ID

Authority Document List

00000

Sarbanes Oxley Guidance

00001

 

US Congress

00010

 

 

Sarbanes Oxley Act

00004

 

US Office of Management and Budget

00012

 

 

A 123

00005

 

 

 

A 123 Implementation Guide

00008

 

US Public Company Accounting Oversight Board

00011

 

 

PCAOB AS 2

00006

 

 

PCAOB AS 3

00007

 

 

PCAOB AS 5

00000

Banking and Finance Guidance

00002

Payment Card Guidance

00003

A sample Authority Document list with ID fields

The genealogy field

Within the UCF, a record's genealogy is a set of UCF IDs strung together as distinct words (e.g., 00000 00001 00002) that represent (from right to left) the current record's parent, grand-parent, great-grand-parent, on back to the very root element that spawned the list. At minimum, every record will have a genealogy of 00000 which represents the root record within the list.

Each list will represent its genealogy field slightly different than the others. Each list's ID field is in the XML format of <UCF_XXX_genealogy>00000 00001</UCF_XXX_genealogy>, where the "XXX" represents the different list. Here's the a sample list of ID field names:

    • UCF_XXX_Genealogy = generic representation used in documentation

    • UCF_AD_Genealogy = Authority Document ID

    • UCF_CE_Genealogy = Control Entry ID (also used for the Audit Question and various Guidance lists as they are derived from the same table)

    • UCF_Asset_Genealogy = Common Asset Enumerator ID (also used for auditable artifacts as they are derived from the same table)

    • UCF_Vendor_Genealogy = Vendor IDs for creators of assets

The genealogy field in use

In order to derive a record's hierarchical status, the genealogy field is used in a dynamic calculation formula that you will have to create. Because we treat the genealogy field as a text field (as with the ID field), the words in the genealogy field can be counted to set the hierarchy level of the record. The following table shows the genealogy information for each of the records in our example as well as the derived hierarchical level for each record.

How you create your hierarchical level calculation is up to you. We've got programmer's notes for calculating the hierarchy value for a couple of applications and can send them to you if you want them. Once you have the hierarchical level defined, you'll want to assign a style level to the record display as we do. In database format the style can be as simple as calculating extra spaces in front of each hierarchy. In HTML, Excel, and Word format, the style should be a CSS style.

Title

ID

Genealogy

Hierarchy

Authority Document List

00000

 

0

Sarbanes Oxley Guidance

00001

00000

1

 

US Congress

00010

00000 00001

2

 

 

Sarbanes Oxley Act

00004

00000 00001 00010

3

 

US Office of Management and Budget

00012

00000 00001

2

 

 

A 123

00005

00000 00001 00012

3

 

 

 

A 123 Implementation Guide

00008

00000 00001 00012 00005

4

 

US Public Company Accounting Oversight Board

00011

00000 00001

2

 

 

PCAOB AS 2

00006

00000 00001 00011

3

 

 

PCAOB AS 3

00007

00000 00001 00011

3

 

 

PCAOB AS 5

00000

00000 00001 00011

3

Banking and Finance Guidance

00002

00000

1

Payment Card Guidance

00003

00000

1

A sample Authority Document list with Genealogy and Hierarchy fields

The Sort ID field

The last field we need to understand is UCF_CE_Hierarchy_Sort_ID (Sort ID). Sort ID is how you will force your list to sort properly. It is comprised of sets of three numbers treated as text so that it can be sorted properly. The best way to explain Sort ID is to continue with the example we started above. We are going to ignore the first record in the example for now, as it is only the reference root record.

Do not think that because the Sort ID is structured as sets of numbers, that those numbers have direct correspondence to the genealogy. What we want you to notice is that the right-most set of numbers of the Sort ID has no correlation to the right-most numbers of the genealogy ID. The numbers in the genealogy ID are derived directly from stringing together the primary ID fields as distinct sets of words. The sets of numbers in the Sort ID field are

    • 3 characters long in order to allow for sorting lists up to 999 records,

    • derived from our team's manually entering those numbers to achieve the sort order that makes the most ontological sense, and

    • structured so that sorts are performed based upon hierarchical level.

Notice in the following example that all records in hierarchy level 1 have Sort IDs that are two number sets long, and that the second set runs from top to bottom as 001, 002, and 003. This allows the programmer or end user to remove from display all hierarchical levels greater than 2 and still have the remaining records in the list sort and display properly. The same thing can be said for each and every hierarchical level. Likewise, if any of the records were deleted or removed from view, the remaining records would still sort correctly.

Title

ID

Sort ID

Hierarchy

Authority Document List

00000

000

0

Sarbanes Oxley Guidance

00001

000 001

1

 

US Congress

00010

000 001 001

2

 

 

Sarbanes Oxley Act

00004

000 001 001 001

3

 

US Office of Management and Budget

00012

000 001 002

2

 

 

A 123

00005

000 001 002 001

3

 

 

 

A 123 Implementation Guide

00008

000 001 002 001 001

4

 

US Public Company Accounting Oversight Board

00011

000 001 003

2

 

 

PCAOB AS 2

00006

000 001 003 001

3

 

 

PCAOB AS 3

00007

000 001 003 002

3

 

 

PCAOB AS 5

00000

000 001 003 003

3

Banking and Finance Guidance

00002

000 002

1

Payment Card Guidance

00003

000 003

1

A sample Authority Document list with Genealogy and Hierarchy fields

Post a comment

 
 
 
Recent Site Updates
The UCF Acronym XML specification
The UCF Glossary XML specification
The UCF Common Metric Enumerator XML specification
Testing for uniqueness
Migrating an XML file into a database