purpose

This is the specification for the lexical and syntactic constructs of the components of CDCL.

Contrasting CDCL and Booliette

CDCL has what is known as an "authoring form". This authoring form is what people use to write and read disclosure control policy. People use CDCL authoring form to create rulesheets that declare and express policy rules.

This lexical discussion covers the authoring form's mechanisms by which rulesheets are organized and constructed so that policy may be uniformly interpreted. In contrast, the page on Booliette discusses a formalized language for non-technical personnel to express Boolean algebra in a way that is easier to understand than traditional means of expression. Booliette is intended to be an approximation of a natural language so that ease of use, clarity, and precision share equal importance.

Booliette is the most important member of the set of mechanisms CDCL authoring form employs in its grammar. So, suffice it to say that CDCL contains Booliette, while the converse is not true. Booliette along with the remainder of the non-trivial mechanisms are formally described in the grammar specification.

Policy authors will use Booliette within CDCL authoring form to declare the rules' stipulations, from the simple to the complex as appropriate, which must be observed and honored in order to enforce proper outcomes during data disclosure events. It's possible that one eventually may find uses for Booliette outside the domain of CDCL authoring form.


rulesheet examples

Some readers may be experts in related fields and will likely have expectations about CDCL based on prior experience in those fields. Therefore it may help to have an understanding of CDCL first in order to examine rulesheets. One may choose to review the fast-track CDCL introduction and save reading this site's content for a later time. Either way, several rulesheet examples are offered below.

For this discussion of CDCL Authoring Form, it may be helpful to become familiar with some example rulesheets.


Perspective for Drafting Policy in CDCL

It is critical for authors to understand that the policies they write will be interpreted in a CDCL Gatepoint from a specific perspective. That is one in which the document under redaction is examined node by node (i.e. element by element) and, from the point of evaluation of each node, the policies are examined to determine applicable rules and their outcomes. This is significantly different from the unsupported perspective of beginning with the policies' rules one-by-one and thereafter poring over the document to achieve policy compliance for each given rule. Each perspective lends itself to a different style of drafting rules. Although the proposed vocabulary of CDCL-purposed keywords in Booliette could be used for either style, some words would be used far more heavily in one style than the other. Since some of these words represent searches through the document, a style that relies heavily on words that cause these searches would be redundant at best and could negatively impact performance at worst. The unsupported perspective's style would do just that. Therefore, with an author's understanding of the supported perspective, it ought to be clear that words for searching (like "present-document contains") should be used sparingly in favor of reliance on the word "present-item" (synonymous with "content").

In other words, it could be a dangerous intuition by an author to believe that one could define a rule's Nodeset specification (the for-content block) without citing "present-item"/"any-item" or "content"/"all-content" because such a rule would be redundantly evaluated against all the document's nodes for every node-by-node visit the Gatepoint makes any way. Of course, this could be legitimate at times, but the likely rate of mistakes is far too high to ignore.

Furthermore, "present-item" and its synonyms (e.g. "content") must not exist anywhere outside a rule's Nodeset specification (the for-content block). We have felt for the purpose of clarity, which is an important facet of the entire CDCL solution, that a rule would have both a Nodeset specification and a separate condition/user specification (i.e. the Condition-set). If the "present-item" subject were allowed to exist in propositions anywhere within a rule, particularly within a Condition-set specification (the for-conditions block), there would be no sense in having these two separate areas of specification (i.e. the two kinds of blocks). We feel elimination of differentiating these two blocks by allowing use of "present-item" anywhere within the rule would negatively impact rule clarity.

Just as important as clarity, maintaining both the separation of these two specifications (Nodeset and Condition-set) and the prohibition on citing "present-item" outside the Nodeset specification allows CDCL to log a serialized representation of the combined and evaluated document/policy that exhibits rule applicability. For auditing purposes, this will be a great benefit by relieving auditors from the necessity to manually evaluate the policy against the document. And should an auditor choose to conduct manual evaluation, this affords an opportunity to verify that a Gatepoint is operating correctly.


aliasing in CDCL

CDCL authoring form offers its own aliasing feature that is different from Booliette's aliasing mechanism, but both mechanisms behave the same way. (In brief, it's a purely lexical substitution that occurs, logically speaking, before any processing takes place.)

Aliasing permits the substitution of simple character sequences for unwieldy, visually dense textual constructs such as URIs, XML literals, SQL queries etc. Alaising also gives the rulesheet author the means by which policy may be expressed in a localized dialect suitable for that author and that policy's stakeholder.

The scope of a CDCL alias declaration is the Rulesheet. CDCL alias resolution occurs before any Booliette alias resolution occurs. The CDCL authoring form alias feature has the following convention:
The word alias on its own line. On the following line, indented one level, the word replace: occurs on its own line. On the following line, indented two levels is the character sequence to be removed on its own line; this is tycpially the shortened form found throughout the rulesheet. On the following line, indented higher at only one level (the same as for 'replace'), the word with: occurs on its own line. On the following line, indented two levels is the character sequence to be used instead of the removed sequence; this is typically the longer, unwieldy form.


CDCL Thesaurus and management of keyword synonyms

Although the authoring form's alias feature can give a rulesheet author a tool for expressing identical policy logic using different words, use of alias is restricted to within the rulesheet in which it appears. Often, globally available synonyms for authoring form keywords will be useful, and for this purpose, there is the CDCL thesaurus.

In order to achieve support for synonyms of authoring form organizational constructs, such as when translating the constructs to Spanish, French, or any language (e.g. keyword "rule" in English to "regla" in Spanish), CDCL relies on the thesaurus of keywords. It may be too high a security risk to use an approach that would dynamically determine the thesaurus through a service. So, under consideration is the merit and cost of bundling a thesaurus with CDCL deployments.

A thesaurus is a special-purpose variant of a Semantic Registry. A CDCL implementation may be configured to know of a thesaurus (zero to many of them). For more information about CDCL implementations of Gatepoints, Rulesheet Repositories, Rulesheet Editing, and Syntax Checking, see the sections on decision making process flow and collation.


CDCL Runtime Entities

CDCL recognizes many information entities whose values are determined at runtime. These entities may be found within four different explicit contexts:

  1. document context (containing the data that is subject to redaction)
  2. user context (for the recipient user(s) to whom the data is destined)
  3. execution context (containing CDCL Gatepoint environmental information, e.g. current time)
  4. client context (containing the calling software system's environmental information, e.g. data exchange identity & business purpose)
There is an implicit context of the collated collection of rulesheets, which is called the rulesheet deck. However, there is no information entity within this context that is directly citable.

This is only a partial list and small sample of entity references likely to be used within authoring form:

Although the aliases' values are not known until runtime (and in fact may wind up being UNDEFINED at runtime), these items may be referenced in CDCL Rules. Accordingly, CDCL reserves the aforementioned alias names for the respective runtime entities. These aliases are undeclared, because the declaration of an alias presupposes an exact knowledge of the value being aliased. These special aliases will assume values supplied to the CDCL Gatepoint processor at runtime.


the logical structure of CDCL

CDCL is built of several discrete components, sequenced or nested as the case may be:


validating and evaluating rulesheets

Validation of rulesheets involves checks for compliance of both syntax and logical coherence.

the order of rule evaluation

There is no guarantee of the order in which rules are evaluated.

the order of rule precedence

The order of precedence of rules is not the same thing as the order in which rules are evaluated. The order of rule precedence is a future feature of CDCL rule authoring form, of CDCL rule fundamental form, and of outcome decision making. This feature is intended to satisfy Condition Scope Control.

In authoring form, one may dictate the precedence of one to many rules over another rule by using the unless keyword. For example:

			
     rule
          id:
               7
          apply-outcomes:
               disclose
          for-content:
               * content has-semantic PII
          for-conditions:
               * all-true
                    * inherent-role-list has-semantic auditor
                    * current-time is-greater-scalar-value-than "15:00:00"

     rule
          id:
               8
          unless:
               3 or 4 or 7 applies
          apply-outcomes:
               disclose-and-hold-for-review
                    immediate-reply-to-recipient:
                         The document is being held for manual review prior to a disclosure decision.
                         Please contact my.reviewer@thefed.gov for review results.
                    address-list:
                         my.reviewer@thefed.gov
                    subject:
                         Gatepoint Hold for Review
                    body:
                         See attachment.  Sincerely, your friendly neighborhood Gatepoint.
          for-content:
               * content has-semantic PII
          for-conditions:
               * inherent-role-list has-semantic auditor
			
		
In outcome decision making, evaluation of a specific rule would be skipped entirely if it were determined that it occupied any position other than the first position in a sequence - unless the rule being evaluated were under immediate consideration within a precedence resolution operation, such as in the case when the rule in first position were found inapplicable, whereupon the rule or rules in secondary position are considered. For example, let's say the very first rule to be evaluated were rule 8 above. It would be skipped because it occupies a lower precedence than rules 3, 4, 7. Some time later, rule 7 is evaluated and determined to be inapplicable. Likewise for rules 4 and 3. Following the sequence of precedence to rule 8, rule 8 is then not skipped at this moment and is evaluated for applicability. But this had to wait until rules 3, 4, and 7 were found to be inapplicable. (Note that in following the sequence of precedence, there may be more rules than just 8 to seek out and evaluate.)


the lexical elements of Authoring CDCL

The logical model described above would be written in CDCL Authoring Form as follows. This listing is annotated below with a more formal treatment of the language elements. CDCL keywords are highlighted like this; if they are links, clicking on them shows definition or discussion.

The characters : { } = are literals, required to be in the code as shown. Remarks in parentheses are descriptions or discussions; neither the parentheses nor their content form a part of CDCL syntax.

Booliette syntax is discussed on its own page. None of the syntactic requirements of CDCL are applicable to the Booliette elements; they follow their own rules.

	
include
The include keyword is used to make an external resource available within the context of the rulesheet, which typically is another rulesheet. This keyword can occur 0 to n times.
doctype: (alias or full URI) primary-custodian: (alias or full URI) revision: (fragment identifier) under-these-conditions: (booliette statements about stake) alias
This keyword can occur 0 to n times.
replace: (fragment identifier) with: (fragment identifier) default-rule
The default-rule keyword must occur once in a rulesheet.
id: (fragment identifier) apply-outcomes: (outcome alias or full URI) rule
The rule keyword can occur from 0 to n times in a rulesheet.
id: (fragment identifier) apply-outcomes: (outcome alias or full URI) for-content: (booliette statements about present node or present document) for-conditions: (booliette statements about recipient user; "present-item" alias not permitted) rule id: (fragment identifier) apply-outcomes: (outcome alias or full URI) for-content: (booliette statements about present node or present document) for-conditions: (booliette statements about recipient user; "present-item" alias not permitted) (...and so on. the number of rules in a rulesheet is unrestricted.)