Gatepoint's purpose

The Gatepoint offers disclosure control as a service to requesting client systems, which, in turn, are likely participants in an information exchange. In addition, the Gatepoint offers auditing services of the actions it has taken. In offering these services, the Gatepoint collates the appropriate policies from one or more communities of stakeholders in the policy-compliant disclosure of data. Once the Gatepoint concludes collation, it conducts policy evaluation for the applicability of rules and marks up the document with the reconciled outcomes (i.e. it serves as a policy decision point) and enforces those outcomes from the applicable rules within the policies (i.e. it serves as a policy enforcement point). Enforcement is achieved through any sort of interpretation or transformation of the document's content. For example, a reconciled outcome of redact-and-admit for a piece of information like "judgment=convicted" might cause that information in the ultimate document to be transformed to "judgment=REDACTED". The sorts of interpretations or transformations can be configurable and may vary from Gatepoint to Gatepoint. However, it will likely be the case that certain domains of redaction problems will require uniform or standard configurations. Ultimately, the Gatepoint logs its actions for potential audit purposes.


How clients interact with a Gatepoint

A Gatepoint can be integrated in one of two modes: linearly or orthogonally. The linear approach finds the Gatepoint acting as a proxy between two parties engaged in a data exchange. The orthogonal approach finds the Gateway acting in a typical service fashion to be used at will by a given process. In the latter approach, the resulting document is delivered back to the requesting process, whereas in the former approach, the resulting document is not delivered to the requesting process - but to the second entity. See the interaction diagram.


What metadata shall be returned to a calling client?

Success status, should the document to be redacted wound up being successfully redacted.

Failure status, should the document to be redacted fail to have all outcomes honored (if enforcement were to be done) or if there were some other error in processing, such as failure to locate the primary custodian's rulesheet. Also includes failure to send alerts and failures to execute holds.

The metadata might also contain the list of all document portions that could not be decrypted and contain a list of all data within the document to be redacted that were assigned to the default outcome (this occurs in a Gatepoint process step known as Prepare). It might also contain the list of all rulesheet URIs that were not queried or that were queried yet resulted in no returned rulesheet from the repository.

This metadata might be inappropriate to log but would be helpful and informative to the calling client for troubleshooting purposes.


Activity diagram

The top level steps in the process flow for a disclosure control request are found in the activity diagram.


Gatepoint's application programming interface (API)

See the UML diagram of the API.

Additional information:
See the Gatepoint UML class diagram.

See the authoring form rule parser UML class diagram.

See the rulesheet compiler UML class diagram and its associated ANTLR UML class diagram.


How outcome reconciliation is conducted

For a future version of CDCL, outcome reconciliation might be dependent upon CDCL Condition Scope Control. Until that time, the CDCL engine shall place "Redact and Deny" as the highest priority outcome. Next in priority is "Redact and Admit". This is followed by "Disclose for Hold Review". Last comes "Disclose". This is called the cascade or the priority of outcome "overrides". All data nodes must be governed by at least one of these outcomes, which are collectively called state-change-independent outcomes. If multiple, different, and applicable outcomes exist for a node, the priority list documented here shall apply to reconcile the single outcome used to govern the node in the result document. Other outcomes may exist for adding information to the result document, such as secondary disclosure constraints/obligations/conditions or alerts, but these supplement each member of the four core outcomes documented here.

Upon determining the reconciled outcome for a node, that node shall be decorated with that decision. If the node already possesses a decoration with a potentially different reconciled outcome and a newly formed decision must be implemented, the existing decoration shall be replaced with the new decoration reflecting the new decision. Of course, the new decision is permitted to be dependent on the existing decoration, as requirements may warrant (e.g. a newly formed decision to disclose could be dependent on no existing decoration to redact). Only one reconciled outcome decoration may exist for a node.


What happens if a document were to be submitted repeatedly for disclosure control?

Because policy may be dependent on time of day, there is no guarantee that the same outcomes as had been applied in a first pass will be applicable to that data which remains within the document during a successive pass through a Gatepoint (ostensibly occuring at a later point in time). Despite this, Gatepoint behavior guarantees three things. One is that the Primary Custodian's policy will be present and be evaluated no matter how many times a document passes through Gatepoints, or else the Gatepoint will raise an error condition. The second is that contentious outcomes will be reconciled according to a known "cascade" priority (a.k.a. the outcome override). The third is that it does not matter in which order a specific Gatepoint evaluates policies and rules within those policies. In other words, behavior is independent of the order in which a Gatepoint chooses to evaluate policy rules.

There potentially are additional policy dependencies which preclude making any other guarantees of Gatepoint behavior in this circumstance. Policy may be dependent on the state of the document. So, the presence or absence of information within the document itself may impact applicability of policy outcomes. Furthermore, policy may be dependent on a change in state of the document. For example, one may have a rule that is conditional on the event of redaction of a piece of information (i.e. it was present on the input document and will be absent from the output document) in order to send an alert. With all other things held constant, policy may be dependent on the Gatepoint's execution context, which is the data that is specific to a Gatepoint instance. This execution context data cannot be assumed to be forever constant. However, the execution context data, such as date and time, will remain unchanged only within a given disclosure control transaction, thereby removing any possibility that the order in which policies and rules are evaluated will matter; the good news is it will not matter.

A stakeholder that is not the primary custodian might not have its policy included for evaluation due to failure to establish a connection between the Gatepoint and the Rulesheet Repository, due to policy expiration, or due to a variety of other reasons. After repeated attempts to retrieve a stakeholder's policy, if such policy is not found and included, the Gatepoint will proceed with the disclosure control transaction subject to the aforementioned behavioral guarantees.


Errors that may be generated during Gatepoint operations

Also see validation error messages, which ought to be conducted by Rulesheet Repositories at rulesheet check-in time. This is not to say that rulesheet validation cannot also be performed by the Gatepoint. In fact, many of the logical coherence checks in rulesheet validation as well as some of the checks listed below could be performed by both the Gatepoint and the repository.


Modelling Rulesheets and Documents within the Gatepoint

The textual information in documents and in policy rulesheets are somehow modelled at run-time within the Gatepoint. One option for modeling can be a network of nodes (i.e. a graph) to represent the rulesheet content and document content (as well as of the data concerning the user context, client context, and execution context). It's important to emphasize that CDCL has no expectations on the mechanisms chosen for modeling these items. Anyone implementing the CDCL specification may choose a modeling approach that works best for one's circumstances. The remainder of this section discusses topics pertinent to graph models.

Disabling a node versus removing a node
In some circumstances, a node that is extant in a graph does not accurately model the real-world information. Such a node could be removed from the graph. However, at times it may be prudent for logging/archival/audit purposes to preserve the fact that the node had existed in the graph, and therefore it should not be removed. When such a node exists in a graph and its removal is imprudent, the node should be disabled. The status of node enablement/disablement is a property of the node.

Withering

Withering is a concept that could be used within a graph model to facilitate Gatepoint process flow operations. Withering is essentially a means by which graph traversal can be governed. The present position during graph traversal is a node that can be called Node_PP for Node Present Position. An arbitrary, non-negative quantity of arc-nodes (i.e. directed edges) lead away from the Node_PP. The node to which an arc-node leads or "points" is a destination node of the Node_PP, and this destination node can be called Node_D. The only arc-nodes that may wither are those with Node_D of type Logic Node or Anonymous Node. However, it may be that not all arc-nodes with Node_D of these types are subject to withering. This may necessitate creation of a special sub-type of arc to Logic Node that is a Withering Arc to Logic Node and a special sub-type of arc to Anonymous Node that is a Withering Arc to Anonymous Node. When a withered arc-node is detected on the Node_PP, it is a signal that the arc-node should not be traversed. No graph analysis operation is compelled to honor the withering of arc-nodes. It is at the discretion of a given process to honor or not honor a withered arc-node.

How is an arc-node indicated as being withered?
The withered status is an attribute of the arc. If arc attributes are not supported in the data model, then the withered status of the arc could become an attribute of the Node_PP.

How does an arc-node become withered?
Any graph analysis process can define the wither. For the purposes of CDCL, the evaluation of a Logic Node as FALSE will cause all arc-nodes of the Withering Arc to Logic Node type leading to it (not away from it) to be withered. In other words, the process responsible for evaluating the Logic Node is a likely candidate to also be the process responsible for withering the proper arc-nodes. This withering must propagate recursively backwards, one generation of nodes at a time, through the graph as long as the arc-nodes are of a type that withers, and if so, such arc-nodes are themselves withered. For example, the propagation of withering occurs when an arc-node withers that leads away from a Node_PP of type Anonymous Node to a Node_D of type Logic Node and when the Node_PP, in turn, has arc-nodes leading to it that are of type Withering Arc to Anonymous Node.

How does an arc-node become un-withered?
For the purposes of CDCL, no process causes arc-nodes to "unwither".


Gatepoint deployment topology and architecture

Gatepoint clustering

An instance of a deployed Gatepoint will include customer-administered configuration data along with customer-administered information concerning connection management, system management, and network management specific to the CDCL Gatepoint. In addition, such an instance may include caching proxies for operational performance enhancement, and these proxies could be for either Stakeholder Directories queries or Semantic Registries / Type Definition Registries queries or both. Not to be forgotten are the operating system resource dependencies such an instance has, such as to the file system for storage and retrieval of system-to-system credentials (e.g. X.509 certificate).

When considering the possibility of a customer's preference to deploy a cluster of Gatepoints to achieve fault-tolerance or performance enhancement, care must be taken to design the CDCL system so that multiple simultaneous Gatepoints may access virtually single, stateful resources without contention. If the given resource is not truly single, then at least its state must be common across all its instances so that all clustered Gatepoints will behave similarly. For example, if within such a cluster there were multiple instances of a semantic registry proxy, then each proxy's state must be replicated across its peers.

Employing Proxies in CDCL

A forward caching proxy would improve performance of Gatepoint URI verification requests of Semantic Registries and Type Definition Registries. The risk is that cached information is immediately stale.

In a linear deployment approach, a "clandestine" or intercepting proxy (often called a transparent proxy, correctly or incorrectly) is a type of forward proxy that would benefit deployment installations of CDCL Gatepoints where customers cannot afford to or are unwilling to make the necessary modifications to their systems to act as Gatepoint clients. Such a deployment would have the Gatepoint insinuate itself between two parties already engaged in a legacy exchange of information. And a reverse proxy would benefit a Gatepoint application of a web site (or web application) content filter.

There are open-source proxy solutions that could be examined for integration with CDCL.

connections management

system/network management

disclosure control engine operations

See the activity diagram and UML diagram of the API.

"document", "redacted document", & "unparsed document" all mean unparsed documents in some external format, such as XML, GJXML, RDF, a serialized Java object, etc. "unparsed document" means either a document prior to disclosure control or subsequent to disclosure control. "document" typically means the input argument to the disclosure control engine: i.e. a "present document" or "message" prior to parsing and prior to any redaction.

Any stakeholder or primary custodian may author a disclosure outcome rule. The sentiment is held that primary custodians deserve greater consideration than stakeholders. This sentiment could be reflected in defaulting primary custodians' rulesheets to "Redact and deny" outcome by default, whereas all other stakeholders' rulesheets would be defaulted to "Disclose" outcome by default. However, this might place undue burdens on rulesheet authors to do what is in their best interests.

The return of a non-schema-compliant redacted document may, itself, compromise the initial existence of document content; by inference, an auditor could use the compliance failure to have knowledge of the un-redacted document's state, and this is a risk for rulesheet authors who use redact outcomes.

A Gatepoint shall allow a disclosure control request to have multiple document/user-ctx/client-ctx triplets. For example, it's not unreasonable to think that some business process orchestrator might attempt to do two document disclosures (one disclosure between "parties A" and a different disclosure between "parties B") as a single business transaction. So, it might want to set the conditions appropriate for doing so, and the Gatepoint would assist by utilizing a single, static execution context at the Gatepoint. See Multiple Request Coordination on the CDCL Forum, a place listing many CDCL topics of interest for the open source community.

What will happen if there is a collision based on rules written for different users where multiple users could be part of the incoming user context?
Two forces are applicable in such a situation. First, let's remember that the cascade is the outcome resolution mechanism that handles any outcome discrepancies. Second is the ability for the rule author to specify how to examine a multiple user context via possessive determiners. This is the means by which the author wishes to govern how a rule will be found applicable in multi-user context situations (and therefore have the rule's outcome made a candidate for the cascade). For example, a multi-user context might contain Jim and Pattabi. Jim is supposed to be granted access to - let' say - a TaxIDInfo node while Pattabi is not. But when both characters are the intended users of the info, the author is at liberty through use of possessive determiners either to make the outcome dependent on the qualities of just one member of the group or to make it dependent on the common qualities, if any, found throughout every member of the group.

The Gatepoint assumes the directory will ensure unique appearances of any given matching stakeholder within the collection it is to return. The risk that must be assumed by stakeholders is that a party may have an entry in multiple stakeholder directories with rulesheet URLs resolving to independent rulesheets. Each of these independent rulesheets may have contradictory policy defined within, and the combination of these rulesheets may have been neither intended nor contemplated by the stakeholder. It is recommended (not required), therefore, that stakeholders refrain from managing its disclosure policy across multiple rulesheet repositories and refrain from establishing entries across multiple stakeholder directories (except for failover purposes). Two or more stakeholder URIs (aka aliases) can never be considered by a Gatepoint to be for a single stakeholder entity, even if in reality a single entity were indeed identified by multiple stakeholder URIs.

The collection of rulesheet repositories returned by a stakeholder directory for each given stakeholder is assumed to be pre-sorted by the directory in order of priority. For each alias that has more than one stakeholder directory giving a response, the alias's collection of Rulesheet Repository URIs shall be merged from all responses so that the resulting collection is in the "merge order" that matches the Stakeholder Directories priority order and has no more than one entry per Rulesheet Repository. For example, if a Gatepoint has its Stakeholder Directories prioritized as directoryA (priority number 1) and directoryB (priority number 2), and if stakeholder known by alias "AnchorsAweighStakeholder" is given a Rulesheet Repository list from directoryA of "repository5, repository3" and from directoryB of "repository1, repository3, repository5", then the collection of Rulesheet Repository URIs to manage shall be "repository5, repository3, repository1".

Why does the execution context contain document sender's information? Should this information not instead be found within the client context?
The execution context contains sender's information because it's stuff that is taken off the wire. This sender info would likely be gleaned from the X.509 certificate supplied in the application transport layer, which would have no involvement in the CDCL messaging layer where the client context would be found.

It is possible that the logging of the decided rulesheet deck could compromise the initial existence of document content if the rules cite any document dependencies; by inference, an auditor could use the decided rulesheet deck and its withered and un-withered rules and thus have knowledge of the un-redacted document's state, and this is a risk for rulesheet authors who use redact-and-deny outcomes.