Metadata Requirements for Evidence

by
David Bearman, Archives & Museum Informatics
and
Ken Sochats, University of Pittsburgh, School of Information Science

Introduction: Towards A Reference Model for Business Acceptable Communications
Managers in application domains from commerce to health care, and from research and development to manufacturing, are seeking to define standards for data interchange adequate for their business purposes. The literature is replete with discussions of how to enable end-to-end electronic business interaction, how to support the requirements of electronic patient records or electronic laboratory notebooks, and how to implement the documentation demanded by CALS or ISO-9000.(1)
At the same time, managers of existing information networks and technical personnel charged with planning the National Information Infrastructure of the future, are encountering the requirements to identify, control access, manage software dependencies, represent the business meaning, and document the use of data, in these vast, distributed, heterogeneous, computing environments.(2) Many observers feel that unless we can satisfy requirements for "integrity", "authenticity", "reliability" and "archiving" of digital information, the National and Global Information Infrastructures will never be able to support serious work.(3)

The professions traditionally concerned with evidence and records have not ignored these emerging requirements.(4) At the University of Pittsburgh School of Library and Information Science, faculty and students engaged in a research project funded by the National Historical Publications and Records Commission have been examining the "Functional Requirements for Recordkeeping" as defined in a broad range of sources from law, regulation and best practices. From this "literary warrant" they have derived a specification of the attributes of "recordness" or evidentiality.(5)

The specification defines thirteen properties which are identified in law, regulation and best practices throughout the society as the fundamental properties of records.(6) These characteristics can be formally expressed as "production rules" or logical statements of simple observable attributes.(7) One problem associated with deriving a set of requirements from the written prose of the literature is that the specifications are often ambiguous, imprecise and subject to a high degree of interpretation. The research group elected to represent the specification of each of the requirements formally as a set of production rules. The production rule formalism was used during the process of developing a requirement in addition to being used as a representation mechanism for the requirement. This helped to guarantee that the requirement specifications were as explicit and unambiguous as possible. It also allowed the specifications to be logically refined such that the component statements of the specifications were observable states or properties. These observables provided the foundation for the identification of a specific set of metadata which, when present, satisfies the informational needs of the specification. If this metadata is inextricably linked to, and retained with, the data associated with each business transaction it guarantees that the data object will be usable over time, be accessible only under the terms and conditions established by its creator, and have properties required to be fully trustworthy for purposes of executing business.(8) Additional metadata retained in system and organizational accountability documentation assures full evidentiality.

In order to facilitate implementation of environments in which electronic evidence can be created, the project has taken its findings one step further and proposed a "Reference Model" for "Business Acceptable Communications (BAC)". The metadata requirements for evidentiality or "recordness" are necessary components of the reference model. One could imagine this as a scheme for addressing electronic envelopes containing business communications that would ensure that the envelope could be opened by different computers and software in the future and its contents would still be completely understandable. Because it has been empirically found to correspond closely with the metadata specified in or required by strategies adopted by a range of discrete standardization efforts underway in a variety of application niches, the reference model appears to have relevance to, and value for, the process of defining standards for any type of BAC.(9)

The need for such a standard is widespread. Not only would it make communications received over networks trustworthy for the purposes of conducting business, and help to ensure accountability and protect organizations against the risks of loss of proof of their past behavior, it would greatly simply:

the management of huge volumes of communications from heterogeneous hosts,
the proper retention and disposition of records,
auditing the use of records for business, and
the appropriate management of private, secure, proprietary and business confidential data.

A side effect of such a recordkeeping standard is that it will enhance the business value of the data that it preserves. These business benefits include:

providing data for market and other research.
documenting decisions, policies, events, etc.
documenting R&D and other business related processes.

To understand what data is necessary for such communications, we must begin by examining the nature of electronic evidence (or the essential properties of records).

Records are at one and the same time the carriers, products and documentation, of business transactions. Transactions (trans-actions) by definition are actions communicated from one person to another, from a person to a store of information (such as a filing cabinet or computer database) and thereby available to another person at a later time, or communications from a store of information to a person or another computer.(10) Because such trans-actions must leave the mind, computer memory, or software process in which they are created (or must be used, "over-the-shoulder" as it were, by a person with access to the same computer memory), the transaction record must be conveyed across a software layer, and typically across a number of hardware devices.

Not all data that has been is communicated across software and hardware layers is a record. In fact, most information created by and managed in information systems, is not a record because it lacks the properties of evidence. Records oriented professionals within organizations, such as senior management, legal counsel, auditors, Freedom of Information and Privacy officials, and archivists all require records, and not just information, but creators of the records frequently only need continued access to the information to support their work. Therefore, application environments that support the ongoing work of the organization frequently, or even usually, do not satisfy the requirements for creating evidence. In this paper we subsequently distinguish between the terms "records" and "data", using records exclusively when we mean information that provides evidence of a transaction.

The Functional Requirements for Evidence within Recordkeeping

Any organization that wants to use electronic documentation as evidence in the future will need to satisfy the requirements of evidence in the normal course of conducting its business. It has been difficult to do so in the computer based communications environments we have implemented in the past because applications software sold by third parties has not met these requirements. Information systems are generally designed to hold timely, non-redundant and manipulable information, while recordkeeping systems store time bound, inviolable and redundant records. Few, if any, in-house information managers have been able to devote the energy to rigorous definition of the distinct requirements for recordkeeping or, if they had, would be able to envision how to satisfy these throughout all systems. Without such explicit and testable specifications, computing application and electronic communications systems have failed to satisfy the requirements for recordkeeping and are, therefore, a growing liability to companies even while they are contributing directly to day-to-day corporate effectiveness.

The University of Pittsburgh research project identified hundreds of sources in law, regulation, best practice guidelines, and general societal discourse which relate to the properties of evidence. From these it is clear that if records that are critical to the organization for long term accountability and to protect its rights are not created by transactions, they cannot be created after the fact from data in information systems. Information captured in the process of communication will only be evidence if the content, structure and context metadata required to satisfy the functional requirements for recordkeeping is captured, maintained and usable. The requirements of recordkeeping are corporate requirements, not those of any given business function or application, and are therefore present for any communications. They are the foundation of good business practices and are essential elements in reducing the risks of increased liabilities and decreased opportunities that accompany poor recordkeeping practices.

The functional requirements in table 1 below are derived from the many sources we consulted which defined what constitutes evidence. In addition to interviewing experts, we have systematically reviewed hundreds of sources considered authoritative by lawyers, auditors, information technology specialists and archivists and records managers. In these sources we have identified statements that pertain explicitly to the characteristics or attributes of evidence or records. Analysis of these authoritative sources revealed twenty functional requirements for evidence which fell into three broad categories. In retrospect the small number of requirements should not have surprised us, since they reflect a relatively tight social consensus about what it means for written testimony about an act in the past to be considered trustworthy in the future.

Table 1: FUNCTIONAL REQUIREMENTS FOR EVIDENCE IN RECORDKEEPING

Conscientious Organization
	Compliant (1)
Accountable Recordkeeping System
	Responsible
		Assigned (2)
		Documented (3)
	Implemented (4)
	Consistent (5)
Functional Records
	Comprehensive (6)
	Identifiable (7)
	Complete
		Accurate (8)
		Understandable (9)
		Meaningful (10)
	Authorized (11)
	Preserved
		Inviolate (12)
		Coherent (13)
		Auditable (14)
	Removable (15)
	Exportable (16)
	Accessible
		Available (17)
		Renderable (18)
		Evidential (19)
	Redactable (20)

The full requirement and specification is reproduced in Appendix 1.

Over the course of the past two years, this prose requirements statement has been subjected to rigorous analysis as we expressed it in a formal representation. This version, which we call the "Production Rules Representation of the Functional Requirements for Evidence" has forced us to operationalize a number of concepts that are not very precise in the literary warrant and were not specific enough in the prose specification to ensure that a computer system would be able to validate their presence or absence. Care has been taken in the development of the specification of the requirements to include only those elements that are required to delineate the requirement. It is very easy to fall into the trap of overspecification and include statements that would pre-define some level of implementation. For this reason, some of the requirements appear to to be very abstract. These higher level specifications need to be further defined by the implementer to indentify specific system design artifacts. We have made an effort to ensure that only observable data or calculations from observables reside at the leaf nodes of the production rules. The observables consist of metadata and a very limited predicate vocabulary has been used to simplify system requirements for auditing the production rules. The production rules representation is reproduced in Appendix 2.

Metadata Specifications for Evidence

Ideally a metadata specification for evidence could be completely deducible from the Production Rules version of the Functional Requirements for Evidence. We believe we have achieved such a specification and that it serves to identify the data required for such purposes as are proposed in the draft NIST standard for a "Record Description Record" or the Research Library Group/Commission on Preservation and Access Task Force on Archiving of Digital Information. We also believe this specification satisfies the needs for entries in electronic laboratory notebooks, electronic patient records and multivalent electronic commerce.

The functional requirements for recordkeeping dictate the creation of records that are comprehensive, identifiable (bounded), complete (containing content, structure and context), and authorized. These four properties are defined by the requirements in sufficient detail to permit us to specify what metadata items would need to describe them in order to audit these properties. This descriptive metadata cannot be separated from them or changed after the record has been created. Several additional requirements define how the data must be maintained and ultimately how it and other metadata can be used when the record is accessed in the future. The metadata created with the record must allow the record to be preserved over time and ensure that it will continue to be usable long after the individuals, computer systems and even information standards under which it was created have ceased to be. The metadata required to ensure that functional requirements are satisfied must be captured by the overall system through which business is conducted, which includes personnel, policy, hardware and software.

We envisage transactions taking place as metadata encapsulated objects, although records might not be physically stored in this manner. When transmitted, the contents of the transaction would be preceded by information identifying the record, the terms for access, the way to open and read it, and the business meaning of the communication much as a train of baggage cars is preceded by an engine. Metadata encapsulated objects may contain other metadata encapsulated objects, because records frequently consist of other records brought together under a new "cover", as when correspondence, reports and results of database projections are forwarded to a management committee for decision. They may also contain the information content of previous records which have been copied into an information system where the creator of this transaction has had the opportunity to modify them; in this case they may contain a citation to a previous record but would not contain the encapsulated version of the previous record.

Ideally the contents of all data objects that we want to communicate would be "interoperable" and encoded in standard formats to give them a degree of software independence (the actual degree depends on how long any given "standard" can be expected to remain a standard, which in archival terms is not very long). In any event, many data objects we create today will not be standard and the metadata with which we label them must flag the dependencies of the data (including their dependency on standards) so that a future review of record headers can locate sources of brittleness and segregate records requiring migration to new software formats before they become unreadable.(11)

Our concept of evidence makes it important to know when records were used and how, in what ways they were filed, classified and restricted in the past, and, if they have been destroyed under proper disposition authority, when and by whom that act took place. It is also important to us to know what redacted versions of records were released over time. Transactional data reflecting the history of its use (events in its life subsequent to creation), provides the documentation traditionally associated with archival description, but instead of such data residing only at aggregate levels, it is possible to define electronic records metadata structures that enable us to search for specific records based on information about the instance or concrete business transaction which generated them.

In addition to ensuring that the data we capture is a record, and can serve as evidence, metadata should be defined so that it makes data objects communicated across software and hardware layers (and therefore any communications over a network):

self-documenting
self-authenticating
self-redacting
self-migrating
self-disposing

These properties, while important for simplifying the management of records (especially in an inter-networked environment in which hundreds of millions of records are created daily), can be made to be direct consequences of keeping records if attention is paid to the structuring of the metadata that makes records evidence.

Furthermore, a system for metadata management which has appropriate modularity and content standardization can support formally auditing the business system which generated the information object transactions and the software, hardware, procedures and policies surrounding a system to determine where they contribute, or fail to contribute, to the creation, maintenance and use of evidence. While no system of management can be self-auditing, a communications system built to ensure that appropriate metadata is captured for evidence can support a level of management accountability that it was never previously possible to implement or enforce.

We recognize however that a specification based solely on necessary and sufficient conditions for recordness does not address certain other desirable functionalities of a business communications environment because evidentiality is not the only requirement for such a system. Among the other requirements we have seen being addressed in the effort to develop widely applicable models for network metadata management are:

support for a system of access and use rights management
support for networked information discovery and retrieval
support for registration of intellectual property

Therefore, we have proposed a draft Reference Model for Business Acceptable Communications that attempts to specifically address these additional requirements as part of a dialog that must take place between advocates of mechanisms to support these different fundamental purposes through an overall structure for metadata encapsulated objects. (12)

The Proposed Reference Model

The metadata elements needed to execute the production rules expression of the University of Pittsburgh Functional Requirements for Evidence possess no intrinsic order. Criteria for ordering these elements must be derived from scenarios of their anticipated use within an overall system of recordkeeping.

The initial clustering of these data elements to achieve functional modularity, led researchers to organize them in six layers which they labelled:

  1. Registration
  2. Terms & Conditions
  3. Structure
  4. Context
  5. Content
  6. History of Use

The addition of the requirements noted earlier from other object standardization efforts designed to provide support for a system of access and use rights management, for networked information discovery and retrieval and for registration of intellectual property suggests a need to add substantially to the properties identified as necessary for assurance of evidence, in layers devoted to identification and terms and conditions. In particular it suggests a need for "resolving" agents or services for dealing with terms and conditions of access or use and managing information discovery and retrieval for the aggregate resource of which a given record forms a part. This required us to introduce a resource descriptor element (not present in the December 1994 draft) that points to a compilation of which the record might be a part and through which it would be accessed.

It was evident in the discussion of NIDR the Spring 1995 CNI meeting (13) that the kinds of relevance ranking and intellectual content representation for information retrieval functions being considered essential to the networked information discovery and retrieval requirement operate at a level of compilations, repositories or services for records of business transactions and that these publications, services, or repositories will have quite different descriptive data associated with them. Indeed this is consistent with the focus of the Library of Congress Electronic cataloging meeting in October 1994 and the recent announcement by OCLC of its intention to catalog Internet resources.

It was also clear from the discussion of network management issues associated with identification of intellectual property (14) that much more attention needs to be given to naming of objects in this domain than has been necessary for the more limited purposes of unique identification of evidence. The simplifying reality that no change can take place in a record and that any interaction with a record, even looking at it or forwarding it, creates a new record, ignores the social dependence of the concept of original creation at the foundation of intellectual property. This (summer 1995) draft of the Reference Model, therefore, attempts to place the specific and limited requirements for metadata of evidence in the context of the other tasks that have been imputed to such descriptors. It does so by renaming the layer previously labeled "Registration" by the new name "Handle" indicating both the requirement for more robust methods of identification than are necessary to evidence and the need for documentation of the contents, or pointers to documentation of the contents, to facilitate discovery and retrieval. No effort is made here to elaborate on how these additional requirements could or should represent the information required to satisfy their further requirements, since this will best be done by the communities most concerned with that functionality. The clusters were described as "metadata/properties" reflecting the distaste expressed for the term metadata by spokesmen for these communities at the recent CNI meeting.

Rather than pursue these matters any further, this paper explores the metadata content required by the Functional Requirements for Recordkeeping which dictate mandatory and optional data elements within defined data clusters at each of the six layers of the metadata model. In certain areas, particularly regarding structural dependencies of data objects representing non-textual content, we have specified a potentially extensible set of modality specific data elements. This reflects the recognition that we can never completely specify the data that will be required to document the structural dependencies of future data types.

The clusters are part of the reference model and must always occur, but the optional metadata elements may or may not be present based on characteristics of the application. The metadata content directly related to satisfying requirements for evidence is mandatory. Hence evidence, required for the conduct of business and for accountability, is ensured by a "Metadata Encapsulated Object" conforming to the reference model for "Business Acceptable Communications". The metadata content which contributes to recordkeeping, or management of records, but is not essential to evidence, is optional. Metadata content useful for specific domains or business functions may be defined by those domains as mandatory for business in that domain or optional within the domain. All such metadata would be optional for anyone outside the domain. The layers and clusters of the reference model are shown in table 3, below.

Table 2: Outline of the Reference Model for Business Acceptable Communications, showing layers and data clusters

Handle Layer
Registration Metadata/Properties
Record Identifier
Information Discovery and Retrieval
Terms & Conditions Layer
Rights Status Metadata
Access Metadata
Use Metadata
Retention Metadata
Structural Layer
File Identification
File Encoding Metadata
File Rendering Metadata
Record Rendering Metadata
Content Structure Metadata
Source Metadata
Contextual Layer
Transaction Context
Responsibility
Business Function
Content Layer
Content-Description
Use History Layer

The purpose of specifying metadata as part of this model is to ensure recordness. When the metadata needed by a specialized domain has an essential application related purpose but is not required for recordness, it is preferable to satisfy this application purpose by definition of a standard interchange format. The interchange standard can be cited in the metadata for Business Acceptable Communications and the data content can then be opened by knowledge of the requirements and structures of the standard without further elaboration. This has the dual advantage of efficiency of definition and ease of migratability as all records corresponding to a specified protocol can be re-presented in a new standard if the old format is superseded.

Implementation Issues

How can the Reference Model be implemented?

We imagine two possible scenarios for the long term: in the first, each organization implements its own methods for capturing metadata and encapsulating objects, while in the second the requirement is imposed on software developers and networks as a consequence of standards adopted by their clients. In either case, any implementation must acknowledge that the level of analysis and documentation at which computer mediated communications are evidence is that of the individual business transaction. This means that we are not concerned with capturing metadata at higher levels, such as that of the resource or lower levels such as that of a single document, data item/file, or computer system transaction. It also means that the data and metadata we capture must always be related back to a specific transaction whether corporate or personal. It also requires that we be able to incorporate multiple physical files within a single record, and any number of prior records within a new record.

Practically, one might view the implementation of metadata recording and management as a continuum. At one end, all of the metadata is encapsulated, stored and transported with the record. The record in this case is physically self explanatory. The problem with this is that a high amount of overhead is associated with every action taken with a record.

The opposite approach would be to store none of the metada with the record. The metadata would be stored in a data base on a kind of archival server and each record would contain a pointer to its metadata. While this approach avoids the overhead associated with communicating records, it requires more sophisticated management and is susceptible to problems associated with corrupt pointers decoupling a record from its metadata.

The actual implementation adopted by an organization will probably lie somewhere between these two extremes. In the inter-organizational exchange of records, the first model must be used. For intra-organizational use it is likely that an intermediate approach is used. Some metadata will be encapsulated and carried with the record, particularly metadata that will be used by subsequent processes or procedures. Other metadata such as citation of business rules, standards and legal authority will be stored in a data base and referenced by pointers encapsulated with the record.

Very few existing information systems are designed to execute business rules or document business processes. Therefore they will currently typically create a new computer record for a software transaction which involves no business transaction but changes the data in the system (such as background saving of a file on which I am working) but create no record of common business transactions which do not change data in the system but nevertheless require evidence (such as querying a database for decision support). Implementation will need to impose the concept of business transactions, rather than that of systems transactions, on their environment. In addition, they will need to interface with business process models to capture appropriate identifying data.

It would be possible to design application software that did recognize business transactions boundaries but the differences between organizations would likely make implementing such software complex and maintaining its knowledge of local business processes costly. Nevertheless, certain parameterized features of application systems can already be employed to ensure the satisfaction of some of the functional requirements for recordkeeping. For example, word processing systems can support corporate record creating requirements if the users of such systems exclusively employ style sheets defined in such a way as to distinguish between transactions based on their process location and business purposes. Geographic information systems often have reporting features that allow the user to create output files of all the relevant layers of data incorporated into a query response.

At least in the short term, however, the need to create electronic records with metadata conforming to the reference model will require systems implementers (possibly in cooperation with users) to construct traps outside of applications software in which they can capture the metadata required for evidential transactions. Assuming no changes were made in applications systems, implementers could capture some of the requisite data items in the user interface layer. For example, by enhancing the information captured when users sign on to the system so that authorizations, logical business location, and types of transactions allowed to an individual are brought into the system memory for assignment to transactions as needed. They could then provide icons representing the business tasks in which a user may engage (based on process data models and business rules of the organization) rather than icons representing software applications.

Indeed the user interface could easily be designed so that users never open software applications directly, but rather they engage special 'clients' which open and configure the application software in a way that utilizes its style sheets, macros, self-documenting features, for the particular business function in which the user is engaged. This way only clients representing specific business processes can be admitted and the rules governing such uses can be imposed on the system. These clients could then provide the necessary metadata to identify the business transaction when a record of it is created.

Alternatively, or in conjunction with user interface interventions, implementers could develop an "evidence" service in the Application Platform Interface to capture transactions addressed in particular ways and assign them metadata attributes required to ensure their authenticity and survival. Such a service could be customized with the rules of a particular business so as to identify transactions of specific types and adhere to the appropriate retention periods, access and use rights, and filing rules.

Finally, information systems staff could identify components in the systems architecture, from storage devices serving as corporate file rooms to telecommunication switches linking to other LAN's, WAN's or systems, which assigned yet other metadata attributes to records when they were communicated. Thus records filed in certain places and under particular headings would be given metadata attributes upon arrival at the filing server application. Records deemed to be lacking appropriate metadata to leave an organizations' boundaries, or even to pass outside the LAN serving one work group, could be assigned those attributes or returned to sender to provide the necessary descriptors. In conjunction with corporate policy and procedure individuals could participate in completing document routing and description templates for all transactions, or be required to default to pre-set templates for a series of identical transactions.

One of the questions that must be answered by research is whether some metadata elements are easier to capture in certain layers of the architecture than others. We believe that certain of the metadata required for recordness, specifically that pertaining to compliant organizations and accountable systems, will best be documented by means other than transaction level metadata capture because of the inherent inefficiency of capture of such systemic proofs at a transaction level and the difficulty of a system ascertaining the state of organizational compliance or of its own logical correctness.

In sum, it appears that through a combination of policy, implementation and design, and standards, an organization can ensure that only "business acceptable communications" are generated from its information systems, maintained in its recordkeeping systems, and made available through to its information retrieval systems.

Implementers will recognize that when a user requests a record, a copy of that record is passed to the information retrieval sub-system, but if the user opens the record contents under the control of another application, the contents are incorporated within the application in which he or she is working and will become part of the contents of a new transaction. If the user intends only to append or forward a record, this does not involve opening the record and may, in some environments, be accomplished by pointing to it while in others it will involve incorporating an encapsulated version of the record within the current transaction.

When users generate a communication in this environment, a "Business Acceptable Communication", encapsulated by metadata necessary to ensure its integrity and longevity, would presumably be split off from the information system stream and sent to a recordkeeping system or warehouse where it would be kept intact. Another version of the transaction would normally remain within the application environment where it would be available for further manipulation, update and editing, or would do the jobs of updating databases, launching procedures or generating reports in that environment. From the perspective of the business, all data in information systems can be treated as a convenience copy, to be kept as long as required for on-going business purposes and to be altered as desired to increase efficiency.

When needed, records from recordkeeping systems may be copied to information systems which need require their content, but the record itself will never be deleted from, or changed within, the recordkeeping system except with specific records disposition authority. Recordkeeping systems will store and provide access to metadata encapsulated objects. Sometimes the purposes of such access will be to make use of the data content of records in subsequent business transactions which create their own records. These transactions will take place through application systems, which like most information systems, are not designed to make or keep records.

Sometimes the purpose of access is simply to view the records outside of the business purposes of the creating organization. Traditionally such reference uses of archives have not created new records, although logically they are the record of the use of the archives which is itself a function of the organization. In an evidential environment, viewing a record in conjunction with a business transaction creates a new record for the recordkeeping system and leaves a transaction trail in the original record.

There is no specific computing model that must be employed in the maintenance of recordkeeping systems although it may seem that the discussion of communicated transactions to this point has used the terminology of object orientation. Once these transactions are communicated (typically by a serial process but always in such a way as to produce a serial record on the receiving end, and hence as an encapsulated object), they can be treated as if the metadata was structured database information in a standard relational, hierarchical or flat file database management system. A simple method for doing this securely would be to store a hash of the contents in the metadata record and a hash of the metadata record with the contents.

Logically, metadata content must either follow an external standard or contain its own declarations (e.g.,. meta-meta-data). It would be greatly more efficient for the society at large if instead of requiring individual organizations to implement systems in ways that supported the requirements for evidence, a standard for communications could be adopted that placed the burden for creating metadata encapsulated objects on the application software and network software developers.

The definition of a standard for Business Acceptable Communications presumes the existence of software and services that can use the metadata which must be associated with an object. Specific types of services, for example, are envisioned to follow up on address information contained in Terms and Conditions metadata layer to translate it into concrete prices, permissions, and data views. The presumption is that Terms and Conditions metadata will be expressed in abstract categorical terms, not in concrete terms so that it can be processed correctly as the situation variables (inflation, changes in permissions based on elapse of time since the event, re-engineered business processes, etc.) change. The model for implementing control based on Terms and Conditions is that a "resolver" will be put in place by the owner/creator of a record that is designed to operate against the categorically expressed terms and conditions data in a dynamic manner. This allows the terms and conditions to be calculated for the moment, based on the user, and sensitive to the conditions of use. It is envisioned that these applications will be maintained by those interested in restricting rights and their functionality can be ensured in part by establishing a mechanism that allows users access to the records if no restrictive permissions manager is operating.

Conclusions:

The Reference Model for Business Acceptable Communications discussed here and proposed in the accompanying formal presentation, builds upon and extends work underway in standards committees in many areas. It attempts to provide a generic structure and theoretical grounding for work items proposing metadata encapsulated objects as a tactic for interchange. While it will doubtless be refined before a fully acceptable reference model is adopted, it is our hope that the formulation of this model will place the functional requirements for evidence at the heart of any discussion of what makes a business communication acceptable.

The Reference Model acknowledges, but does not solve, some fundamental problems in the distributed network environment. For example, a major concern is how the identifier uniquely assigned by one domain is guaranteed to be unique when the object is incorporated into a universe in which identifiers assigned by other domains are present. Obviously uniqueness can be ensured by combining a unique identifier within a domain with a unique identifier for the domain. The problematic aspect of this is that domain identifiers need to be truly unique to a person or organization but we want to define a system in which the domain identifier does not have to carry too much intelligence and yet can be meaningfully related to its successor and precursor identifiers. Also, it must be possible to issue domain identifiers without serious overhead. Billions of unique business transactions will flow through worldwide communications systems within and between organizations and between individuals and/or computers, daily. It must be possible to uniquely identify all these. Mechanisms for unique identification of computing systems and sources of communications are being worked out for such open domains as the Internet (by the IETF) but also need to be developed within specific corporate communications contexts.

It may also be necessary to search for records that satisfy criteria based on their content, even though this is not essentially an archival requirement. The Reference Model is designed to hold metadata that can satisfy such requirements but it is not currently populated by metadata designed to support NIDR. Recent work on this area by the Coalition for Networked Information and by the U.S. library community may define structures within this cluster although the problems of defining what records are "about", rather than what they are "of" has been a vexing one since the advent of archives. The volume of records that are created has always defied cataloging individual records and the content description of records, which are not created to be about their content but rather as a consequence of business transactions, tends in any case to be either misleading or inadequate. Substantial practical research will continue to be required to determine how best to provide access to records of specific kinds or records documenting particular types of transactions.

Another concern is how as a practical matter, to best monitor metadata values in order to make the necessary software migrations at appropriate times in the life of records. Not only do we need to make sure to migrate the records to new structures before the old ones are no longer supported, we need to make good decisions about logical mappings in order not to introduce too much noise with every migration and ultimately lose the message in digital copying as surely as with did with multi-generational copying of analog messages. Needless to say, some people also worry that these software migrations, if they continue to need to be done as often as once a decade or more, will become too costly to support and that as a consequence some records of value will be abandoned. Within the environment in which recordkeeping takes place, stringent approaches to configuration management will be essential to ensure that record documentation retains critical usable metadata.

At the same time, it is noteworthy that the proposed approach to archiving and to maintenance of business acceptable communications does not require us to include information about physical formats and media within the record metadata. Rather the environment in which records are kept will need to be one in which managers move data from one medium to another as required to assure backup and preservation of the data. It is presumed that media that are currently supported will always be used and that data transfer to current media will take place in the normal course of operations. Documentation of the data processing center backup and recovery functions is not part of the model because the model presumes that day-to-day data management will be responsible.

Footnotes/Citations

1) The Functional Requirements for Recordkeeping project has compiled a database of "warrant" for the requirements we have defined. The most up-to-date version of the requirements, specifications, production rules, metadata standards, literary warrant and research papers on the variables in electronic recordkeeping in organizations are maintained on the project WWW server at:

http://www2.lis.pitt.edu/~nhprc

Examples of the kinds of sources from which "literary warrant" has been drawn include:
  • Code of Federal Regulations, 36 CFR PART 1234 -- Electronic Records Management. Subpart C Standards for the Creation, Use, Preservation, and Disposition of Electronic Records
  • Electronic Industry Data Exchange. ASC 12 Convention : Version 3 : Electronic Industry Data Guidelines. Washington Publishing Co., 1994
  • Federal Rules of Evidence. 1990
  • Guttman, B.Computer security considerations in Federal procurements.National Institute of Standards and Technology, NIST Pub 800-4
  • Institute of Internal Auditors Research Foundation. Systems Auditability and Control Report Researched by Price Waterhouse, 1991
  • IS0 9001Quality systems - Model for quality assurance in design/development, production, installation and servicing, 1987
  • FIRMR : federal information resources management regulation. Washington, DC : U.S. General Services Administration, Office of Information Resources Management, 1990
  • McCormick on Evidence. 4th ed. by John William Strong, general editor. (St.Paul, Minn: West Pub. Co, 1992)
  • Miller, Larry P., GAAS guide: a comprehensive restatement of all current promulgated generally accepted accounting principles. San Diego : Harcourt Brace Professional Pub.; 1994

2) for an example, see: IEEE Mass Storage Systems Standards Technical Committee Metadata Project, Second Meeting on Metadata for the Administration and Access of Stored Information, Austin Texas February 17-18, 1994. Documents discussed at this meeting included:

  • "The Intelligent Archive" (UCRL-TB-115079-6 Lawrence Livermore Laboratory, Carol Hunter Project Manager)
  • "Whitepaper on Data Management", Robyne Sumpter, Lawrence Livermore Laboratory Feb.10, 1994
  • "A Metadata Capability Supporting the Hierarchical Storage and Access of Large Abstract Data Entities", J.C.Almond and Rekha Singhal, University of Texas CHPC

3) for example, see:

  • Clifford Lynch, "The Integrity of Digital Information: Mechanics and Definitional Issues", Journal of the American Society for Information Science, vol.45#10, December 1994 p.737-744;
  • Peter Graham, "Intellectual Preservation in the Electronic Environment", Proceedings, Library Collections and Technical Services 1992 pp.18-32 (Chicago, ALA, 1992);
  • Henry Perritt, "Public Information in the National Information Infrastructure", Report to the Regulatory Information Service Center, General Services Administration and to the Administrator, Office of Information and Regulatory Affairs, Office of Management and Budget, 5/20/94
  • Other activities, currently underway, to which the reference model seems relevant are the Research Libraries Group and Commission on Preservation and Access Joint Task Force on Archiving Digital Documents, the Coalition for Networked Information sponsored working group on Networked Information Discovery and Retrieval, and the National Institute of Standards proposed Federal Information Processing Standard for "Record Description Records".
4) New York State Archives & Records Administration, Guidelines for the Legal Acceptance of Public Records in an Emerging Electronic Environment (Albany, Dept.of Education, 1994) 35pp.

5) David Bearman, Electronic Evidence: Strategies for Managing Records in Contemporary Organizations (Pittsburgh, Archives & Museum Informatics, 1994)

6) NHPRC grant #93-030, "Variables in the Satisfaction of Requirements for Electronic Records Management"

7) David Bearman and Ken Sochats, "Formalizing Functional Requirements for Recordkeeping" unpublished draft paper included in University of Pittsburgh Recordkeeping Functional Requirements Project: Reports and Working Papers (LIS055/LS94001) September 1994

8) David Bearman, Functional Requirements for Recordkeeping: Metadata Specification (Unpublished Draft, 2/21/94)

9) References to:

  • Datastream for Folder Interchange (ISO 161/17-WG6 NWI)
  • Electronic Document Interchange (EDI) standards, including EDIFACT
  • ATM protocols
  • Spatial Data Interchange Format (SDIF) and DIGEST

10) David Bearman, "Electronic Records Management Guidelines: A Manual for Development and Implementation" in United Nations, Administrative Coordinating Committee for Information Systems, Management of Electronic Records: Issues and Guidelines (New York, UN, 1990) reprinted in Electronic Evidence, op.cit.fn5

11) There is a consensus that "preservation" in electronic environment means refreshing. For an early, but still sound, articulation of the reasons, see: Margaret Hedstrom, "Optical Disks: Are Archivists Repeating the Mistakes of the Past?", Archives & Museum Informatics Newsletter, vol.2 (1988) p.52; also her "Electronic Archives: Integrity and Access in the Networked Environment" in Stephanie Kenna and Seamus Ross, eds., Networking in the Humanities (London, Bowker/Saur, 1995) p.77-95

12) We believe this model takes into account requirements such as those implied by the plans of the German Government for its move from Bonn to Berlin over the next decade. In that planning process they it has become obvious that much of the communication between governmental departments will take place electronically between individuals with little if any face to face contact who will require secure and authenticated communications and the ability to make and keep records. In defining an architecture to support these requirements, the PoliTeam, established for this purpose, defined an architecture that could take advantage of the functional requirements for recordkeeping, but they did not identify those requirements. In reforms in the Dutch civil service over the past several years, earlier opening of government records was one objective, and the studies undertaken to support this goal revealed a need to begin to plan for electronic communications systems. In their reforms, the Dutch government has begun to take advantage of the functional requirements for recordkeeping and is encountering many of the same issues of metadata management being addressed by this paper. The Canadian government has been defining "Guidelines on the Management of Electronic Records in the Electronic Work Environment" as a component of the "Electronic Work Environment (EWE) Vision" being promulgated by the Canadian Treasury Board. Popularizations of the implications of these activities have been published recently.by Terry Cook in "It's 10 O'Clock, Do You Know Where Your Data Are", Technology Review, January 1995; also his, "Electronic Records, PaperMinds: The Revolution in information Management and Archives in the Post-Custodial and Post-Modernist Era, Archives & Manuscripts, vol.22#2, p.300-328

13) Clifford Lynch presentation (unpublished)at the Coalition for Networked Information Meeting, Spring 1995

14) Bill Arms presentation (unpublished) at the Coalition for Networked Information meeting, Spring 1995

APPENDIX 1. FUNCTIONAL REQUIREMENTS FOR EVIDENCE IN RECORDKEEPING
APPENDIX 2: PRODUCTION RULE REPRESENTATION OF REQUIREMENTS FOR EVIDENCE
APPENDIX 3. REFERENCE MODEL FOR BUSINESS ACCEPTABLE COMMUNICATIONS