TRAC Internal Review - January 2013

A fairly consistent set of standards have emerged in from the digital repository and archives fields, establishing best practices for a "trusted" digital repository. These standards are used to audit a digital repository and/or give it a seal of approval; deeming it a repository in which the integrity and authenticity of the digital files within can be trusted. As the InformaCam system moves from a research prototype, into a next phase of development, it is useful to evaluate the InformaCam system against these practices (serving somewhat as a first step of an internal audit), as a means of identifying strengths, weaknesses, and areas for further development of the server side of the InformaCam system.

It should be noted, that the concept of "trusted" that is applied in these standards has many overlaps with the risks and assessments that must also be considered within the digital security field that InformaCam more aptly belongs, but they are not equivalent. In certain areas, InformaCam's interpretation of the guidelines and practices will likely be far more exacting than other institutions have implemented and will need alternative auditing criteria. It should also be noted that InformaCam System is still only a research implementation, so many of the standards are not applicable in the immediate. Instead this audit serves more as means to define future guideposts of development.

TRAC, a set of criteria and a checklist that have become the ISO 16363 standard, was chosen as the standard to evaluate InformaCam in this document, because of its broader applicability, and the possibility of an applicable external audit.

Jump to Final Conclusions at the end of this document to read the findings of this internal audit.

Background

There are a few initiatives/practices currently in place:
  • ISO 16363:2012 Space data and information transfer systems -- Audit and certification of trustworthy digital repositories
    This standard is a formal standardization of a cross-discipline, cross-institutional effort to define the characteristics of a trustworthy digital repository, that is referred to as TRAC (Trustworthy Repositories Audit and Certification)
  • Data Seal of Approval
    This is a set of 16 guidelines, created by Data Archiving and Networked Services, used to establish trusted data management of scientific research, and is overseen by an international board. A seal is awarded to a digital repository after a board review and approval of the 16 point assessment.
  • DRAMBORA
    This is a toolkit, put together by the U.K.'s Digital Curation Centre and DigitalPreservationEurope (DPE), as a way to conduct an internal self-audit of the integrity of a digital repository.

These practices/initiatives consider OAIS, developed by NASA, as the starting framework for the design and implementation of a digital repository (i.e., the organization of people, as well as the software applications used to support storage practices, and the terminology that is applied). However, the initiatives/standards listed above attempt to go beyond abstraction, and define actual practices that should be in place to be a "trusted" digital repository. There are a number of research-based digital repositories that implement these practices.

TRAC Overview

Trac defines a trusted digital repository as:

'a trusted, “long-term digital repository is a complex and interrelated system” (nestor 2006). However, more than just the “digital preservation system” drives the management of the digital materials. In determining trustworthiness, one must look at the entire system in which the digital information is managed...A trusted digital repository will understand threats to and risks within its systems...these potential threats include media failure, hardware failure, software failure, communication errors, failure of network services, media and hardware obsolescence, software obsolescence, operator error, natural disaster, external attack, internal attack, economic failure, and organizational failure"

The standards for certification of trusted digital repositories are broken into the following areas:
  • Organizational Infrastructure
    This portion of the criteria covers things such as financial stability, liabilities and legalities, ownership and governance of the digital repository system, staffing, and so on.
  • Digital Object Management
    This portion of the criteria is more about defining the user interactions and workflows and data architecture, such as how and when a file is accepted into the system, what are the metadata requirements around that, how access is maintained, and who may have access.
  • Technologies, Technical Infrastructure and Security
    This section identifies criteria for technologies determined to be acceptable for trusted digital repositories, and the security standards that should be in place.

Reference:

Organizational Infrastructure

Criteria Met? Current Application Proposed Next Steps Next Phase Priority
A1.1. Repository has a mission statement that reflects a commitment to the long-term
retention of, management of, and access to digital information.
? In statement of work? Guardian Project should require each client to define mission before an installation if the guardian maintains some level of responsibility Medium
A1.2. Repository has an appropriate, formal succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate or the governing or funding institution No Not met Option 1. Push responsibility onto Witness; Option 2. Roll into "social business plan" Urgent
A2.1. Repository has identified and established the duties that it needs to perform and has appointed staff with adequate skills and experience to fulfill these duties. No The focus (appropriately) has been on development of the software, not on servicing it A definition of the service, and then a clearly defined breakdown of responsibilities between organizations needs to be defined High
A2.2. Repository has the appropriate number of staff to support all functions and services. No See A2.1 See A2.1 Medium
A2.3. Repository has an active professional development program in place that provides staff with skills and expertise development opportunities. Yes Guardian Project develops with as-needed-consultant basis. Skills are pulled from various areas. current guardian developers are making themselves familiar with the code base Low
A3.1. Repository has defined its designated community(ies) and associated knowledge base(s) and has publicly accessible definitions and policies in place to dictate how its preservation service requirements will be met. Partially Tutorials, and How-to-steps have been written for software. Preservation specific knowledgebase has not been completed Finish documentation to include preservation definitions Low
A3.2. Repository has procedures and policies in place, and mechanisms for their review, update, and development as the repository grows and as technology and community practice evolve. Not met Not met Make part of SLA development, include development of technology maintenance Low
A3.3. Repository maintains written policies that specify the nature of any legal permissions required to preserve digital content over time, and repository can demonstrate that these permissions have been acquired when needed. No Not met Create a terms of agreement for users installing InformaCam system before they can use High
A3.4. Repository is committed to formal, periodic review and assessment to ensure responsiveness to technological developments and evolving requirements. Yes Steps to conduct internal audit; discussions and acknowledgement of need for external audit; phase reviews; direct contact with users with requirements gathering being conducted at beginning of next phase Medium
A3.5. Repository has policies and procedures to ensure that feedback from producers and users is sought and addressed over time. Met See A3.4 See A3.4 Medium
A3.6. Repository has a documented history of the changes to its operations, procedures, software, and hardware that, where
appropriate, is linked to relevant preservation strategies and describes potential effects on preserving digital content.
No Not met Make part of SLA development Medium
A3.7. Repository commits to transparency and accountability in all actions supporting the operation and management of the repository, especially those that affect the preservation of digital content over time. Partial As open source project, and as a grant reporting project, transparency has been maintained These need to be formally documented in relation to repository practices low
A3.8 Repository commits to defining, collecting, tracking, and providing, on demand, its information integrity measurements. No Not met Need to better define how the system will long actor + actions taken on the storage application; define how the system will routinely conduct checksum and bit corrosion checks; define better how the server side will maintain user identity to all actions High
A3.9 Repository commits to a regular schedule of self-assessment and certification and, if certified, commits to notifying certifying bodies of operational changes that will change or nullify its certification status. Not met With the project in research stage this is not currently a priority. Initial steps are being taken now Low
A4.1. Repository has short- and long-term business planning processes in place to sustain the repository over time. No Not met Discussions have begun.; but clear funding needs, goals and action plans are not yet established Urgent
A4.2. Repository has in place processes to review and adjust business plans at least annually. No Not met Make part of SLA development work Low
A4.3. Repository’s financial practices and procedures are transparent, compliant with relevant accounting standards and practices, and audited by third parties in accordance with territorial legal requirements. Yes All funding requirements are being met with the funding agencies. N/A Met
A4.4. Repository has ongoing commitment to analyze and report on risk, benefit, investment, and expenditure (including assets, licenses, and liabilities). Met See A4.3 See A4.3
A4.5. Repository commits to monitoring for and bridging gaps in funding. Not Met While meeting requirements and software development has been met, the funding to provide the service of a repository has not yet been fully evaluated and covered Make part of the SLA development; seek additional funders and partners (e.g., E.U., U.N., continue with IBA, etc.), or pay-for-service model instead of grant funding Urgent
A5.1 If repository manages, preserves, and/or provides access to digital materials on behalf of another organization, it has and maintains appropriate contracts or deposit agreements. No Not met Make part of SLA Development Urgent
A5.2 Repository contracts or deposit agreements must specify and transfer all necessary preservation rights, and those rights transferred must be documented. No Not met Make part of SLA development Urgent
A5.3 Repository has specified all appropriate aspects of acquisition, maintenance, access, and withdrawal in written agreements with depositors and other relevant parties. No Not met Make part of SLA development; delineate responsible parties Medium
A5.4 Repository tracks and manages intellectual property rights and restrictions on use of repository content as required by deposit agreement, contract, or license. No Not met Make part of SLA development; add appropriate data requirements to metadata schema; delineate responsible parties High
A5.5 If repository ingests digital content with unclear ownership/rights, policies are in place to address liability and challenges to those rights. No Not met See A5.4 High

Organizational Infrastructure Conclusions

While some of the criteria in this section will not directly influence this next phase of software development, there are some items that have an urgency to them. At this time, InformaCam resides in a nebulous space, in which the Guardian Project develops and maintains a research implementation that clients may access to review the stage and usability of the product. However, as a "trusted" digital repository, it becomes important to clearly delineate the responsibilities that the Guardian Project is taking on, the legalities of ownership of media submitted to system, and the funding sources for these responsibilities before accepting any "true" submissions into an implementation. Therefore, it is recommended that the development of an SLA begins now; and appropriate long-term funding paths be identified (whether that is as a social entrepreneurship or through further non-profit funding sources, or that responsibility will be fully assumed by other organizations); as this is work that involves significant lag time to accomplish.

This work should not be considered an obstacle to development. Instead, the urgency should be more understood in relationship to 1) ensuring that the expenses involved in appropriately maintaining a trusted repository can be met, and 2) the related societal expense that would result if media submitted to the repository was deemed not worthy as court admissible evidence due to the results of up-and-down funding, multiple migrations, undefined responsibilities, etc.

Digital Object Management

Criteria Met? Current Application Proposed Next Steps Next Phase Priority
B1.1. Repository identifies properties it will preserve for digital objects. Yes see J3M spec review mets and Witness metadata schema to incorporate any other additional requirements High
B1.2. Repository clearly specifies the information that needs to be associated with digital material at the time of its deposit(i.e., SIP). Yes See B1.1 Low
B1.3. Repository has mechanisms to authenticate the source of all materials. Yes Chain of Custody functionality Document process and test High
B1.4. Repository’s ingest process verifies each submitted object (i.e., SIP) for completeness and correctness as specified in ? J3M is applied; but what does this mean interms of "archival" requirements see B1.1 Medium
B1.5. Repository obtains sufficient physical control over the digital objects to preserve them (Ingest: content acquisition). ? this falls into service level which is not yet defined include in SLA development High
B1.6. Repository provides producer/depositor with appropriate responses at predefined points during the ingest processes. Partial This system is fairly well developed on mobile to server; but admin work and review of completion of submit, etc. is not fully developed yet Identify strong use cases and develop Medium
B1.7. Repository can demonstrate when preservation responsibility is formally accepted for the contents of the submitted data objects (i.e., SIPs). Partial See B1.6 See B1.6 Medium
B1.8. Repository has contemporaneous records of actions and administration processes that are relevant to preservation. No Not met review with UT, IBA and Witness current practices; determine next steps for implementing Low
B2.1. Repository has an identifiable, written definition for each AIP or class of information preserved by the repository. Partial see B1.1. low
B2.2. Repository has a definition of each AIP (or class) that is adequate to fit longterm preservation needs. No Not met See B1.1 Low
B2.3. Repository has a description of how AIPs are constructed from SIPs No Not met See B1.1 Low
B2.4. Repository can demonstrate that all submitted objects (i.e., SIPs) are either accepted as whole or part of an eventual archival object (i.e., AIP), or otherwise disposed of in a recorded fashion. No Not met See B1.1 Low
B2.5. Repository has and uses a naming convention that generates visible, persistent, unique identifiers for all archived objects (i.e., AIPs). Yes See B1.1 See B1.1 Low
B2.6. If unique identifiers are associated with SIPs before ingest, the repository preserves the identifiers in a way that maintains a persistent association with the resultant archived object (e.g., AIP). Yes See B1.1 See B1.1 Low
B2.7. Repository demonstrates that it has access to necessary tools and resources to establish authoritative semantic or technical
context of the digital objects it contains (i.e., access to appropriate international Representation Information and format registries).
No Since this is still a research implementation not applicable Needs to be considered when developing SLA High
B2.8 Repository records/registers Representation Information (including formats) ingested. Yes See B1.1 See B1.1 Low
B2.9 Repository acquires preservation metadata (i.e., PDI) for its associated Content Information. Yes See B1.1 See B1.1 Low
B2.10 Repository has a documented process for testing understandability of the information content and bringing the information content up to the agreed level of understandability. No J3M is well documented, well defined schema, but not yet tested in a broader community Usability testing with IBA groups + find potential peer groups to review J3M Low
B2.11 Repository verifies each AIP for completeness and correctness at the point it is generated. ? Chain of custody is established, and initial capture is defined, but full AIP requirements needed to be fleshed out, and system needs to automate the verification process Medium
B2.12 Repository provides an independent mechanism for audit of the integrity of the repository collection/content. No This is still research implementation so not applicable Performing internal TRAC audit as first step; meeting with UT to determine applicable standards; will build in necessary features this round High
B2.13 Repository has contemporaneous records of actions and administration processes that are relevant to preservation (AIP creation). No See B2.12 See B2.12 High
B3.1. Repository has documented preservation strategies. No See B2.12 See B2.12 Medium
B3.2. Repository has mechanisms in place for monitoring and notification when Representation Information (including formats) approaches obsolescence or is no longer viable. see B2.12 See B2.12 + will include create of an "archive" master at time representations are created High
B3.3 Repository has mechanisms to change its preservation plans as a result of its monitoring activities. Partial InformaCam is an entirely open-source system, and any changes required would be at the discretion of a group implementing it. Need to more deeply consider the malleability of current stack around the creation of the media representations, and the metadata schema Medium
B3.4. Repository can provide evidence of the effectiveness of its preservation planning. No See B2.12 See B2.12 Medium
B4.1. Repository employs documented preservation strategies. No See B2.12 See B2.12 Medium
B4.2. Repository implements/responds to strategies for archival object (i.e., AIP) storage and migration. No See B2.12 See B2.12 Medium
B4.3 Repository preserves the Content Information of archival objects (i.e., AIPs). Partial See B2.12 See B2.12 Medium
B4.4 Repository actively monitors integrity of archival objects (i.e., AIPs). No See B2.12 See B2.12 High
B4.5 Repository has contemporaneous records of actions and administration processes that are relevant to preservation (Archival Storage). No See B2.12 See B2.12 High
B5.1 Repository articulates minimum metadata requirements to enable the designated community to discover and identify material of interest. Yes research implementation of search feature has been implemented Refine geo search; fill out catalog for better testing; refine search results Medium
B5.2 Repository captures or creates minimum descriptive metadata and ensures that it is associated with the archived object (i.e., AIP). Yes J3M is well defined, well documented metadataschema currently implemented see B2.12 Low
B5.3 Repository can demonstrate that referential integrity is created between all archived objects (i.e., AIPs) and associated descriptive information. Partial Representations that See B2.12. Will more deeply consider longer life cycles for media in repository and other "representations" that will be created Medium
B5.4 Repository can demonstrate that referential integrity is maintained between all archived objects (i.e., AIPs) and associated descriptive information. Partial see B5.3 See B5.3 Medium
B6.1 Repository documents and communicates to its designated community what access and delivery options are available. Partial Help documentation has been written; some admin features exist Documentation needs to be refined to reflect preservation practices decided on + admin features need to be built out to support full life cycle of media Medium-High
B6.2 Repository has implemented a policy for recording all access actions (includes requests, orders etc.) that meet the requirements of the repository and information producers/depositors. No See B2.12 See B2.12 Low
B6.3 Repository ensures that agreements applicable to access conditions are adhered to. No As research implementation not applicable See B2.12 High
B6.4 Repository has documented and implemented access policies (authorization rules, authentication requirements) consistent with deposit agreements for stored objects. No users submitting to system must have registered cert with system; though installation of app is open/uncertain of rules and authentication is applied as full open source app See B6.3 High
B6.5 Repository access management system fully implements access policy. No See B6.4 High
B6.6 Repository logs all access management failures, and staff review inappropriate “access denial” incidents. No Research implementation, so not yet developed Need to implement a more robust log system for all activities High
B6.7 Repository can demonstrate that the process that generates the requested digital object(s) (i.e., DIP) is completed in relation to the request. Partial Chain of custody is currently maintained Need to implement an more robust log system for all activities High
B6.8 Repository can demonstrate that the process that generates the requested digital object(s) (i.e., DIP) is correct in relation to the request. No See B6.7 See B6.7 High
B6.9 Repository demonstrates that all access requests result in a response of acceptance or rejection. Partial see B6.7 see B6.7 High
B6.10 Repository enables the dissemination of authentic copies of the orig No Focus is on sharing the representations Full life cycle and admin features still need to be defined and developed Medium

Digital Object Management Conclusions

Much of this section is related to the metadata that is associated with submitted media. The J3M schema is a great start, and it has already been identified that METs and other archival metadata could be wrapped around the J3M at time of submission. The next phase would include some work to review and incorporate any additional metadata that is determined useful/necessary. In addition, the server-side schema should be fleshed out more to maintain relationships between representations of an original, and their corresponding metadata and any related technical documentation.

However, it is also important to more clearly define who the "designated" communities are, and what their acceptable "access" levels will be. While a PEM file is created for any user submitting media, user authentication through the web admin interface is not tied to this. Varying permission levels are not associated with the various user actions. Certain user actions that can be performed on an object are not appropriately tied to a corresponding user identity. Searchable/readily useable logs of all system actions do not exist. And formal "preservation strategies" still need to be defined before the system can accept responsibility for media.

Technologies, Technical Infrastructure & Security

Criteria Met? Current Application Proposed Next Steps Next Phase Priority
C1.1 Repository functions on well-supported operating systems and other core infrastructural software. Partial Technologies chosen for the informacam stack are strong; however, as a research implementation a full community for "informaCam" needs to be developed identify existing open-source communities that would have an interest in technology; promote informacam Medium
C1.2 Repository ensures that it has adequate hardware and software support for backup functionality sufficient for the repository’s services and for the data held, e.g., metadata associated with access controls, repository main content. No Research implementation/ not applicable yet See A2.1 High
C1.3 Repository manages the number and location of copies of all digital objects. Partial See B5.3 See B5.3 Medium
C1.4 Repository has mechanisms in place to ensure any/multiple copies of digital objects are synchronized. Partial See B5.3 See B5.3 Medium
C1.5 Repository has effective mechanisms to detect bit corruption or loss. No As research implementation not yet established Meet with UT; identify other repo apps approach to this; build out High
C1.6 Repository reports to its administration all incidents of data corruption or loss, and steps taken to repair/replace corrupt or lost data. No See C1.5 C1.5
C1.7 Repository has defined processes for storage media and/or hardware change (e.g., refreshing, migration). No see B2.12 see B2.12 Medium
C1.8 Repository has a documented change management process that identifies changes to critical processes that potentially affect the repository’s ability to comply with its mandatory responsibilities. No See B2.12 See B2.12 Medium
C1.9 Repository has a process for testing the effect of critical changes to the system. No As research implementation, not applicable Can include in review of current witness practices, etc. Low
C1.10 Repository has a process to react to the availability of new software security updates based on a risk-benefit assessment. No Currently building out a chef recipe for automated builds create a build process that will enable for more rapid, and recordable updates of system when security updates become available. (is chef too time consuming to maintain; will software patches + upgrades realistically get updated in chef file ??) High
C2.1 Repository has hardware technologies appropriate to the services it provides to its designated communities and has procedures in place to receive and monitor notifications, and evaluate when hardware technology changes are needed. Partial Technologies selected are appropriate for research implementation Formal notification practices need to be established once service ownership is also established Low
C2.2 Repository has software technologies appropriate to the services it provides to its designated community(ies) and has procedures in place to receive and monitor notifications, and evaluate when software Partial Technologies selected are appropriate for long-term implementations (Java version will be revised/dependence on Matlab is being removed See C2.1 Medium
C3.1 Repository maintains a systematic analysis of such factors as data, systems, personnel, physical plant, and security needs. No Not applicable as a research implementation Include this in development of SLA High
C3.2 Repository has implemented controls to adequately address each of the defined security needs. No See C3.1 See C3.1 High
C3.3 Repository staff have delineated roles, responsibilities, and authorizations related to implementing changes within the system. No See A2.1 See A2.1 Medium
C3.4 Repository has suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an offsite copy of the recovery plan(s). No As research implementation, not applicable Make part of SLA development Urgent

Technologies, Technical Infrastructure & Security Conclusions

Security of the media and its corresponding chain of custody is a strength of the InformaCam research implementation. So, it makes sense that many of the weaknesses identified in this section actually correspond to the criteria that must be addressed as urgent in the Organizational Infrastructure. For example, it is urgent that back-up systems be in place before InformaCam begin accepting responsibility for media; however, the full responsibility and funding for this work still needs to be defined.

However, there are some items that directly relate to software development in this next phase as well. As identified in the previous section, searchable/readily useable logs of all system actions needs to be developed. Synchronization between representations and original media submissions needs to be accurately maintained. Dependence on non-standard Java and on Matlab needs to be phased out. And automated bitsum checks and bit corrosion checks needs to be created.

Final Conclusions

Overall, the InformaCam system is heading in an appropriate direction. Many of the priorities that have been identified are natural steps within a system that is evolving towards service quality. However, out of this internal audit, the following items should be considered for development in this next phase, either to ensure the system is being developed in way that it can verify data integrity using industry standards, or to ensure the appropriate resources are dedicated to ensure full responsibility for court-admissible evidence is maintained:

1. SLA development

Guardian project needs to develop an SLA between any organization that it is providing "trusted" digital repository services for. This SLA must define, but is not limited to the following:
  • a repository mission
  • the types of media that will be accepted into the repository (i.e., the types of media the repo can assure it can maintain access and integrity of)
  • ownership / intellectual property rights established for media submitted
  • the preservation standards that will be met for any media submitted
  • what level of security that will be maintained
  • who are the people (designated communities) that will be allowed access and how
  • what audit tools will be used to verify integrity of repo
  • which organization will be responsible for which service entity (e.g., who is running the servers, who is maintaining bug reports, who is running helpdesk, etc.)
  • contingency plans in place for failure of service
  • contingency plans in place for loss of funding/organizational structure
  • and last, and most importantly, how funding will be provided to ensure the trusted digital repository requirements can be met

2. Business plan

Long-term funding for the digital repository services must be defined. This could be as a social venture, it could be to continue to maintain strong relationships with non-profit granting agencies, it could be to develop deep partnerships with viable organizations, or it could be some combination of these three. Potential partner organizations could be for-profit (e.g., Google), governments (e.g., E.U.), or research institutions with leading and committed digital repository programs (e.g., Yale, Cambridge, UC Berkeley, etc.)

3. Metadata extended

The J3M metadata schema was not been designed against METS or other digital repository schema systems. This is not a shortcoming, since the schema is appropriate for its location/purpose. However, work must be done during this phase to identify a path to integrate incoming J3M metadata with METS, as well as some other metadata standards that have been identified as already in use (e.g, the schema developed with UT and Witness).

In addition, a more robust and malleable means for maintaining relationships (and their associated metadata) between representations and the originating media submission, needs to be developed/incorporated within this server-side metadata schema.

4. Preservation Standards

Formal preservation practices for the video and images accepted to InformaCam should be established. Video and digital image preservation is a fairly well established field at this point, and several leading organizations (e.g., Library of Congress, etc.) have published standards that be readily adopted. However, once standards are selected, some software changes will also be necessary to implement (e.g., if the best practice preservation format for images is considered to be TIFF, the InformaCam Server will create this format at time of ingest, as well as the web-ready representations, etc.).

5. Build Out Audit-able Repo Tools

Software development is needed to create a robust evidence log of all system actions; including:
  • any connections made to the system and by actor (user or system)
  • any basic user actions taken (e.g., annotation added, new representation created, representation viewed, new master created, etc.)
  • any system notifications sent (e.g., software update needed, unusual behavior detected, etc.)
  • any system updates made (e.g., patch added, stack software updated, etc.)
  • when checksum and bit corrosion checks were made
  • {add to this after UT conversation}

In addition, a system to automate checksum and bit corrosions checks needs to be developed.

6. Designed Communities and Corresponding Access Levels

The permissions/authentication on the server side needs to be enhanced to better support more granular levels of access. E.g., view privileges of a media submissions, vs. annotation rights, vs. download of original, etc. This granular levels of access also need to be recorded within the evidence logs. In addition, if this phase of development will also include the creation of feeds/sharing with more "public" community, the software will also need to be enhanced to maintain distinctions between the "vault" containing the originating media, and a front-end that many, varying users could be hitting with a public URL/access point.

Also available in: PDF HTML TXT