eDiscovery – the search for relevant information for investigations
Every day, the amount of data in organisations grows and so do the opportunities to misuse it. With eDiscovery, data is located, secured and searched for use as evidence in civil or criminal proceedings. Our author is a specialist in this topic and explains the complex process of eDiscovery.
Differentiating eDiscovery from digital forensics
Since both disciplines deal with electronically stored information in the context of investigations, it is often assumed that they deal with one and the same thing. The main purpose of eDiscovery is to identify potentially relevant data and metadata and to secure or collect them “forensically sound”, i.e. the data and metadata must remain intact. The collected data is then not viewed in the context of the storage media or a user’s behaviour, but after a processing step in a review platform by the investigating lawyers or professionals in order to disclose the relevant part of the documents to the opposing party or to include it in an investigation report. The collection of data in eDiscovery usually requires the explicit use of forensic methods in the context of fraud investigations and criminal prosecutions, e.g. to verify the authenticity or other nature of a piece of evidence or to recover data that has already been deleted or to collect data that is more difficult to access. In digital forensics, for example, the entire hard drive is mirrored and searched for hidden data to determine who did what, where and why to a computer or application.
eDiscovery in legal cases and investigations
eDiscovery or eDisclosure has its origins in common law countries such as the US and the UK, where a wide range of potentially relevant data must be seized and disclosed in the context of legal cases and internal or external investigations. In 2006, the Federal Rules for Civil Procedures in the USA were amended to regulate the requirements for the preservation, preparation and disclosure of electronically stored information (ESI) in the context of pre-trial discovery. ESI explicitly includes so-called unstructured data such as emails, chats, documents on drives, telephone connections and their recordings. In addition, structured data from applications and databases are often relevant for investigations and also belong to the category of ESI. In contrast to common law, the civil law of continental European countries has a much more limited duty of disclosure. However, similarly large amounts of data must now be secured and disclosed in the context of official investigations. As the use of electronic means of communication such as email, chat and social media is constantly increasing and the digitalisation of data and work processes continues to grow significantly, investigations are often confronted with ever larger volumes of data. In many cases, the volume of data collected is several hundred gigabytes or even terabytes, and even after several steps to reduce the data, several tens of thousands of documents often remain for review. Special review platforms are used for this purpose, even with small amounts of data, instead of printing out the data or checking it for relevance in the data-specific application. A review platform can process large amounts of data, deduplicate multiple data in a comprehensible way and make them searchable. They typically also offer analytical methods (Technology Assisted Review, TAR) that make the review efficient far beyond the use of search terms and achieve the most reliable results possible. Based on case law and working groups of a leading think tank, eDiscovery best practices have developed that specify the requirements for securing and processing the data, reviewing it and producing it. It is important that decisions and results are documented to ensure traceability and reproducibility. The use of well-documented and adhered-to process flows as well as a review platform is indispensable for this. Last but not least, data identified as relevant must be released to the counterparty or authorities, which are often located abroad, in a so-called “production”. In this regard, a review platform also offers optimised functions and proven solutions to meet such requirements efficiently and reliably. The processing and disclosure is subject to country-specific data protection laws as well as, if applicable, further industry and country-specific laws that restrict disclosure or require certain protective measures. There is a standard process for eDiscovery, which is described in detail below.
Steps in the eDiscovery process
The individual steps of the eDiscovery process can be explained using the so-called “Electronic Discovery Reference Model (EDRM)”. The sequence is often not linear as shown, but iterative. For example, data from certain persons are prioritised and collected at the beginning, and once they have been sifted, others are added. The individual steps are explained further below.
Electronic Discovery Reference Model
Source: edrm.net V3.0, adapted
Information governance deals with the provision of communication technologies and applications authorised for use in the company as well as the management, classification and security of information. Furthermore, information governance overlaps with records management, which regulates the retention periods of information in the ordinary course of business (records retention management). During implementation, the framework conditions (in the form of guidelines and processes) are created for the consistent and orderly handling of ESI by the company and its employees.
Data identification is about which categories of data are potentially relevant and where and by whom they have been stored. The first step is to identify the possible persons as well as information and data types and sources:
- Which employees (custodians) and/or persons are the focus of the investigation, which are potentially relevant?
- What kind of information is relevant, e.g. internal and/or external communication, marketing material, transaction data, etc.?
- Is structured data in databases relevant, e.g. information in a client relationship management system that can be filtered and collected comparatively easily due to a unique key, e.g. the customer number?
- What types of unstructured data are relevant, e.g. emails, chats, social media as well as loose files on drives that typically do not contain structured classification?
- Is physical data required, such as archival data in folders or employee records, each of which is converted to electronic data by scanning for eDiscovery?
Once identified, an inventory of available information and data sources is made, as well as a plan for obtaining the data. Of critical importance is the question of how complete and intact the data is in the various systems and storage locations. In the absence of a central electronic archive for email, for example, it will most likely be necessary to collect emails not only from the server or the cloud but also from existing backup tapes, as well as user-specific archives from the local hard drive. In addition, employees may have been working at different locations or countries during the period under investigation, so that different sources are affected.
In the context of US legal cases or external investigations, the companies concerned have often sent out very broadly designed destruction stops of documents (legal holds) to employees and IT managers. This can result in companies often not only deleting the data potentially relevant to the case in question, but no data at all, despite the limited retention period according to the Records Retention Policy. However, this is precisely what is required according to data protection.
The securing and collection of data for the purpose of eDiscovery goes beyond the original purpose and requires legal legitimisation, which should be part of the collection process. For this purpose, the reason as well as the scope of the collection in terms of persons, data types and time period must be clearly documented. On this basis, the data collection is carried out in cooperation between the eDiscovery specialist and the IT department of the company. The following points are also of central importance:
- Use a digitally forensically compliant method
- Documented, repeatable processes
- Verified collection results and traceable protocols (chain of custody)
- Use of state-of-the-art software and hardware
Data processing (Processing)
Data processing and preparation by the eDiscovery specialist involves converting the data into a uniform and readable format so that it can be viewed precisely and efficiently using a review platform. The data is extracted from its proprietary format, encrypted data is decrypted as far as possible, and non-searchable formats are made searchable using OCR (“Optical Character Recognition”). Furthermore, parts of the collected data are excluded (culling) and the data deduplicated according to principles agreed with the lawyers. A distinction is made here between global deduplication (cross custodian) vs. deduplication per employee (custodian). In addition, deduplication is defined at object level (e-mails, attachments or loose files) – or family level (e-mail with attachment/attachments). In addition, there are methods to display only the longest and most complete e-mail of a communication chain (e-mail threading). The initial emails that build on each other are suppressed and only displayed if they are not exactly congruent with the complete email communication, e.g. the independent forwarding of an email to one or more other persons (branches). The quality as well as the clarity and comprehensibility of the results of the email threading can vary considerably depending on the technology used per provider.
Review and Analysis
The review has different phases and goals, which are recorded in a protocol before the start and further developed if necessary. At the beginning, the documents relevant to the core of the investigation must be found as quickly and reliably as possible (Key Document Identification) so that the risks and costs can be assessed (Early Case Assessment). For disclosure to the counterparty, e.g. in the context of US pre-trial discovery, all documents relevant in the broader sense must then also be found (Relevancy Review). In the process, the documents are marked as relevant or not relevant on the basis of predefined categories (tagging and issue coding). Studies have shown that human review reaches its limits, especially with large amounts of data and teams, because even with simultaneous instruction, reviewers often arrive at different assessments or make wrong decisions. It is not only the experience and qualifications of the individual reviewers and a sophisticated quality assurance process that are of great importance. The use of analytical procedures such as predictive coding can, with the right choice of method and experience of the person carrying out the review, not only significantly reduce costs and time requirements, but also be qualitatively better than a review carried out purely by humans.
Data output (production)
In US pre-trial discovery, the data identified as relevant is disclosed to the other party’s law firm in accordance with the scope negotiated at the outset. Before the data is released, a privilege and data privacy review must be conducted to identify any information that needs to be excluded or redacted. The disclosure of data to foreign authorities may require an Administrative Assistance Proceeding. Usually, framework conditions and formats for these data deliveries are defined at the beginning and must be strictly adhered to.
Optimal cooperation between specialists and lawyers
Already in the phase of data identification and procurement, lawyers and eDiscovery specialists work closely together. There are different types of eDiscovery specialists, with most often offering a purely technical advisory function. In the area of targeted data identification (key document identification) and keyword consulting and review using analytical methods, there are increasingly other specialists who, under the instructions of the lawyers, ensure optimal use of the review platform and use targeted analytical methods effectively and without error. However, the substantive decisions, such as the scope of the review, the problem definition and the selection and prioritisation of the custodians, are made by the lawyers. For a smooth process, it is advantageous if the lawyers, in addition to their legal knowledge, also have a basic understanding of these concepts or involve a specialist, as these technical and methodological and technological decisions usually have a significant impact on the investigation process in terms of scope, quality, time required and costs. In the US, this type of division of labour and cooperation is already largely established, whereas in Europe, due to lack of supply and knowledge, one still largely finds a more traditional division of labour between lawyer and technical eDiscovery specialist. Technical eDiscovery specialists check the data quality and completeness of the data processing, especially in both collection, processing and production. If any data gaps occur, the lawyers must be informed and the cause analysed, documented and, if possible, remedied. A report (exception report) lists data that is not extracted during data processing or is not made available in the review platform due to file size or other reasons. The handling of exceptions should be documented in a strict and well logged process and the handling of special exceptions (e.g. encrypted or very large data) should be discussed with the lawyers. In most cases, just because the data is available in a review platform does not mean that the entire set can be reviewed. In the best case, lawyers will work with specialists to define search strategies and search terms that can be used to limit the amount of data for the review as much as possible. The correct selection and optimisation of search terms are central to the preparation of the review. Search terms should be so varied and broad that they cover all aspects of the review, but at the same time deliver results that are as precise as possible and also minimise so-called false positives. If the terms are too broad, this can lead to a flood of irrelevant documents (low precision) and if they are too specific, to the exclusion of relevant documents (low recall). The development of case-specific linguistic models in cooperation with the lawyers can elegantly solve this problem. This creates an optimal balance between the precision and recall problem. At the end of the review process, the relevant facts and findings flow into the investigation report. This is often complemented by other sources of information (e.g. staff surveys, analysis of structured data, etc.). The relevant data is handed over as a production to the lawyers of the other party or the authorities.
eDiscovery is an independent discipline in the context of legal cases and internal and external investigations, which deals with the identification, seizure, technical processing as well as the review and production of electronic information. It is subject to special rules in common law countries such as the US, which can result in horrendous sanctions if mistakes are made. The examination of personal data is also subject to European data protection requirements and other industry- and country-specific restrictions. Due to the high degree of error-proneness as well as optimisation possibilities, a division of labour between the investigating lawyer or expert, the technical eDiscovery service provider and the search and analysis specialist, which is already common in the USA, is becoming more and more established in Europe. Specialised eDiscovery experts can use analytical methods to unerringly find core-relevant documents in the initial phase of an investigation, as well as ensure an efficient and effective process for the disclosure of data to the third party.
 The Sedona Conference Working Group on electronic document retention & production (WG1) 2] Rosenthal and Zeunert “E. Discovery and Data Protection: Challenges and Approaches for Multinational Companies” in “E-Discovery and Companies” in “E-Discovery and Information Governance” ES Verlag, 2011  https://www.edrm.net 4] Zeunert, Brupbacher, Dix: “GDPR adds to the risk profile for eDiscovery related preservation and collection activities for companies”; Conference paper; ABA Cross-Border Discovery Institute, Brussel 5] TREC Studies at http://trec-legal.umiacs.umd.edu/
- ABA Cross-Border Discovery Institute, 2018, Brussels: “GDPR adds to the risk profile for eDiscovery related preservation and collection activities for companies”; Christian Zeunert, Dr. Oliver M. Brupbacher, Dr. Alexander Dix
- Lead article “E-Discovery und Information Governance” ES Verlag, 2011, “E-Discovery und Datenschutz, Herausforderungen und Lösungsansätze für mulitnationale Unternehmen”, Christian Zeunert, David Rosenthal
- “Compliance: Structure – Management – Risk Realms” C.F. Müller; Chapter 6 C : “Document Management”, 2013
- Article in The Sedona Conference® Journal: “Cross border discovery – practical considerations and solutions for multinationals”, 2013
- Senior Editor: The Sedona Conference® International Principles on Discovery, Disclosure & Data Protection, 2011