EDRM and the Bolch Judicial Institute at Duke Law recently released Technology Assisted Review (TAR) Guidelines (Guidelines) with the aim “to objectively define and explain technology-assisted review for members of the judiciary and the legal profession.” Among the topics covered are the validation and reliability measures practitioners can use to defend their TAR processes. This post summarizes this validation and reliability guidance, which has the potential to be a widely-referenced authority on this topic going forward.
According to EDRM, there are no “bright-line rules” governing what constitutes a reasonable review or one standard measurement to validate the results of TAR. Instead, principles of reasonableness and proportionality as set forth in FRCP Rule 26 generally guide the inquiry.
Although not without controversy, validation of TAR processes is often based on “recall” rate—a measurement estimating the proportion of relevant documents that have been identified by TAR. If a team does not achieve what EDRM refers to as a “reasonable level” of recall, it may consider conducting additional training or other processes to improve the results and further identify relevant documents to be added to a predicted relevant set. EDRM identified the following tips to consider:
- Use a limited number of human reviewers to train the algorithm. Larger numbers of reviewers might lead to more inconsistent coding, which could increase the amount of quality control review needed and increase the time it takes to achieve a reasonable result.
- Base validity against a control set—a random sample taken from the entire TAR data set that is reviewed by a human reviewer at the beginning of training. When training no longer substantially improves the computer’s classifications of the control set documents, practitioners can consider training complete. The algorithm may then be applied to unreviewed documents.
- Consider the two primary approaches to measuring recall based on estimated “richness” (the proportion of relevant documents), by reviewing and calculating the percentage of relevant documents in a random sample of 1) the TAR set, and 2) the predicted non-relevant set. This will estimate the overall number of relevant documents in the data set, and how many of those the TAR model is estimated to exclude from review.
- Check the importance of the relevant documents found in validation that were categorized as non-relevant by TAR. If significant to the case and non-duplicative of what is already found in the relevant set, consider additional training to capture more of these documents.
- Carefully consider how validity is affected when introducing new documents into the TAR set after training has started, and if the scope of relevancy changes during training.
- Consider discussing TAR validation metrics with the opposing party before beginning the review. The Antitrust Division of the Department of Justice (DOJ Antitrust Division) and the Bureau of Competition of the Federal Trade Commission, for example, require prior approval of the specific TAR method proposed by parties, as well as several statistics including: the total number of manually-reviewed documents, the process used to determine validity, and “all statistical analyses utilized or generated by the [party] . . . related to the precision, recall, accuracy, validation, or quality of its document production.” Further, the DOJ Antitrust Division often requires confirmation that subject-matter experts will be reviewing the seed set and training rounds. In a litigation setting, practitioners should ensure they are meeting any obligations laid out in an ESI Protocol regarding disclosure to opposing counsel.
Again, it is important to note that there is no standard measurement to validate the results of TAR, and no objectively acceptable recall rate. Instead, validation is a project-specific assessment based on reasonableness and proportionality under the circumstances—how much could the result be improved by further review, at what cost, and what is the relative value of the relevant information that might be found by further review versus the additional review effort required to find that information.
It is important to note that the above guidelines represent only the first step in EDRM promoting an increased acceptance of TAR and developing standards for its use. A separate “best practices” guide is expected this summer. On balance, EDRM’s guidance adds to the body of work from various sources, including The Sedona Conference and others, cautioning against rigid requirements for TAR validation and arguing in favor of a flexible, case-specific evaluation.