Skip to content

Instantly share code, notes, and snippets.

@arn-e
Created January 9, 2012 17:45
Show Gist options
  • Save arn-e/1584078 to your computer and use it in GitHub Desktop.
Save arn-e/1584078 to your computer and use it in GitHub Desktop.
Document Review and the High Cost of Civil Litigation
Document Review and the High Cost of Civil Litigation
-----------------------------------------------------
Despite soaring discovery costs, the legal industry has by many accounts been slow to adapt to
the 21st century. Legal discovery refers to the initial phase of litigation during which the
disputing parties exchange information relevant to the case. Today, the majority of material
gathered consists of electronically stored information (ESI). ESI is an umbrella term used to
describe all electronic data, including e-mail (Outlook, Entourage, EML / RFC-2822) and e-docs
(PDF, plain text).
Traditionally, ESI has been reviewed in what is called a *linear* fashion. This means teams of
attorneys have looked at documents one by one and coded them based on *responsiveness* (relevance).
This form of review can be traced back to previous decades when legal documents consisted of paper.
Today, this remains the most prevalent method for performing document review, typically aided by
relatively straightforward culling techniques such as Boolean search and de-duplication. Since it
relies heavily on staffing hours, linear review is expensive and has difficulty scaling to keep
pace with the ever-increasing volume of ESI.
New technologies are slowly being introduced into this space, however the fundamental review
paradigm has yet to change. This is in large part due to the issue of defensibility, which means
to withstand legal scrutiny. Those responsible for the review of ESI are reluctant to abandon
tried and true best practices in favor of more advanced yet (legally) un-tested methodologies.
Machine learning, for instance, can be applied to document review in order to increase efficiency,
however it is not without potential pitfalls. Sophisticated approaches such as predictive coding
(supervised learning) and generative models (unsupervised learning) have not yet faced a serious
legal challenge. To put it simply, why risk a legal battle over the use of machine learning
techniques, and in the event of a challenge, how best to justify said techniques to a judge?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment