arn-e · January 9, 2012 17:45
diff --git a/gistfile1.txt b/gistfile1.txt
 Document Review and the High Cost of Civil Litigation
 -----------------------------------------------------
 Despite soaring discovery costs, the legal industry has by many accounts been slow to adapt to  
 the 21st century.  Legal discovery refers to the initial phase of litigation during which the  
 disputing parties exchange information relevant to the case.  Today, the majority of material  
 gathered consists of electronically stored information (ESI).  ESI is an umbrella term used to  
 describe all electronic data, including e-mail (Outlook, Entourage, EML / RFC-2822) and e-docs  
 (PDF, plain text).  

 Traditionally, ESI has been reviewed in what is called a *linear* fashion.  This means teams of  
 attorneys have looked at documents one by one and coded them based on *responsiveness* (relevance).  
 This form of review can be traced back to previous decades when legal documents consisted of paper.  
 Today, this remains the most prevalent method for performing document review, typically aided by  
 relatively straightforward culling techniques such as Boolean search and de-duplication.  Since it  
 relies heavily on staffing hours, linear review is expensive and has difficulty scaling to keep  
 pace with the ever-increasing volume of ESI.  

 New technologies are slowly being introduced into this space, however the fundamental review  
 paradigm has yet to change.  This is in large part due to the issue of defensibility, which means  
 to withstand legal scrutiny.  Those responsible for the review of ESI are reluctant to abandon  
 tried and true best practices in favor of more advanced yet (legally) un-tested methodologies.  
 Machine learning, for instance, can be applied to document review in order to increase efficiency,   
 however it is not without potential pitfalls.  Sophisticated approaches such as predictive coding  
 (supervised learning) and generative models (unsupervised learning) have not yet faced a serious  
 legal challenge.  To put it simply, why risk a legal battle over the use of machine learning  
 techniques, and in the event of a challenge, how best to justify said techniques to a judge?
	Document Review and the High Cost of Civil Litigation
	-----------------------------------------------------
	Despite soaring discovery costs, the legal industry has by many accounts been slow to adapt to
	the 21st century. Legal discovery refers to the initial phase of litigation during which the
	disputing parties exchange information relevant to the case. Today, the majority of material
	gathered consists of electronically stored information (ESI). ESI is an umbrella term used to
	describe all electronic data, including e-mail (Outlook, Entourage, EML / RFC-2822) and e-docs
	(PDF, plain text).

	Traditionally, ESI has been reviewed in what is called a linear fashion. This means teams of
	attorneys have looked at documents one by one and coded them based on responsiveness (relevance).
	This form of review can be traced back to previous decades when legal documents consisted of paper.
	Today, this remains the most prevalent method for performing document review, typically aided by
	relatively straightforward culling techniques such as Boolean search and de-duplication. Since it
	relies heavily on staffing hours, linear review is expensive and has difficulty scaling to keep
	pace with the ever-increasing volume of ESI.

	New technologies are slowly being introduced into this space, however the fundamental review
	paradigm has yet to change. This is in large part due to the issue of defensibility, which means
	to withstand legal scrutiny. Those responsible for the review of ESI are reluctant to abandon
	tried and true best practices in favor of more advanced yet (legally) un-tested methodologies.
	Machine learning, for instance, can be applied to document review in order to increase efficiency,
	however it is not without potential pitfalls. Sophisticated approaches such as predictive coding
	(supervised learning) and generative models (unsupervised learning) have not yet faced a serious
	legal challenge. To put it simply, why risk a legal battle over the use of machine learning
	techniques, and in the event of a challenge, how best to justify said techniques to a judge?