NSF IIS Program
Project # 1065250

Development and Evaluation of Search Technology
for Discovery of Evidence in Civil Litigation



Douglas W. Oard, PI, University of Maryland

David S. Doermann, Co-PI, University of Maryland

David A. Kirsch, Co-PI, University of Maryland

Maura R. Grossman, Consultant

Dave Lewis, Consultant, David D. Lewis Consulting

William E. Webber, Postdoctoral Scholar, University of Maryland

Jiaul Paik, Postdoctoral Scholar, University of Maryland

Mossaab Bagdouri, Ph.D. Student, University of Maryland

Heejung Byun, Ph.D. Student, University of Maryland

Ning Gao, Ph.D. Student, University of Maryland

Jyothi Vinjumur, Ph.D. Student, University of Maryland

Rashmi Sankepally, Masters Student, University of Maryland

Bryan Toth, Eleanor Roosevelt High School, Greenbelt, MD.

Marjorie Desamito, Eleanor Roosevelt High School, Greenbelt, MD.

There is a crisis brewing in the civil litigation system of the United States, the system on which our society depends as the ultimate arbiter for commercial and personal disputes. Plaintiffs and defendants are entitled to request relevant evidence from each other, but the cost of finding that evidence is growing rapidly. Although born-digital records might seem easier to find than their older paper counterparts, rapid growth in the volume, diversity, and possible locations of these records has actually made it harder to find the proverbial needles within the many digital haystacks. These growing costs raise concerns about access to justice, which the courts must balance against the imperative for discovery and exchange of relevant evidence. Demonstrably effective technology that can improve the cost-effectiveness of this "e-discovery" process is crucial if we are to avoid legal gridlock.

Search problems occupy the boundary between formal computation and human cognition. Techniques are being developed in this project to automatically decide within minutes on the responsiveness of more documents than one person could examine in a lifetime. These techniques use "supervised learning," automatically teaching the software to replicate the kinds of decisions that people make on representative examples. Using Finite Population Annotation, a new framework for integrating learning with evaluation, novel methods are being developed to achieve and measure the highest possible effectiveness for any specified level of human effort. These learning methods draw on rich approaches to representing the content of both born-digital structured documents and scanned paper. The effectiveness of automated review techniques must be accurately measured, both to support decisions by legal professionals and by the courts about which methods to use, and to help developers further improve their algorithms. Recent advances in measurement science are being applied to conduct rigorous evaluations of effectiveness despite the inevitable differences of opinion that some human reviewers may have about the responsiveness specific documents.

The legal system demands technology whose effectiveness has been demonstrated on collections that are representative of what is actually expected in a real case. For that reason, this project is creating unprecedented real world test collections in collaboration with the National Institute of Standards and Technology's Text Retrieval Conference (TREC). The project's results will help to shape professional practice through workshops for legal and technical stakeholders, and through university courses to prepare the next generation of attorneys and information professionals to employ these new capabilities. Moreover, technology from this project has broad applicability beyond the law, including preparation of systematic reviews of scientific literature, scholarly access to digital archives, and government responses to public information requests from citizens.


   Project Pages:
   Related Links:

Advisory board:

Jason R. Baron, Drinker Biddle & Reath LLP

David Madigan, Columbia University

Martin D. Pichinson, Sherwood Partners, LLC

Page created: February 19, 2011
Last updated: May 14, 2012

This material is based upon work supported by the National Science Foundation under Grant No. 1065250. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.