Project # 1065250
Participants:Douglas W. Oard, PI, University of Maryland
David S. Doermann, Co-PI, University of Maryland
David A. Kirsch, Co-PI, University of Maryland
Maura R. Grossman, Consultant
Dave Lewis, Consultant, David D. Lewis Consulting
William E. Webber, Postdoctoral Scholar, University of Maryland
Advisory board:Jason R. Baron, University of Maryland
David Madigan, Columbia University
Martin D. Pichinson, Sherwood Partners, LLC
Interns:Marjorie Desamito, Eleanor Roosevelt High School, Greenbelt, MD.
Bryan Toth, Eleanor Roosevelt High School, Greenbelt, MD.
There is a crisis brewing in the civil litigation system of the United States, the system on which our society depends as the ultimate arbiter for commercial and personal disputes. Plaintiffs and defendants are entitled to request relevant evidence from each other, but the cost of finding that evidence is growing rapidly. Although born-digital records might seem easier to find than their older paper counterparts, rapid growth in the volume, diversity, and possible locations of these records has actually made it harder to find the proverbial needles within the many digital haystacks. These growing costs raise concerns about access to justice, which the courts must balance against the imperative for discovery and exchange of relevant evidence. Demonstrably effective technology that can improve the cost-effectiveness of this "e-discovery" process is crucial if we are to avoid legal gridlock.
Search problems occupy the boundary between formal computation and human cognition. Techniques are being developed in this project to automatically decide within minutes on the responsiveness of more documents than one person could examine in a lifetime. These techniques use "supervised learning," automatically teaching the software to replicate the kinds of decisions that people make on representative examples. Using Finite Population Annotation, a new framework for integrating learning with evaluation, novel methods are being developed to achieve and measure the highest possible effectiveness for any specified level of human effort. These learning methods draw on rich approaches to representing the content of both born-digital structured documents and scanned paper. The effectiveness of automated review techniques must be accurately measured, both to support decisions by legal professionals and by the courts about which methods to use, and to help developers further improve their algorithms. Recent advances in measurement science are being applied to conduct rigorous evaluations of effectiveness despite the inevitable differences of opinion that some human reviewers may have about the responsiveness specific documents.
The legal system demands technology whose effectiveness has been demonstrated on collections that are representative of what is actually expected in a real case. For that reason, this project is creating unprecedented real world test collections in collaboration with the National Institute of Standards and Technology's Text Retrieval Conference (TREC). The project's results will help to shape professional practice through workshops for legal and technical stakeholders, and through university courses to prepare the next generation of attorneys and information professionals to employ these new capabilities. Moreover, technology from this project has broad applicability beyond the law, including preparation of systematic reviews of scientific literature, scholarly access to digital archives, and government responses to public information requests from citizens.
Page created: February 19, 2011
Last updated: May 14, 2012
This material is based upon work supported by the National Science Foundation under Grant No. 1065250. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.