|
NSF
IIS Program Project # 1065250 |
Participants: Douglas W. Oard, PI, University of MarylandDavid S. Doermann, Co-PI, University of Maryland David A. Kirsch, Co-PI, University of Maryland Maura R. Grossman, Consultant Dave Lewis, Consultant, David D. Lewis Consulting William E. Webber, Postdoctoral Scholar, University of Maryland Jiaul Paik, Postdoctoral Scholar, University of Maryland Mossaab Bagdouri, Ph.D. Student, University of Maryland Heejung Byun, Ph.D. Student, University of Maryland Ning Gao, Ph.D. Student, University of Maryland Jyothi Vinjumur, Ph.D. Student, University of Maryland Rashmi Sankepally, Masters Student, University of Maryland Bryan Toth, Eleanor Roosevelt High School, Greenbelt, MD. Marjorie Desamito, Eleanor Roosevelt High School, Greenbelt, MD. |
There is a crisis brewing in the civil litigation
system of the United States, the system on which our society
depends
as the ultimate arbiter for commercial and personal disputes.
Plaintiffs and defendants are entitled to request relevant
evidence from each other, but the cost of finding that evidence
is growing rapidly. Although born-digital records might seem
easier to find than their older paper counterparts, rapid
growth in the volume, diversity, and possible locations of
these records has actually made it harder to find the proverbial
needles within the many digital haystacks. These growing
costs raise concerns about access to justice, which the courts
must balance against the imperative for discovery and exchange
of relevant evidence. Demonstrably effective technology that
can improve the cost-effectiveness of this "e-discovery" process
is crucial if we are to avoid legal gridlock. Search problems occupy the boundary between formal computation and human cognition. Techniques are being developed in this project to automatically decide within minutes on the responsiveness of more documents than one person could examine in a lifetime. These techniques use "supervised learning," automatically teaching the software to replicate the kinds of decisions that people make on representative examples. Using Finite Population Annotation, a new framework for integrating learning with evaluation, novel methods are being developed to achieve and measure the highest possible effectiveness for any specified level of human effort. These learning methods draw on rich approaches to representing the content of both born-digital structured documents and scanned paper. The effectiveness of automated review techniques must be accurately measured, both to support decisions by legal professionals and by the courts about which methods to use, and to help developers further improve their algorithms. Recent advances in measurement science are being applied to conduct rigorous evaluations of effectiveness despite the inevitable differences of opinion that some human reviewers may have about the responsiveness specific documents. The legal system demands technology whose effectiveness
has been demonstrated on collections that are representative
of what is actually expected in a real case. For that reason,
this project is creating unprecedented real world test collections
in collaboration with the National Institute of Standards and
Technology's Text Retrieval Conference (TREC). The project's
results will help to shape professional practice through workshops
for legal and technical stakeholders, and through university
courses to prepare the next generation of attorneys and information
professionals to employ these new capabilities. Moreover, technology
from this project has broad applicability beyond the law, including
preparation of systematic reviews of scientific literature,
scholarly access to digital archives, and government responses
to public information requests from citizens.
|
Project Pages:
Related Links: Advisory board: Jason R. Baron, Drinker Biddle & Reath LLPDavid Madigan, Columbia University Martin D. Pichinson, Sherwood Partners, LLC |
Page created: February 19, 2011
Last updated: May 14, 2012
This material is based upon work supported by the National Science Foundation under Grant No. 1065250. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.