The Center for Language and Speech Processing




About CLSP
About CLSP
Upcoming Seminar


July 24th
12:00AM
CSEB Room B17
" "

More information »

Workshops

Confusion-based Statistical Language Modeling for Machine Translation and Speech Recognition

How can we decide that one sentence is more likely in a language than another sentence, especially if those sentences have never been seen before in entirety? Why would we want to? The answer to the second question is that many natural language applications -- machine translation, automatic speech recognition -- produce a multitude of possible sentences as the output (of translation or recognition) and the likelihood of the resulting sentences in the language is a key way to choose between them. New methods for figuring out the answer to the first question is the topic of this summer workshop project. For the same "true" output, the set of competing outputs ('confusions') depends on the application: for speech recognition, the confusions typically sound similar (such as 'their' and 'there'); while in machine translation, the confusions will depend on ambiguities that arise in the translation process for a particular language pair (different for, say, Chinese and German when translating into English). In this project, we will be investigating techniques to automatically generate possible confusions for a particular task and learn statistical models of language from such confusions. These models can then be used to do a better job of choosing which of the alternative outputs of a particular system is best. This project is a chance to work on cutting edge speech and natural language applications, and get your hands dirty underneath the hood of state-of-the-art systems, while trying to make them better.

Final Presentation

Final Report

Team Members

Senior Members
   Brian Roark roark at cslu dot ogi dot edu Oregon Health and Science University
   Philipp Koehn pkoehn at inf dot ed dot ac dot uk University of Edinburgh
   Kenji Sagae sagae at ict dot usc dot edu University of Southern California
   Keith Hall keith dot hall at mac dot com Google
   Dan Bikel dbikel at google dot com Google
   Sanjeev Khudanpur khudanpur at jhu dot edu Johns Hopkins University
   Chris Callison-Burch ccb at cs dot jhu dot edu Johns Hopkins University
Graduate Students
   Emily Tucker emilytucker at gmail dot com Oregon Health and Science University
   Eva Hasler e dot hasler at sms dot ed dot ac dot uk University of Edinburgh
   Maider Lehr maiderlehr at gmail dot com Oregon Health and Science University
   Yuan Cao yuan dot cao at jhu dot edu Johns Hopkins University
   Puyang Xu puyangxu at gmail dot com Johns Hopkins University
   Charley Chan tsz at jhu dot edu Johns Hopkins University
Undergraduate Students
   Darcey Riley DystopianAnomaly at gmail dot com University of Rochester
   Nathan Glenn garfieldnate at gmail dot com Brigham Young University
Affiliate Members
   Damianos Karakos damianos at jhu dot edu Johns Hopkins University
   Zhifei Li zhifei dot work at gmail dot com Google
   Adam Lopez alopez at cs dot jhu dot edu Johns Hopkins University
   Matt Post post at jhu dot edu Johns Hopkins University
   Murat Saraclar murat dot saraclar at boun dot edu dot tr Boğaziçi University
   Izhak Shafran zakshafran at gmail dot com Oregon Health and Science University