|
Markus Dreyer
I am a Ph.D. student in Natural Language Processing
(NLP)
at the Johns Hopkins University
(JHU) and member of
the CLSP and
the HLTCOE.
Here is my curriculum vitae.
E-mail: m...@gmail.com.
News: I defended my dissertation in 2010 and started as a Research Scientist at SDL Language Weaver. Research Interests
Natural language processing, machine translation,
computational morphology, machine learning, finite-state
modeling, parsing
|
That's me in scenic Iceland.
|
Publications
-
"Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model"
Markus Dreyer and Jason Eisner (2011).
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh.
[pdf] -
"Hill Climbing on Speech Lattices: A New Rescoring Framework"
A. Rastrow, M. Dreyer, A. Sethy, S. Khudanpur, B. Ramabhadran and M. Dredze (2011).
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague.
[pdf] -
"A Non-Parametric Model for the Discovery of Inflectional Paradigms
from Plain Text Using Graphical Models over Strings"
Markus Dreyer (2011).
Ph.D. Thesis, JHU, Baltimore.
[website] -
"Graphical Models over Multiple Strings"
Markus Dreyer and Jason Eisner (2009).
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore.
[pdf | bib | slides: keynote'09, mov small/large, pdf export] -
"Latent-Variable Modeling of String Transductions With Finite-State Methods."
Markus Dreyer, Jason Smith, Jason Eisner (2008).
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Honolulu, Hawaii.
[pdf (small fix in Fig. 1) | bib] -
"Machine Translation System Combination using ITG-based Alignments"
Damianos Karakos, Jason Eisner, Sanjeev Khudanpur and Markus Dreyer (2008).
In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Columbus, Ohio.
[pdf | bib] -
"Exploiting Prosody for PCFGs with Latent Annotations"
Markus Dreyer and Izhak Shafran (2007).
In Proceedings of Interspeech, Antwerp, Belgium.
[pdf | bib] -
"Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation"
Markus Dreyer, Keith Hall, Sanjeev Khudanpur (2007).
In Proceedings of the HLT-NAACL Workshop on Syntax and Structure in Statistical Translation (SSST), Rochester, New York.
[pdf | bib] -
"Better Informed Training of Latent Syntactic Features"
Markus Dreyer and Jason Eisner (2006).
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia.
[pdf | bib] -
"Vine Parsing and Minimum Risk Reranking for Speed and Precision"
Markus Dreyer, David A. Smith, Noah A. Smith (2006).
In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL), New York.
[pdf | bib | slides] -
"Statistical Machine Translation by Parsing"
A. Burbank, M. Carpuat, S. Clark, M. Dreyer, P. Fox, D. Groves, K. Hall, M. Hearne, I. D. Melamed, Y. Shen, A. Way, B. Wellington, and D. Wu (2005).
CLSP Technical Report.
[pdf]
Software
-
fstrain
I wrote this toolkit for efficient training of finite-state machines in C++. It includes an implementation of the expectation semiring, used with OpenFst, to represent and manipulate finite-state machines with (log-linear) features. It uses R for parameter optimization and can handle potentially divergent objective functions.
You can use fstrain to train globally normalized sequence models (e.g. for POS tagging or NER), or string-to-string transductions that may include deletions and insertions (e.g. for lemmatization), or train simple maxent classifiers. It is always possible to compose several smaller models and train them jointly, e.g. for a factorial CRF. The tarball contains a README and doxygen documentation. I hope to add some tutorial-style documentation in the near future.
Download: [fstrain-0.1.tar.gz] -
dyna
I am a member of the Dyna team. Some time ago, I wrote the Dyna frontend parser, the type inference system and some program transformations, like automatic binarization of dynamic programs.
Contact Information
Center for Language and Speech Processing
3400 N. Charles Street, CSE 321
Baltimore, MD 21218-2691
E-mail: m...@gmail.com
Phone: 410-516-6837
Fax : 410-516-5050
3400 N. Charles Street, CSE 321
Baltimore, MD 21218-2691
E-mail: m...@gmail.com
Phone: 410-516-6837
Fax : 410-516-5050