Nathan Green Ph.D

WMT Dependency Annotations

WMT annotations are described in the paper Improvements to Syntax-based Machine Translation using Ensemble Dependency Parsers presented at The Second Workshop on Hybrid Approaches to Translation (HyTra) at ACL 2013. The annotations are done on the analytically layer of the Czech to English Data set from the Workshop on Machine Translation 2012 data.

The main purpose of this data set is to give researchers gold level dependency annotations for a data set that contains a gold level translations. This should allow researchers to analyze dependency annotations effect on machine translation.

License:

The data sets were made available as part of the shared task for machine translation at WMT 2012. More information can be found at http://www.statmt.org/wmt12/translation-task.html.

The data below with our annotations are being made available under the Creative Commons 3.0-BY-NC-SA license.

Contact Info:

If you have any concerns or questions please contact:

Nathan Green: nathan@nathangreen.com

Zdeněk Žabokrtský: zabokrtsky@ufal.mff.cuni.cz

Publications:

If this data set is used in publications we would appreciate if the following is cited.

[bib] Nathan Green and Zdeněk Žabokrtský, Improvements to Syntax-based Machine Translation using Ensemble Dependency Parsers. Proceedings of the ACL 2013 Second Workshop on Hybrid Approaches to Translation (HyTra), Sofia, Bulgaria, 2013

Acknowledgments:

This research has received funding from the European Commission’s 7th Framework Program (FP7) under grant agreement n° 238405 (CLARA). Additionally, this work has been using language resources developed and/or stored and/or distributed by the LINDAT-Clarin project of the Ministry of Education of the Czech Republic (project LM2010013).

File Downloads:

Treex format: Version 0.1

CoNLL format: Version 0.1