Johann Roturier

Johann Roturier

Johann Roturier
Researcher

Johann Roturier's current research interests lie at the intersection of natural language processing, localization, and human factors in security. Johann completed his Ph.D. thesis in 2007, which investigated the impact of controlled language rules on various characteristics of machine-translated documentation. Since then, he has transferred some of his research findings into production processes, co-authored several papers and patents, and worked with multiple product teams.

During that time, he has also authored a book on localization (published by Routledge), taken part in standardization activities, served on numerous program committees for top-tier conferences (e.g. ACL, NAACL, EMNLP), co-supervised several Ph.D. Computer Science and Applied Language students, and acted as the scientific representative of the FP7 ACCEPT collaborative research project.

Selected Academic Papers

pdf
Foreebank: Syntactic Analysis of Customer Support Forums

In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)
We present a new treebank of English and French technical forum content which has been annotated for grammatical errors and phrase structure. This double annotation allows us to empirically measure the effect of errors on parsing performance. While it is slightly easier to parse the corrected versions of the forum sentences, the errors are not the main factor in making this kind of text hard to parse.
This paper introduces the Foreebank data set, a data set created for training user-generated content parsers. By clicking on the link below to access the Foreebank data set, or by accessing and/or using the Foreebank data set, you agree to be bound by these Terms of Use. If you do not agree to the Terms of Use, do not access or use the ForeeBank Data Set.

pdf
Evaluation of Machine-Translated User Generated Content: A pilot study based on User Ratings

In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT 2012)

pdf
Examining the Adoption and Abandonment of Security, Privacy, and Identity Theft Protection Practices

In Proceedings of ACM CHI Conference on Human Factors in Computing Systems (CHI 2020) (Honorable Mention Award)
Our online survey of 902 individuals studies the reasons for which users struggle to adhere to expert-recommended security, privacy, and identity-protection practices. We examined 30 of these practices, finding that gender, education, technical background, and prior negative experiences correlate with practice adoption levels. We found that practices were abandoned when they were perceived as low-value, inconvenient, or when overridden by subjective judgment. We discuss how tools and expert recommendations can better align to user needs.

pdf
DCU-Symantec Submission for the WMT 2012 Quality Estimation Task

In Proceedings of the 7th Workshop on Statistical Machine Translation (NAACL 2012)

pdf
A Detailed Analysis of Phrase-based and Syntax-based Machine Translation: The Search for Systematic Differences

In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas (AMTA 2012)

pdf
DCU-Symantec at the WMT 2013 Quality Estimation Shared Task

In Proceedings of the 8th Workshop on Statistical Machine Translation (ACL 2013)

pdf
Using Automatic Machine Translation Metrics to Analyze the Impact of Source Reformulations

In Proceedings of the 10th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2012)

pdf
Syntax and Semantics in Quality Estimation of Machine Translation

In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)

pdf
Evaluation of MT systems to translate user generated content

In Proceedings of the 13th Machine Translation Summit (MT Summit XIII)

pdf
Community-based post-editing of machine-translated content: monolingual vs. bilingual

In Proceedings of the 2nd MT Summit XIV Workshop on Post-editing Technology and Practice (WPTP 2013)

pdf
The ACCEPT Post-Editing environment: a flexible and customisable online tool to perform and analyse machine translation post-editing

In Proceedings of the 14th Machine Translation Summit (MT Summit 2013)

pdf
Domain adaptation in statistical machine translation of user-forum data using component-level mixture modeling in statistical machine translation of user-forum data using component-level mixture modeling

In Proceedings of the 13th Machine Translation Summit (MT Summit XIII)

pdf
Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models

In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012)

pdf
Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word Reduction: Normalization and/or Supplementary Data?

In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT 2012)

pdf
Quality Estimation of English-French Machine Translation: A Detailed Study of the Role of Syntax

In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014)

pdf
Bootstrapping a Natural Language Interface to a Cyber Security Event Collection System using a Hybrid Translation Approach

In Proceedings of the 17th Machine Translation Summit (MT Summit XVII)
We present a system that can be used to generate Elasticsearch (database) query strings for English-speaking cyberthreat hunters, security analysts or responders (agents) using a natural language interface.

pdf
Quality Estimation-guided Data Selection for Domain Adaptation of SMT

In Proceedings of the 14th Machine Translation Summit (MT Summit 2013)

pdf
Who Knows I Like Jelly Beans? An Investigation Into Search Privacy

In Proceedings of the 22nd Privacy Enhancing Technologies Symposium (PETS 2022)