Markus Freitag

Head of Google Translate Research

Email: freitag [at] google [dot] com

Short CV

I am currently a Senior Staff Research Scientist and head of Google Translate Research in Mountain View, CA. My research interests include machine translation, human and automatic evaluation of NLP systems, decoding strategies, model training, and data processing. Before joining Google, I worked as a Research Staff Member at IBM in Yorktown Heights, NY. I received my PhD in Computer Science from the RWTH Aachen University in 2015 under the supervision of Prof. Dr. Hermann Ney.

Current Research Interests

From 2021 onwards, we have primarily focused on researching better ways to evaluate the performance of NLP systems, either by humans or automatically via learned metrics. We have demonstrated that these improvements can be directly applied to NLP systems to improve their performance. For example, we can incorporate them as a utility/reward function into the inference algorithm. If you want to learn more about the importance of evaluation and how to use human feedback to improve NLP systems, you should watch my invited talk at the HumanEval 2022.

Community Contributions

Program Committee: ACL, EMNLP, EACL, NAACL, NeurIPS, WMT, EAMT, MT Summit, IWSLT, Eval4NLP
Area Chair: EMNLP 2021, EACL 2023, EMNLP 2023, NeurIPS 2023, ICLR 2023
Senior Area Chair: ACL 2023
Organizer of the WMT Metric Task: WMT 2020, WMT 2021, WMT 2022, WMT 2023

Invited Talks

Importance of Focusing on Evaluation
MT Marathon 2022 [pdf]
A journey of MT research - Why it is crucial to work on evaluation
HumEval 2022 [recording]
List of older invited talks will follow soon.

Publications

2023

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
Mara Finkelstein, Subhajit Naskar, Mehdi Mirzazadeh, Apurva Shah, Markus Freitag
Arxiv [pdf]
Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus Freitag
Arxiv [pdf]
Quality Control at Your Fingertips: Quality-Aware Translation Models
Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Daniel Cremers
Arxiv [pdf]
Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent
Markus Freitag, Nitika Mathur, Chi-kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Tom Kocmi, Frederic Blain, Daniel Deutsch, Craig Stewart, Chrysoula Zerva, Sheila Castilho, Alon Lavie and George Foster
WMT 2023 [pdf]
The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André Martins, Graham Neubig, Ankush Garg, Jonathan Clark, Markus Freitag, Orhan Firat
WMT 2023 [pdf]
There’s No Data like Better Data: Using QE Metrics for MT Data Filtering
Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska and Markus Freitag
WMT 2023 [pdf]
Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
Daniel Deutsch, Juraj Juraska, Mara Finkelstein and Markus Freitag
WMT 2023 [pdf]
Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet
Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Philipp Koehn, Benjamin Marie, Christof Monz, Makoto Morishita, Kenton Murray, Makoto Nagata, Toshiaki Nakazawa, Martin Popel, Maja Popović and Mariya Shmatova
WMT 2023 [pdf]
Findings of the WMT 2023 Shared Task on Automatic Post-Editing
Pushpak Bhattacharyya, Rajen Chatterjee, Markus Freitag, Diptesh Kanojia, Matteo Negri and Marco Turchi
WMT 2023 [pdf]
MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task
Juraj Juraska, Mara Finkelstein, Daniel Deutsch, Aditya Siddhant, Mehdi Mirzazadeh and Markus Freitag
WMT 2023 [pdf]
Quality Estimation Using Minimum Bayes Risk
Subhajit Naskar, Daniel Deutsch and Markus Freitag
WMT 2023 [pdf]
Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
Markus Freitag, Behrooz Ghorbani, Patrick Fernandes
EMNLP 2023 [pdf]
Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
Daniel Deutsch, George Foster, Markus Freitag
EMNLP 2023 [pdf]
INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback
Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei Li
EMNLP 2023 [pdf]
Prompting PaLM for Translation: Assessing Strategies and Performance
David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster
ACL 2023 [pdf]
PaLM 2 Technical Report
Google
Arxiv [pdf]
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
ICML 2023 [pdf]
Language Models are Multilingual Chain-of-thought Reasoners
Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei
ICLR 2023 [pdf]

2022

High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
Markus Freitag, David Grangier, Qijun Tan, Bowen Liang
TACL [pdf] [video]
A Natural Diet: Towards Improving Naturalness of Machine Translation Output
Markus Freitag, David Vilar, David Grangier, Colin Cherry, George Foster
ACL 2022 [pdf] [video]
Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, Eleftherios Avramidis, Tom Kocmi, George Foster, Alon Lavie, André FT Martins
WMT 2022 [pdf]
Scaling Laws for Neural Machine Translation
Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry
ICLR 2022 [pdf]
Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance
Jingwei Ni, Zhijing Jin, Markus Freitag, Mrinmaya Sachan, Bernhard Schölkopf
NAACL 2022 [pdf] [video]
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías Fuentes, George Foster, Markus Freitag, Qijun Tan
HumEval [pdf] [video]
Proceedings of the Seventh Conference on Machine Translation (WMT)
Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André FT Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
WMT 2022 [pdf]
Findings of the WMT 2022 Shared Task on Automatic Post-Editing
Pushpak Bhattacharyya, Rajen Chatterjee, Markus Freitag, Diptesh Kanojia, Matteo Negri, Marco Turchi
WMT 2022 [pdf]
On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation
Kelly Marchisio, Markus Freitag, David Grangier
NAACL 2022 [pdf] [video]

2021

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, Wolfgang Macherey
TACL [pdf]
Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain
Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar
WMT 2021 [pdf]
Assessing Reference-Free Peer Evaluation for Machine Translation
Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry
NAACL 2021 [pdf] [video]
Findings of the 2021 Conference on Machine Translation (WMT21)
Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
WMT 2021 [pdf]
Using Machine Translation to Localize Task Oriented NLG Output
Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano
Arxiv [pdf]

2020

BLEU might be Guilty but References are not Innocent
Markus Freitag, David Grangier, Isaac Caswell
EMNLP 2020 [pdf] [video]
Translationese as a Language in “Multilingual” NMT
Parker Riley, Isaac Caswell, Markus Freitag, David Grangier
ACL 2020 [pdf] [video]
Results of the WMT20 Metrics Shared Task
Nitika Mathur, Johnny Wei, Markus Freitag, Qingsong Ma, Ondřej Bojar
WMT 2020 [pdf]
Human-Paraphrased References Improve Neural Machine Translation
Markus Freitag, George Foster, David Grangier, Colin Cherry
WMT 2020 [pdf] [video]
Complete Multilingual Neural Machine Translation
Markus Freitag, Orhan Firat
WMT 2020 [pdf] [video]
Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P Parikh
WMT 2020 [pdf] [video]
KoBE: Knowledge-Based Machine Translation Evaluation
Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, Wolfgang Macherey
EMNLP 2020 [pdf] [video]

2019

APE at Scale and Its Implications on MT Evaluation Biases
Markus Freitag, Isaac Caswell, Scott Roy
WMT 2019 [pdf]

2018

Unsupervised Natural Language Generation with Denoising Autoencoders
Markus Freitag, Scott Roy
EMNLP 2018 [pdf]

2017

Beam Search Strategies for Neural Machine Translation
Markus Freitag, Yaser Al-Onaizan
ACL 2017 [pdf]
Ensemble Distillation for Neural Machine Translation
Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran
Arxiv [pdf]
Attention-based Vocabulary Selection for NMT Decoding
Baskaran Sankaran, Markus Freitag, Yaser Al-Onaizan
Arxiv [pdf]

2016

Fast Domain Adaptation for Neural Machine Translation
Markus Freitag, Yaser Al-Onaizan
Arxiv [pdf]

pre 2016

You can find a complete list of my publications on my Google Scholar page.

Thesis

Investigations on Machine Translation System Combination
Markus Freitag
PhD dissertation at RWTH Aachen University, Germany [pdf]