Markus Freitag


Head of Google Translate Research

Email: freitag [at] google [dot] com

Google Scholar  /  Twitter  /  Linkedin

profile photo

Short CV

I am currently a Senior Staff Research Scientist and head of Google Translate Research in Mountain View, CA. My research interests include machine translation, human and automatic evaluation of NLP systems, decoding strategies, model training, and data processing. Before joining Google, I worked as a Research Staff Member at IBM in Yorktown Heights, NY. I received my PhD in Computer Science from the RWTH Aachen University in 2015 under the supervision of Prof. Dr. Hermann Ney.

Current Research Interests

From 2021 onwards, we have primarily focused on researching better ways to evaluate the performance of NLP systems, either by humans or automatically via learned metrics. We have demonstrated that these improvements can be directly applied to NLP systems to improve their performance. For example, we can incorporate them as a utility/reward function into the inference algorithm. If you want to learn more about the importance of evaluation and how to use human feedback to improve NLP systems, you should watch my invited talk at the HumanEval 2022.

Community Contributions

  • Program Committee: ACL, EMNLP, EACL, NAACL, NeurIPS, WMT, EAMT, MT Summit, IWSLT, Eval4NLP, COLM
  • Area Chair: EMNLP 2021, EACL 2023, EMNLP 2023, NeurIPS 2023, ICLR 2023
  • Senior Area Chair: ACL 2023, ACL 2024
  • Organizer of the WMT Metric Task: WMT 2020, WMT 2021, WMT 2022, WMT 2023


Invited Talks

  • Importance of Focusing on Evaluation
    MT Marathon 2022 [pdf]
  • A journey of MT research - Why it is crucial to work on evaluation
    HumEval 2022 [recording]
  • List of older invited talks will follow soon.

Publications

    2024

  • Finding Replicable Human Evaluations via Stable Ranking Probability
    Parker Riley, Dan Deutsch, George Foster, Viresh Ratnakar, Ali Dabirmoghaddam, Markus Freitag
    NAACL 2024 [pdf]
  • Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
    Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus Freitag
    NAACL 2024 [pdf]
  • Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model
    Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Daniel Cremers
    ACL 2024 [pdf]
  • MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
    Mara Finkelstein, Subhajit Naskar, Mehdi Mirzazadeh, Apurva Shah, Markus Freitag
    ICLR 2024 [pdf]
  • 2023

  • Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent
    Markus Freitag, Nitika Mathur, Chi-kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Tom Kocmi, Frederic Blain, Daniel Deutsch, Craig Stewart, Chrysoula Zerva, Sheila Castilho, Alon Lavie and George Foster
    WMT 2023 [pdf]
  • The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
    Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André Martins, Graham Neubig, Ankush Garg, Jonathan Clark, Markus Freitag, Orhan Firat
    WMT 2023 [pdf]
  • There’s No Data like Better Data: Using QE Metrics for MT Data Filtering
    Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska and Markus Freitag
    WMT 2023 [pdf]
  • Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
    Daniel Deutsch, Juraj Juraska, Mara Finkelstein and Markus Freitag
    WMT 2023 [pdf]
  • Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet
    Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Philipp Koehn, Benjamin Marie, Christof Monz, Makoto Morishita, Kenton Murray, Makoto Nagata, Toshiaki Nakazawa, Martin Popel, Maja Popović and Mariya Shmatova
    WMT 2023 [pdf]
  • Findings of the WMT 2023 Shared Task on Automatic Post-Editing
    Pushpak Bhattacharyya, Rajen Chatterjee, Markus Freitag, Diptesh Kanojia, Matteo Negri and Marco Turchi
    WMT 2023 [pdf]
  • MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task
    Juraj Juraska, Mara Finkelstein, Daniel Deutsch, Aditya Siddhant, Mehdi Mirzazadeh and Markus Freitag
    WMT 2023 [pdf]
  • Quality Estimation Using Minimum Bayes Risk
    Subhajit Naskar, Daniel Deutsch and Markus Freitag
    WMT 2023 [pdf]
  • Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
    Markus Freitag, Behrooz Ghorbani, Patrick Fernandes
    EMNLP 2023 [pdf]
  • Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
    Daniel Deutsch, George Foster, Markus Freitag
    EMNLP 2023 [pdf]
  • INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback
    Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei Li
    EMNLP 2023 [pdf]
  • Prompting PaLM for Translation: Assessing Strategies and Performance
    David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster
    ACL 2023 [pdf]
  • PaLM 2 Technical Report
    Google
    Arxiv [pdf]
  • Scaling Laws for Multilingual Neural Machine Translation
    Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
    ICML 2023 [pdf]
  • Language Models are Multilingual Chain-of-thought Reasoners
    Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei
    ICLR 2023 [pdf]
  • 2022

  • High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
    Markus Freitag, David Grangier, Qijun Tan, Bowen Liang
    TACL [pdf] [video]
  • A Natural Diet: Towards Improving Naturalness of Machine Translation Output
    Markus Freitag, David Vilar, David Grangier, Colin Cherry, George Foster
    ACL 2022 [pdf] [video]
  • Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
    Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, Eleftherios Avramidis, Tom Kocmi, George Foster, Alon Lavie, André FT Martins
    WMT 2022 [pdf]
  • Scaling Laws for Neural Machine Translation
    Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry
    ICLR 2022 [pdf]
  • Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance
    Jingwei Ni, Zhijing Jin, Markus Freitag, Mrinmaya Sachan, Bernhard Schölkopf
    NAACL 2022 [pdf] [video]
  • Toward More Effective Human Evaluation for Machine Translation
    Belén Saldías Fuentes, George Foster, Markus Freitag, Qijun Tan
    HumEval [pdf] [video]
  • Proceedings of the Seventh Conference on Machine Translation (WMT)
    Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André FT Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
    WMT 2022 [pdf]
  • Findings of the WMT 2022 Shared Task on Automatic Post-Editing
    Pushpak Bhattacharyya, Rajen Chatterjee, Markus Freitag, Diptesh Kanojia, Matteo Negri, Marco Turchi
    WMT 2022 [pdf]
  • On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation
    Kelly Marchisio, Markus Freitag, David Grangier
    NAACL 2022 [pdf] [video]
  • 2021

  • Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
    Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, Wolfgang Macherey
    TACL [pdf]
  • Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain
    Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar
    WMT 2021 [pdf]
  • Assessing Reference-Free Peer Evaluation for Machine Translation
    Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry
    NAACL 2021 [pdf] [video]
  • Findings of the 2021 Conference on Machine Translation (WMT21)
    Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
    WMT 2021 [pdf]
  • Using Machine Translation to Localize Task Oriented NLG Output
    Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano
    Arxiv [pdf]
  • 2020

  • BLEU might be Guilty but References are not Innocent
    Markus Freitag, David Grangier, Isaac Caswell
    EMNLP 2020 [pdf] [video]
  • Translationese as a Language in “Multilingual” NMT
    Parker Riley, Isaac Caswell, Markus Freitag, David Grangier
    ACL 2020 [pdf] [video]
  • Results of the WMT20 Metrics Shared Task
    Nitika Mathur, Johnny Wei, Markus Freitag, Qingsong Ma, Ondřej Bojar
    WMT 2020 [pdf]
  • Human-Paraphrased References Improve Neural Machine Translation
    Markus Freitag, George Foster, David Grangier, Colin Cherry
    WMT 2020 [pdf] [video]
  • Complete Multilingual Neural Machine Translation
    Markus Freitag, Orhan Firat
    WMT 2020 [pdf] [video]
  • Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
    Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P Parikh
    WMT 2020 [pdf] [video]
  • KoBE: Knowledge-Based Machine Translation Evaluation
    Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, Wolfgang Macherey
    EMNLP 2020 [pdf] [video]
  • 2019

  • APE at Scale and Its Implications on MT Evaluation Biases
    Markus Freitag, Isaac Caswell, Scott Roy
    WMT 2019 [pdf]
  • 2018

  • Unsupervised Natural Language Generation with Denoising Autoencoders
    Markus Freitag, Scott Roy
    EMNLP 2018 [pdf]
  • 2017

  • Beam Search Strategies for Neural Machine Translation
    Markus Freitag, Yaser Al-Onaizan
    ACL 2017 [pdf]
  • Ensemble Distillation for Neural Machine Translation
    Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran
    Arxiv [pdf]
  • Attention-based Vocabulary Selection for NMT Decoding
    Baskaran Sankaran, Markus Freitag, Yaser Al-Onaizan
    Arxiv [pdf]
  • 2016

  • Fast Domain Adaptation for Neural Machine Translation
    Markus Freitag, Yaser Al-Onaizan
    Arxiv [pdf]
  • pre 2016

    You can find a complete list of my publications on my Google Scholar page.

    Thesis

    • Investigations on Machine Translation System Combination
      Markus Freitag
      PhD dissertation at RWTH Aachen University, Germany [pdf]