Markus Freitag


Head of Google Translate Research & Gemini i18n Co-Lead

Email: freitag [at] google [dot] com

Google Scholar  /  Twitter  /  Linkedin

profile photo

Short CV

As a Senior Staff Research Scientist at Google, I lead Google Translate Research and co-lead Gemini i18n. My research focuses on multilingual Large Language Models (pre- and post-training), machine translation, and both human and automatic evaluation. Prior to joining Google, I was a Research Staff Member at IBM T.J. Watson Research Center. I earned my PhD in Computer Science in 2015 from RWTH Aachen University, advised by Prof. Dr. Hermann Ney.

Community Contributions

  • (Senior) Area Chair (most recent): NeurIPS 2025, ICML 2025, NeurIPS 2024, ICLR 2024, ACL 2024, NeurIPS 2023, ICLR 2023, EMNLP 2023, ACL 2023, EACL 2023
  • Program Committee: ACL, EMNLP, EACL, NAACL, NeurIPS, WMT, EAMT, MT Summit, IWSLT, Eval4NLP, COLM
  • Shared Task Organizer: WMT Metrics Task (2020 - 2025), WMT General MT Task (2021 - 2025), WMT Multilingual Instruction Following Task (2025)

Publications

    2025

  • Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
    Gemini Team
    arXiv 2025 [pdf]
  • Preliminary Ranking of WMT25 General Machine Translation Systems
    Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Konstantin Dranch, Anton Dvorkovich, Sergey Dukanov, Natalia Fedorova, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Howard Lakougna, Jessica Lundin, Kenton Murray, Masaaki Nagata, Stefano Perrella, Lorenzo Proietti, Martin Popel, Maja Popović, Parker Riley, Mariya Shmatova, Steinþór Steingrímsson, Lisa Yankovskaya, Vilém Zouhar
    WMT 2025 [pdf]
  • You Cannot Feed Two Birds with One Score: the Accuracy-Naturalness Tradeoff in Translation
    Gergely Flamich, David Vilar, Jan-Thorsten Peter, Markus Freitag
    COLM 2025 [pdf]
  • Enhancing Human Evaluation in Machine Translation with Comparative Judgment
    Yixiao Song, Parker Riley, Daniel Deutsch, Markus Freitag
    ACL 2025 [pdf]
  • WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
    Daniel Deutsch, Eleftheria Briakou, Isaac Caswell, Mara Finkelstein, Rebecca Galor, Juraj Juraska, Geza Kovacs, Alison Lui, Ricardo Rei, Jason Riesa, Shruti Rijhwani, Parker Riley, Elizabeth Salesky, Firas Trabelsi, Stephanie Winkler, Biao Zhang, Markus Freitag
    ACL 2025 [pdf]
  • Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation
    Muhammed Yusuf Kocyigit, Eleftheria Briakou, Daniel Deutsch, Jiaming Luo, Colin Cherry, Markus Freitag
    ICML 2025 [pdf]
  • 2024

  • Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms
    Firas Trabelsi, David Vilar, Mara Finkelstein, Markus Freitag
    NeurIPS 2024 [pdf]
  • From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
    Mara Finkelstein, Daniel Deutsch, Parker Riley, Juraj Juraska, Geza Kovacs, Markus Freitag
    ICML 2024 [pdf]
  • Mitigating Metric Bias in Minimum Bayes Risk Decoding
    Geza Kovacs, Daniel Deutsch, Markus Freitag
    WMT 2024 [pdf]
  • Learning from Others' Mistakes: Finetuning Machine Translation Models with Span-Level Error Annotations
    Lily H Zhang, Hamid Dadkhahi, Mara Finkelstein, Firas Trabelsi, Jiaming Luo, Markus Freitag
    ICML 2024 [pdf]
  • Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
    Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apu Shah, Markus Freitag
    WMT 2024 [pdf]
  • MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
    Juraj Juraska, Daniel Deutsch, Mara Finkelstein, Markus Freitag
    WMT 2024 [pdf]
  • On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation
    Eleftheria Briakou, Zhongtao Liu, Colin Cherry, Markus Freitag
    WMT 2024 [pdf]
  • Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
    Eleftheria Briakou, Jiaming Luo, Colin Cherry, Markus Freitag
    WMT 2024 [pdf]
  • Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data
    Mara Finkelstein, David Vilar, Markus Freitag
    WMT 2024 [pdf]
  • Preliminary WMT24 Ranking of General MT Systems and LLMs
    Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondrej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popovic, Mariya Shmatova, Steinþór Steingrímsson, Vilém Zouhar
    WMT 2024 [pdf]
  • Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task
    Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frederic Blain, Tom Kocmi, Jiayi Wang, David Ifeoluwa Adelani, Marianna Buchicchio, Chrysoula Zerva, Alon Lavie
    WMT 2024 [pdf]
  • Findings of the WMT24 General Machine Translation Shared Task: The LLM Era is Here but MT is not Solved Yet
    Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Christof Monz, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popović, Mariya Shmatova
    WMT 2024 [pdf]
  • Findings of the Quality Estimation Shared Task at WMT 2024 Are LLMs Closing the Gap in QE?
    Chrysoula Zerva, Frédéric Blain, José GC De Souza, Diptesh Kanojia, Sourabh Deoghare, Nuno M Guerreiro, Giuseppe Attanasio, Ricardo Rei, Constantin Orasan, Matteo Negri, Marco Turchi, Rajen Chatterjee, Pushpak Bhattacharyya, Markus Freitag, André Martins
    WMT 2024 [pdf]
  • Finding Replicable Human Evaluations via Stable Ranking Probability
    Parker Riley, Dan Deutsch, George Foster, Viresh Ratnakar, Ali Dabirmoghaddam, Markus Freitag
    NAACL 2024 [pdf]
  • Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback
    Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus Freitag
    NAACL 2024 [pdf]
  • Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model
    Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Daniel Cremers
    ACL 2024 [pdf]
  • MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
    Mara Finkelstein, Subhajit Naskar, Mehdi Mirzazadeh, Apurva Shah, Markus Freitag
    ICLR 2024 [pdf]
  • 2023

  • Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent
    Markus Freitag, Nitika Mathur, Chi-kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Tom Kocmi, Frederic Blain, Daniel Deutsch, Craig Stewart, Chrysoula Zerva, Sheila Castilho, Alon Lavie and George Foster
    WMT 2023 [pdf]
  • The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
    Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André Martins, Graham Neubig, Ankush Garg, Jonathan Clark, Markus Freitag, Orhan Firat
    WMT 2023 [pdf]
  • There’s No Data like Better Data: Using QE Metrics for MT Data Filtering
    Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska and Markus Freitag
    WMT 2023 [pdf]
  • Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
    Daniel Deutsch, Juraj Juraska, Mara Finkelstein and Markus Freitag
    WMT 2023 [pdf]
  • Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet
    Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Philipp Koehn, Benjamin Marie, Christof Monz, Makoto Morishita, Kenton Murray, Makoto Nagata, Toshiaki Nakazawa, Martin Popel, Maja Popović and Mariya Shmatova
    WMT 2023 [pdf]
  • Findings of the WMT 2023 Shared Task on Automatic Post-Editing
    Pushpak Bhattacharyya, Rajen Chatterjee, Markus Freitag, Diptesh Kanojia, Matteo Negri and Marco Turchi
    WMT 2023 [pdf]
  • MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task
    Juraj Juraska, Mara Finkelstein, Daniel Deutsch, Aditya Siddhant, Mehdi Mirzazadeh and Markus Freitag
    WMT 2023 [pdf]
  • Quality Estimation Using Minimum Bayes Risk
    Subhajit Naskar, Daniel Deutsch and Markus Freitag
    WMT 2023 [pdf]
  • Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
    Markus Freitag, Behrooz Ghorbani, Patrick Fernandes
    EMNLP 2023 [pdf]
  • Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation
    Daniel Deutsch, George Foster, Markus Freitag
    EMNLP 2023 [pdf]
  • INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback
    Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei Li
    EMNLP 2023 [pdf]
  • Prompting PaLM for Translation: Assessing Strategies and Performance
    David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster
    ACL 2023 [pdf]
  • PaLM 2 Technical Report
    Google
    Arxiv [pdf]
  • Scaling Laws for Multilingual Neural Machine Translation
    Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
    ICML 2023 [pdf]
  • Language Models are Multilingual Chain-of-thought Reasoners
    Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei
    ICLR 2023 [pdf]
  • 2022

  • High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics
    Markus Freitag, David Grangier, Qijun Tan, Bowen Liang
    TACL [pdf] [video]
  • A Natural Diet: Towards Improving Naturalness of Machine Translation Output
    Markus Freitag, David Vilar, David Grangier, Colin Cherry, George Foster
    ACL 2022 [pdf] [video]
  • Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
    Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, Eleftherios Avramidis, Tom Kocmi, George Foster, Alon Lavie, André FT Martins
    WMT 2022 [pdf]
  • Scaling Laws for Neural Machine Translation
    Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry
    ICLR 2022 [pdf]
  • Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance
    Jingwei Ni, Zhijing Jin, Markus Freitag, Mrinmaya Sachan, Bernhard Schölkopf
    NAACL 2022 [pdf] [video]
  • Toward More Effective Human Evaluation for Machine Translation
    Belén Saldías Fuentes, George Foster, Markus Freitag, Qijun Tan
    HumEval [pdf] [video]
  • Proceedings of the Seventh Conference on Machine Translation (WMT)
    Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André FT Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
    WMT 2022 [pdf]
  • Findings of the WMT 2022 Shared Task on Automatic Post-Editing
    Pushpak Bhattacharyya, Rajen Chatterjee, Markus Freitag, Diptesh Kanojia, Matteo Negri, Marco Turchi
    WMT 2022 [pdf]
  • On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation
    Kelly Marchisio, Markus Freitag, David Grangier
    NAACL 2022 [pdf] [video]
  • 2021

  • Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
    Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, Wolfgang Macherey
    TACL [pdf]
  • Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain
    Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar
    WMT 2021 [pdf]
  • Assessing Reference-Free Peer Evaluation for Machine Translation
    Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry
    NAACL 2021 [pdf] [video]
  • Findings of the 2021 Conference on Machine Translation (WMT21)
    Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
    WMT 2021 [pdf]
  • Using Machine Translation to Localize Task Oriented NLG Output
    Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano
    Arxiv [pdf]
  • 2020

  • BLEU might be Guilty but References are not Innocent
    Markus Freitag, David Grangier, Isaac Caswell
    EMNLP 2020 [pdf] [video]
  • Translationese as a Language in “Multilingual” NMT
    Parker Riley, Isaac Caswell, Markus Freitag, David Grangier
    ACL 2020 [pdf] [video]
  • Results of the WMT20 Metrics Shared Task
    Nitika Mathur, Johnny Wei, Markus Freitag, Qingsong Ma, Ondřej Bojar
    WMT 2020 [pdf]
  • Human-Paraphrased References Improve Neural Machine Translation
    Markus Freitag, George Foster, David Grangier, Colin Cherry
    WMT 2020 [pdf] [video]
  • Complete Multilingual Neural Machine Translation
    Markus Freitag, Orhan Firat
    WMT 2020 [pdf] [video]
  • Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
    Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P Parikh
    WMT 2020 [pdf] [video]
  • KoBE: Knowledge-Based Machine Translation Evaluation
    Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, Wolfgang Macherey
    EMNLP 2020 [pdf] [video]
  • 2019

  • APE at Scale and Its Implications on MT Evaluation Biases
    Markus Freitag, Isaac Caswell, Scott Roy
    WMT 2019 [pdf]
  • 2018

  • Unsupervised Natural Language Generation with Denoising Autoencoders
    Markus Freitag, Scott Roy
    EMNLP 2018 [pdf]
  • 2017

  • Beam Search Strategies for Neural Machine Translation
    Markus Freitag, Yaser Al-Onaizan
    ACL 2017 [pdf]
  • Ensemble Distillation for Neural Machine Translation
    Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran
    Arxiv [pdf]
  • Attention-based Vocabulary Selection for NMT Decoding
    Baskaran Sankaran, Markus Freitag, Yaser Al-Onaizan
    Arxiv [pdf]
  • 2016

  • Fast Domain Adaptation for Neural Machine Translation
    Markus Freitag, Yaser Al-Onaizan
    Arxiv [pdf]
  • pre 2016

    You can find a complete list of my publications on my Google Scholar page.

    Thesis

    • Investigations on Machine Translation System Combination
      Markus Freitag
      PhD dissertation at RWTH Aachen University, Germany [pdf]