Zhang-Wei Hong

Research: I develop reinforcement learning (RL) methods for computational discovery—finding novel solutions in domains ranging from materials science to robotics. My work addresses two fundamental challenges in applying RL to real-world discovery:

  • Learning from sparse feedback: Discovery problems often provide limited reward signals, making it difficult for RL agents to learn effectively. My research develops principled approaches to accelerate learning despite sparse supervision. [NeurIPS'24, NeurIPS'23, ICML'23, ICML'23, ICLR'23, ICLR'22, ICLR'22]
  • Generating diverse solutions: Many discovery problems require multiple high-quality candidates rather than a single optimum. I develop methods that produce diverse solution sets for applications like drug discovery and AI safety. [ICLR'24]

Bio: I am a Principal Investigator and Research Staff Member at the MIT-IBM Watson AI Lab. I received my Ph.D. in Electrical Engineering and Computer Science from MIT, advised by Prof. Pulkit Agrawal. I was a recipient of the Qualcomm Innovation Fellowship (2024). Prior to MIT, I earned my B.S. and M.S. from National Tsing Hua University, where I worked with Prof. Chun-Yi Lee and Prof. Min Sun. I have also collaborated with Prof. Jan Peters at TU Darmstadt and conducted research at Preferred Networks.

         

  Honors & Awards

Qualcomm Innovation Fellowship, North America 2024


  Mentorship

I am fortunate to work with talented students and researchers:

Prospective collaborators: I am always open to collaborating on ideas related to reinforcement learning. Feel free to reach out via email.



  Experience

Principal Investigator | Research Staff Member Jan. 2025 - Present
MIT-IBM Watson AI Lab
Research Intern Jun. 2023 - Sep. 2023
MIT-IBM Watson AI Lab
Advisor: Akash Srivastava
Research Intern Jun. 2022 - Oct. 2022
Microsoft Research Montreal
Advisor: Romain Laroche and Remi Tachet des Combes.

Research Intern Jun. 2019 - Oct. 2019
Preferred Networks
Advisor: Prabhat Nagarajan and Dr. Guilherme Maeda.

Research intern Feb. 2019 - Jun. 2019
Appier
Advisor: Prof. Min Sun

Visiting researcher Jul. 2018 - Oct. 2018
Intelligent Autonomous System (IAS) group at TU Darmstadt
Advisor: Prof. Jan Peters



  Selected Publications

RL for Foundation Models Applications on Science Fundamental RL
RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi
NeurIPS, 2025
Paper | Code

BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Iris Xu, Guangtao Zeng, Zexue He, Charles Jin, Aldo Pareja, Dan Gutfreund, Chuang Gan, Zhang-Wei Hong
ICLR, 2026
Paper | Code

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen, Guangtao Zeng, Zhenting Qi, Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory Wornell, Subhro Das, David Cox, Chuang Gan
ICML, 2025
Paper | Code

Red Teaming Language-Conditioned Robot Models via Vision Language Models
Sathwik Karnik*, Zhang-Wei Hong*, Nishant Abhangi*, Yen-Chen Lin, Tsun-Hsuan Wang, Pulkit Agrawal
NeurIPS Safe Generative AI Workshop, 2024
Paper | Bibtex

Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James R. Glass, Akash Srivastava, Pulkit Agrawal
ICLR, 2024  |  MIT News
Paper | Code | Bibtex


  Course Materials

Lecture Notes: 6.8200 Computational Sensorimotor Learning, MIT


  Invited Talks

ZEW Workshop on Red Teaming Generative AI Models
Microsoft Turing Team
Macro Eyes
Toronto AI in Robotics Seminar, University of Toronto


  Teaching

6.484 Computational Sensorimotor Learning, MIT Spring 2022

6.S090 Deep Learning for Control, MIT Spring 2021

Deep Learning Institute, NVIDIA Taiwan Spring 2018





template from jonbarron