|
Zhang-Wei Hong
Research: I develop reinforcement learning (RL) methods for computational discovery—finding novel solutions in domains ranging from materials science to robotics. My work addresses two fundamental challenges in applying RL to real-world discovery:
-
Learning from sparse feedback: Discovery problems often provide limited reward signals, making it difficult for RL agents to learn effectively. My research develops principled approaches to accelerate learning despite sparse supervision. [NeurIPS'24, NeurIPS'23, ICML'23, ICML'23, ICLR'23, ICLR'22, ICLR'22]
-
Generating diverse solutions: Many discovery problems require multiple high-quality candidates rather than a single optimum. I develop methods that produce diverse solution sets for applications like drug discovery and AI safety. [ICLR'24]
Bio:
I am a Principal Investigator and Research Staff Member at the MIT-IBM Watson AI Lab. I received my Ph.D. in Electrical Engineering and Computer Science from MIT, advised by Prof. Pulkit Agrawal. I was a recipient of the Qualcomm Innovation Fellowship (2024). Prior to MIT, I earned my B.S. and M.S. from National Tsing Hua University, where I worked with Prof. Chun-Yi Lee and Prof. Min Sun. I have also collaborated with Prof. Jan Peters at TU Darmstadt and conducted research at Preferred Networks.
|
|
|
Mentorship
|
I am fortunate to work with talented students and researchers:
- Iris Xu - MIT Undergraduate, former intern at MIT-IBM Watson AI Lab
- Harry Sillifant - MIT Undergraduate
- Sunshine Jiang - MIT MEng
- Raina Wu - MIT Undergraduate
- Anna Yang - MIT MEng
- Chen Bo Calvin Zhang - MIT Visiting researcher, ETH Zurish Master student, now at Scale AI
- Phat Nguyen - UMass Amherst Undergraduate, now at Research assistant at MIT CSAIL
- Nishant Abhangi - MIT Undergraduate, now at Sunrise Futures LLC
- Zechu Li - MIT Visiting Student, Master Student at TU Darmstadt
- Siddhant Mukherjee - MIT Undergraduate, now at Citidel
- Sathwik Karnik - MIT Undergraduate, now Stanford Ph.D. student in Stanford
- Chi-Chang Lee - National Taiwan University Master, now Ph.D. student in UMD
- Srinath Mahankali - MIT Undergraduate, now founding engineer at stealth startup
- Eric Chen - MIT Undergraduate, now PhD student in MIT
Prospective collaborators: I am always open to collaborating on ideas related to reinforcement learning. Feel free to reach out via email.
|
|
Research intern
Feb. 2019 - Jun. 2019
Appier
Advisor: Prof. Min Sun
|
RL for Foundation Models
Applications on Science
Fundamental RL
RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning
Kaiwen Zha, Zhengqi Gao, Maohao Shen, Zhang-Wei Hong, Duane S. Boning, Dina Katabi
NeurIPS, 2025
Paper |
Code
BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Iris Xu, Guangtao Zeng, Zexue He, Charles Jin, Aldo Pareja, Dan Gutfreund, Chuang Gan, Zhang-Wei Hong
ICLR, 2026
Paper |
Code
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen, Guangtao Zeng, Zhenting Qi, Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory Wornell, Subhro Das, David Cox, Chuang Gan
ICML, 2025
Paper |
Code
Red Teaming Language-Conditioned Robot Models via Vision Language Models
Sathwik Karnik*, Zhang-Wei Hong*, Nishant Abhangi*, Yen-Chen Lin, Tsun-Hsuan Wang, Pulkit Agrawal
NeurIPS Safe Generative AI Workshop, 2024
Paper | Bibtex
Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James R. Glass, Akash Srivastava, Pulkit Agrawal
ICLR, 2024 | MIT News
Paper |
Code |
Bibtex
A Multimodal Robotic Platform for Multi-Element Electrocatalyst Discovery
Zhen Zhang, Zhichu Ren, Chia-Wei Hsu, Weibin Chen, Zhang-Wei Hong, Chi-Feng Lee, Aubrey Penn, Hongbin Xu, Daniel J Zheng, Shuhan Miao, Yimeng Huang, Yifan Gao, Weiyin Chen, Hugh Smith, Yaoshen Niu, Yunsheng Tian, Ying-Rui Lu, Yu-Cheng Shao, Sipei Li, Hsiao-Tsu Wang, Iwnetim I Abate, Pulkit Agrawal, Yang Shao-Horn, Ju Li
Nature, 2025
Paper
A Distributional Reinforcement Learning Model for Optimal Glucose Control After Cardiac Surgery
Jacob M Desman, Zhang-Wei Hong, Moein Sabounchi, Ashwin S Sawant, Jaskirat Gill, Ana C Costa, Gagan Kumar, Rajeev Sharma, Arpeta Gupta, Paul McCarthy, Veena Nandwani, Doug Powell, Alexandra Carideo, Donnie Goodwin, Sanam Ahmed, Umesh Gidwani, Matthew A Levin, Robin Varghese, Farzan Filsoufi, Robert Freeman, Avniel Shetreat-Klein, Alexander W Charney, Ira Hofer, Lili Chan, David Reich, Patricia Kovatch, Roopa Kohli-Seth, Monica Kraft, Pulkit Agrawal, John A Kellum, Girish N Nadkarni, Ankit Sakhuja
npj Digital Medicine, 2025
Paper
Towards Generating Stable Materials via Large Language Models with Reinforcement Learning Finetuning
Zhang-Wei Hong*, Nofit Segal*, Aviv Netanyahu, Hoje Chun, Rafael Gomez-Bombarelli, Pulkit Agrawal
NeurIPS AI4Science Workshop, 2025
Paper
Maximizing Velocity by Minimizing Energy
Srinath Mahankali*, Chi-Chang Lee*, Gabriel B. Margolis, Zhang-Wei Hong, Pulkit Agrawal
ICRA, 2024
Paper (coming soon)
Stubborn: A Strong Baseline for Indoor Object Navigation
Haokuan Luo, Albert Yue, Zhang-Wei Hong, Pulkit Agrawal
IROS, 2022
Paper |
Code
Virtual-to-Real: Learning to Control in Visual Semantic Segmentation
Zhang-Wei Hong, Yu-Ming Chen, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Chun-Yi Lee
IJCAI, 2018 (Oral)
Paper |
Project
Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Yihang Yao, Guangtao Zeng, Raina Wu, Yang Zhang, Ding Zhao, Zhang-Wei Hong, Chuang Gan
arXiv, 2025
Paper
Going Beyond Heuristics by Imposing Policy Improvement as a Constraint
Chi-Chang Lee, Zhang-Wei Hong, Pulkit Agrawal
NeurIPS, 2024
Paper | Code | Bibtex
Random Latent Exploration for Deep Reinforcement Learning
Srinath Mahankali, Zhang-Wei Hong, Ayush Sekhari, Alexander Rakhlin, Pulkit Agrawal
ICML, 2024
Website |
Paper
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal
NeurIPS, 2023
Paper |
Code
TGRL: An Algorithm for Teacher Guided Reinforcement Learning
Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal
ICML, 2023
Paper |
Website |
Code
Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
Zechu Li*, Tao Chen*, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
ICML, 2023
Paper |
Code
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Reweighting
Zhang-Wei Hong, Pulkit Agrawal, Remi Tachet des Combes, Romain Laroche
ICLR, 2023
Paper |
Code
Redeeming Intrinsic Rewards via Constrained Optimization
Eric Chen*, Zhang-Wei Hong*, Joni Pajarinen, Pulkit Agrawal (* indicates equal contribution)
NeurIPS, 2022 | MIT News
Paper |
Website |
Code
Bilinear Value Networks for Multi-goal Reinforcement Learning
Zhang-Wei Hong*, Ge Yang*, Pulkit Agrawal (* indicates equal contribution)
ICLR, 2022
Paper |
Code
Topological Experience Replay
Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal
ICLR, 2022
Paper |
Code
Reducing the Deployment-Time Inference Control Costs of Deep Reinforcement Learning Agents via an Asymmetric Architecture
Chin-Jui Chang, Yu-Wei Chu, Chao-Hsien Ting, Hao-Kang Liu, Zhang-Wei Hong, Chun-Yi Lee
ICRA, 2021
Paper
Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning
Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
ECML, 2021
Paper
Adversarial Active Exploration for Inverse Dynamics Model Learning
Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
CoRL, 2019 (Oral)
Paper |
Project
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, Chun-Yi Lee
NeurIPS, 2018
Paper |
Project
Deep Policy Inference Q-Network for Multi-Agent Systems
Zhang-Wei Hong, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
AAMAS, 2018 (Oral)
Paper
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
IJCAI, 2017
Paper |
Project
|