[Resume]
My research interests include Reinforcement Learning, Machine Learning and Capybara.
Experience
[09/2024 - now] PhD student majoring in Computer Science at the University of Southampton (transfer from Liverpool), advised by Prof. Chao Huang.
[09/2023 - 09/2024] PhD student majoring in Computer Science at the University of Liverpool (transfer to Southampton).
[04/2022 - 09/2023] Reinforcement Learning and Gaming AI Researcher working in the P2 team of Parametrix.AI.
[09/2019 - 04/2022] MS Student majoring in Computer Science at the Nanjing University of Aeronautics and Astronautics, supervised by Prof. Xiaoyang Tan.
[09/2015 - 06/2019] Undergrad Student majoring in Mathematics at the Nanjing University of Aeronautics and Astronautics.
Publications
Variational Delayed Policy Optimization.
Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang.
[NeurIPS 2024], Conference on Neural Information Processing Systems, 2024, Spotlight.
Paper / CodeBoosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays.
Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang.
[ICML 2024], International Conference on Machine Learning, 2024, Poster.
Paper / Code / PosterHighway Value Iteration Networks.
Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber.
[ICML 2024], International Conference on Machine Learning, 2024, Poster.
PaperState-wise safe reinforcement learning with pixel observations.
Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu.
[L4DC 2024], Learning for Dynamics and Control Conference, 2024, Poster.
Paper / Code
Pre-prints
Inverse Delayed Reinforcement Learning.
Simon Sinong Zhan, Qingyuan Wu, Zhian Ruan, Frank Yang, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu.
PaperModel-based Reward Shaping for Adversial Inverse Reinforcement Learning in Stochastic Environments.
Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu.
PaperScaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning.
Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber.
PaperHighway reinforcement learning.
Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber.
PaperExpected-Max Ensembled Q-learning with Temporally-Varying Exploration.
Qingyuan Wu, Yuhui Wang.
PaperGreedy-Step Off-Policy Reinforcement Learning.
Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan.
Paper