[Resume]

My research interests include Reinforcement Learning, Machine Learning and Capybara.

Experience

  • [09/2024 - now] PhD student majoring in Computer Science at the University of Southampton (transfer from Liverpool), advised by Prof. Chao Huang.

  • [09/2023 - 09/2024] PhD student majoring in Computer Science at the University of Liverpool (transfer to Southampton).

  • [04/2022 - 09/2023] Reinforcement Learning and Gaming AI Researcher working in the P2 team of Parametrix.AI.

  • [09/2019 - 04/2022] MS Student majoring in Computer Science at the Nanjing University of Aeronautics and Astronautics, supervised by Prof. Xiaoyang Tan.

  • [09/2015 - 06/2019] Undergrad Student majoring in Mathematics at the Nanjing University of Aeronautics and Astronautics.

Publications

  • Variational Delayed Policy Optimization.
    Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang.
    [NeurIPS 2024], Conference on Neural Information Processing Systems, 2024, Spotlight.
    Paper / Code

  • Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays.
    Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang.
    [ICML 2024], International Conference on Machine Learning, 2024, Poster.
    Paper / Code / Poster

  • Highway Value Iteration Networks.
    Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber.
    [ICML 2024], International Conference on Machine Learning, 2024, Poster.
    Paper

  • State-wise safe reinforcement learning with pixel observations.
    Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu.
    [L4DC 2024], Learning for Dynamics and Control Conference, 2024, Poster.
    Paper / Code

Pre-prints

  • Inverse Delayed Reinforcement Learning.
    Simon Sinong Zhan, Qingyuan Wu, Zhian Ruan, Frank Yang, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu.
    Paper

  • Model-based Reward Shaping for Adversial Inverse Reinforcement Learning in Stochastic Environments.
    Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu.
    Paper

  • Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning.
    Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber.
    Paper

  • Highway reinforcement learning.
    Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber.
    Paper

  • Expected-Max Ensembled Q-learning with Temporally-Varying Exploration.
    Qingyuan Wu, Yuhui Wang.
    Paper

  • Greedy-Step Off-Policy Reinforcement Learning.
    Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan.
    Paper