Jianfei Ma

I am a Ph.D. student in Computer Science at National University of Singapore advised by Prof. Wee Sun Lee with a focus on reinforcement learning.

I received my Bachelor's degree in Statistics at Northwestern Polytechnical University, where I was studying mathematics and statistics.

Email  /  CV  /  Google Scholar  /  Github

Research

I'm broadly interested in deep reinforcement learning. My current research focuses on structured memory, multi-step world models, and hierarchical RL. The ultimate goal is to realize the autonomous AI systems for a wide spectrum of applications that benefit the society.

The core research problems that I am actively working on and will continue to answer are

  • Sample Efficiency
    How to acquire skills and adapt to novel situations with minimal data is a long-standing challenge in the field.
  • World Models
    Predictive models with a universal state representation enable fine-grained planning and reasoning.
  • Intrinsic Motivation
    Without hand-crafted rewards, how could an agent maintain its well-being and possibly refine its policy?.
  • Options
    An autonomous agent must divide and conquer a task with subgoals by temporally abstracting knowledge and action.

Discerning Temporal Difference Learning
Jianfei Ma
AAAI, 2024
arXiv

We propose an emphasis-aware TD learning method, DTD, which takes into account the significance of historical states and better controls TD error propagation, allowing more efficient credit assignment.

Generative Intrinsic Optimization: Intrinsic Control with Model Learning
Jianfei Ma
NeurIPS Workshop IMOL, 2023
arXiv

We provide a theoretical analysis of mutual information maximization, accompanied with a variational approach to jointly learn the posterior and the transition model.

The Point to Which Soft Actor-Critic Converges
Jianfei Ma
ICLR Tiny Papers, 2023
arXiv

We bridge SQL and SAC that in the limit they converge to the same solution, which translates the optimization from an arduous to an easier way.

Distillation Policy Optimization
Jianfei Ma
Preprint, 2023
arXiv / code

A general learning framework for a set of on-policy algorithms, with off-policy data fully engaged, shored by Unified Advantage Estimate (UAE), and a residual baseline.

Entropy Augmented Reinforcement Learning
Jianfei Ma
Preprint, 2022
arXiv

We propose a general entropy augmentation technique, which can mitigate the exploration issues of TRPO and PPO, and is also applicable for other algorithms.

Misc

  • I am passionate about soccer, for which I established a team called "Boyzone".
  • I love painting, music and dancing.
  • I am into the rock band AC/DC, for its power, spirit and simplicity.



Design and source code from Jon Barron