Jianfei Ma

I am currently an independent researcher with a focus on reinforcement learning.

I received my Bachelor's degree in Statistics at Northwestern Polytechnical University, where I was studying mathematics and statistics.

Email  /  CV  /  Google Scholar  /  Github

Research

I'm broadly interested in deep reinforcement learning. My current research focuses on structured memory, model-based planning, and temporal abstraction to enhance sample efficiency by utilizing inference and optimization. The ultimate goal is to realize the autonomous AI systems for a wide spectrum of applications that benefit the society.

The core research problems that I am actively working on and will continue to answer are

  • Sample Efficiency
    How to learn new skills with minimal data is a long-standing challenge in the field.
  • Intrinsic Motivation
    Artificial goals are rich in nature and dispense with the need of hand-crafted reward.
  • World Model
    A universal internal state representation enables fast planning and leverages different sensory informations.
  • Options
    Diverse skills require tackling a set of MDPs that leads to the emergence of general and autonomous agents.

Discerning Temporal Difference Learning
Jianfei Ma
AAAI, 2024
arXiv

We propose an emphasis-aware TD learning method, DTD, which takes into account the significance of historical states and better controls TD error propagation, allowing more efficient credit assignment.

Generative Intrinsic Optimization: Intrinsic Control with Model Learning
Jianfei Ma
NeurIPS Workshop IMOL, 2023
arXiv

We provide a theoretical analysis of mutual information maximization, accompanied with a variational approach to jointly learn the posterior and the transition model.

The Point to Which Soft Actor-Critic Converges
Jianfei Ma
ICLR Tiny Papers, 2023
arXiv

We bridge SQL and SAC that in the limit they converge to the same solution, which translates the optimization from an arduous to an easier way.

Distillation Policy Optimization
Jianfei Ma
Preprint, 2023
arXiv / code

A general learning framework for a set of on-policy algorithms, with off-policy data fully engaged, shored by Unified Advantage Estimate (UAE), and a residual baseline.

Misc

  • I am passionate about soccer, for which I established a team called "Boyzone".
  • I love painting, music and dancing.



Design and source code from Jon Barron