Ruizhe Chen
PhD student

Ruizhe Chen is currently a Ph.D. candidate in Computer Science at Zhejiang University, advised by Prof. Zuozhu Liu, and I expect to graduate in June 2026. My research centers on large-model post-training and multimodal video understanding. Previously, I contributed to Qwen3-VL with Alibaba’s Qwen team. I’ve published in top venues including NeurIPS, ICLR, ACL, EMNLP, and NAACL. I’m currently seeking positions focused on large multimodal models, especially video-LLMs.


Education
  • Zhejiang University
    Zhejiang University
    Department of Computer Science
    Ph.D. Student
    Sep. 2021 - present
  • Zhejiang University
    Zhejiang University
    B.S. in Electrical Engineering
    Sep. 2017 - Jul. 2021
Work Experience
  • Qwen Team, Alibaba Group
    Qwen Team, Alibaba Group
    Contribute to Qwen3-VL with a focus on Video Understanding and Agentic RL.
    Research Intern
    2025
News
2025
Served as a core contributor to the Qwen3-VL, technical report released.
Oct
Four papers accepted by EMNLP 2025
Aug
Four papers accepted by ACL 2025
Apr
Two paper accepted by ICLR 2025. One paper accepted by NAACL 2025.
Feb
2021
Start PhD at Zhejiang University
Aug
Selected Publications (view all )
Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning

EMNLP 2025

This paper introduces a two-stage SFT+RL framework that improves Video Temporal Grounding accuracy and robustness.

Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning

EMNLP 2025

This paper introduces a two-stage SFT+RL framework that improves Video Temporal Grounding accuracy and robustness.

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

ACL 2025

Diffusion-styled Preference Optimization (DPO) is a plug-and-play, policy-agnostic inference-time alignment method that aligns LLMs at the sentence level to reduce latency while improving alignment quality across benchmarks and model scales.

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models

ACL 2025

Diffusion-styled Preference Optimization (DPO) is a plug-and-play, policy-agnostic inference-time alignment method that aligns LLMs at the sentence level to reduce latency while improving alignment quality across benchmarks and model scales.

Pad: Personalized alignment of llms at decoding-time
Pad: Personalized alignment of llms at decoding-time

Ruizhe Chen, Zuozhu Liu

ICLR 2025

Large Language Models Alignment.

Pad: Personalized alignment of llms at decoding-time

Ruizhe Chen, Zuozhu Liu

ICLR 2025

Large Language Models Alignment.

Learnable Privacy Neurons Localization in Language Models
Learnable Privacy Neurons Localization in Language Models

Ruizhe Chen, Tianxiang Hu, Zuozhu Liu

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 main) 2024

Large Language Models Safety (Privacy).

Learnable Privacy Neurons Localization in Language Models

Ruizhe Chen, Tianxiang Hu, Zuozhu Liu

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 main) 2024

Large Language Models Safety (Privacy).

Fast model debias with machine unlearning
Fast model debias with machine unlearning

Ruizhe Chen, Jianfei Yang, Zuozhu Liu

Advances in Neural Information Processing Systems 2023

DL Fairness, Large Language Models Fairness, Machine Unlearning via Influence Function

Fast model debias with machine unlearning

Ruizhe Chen, Jianfei Yang, Zuozhu Liu

Advances in Neural Information Processing Systems 2023

DL Fairness, Large Language Models Fairness, Machine Unlearning via Influence Function

All publications