Zhang, Binjie (张斌杰)

Binjie Zhang

Ph.D. Candidate, Computer Science
Show Lab, National University of Singapore (NUS)
E-mail: binjie97[AT]u.nus.edu

Biography

I am a Ph.D. candidate in Computer Science at the National University of Singapore (NUS), advised by Assistant Prof. Mike Zheng Shou in Show Lab. My research interests lie in video understanding, vision-language models, and world models, with a focus on deployable and compatible visual representations.

Before joining NUS, I received my M.Eng. in Computer Science and Technology from Tsinghua University, supervised by Prof. Chun Yuan, and my B.Eng. in Information Engineering from East China University of Science and Technology (ECUST), where I ranked 1 / 92.

I spent several years as an AI research intern at Tencent ARC Lab (Shenzhen), working on:

  • Compatible representation learning and model upgrades for large-scale image/video retrieval systems.

  • Cross-modality video understanding, including video-text retrieval and temporal grounding.

Broadly, my research aims to make visual and multimodal models easier to upgrade, reuse, and deploy in real systems, while enabling embodied and egocentric agents to anticipate both actions and visual futures.

News

  • [2026] Submitted GR-MCT (tool-using agents with multi-modal CoT) and PRCFC (lifelong imitation learning) to CVPR 2026 as first author.

  • [2025] Ego-centric Predictive Model Conditioned on Hand Trajectories submitted to ICLR 2025.

  • [2023] Darwinian Model Upgrades accepted to AAAI 2023.

  • [2022] Towards Universal Backward-Compatible Representation Learning accepted to IJCAI 2022 (long oral).

  • [2022] Hot-Refresh Model Upgrades accepted to ICLR 2022, and received the Tencent Technology Breakthrough Award and SZCCF Science and Technology Award.

Selected Publications

(* indicates first author or core contribution)

  1. GR-MCT: Group-Relative Reasoning with Multi-Modal CoT for Tool-Using Agents*
    CVPR, under submission, 2026.

  2. PRCFC: Lifelong Imitation Learning via Prototype Replay and Coarse-to-Fine Compatibility*
    CVPR, under submission, 2026.

  3. Ego-centric Predictive Model Conditioned on Hand Trajectories*
    ICLR, under review, 2025.
    [Paper]

  4. TaCA: Upgrading Your Visual Foundation Model with a Task-Agnostic Compatible Adapter*
    Binjie Zhang, Yixiao Ge, Xuyuan Xu,, Mike Zheng Shou
    arXiv, 2023.
    [Paper]

  5. Darwinian Model Upgrades: Model Evolving with Selective Compatibility*
    Binjie Zhang, Yixiao Ge, Xuyuan Xu, Chun Yuan
    AAAI, 2023.
    [Paper]

  6. Towards Universal Backward-Compatible Representation Learning*
    Binjie Zhang, Yixiao Ge, Yantao Shen, Yu Li, Chun Yuan
    IJCAI (long oral), 2022.
    [Paper]

  7. Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval*
    Binjie Zhang, Yixiao Ge, Yantao Shen, Shupeng Su, Chun Yuan, Ying Shan
    ICLR, 2022.
    [Paper]

A more complete and up-to-date list can be found on my Google Scholar.

Projects

PRCFC project

Robot Lifelong Learning (PRCFC)
Developed PRCFC, a lightweight framework for lifelong imitation learning that uses compact prototype replay and coarse-to-fine compatibility regularization to reduce forgetting and improve cross-task transfer on the LIBERO benchmark.

Ego-centric Predictive Model

Ego-Centric Predictive Model
Designed a two-stage model that predicts future hand trajectories and uses them to guide a Latent Diffusion Model for egocentric future video generation, achieving state-of-the-art performance on Ego4D, BridgeData, and RLBench.

TaCA Adapter

Task-Agnostic Compatible Adapter (TaCA)
Proposed TaCA, a parameter-efficient adapter that enables seamless upgrades between visual foundation models (e.g., CLIP variants) without retraining downstream tasks, validated on large-scale video-text and video recognition benchmarks.

Hot-Refresh Model Upgrades

Hot-Refresh Model Upgrades
Studied model regression in hot-refresh upgrades for large-scale image retrieval systems, proposing training regularization and uncertainty-based backfilling strategies to maintain compatibility without re-encoding existing galleries.

Research Experience

  • Ph.D. Candidate, Show Lab @ NUS
    Jan. 2023 – Present, Singapore
    Advisor: Mike Zheng Shou
    Research on video understanding, vision-language models, world models, and compatible visual representations.

  • AI Research Intern, Tencent ARC Lab – Compatible Representation Learning Group
    2020 – 2023, Shenzhen, China
    Advisors: Yixiao Ge, Yantao Shen
    Worked on backward-compatible representation learning and model upgrades for large-scale image and video retrieval systems, including multi-GPU PyTorch pipelines and hot-refresh deployment strategies.

  • AI Research Intern, Tencent ARC Lab – Cross-Modality Video Understanding Group
    2019 – 2020, Shenzhen, China
    Advisors: Yu Li, Ying Shan
    Developed multimodal models for video-text retrieval and temporal grounding on large-scale video-language datasets.

Education

  • National University of Singapore (NUS)
    Ph.D. in Computer Science, Jan. 2023 – Jan. 2027 (expected), Singapore
    Advisor: Mike Zheng Shou

  • Tsinghua University (THU)
    M.Eng. in Computer Science and Technology, Aug. 2019 – Jul. 2022, Beijing, China
    Advisor: Chun Yuan
    Research focus: compatible representation learning; cross-modal video understanding.

  • East China University of Science and Technology (ECUST)
    B.Eng. in Information Engineering, Aug. 2015 – Jul. 2019, Shanghai, China
    Cumulative GPA: 3.77 / 4.00, Ranking: 1 / 92.

Selected Awards

  • Tencent Technology Breakthrough Award – Hot-Refresh Model Upgrades, 2022.

  • SZCCF Science and Technology Award – Efficient Model Upgrades, 2022.

  • Excellent Master Degree Graduate in Beijing & Outstanding Master's Graduation Thesis, 2022.

  • Annual College Personage Award (highest student honor in ECUST), 2018.

  • National Scholarship for Undergraduates (twice), Ministry of Education of China, 2016 & 2017.

Academic Service

  • Reviewer: ECCV, ICCV, AAAI, IJCAI, CVPR, ICLR

© Binjie Zhang · Last updated: Nov. 2025