I studied at Sichuan University (SCU) from 2020 to 2024,
where I majored in Computer Science & Technology. My
Major GPA (CS courses):
3.79/4, 89.39/100;
Overall GPA: 3.78/4, 89.25/100
During my time at Sichuan University, I worked as a research assistant at
MachineILab
from 2022 to 2024, advised by
Prof. JiZhe Zhou.
I participated in one National Natural Science Foundation of China project and one National Key
R&D
Program of China.
My research interests lie in large language models (LLMs), retrieval-augmented generation
(RAG),
and Recommender Systems (RecSys). I focus on both theoretical
foundations and
practical applications of LLM-based systems.
My previous research was primarily focused on topics within computer vision, such as tampering
detection and object recognition tasks. I have contributed to the design of high-impact
benchmarks,
such as HiBench, and comprehensive surveys like WebAgents.
My work has been published in top-tier conferences, including
NeurIPS 2024 (spotlight) and AAAI 2025, and has accumulated
300+ citations, with an h-index of 5.
🏆 2026-05-16 - Our paper Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations was accepted as a presentation at the KDD 2026 Main Conference ADS Track! 🎉
🏆2026-04 - Our paper SUPERGLASSES was accepted by CVPR 2026 Findings! 🎉
💼 2025-10 - Will join Kuaishou E-commerce Team as a
research intern.
📝 2025-09 - Appointed as Topic Coordinator for
Frontiers in Artificial Intelligence (Impact Factor: 4.7, CiteScore: 7.3,
Logic and Reasoning Section) and Frontiers in Big Data (Impact Factor: 2.3,
CiteScore: 6.1).
🏆 2025-08-20 - Our work QA-Dragon was accepted by KDD
2025 Workshop for Multimodal Retrieval Augmented Generation.
🎤 2025-08-07 - Invited to give a talk "Understanding Hierarchical Data with
Large Language Models: RAG, Structural Reasoning, and Future Directions" at KDD 2025 Reasoning Day in Toronto,
Canada! 🎙️
🏅 2025-06-18 - Achieved 12th place globally in KDD Cup 2025
-
Meta CRAG-MM Multimodal Retrieval Challenge among hundreds of international teams! 🌍
🏆 2025-05-16 - Our benchmark paper HiBench was accepted to KDD Benchmark
Track! 🎉
📝 2024-09-15 - 担任 Frontiers in Artificial
Intelligence(影响因子:4.7,CiteScore:7.3,Logic and Reasoning专栏)与 Frontiers in Big
Data(影响因子:2.3,CiteScore:6.1)期刊的Topic Coordinator。
[KDD'26 ADS Track] Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations Zhuohang Jiang, Yuxin Chen, Shijie Wang, Haohao Qu, Zhou Jindong, Wenqi Fan, Li Qing, Dongxu Liang, Jun Wang
We introduce AIR, an LLM-driven cross-domain recommendation framework that moves semantic reasoning
offline and composes user intent representations efficiently online. AIR achieves about 400x
inference acceleration, reaches state-of-the-art results on public benchmarks, and improves
Kuaishou E-commerce online GMV by +3.446% in large-scale A/B testing.
[CVPR'26 Findings] SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses Zhuohang Jiang, Xu Yuan, Haohao Qu, Shanru Lin, Kanglong Liu, Wenqi Fan, Qing Li
SUPERGLASSES is a real-world smart-glasses VQA benchmark with 2,422 egocentric image-question pairs
across 14 domains and 8 query types. We evaluate 26 VLMs and propose SUPERLENS, a multimodal RAG
agent that surpasses GPT-4o by 2.19% on this setting.
[KDD'25 Workshop] QA‑Dragon: Query‑Aware Dynamic RAG System for Knowledge‑Intensive Visual
Question Answering Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li
QA-Dragon is a query-aware dynamic RAG system for knowledge-intensive VQA. By routing queries across
domains and retrieval strategies, it supports multimodal, multi-turn, and multi-hop reasoning and
improves Meta CRAG-MM Challenge performance by up to 6.35% over baselines.
[KDD'25] A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with
Large
Foundation Models
Liangbo Ning, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei,
Shanru Lin, Hui Liu, Philip S. Yu, Qing Li
This survey reviews WebAgents powered by large foundation models, focusing on architectures,
training methods, and trustworthiness. It organizes recent progress in web automation and outlines
key directions for building more reliable autonomous web agents.
[KDD'25] HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu,
Chen
Li, Peter H.F. Ng, Qing Li
HiBench is a benchmark for evaluating hierarchical structure reasoning in LLMs, covering 30 tasks
and 39,519 queries across six scenarios. Experiments on 20 LLMs reveal strengths in basic hierarchy
reasoning and limitations on complex or implicit structures; a compact instruction dataset further
improves model performance.
[AAAI'25] Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image
Manipulation Localization
Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng,
Chi-Man Pun, Jizhe Zhou
This work introduces Mesorch, a mesoscopic architecture for image manipulation localization that
combines macro-level semantic cues with micro-level tampering traces. Across four datasets, the
proposed models improve accuracy, efficiency, and robustness over prior methods.
[NIPS'24] IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation
Detection &
Localization
Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu
Yang,
Chi-Man Pun, Jiancheng Lv, Jizhe Zhou
IMDL-BenCo is a comprehensive benchmark and modular codebase for image manipulation detection and
localization. It standardizes training, evaluation, metrics, robustness tests, and eight strong
baselines, enabling more reliable comparison across IMDL methods.
Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph
Reasoning Zhuohang Jiang, Bingkui Tong, Xia Du, Ahmed Alhammadi, Jizhe Zhou
PrivacyGuard formulates privacy-sensitive object identification as a visual reasoning problem over
structured scene context. It builds heterogeneous scene graphs, balances privacy classes with
contextual perturbation, and reasons over hybrid graph paths to capture subtle context changes.
[ICONIP'23] TPTGAN: Two-Path Transformer-Based Generative Adversarial Network Using
Joint Magnitude Masking and Complex Spectral Mapping for Speech Enhancement
Zhaoyi Liu, Zhuohang Jiang, Wendian Luo, Zhuoyao Fan, Haoda Di, Yufan Long, Haizhou Wang
TPTGAN is a two-path Transformer-based metric GAN for speech enhancement in the time-frequency
domain. It jointly predicts magnitude masks and complex spectra to improve enhanced speech
quality.
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
Xiaochen Ma, Bo Du, Zhuohang Jiang, Ahmed Y. Al Hammadi, Jizhe Zhou
IML-ViT is a ViT-based benchmark model for image manipulation localization that combines
high-resolution inputs, multi-scale features, and manipulation-edge supervision. It outperforms
prior localization methods on five public datasets.
Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing
on Low-level Features
Xiaochen Ma, Zhuohang Jiang, Xiong Xu, Chi-Man Pun, Jizhe Zhou
Perceptual MAE enhances masked autoencoders for image manipulation localization with
high-resolution inputs and perceptual supervision. The model combines high-level semantic
understanding with low-level tampering cues and achieves strong results on five datasets.
Selected Projects主要项目
HiBench: Benchmark for Hierarchical ReasoningHiBench:层次化推理基准 First Author & Team Leader • KDD 2025 Benchmark Track • 2025
Designed and developed the first comprehensive benchmark for evaluating LLMs' capability on
hierarchical structure reasoning.
The benchmark encompasses six representative scenarios with 39,519 queries across varying
hierarchical complexity.
Key contributions: (1) Led the architectural design and implementation of the
evaluation framework,
(2) Coordinated a multi-institutional team across different time zones, (3) Open-sourced the
complete toolkit
including dataset, evaluation metrics, and baseline implementations. The benchmark has been
accepted
as an
oral presentation at KDD 2025 and is being adopted by multiple research groups
for
hierarchical reasoning evaluation.
Meta CRAG-MM: Multimodal Retrieval ChallengeMeta CRAG-MM:多模态检索挑战 Team Leader • KDD Cup 2025 • 2025
Led a team to achieve 12th place globally among hundreds of international teams
in
the Meta CRAG-MM
Multimodal Retrieval Challenge. Key contributions: (1) Designed novel multimodal
fusion architectures
combining vision and language understanding, (2) Implemented efficient retrieval-augmented
generation pipelines,
(3) Coordinated team efforts in model development, hyperparameter optimization, and submission
strategies.
The challenge focused on developing AI systems capable of understanding and retrieving information
from
multimodal content, which aligns with current trends in large multimodal models.
Developed the first comprehensive benchmark and codebase for Image Manipulation Detection &
Localization (IMDL).
Key contributions: (1) Implemented GPU-accelerated evaluation metrics for fair
and
efficient comparison,
(2) Designed modular codebase architecture enabling easy customization and extension, (3)
Co-authored the manuscript
that received a Spotlight Award at NeurIPS 2024. The benchmark includes 8
state-of-the-art models,
15 evaluation metrics, and comprehensive robustness evaluation protocols, significantly advancing
the field's
standardization and reproducibility.
Understanding Hierarchical Data with Large Language Models: RAG,
Structural Reasoning, and Future Directions用大语言模型理解层次化数据:RAG、结构推理与未来方向 Invited Talks • Reasoning Day @ KDD 2025 • Toronto, ON, Canada • Aug 2025
Invited to deliver a presentation at the prestigious KDD 2025 Reasoning Day workshop.
The talk will explore cutting-edge developments in leveraging Large Language Models for
hierarchical
data understanding,
with particular focus on Retrieval-Augmented Generation (RAG) systems and structural reasoning
capabilities. Key topics:
(1) Novel approaches to hierarchical data representation in LLM contexts,
(2) Integration of structural reasoning with retrieval-augmented generation,
(3) Future research directions in reasoning-enhanced AI systems,
(4) Practical applications and deployment considerations for hierarchical reasoning in real-world
scenarios.
This invitation recognizes the impact of our HiBench work and positions our research at the
forefront of LLM reasoning capabilities.
Sichuan University, Chengdu, Sichuan, China
B.E. in Computer Science and Technology • Sep. 2020 to Jun. 2024
Hong Kong Polytechnic University, Hongkong, China
PHD. in Computer Science and Technology • Sep. 2024 to Present
四川大学,成都,四川,中国
计算机科学与技术学士 • 2020年9月至2024年6月
香港理工大学,香港,中国
计算机科学与技术博士 • 2024年9月至今
Experience工作经历
National University of Singapore (NUS) Summer School Participant • Aug. 2023
• Participated in intensive research program at School of Computing
• Completed face recognition project using CNN-based feature extraction and similarity matching
• Gained international research experience and cross-cultural collaboration skills
DICALab, Sichuan University Research Assistant • Sep. 2022 to Jun. 2024
Advisor: Prof. JiZhe
Zhou
• Developed graph-based frameworks for privacy-sensitive object detection
• Participated in National Natural Science Foundation of China project
• Contributed to National Key R&D Program of China
• Co-authored multiple publications in top-tier conferences and journals
Kuaishou E-commerce Team, Kuaishou Inc. Research Intern • Oct. 2025 to Present
Beijing, China
• Conducted end-to-end recommendation research based on large language models