Zhuohang Jiang

I am currently pursuing a PhD degree at the Hong Kong Polytechnic University. My supervisors are Qing Li and Wenqi Fan. My current Cumulative GPA is 3.60/4.00.

I studied at Sichuan University (SCU) from 2020 to 2024, where I majored in Computer Science & Technology. My Major GPA (CS courses): 3.79/4, 89.39/100; Overall GPA: 3.78/4, 89.25/100

During my time at Sichuan University, I worked as a research assistant at MachineILab from 2022 to 2024, advised by Prof. JiZhe Zhou. I participated in one National Natural Science Foundation of China project and one National Key R&D Program of China.

Email  /  CV  /  Google Scholar  /  Github 

profile photo

Research Topics

My research interests lie in large language models (LLMs), retrieval-augmented generation (RAG), and Recommender Systems (RecSys). I focus on both theoretical foundations and practical applications of LLM-based systems.

My previous research was primarily focused on topics within computer vision, such as tampering detection and object recognition tasks. I have contributed to the design of high-impact benchmarks, such as HiBench, and comprehensive surveys like WebAgents. My work has been published in top-tier conferences, including NeurIPS 2024 (spotlight) and AAAI 2025, and has accumulated 300+ citations, with an h-index of 5.


News

🏆 2026-05-16 - Our paper Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations was accepted as a presentation at the KDD 2026 Main Conference ADS Track! 🎉

🏆2026-04 - Our paper SUPERGLASSES was accepted by CVPR 2026 Findings! 🎉

💼 2025-10 - Will join Kuaishou E-commerce Team as a research intern.

📝 2025-09 - Appointed as Topic Coordinator for Frontiers in Artificial Intelligence (Impact Factor: 4.7, CiteScore: 7.3, Logic and Reasoning Section) and Frontiers in Big Data (Impact Factor: 2.3, CiteScore: 6.1).

🏆 2025-08-20 - Our work QA-Dragon was accepted by KDD 2025 Workshop for Multimodal Retrieval Augmented Generation.

🎤 2025-08-07 - Invited to give a talk "Understanding Hierarchical Data with Large Language Models: RAG, Structural Reasoning, and Future Directions" at KDD 2025 Reasoning Day in Toronto, Canada! 🎙️

🏅 2025-06-18 - Achieved 12th place globally in KDD Cup 2025 - Meta CRAG-MM Multimodal Retrieval Challenge among hundreds of international teams! 🌍

🏆 2025-05-16 - Our benchmark paper HiBench was accepted to KDD Benchmark Track! 🎉

🎉 2025-05-07 - Our survey paper A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models was accepted to KDD Tutorial Track! 🎊

📘 2025-03-01 - Completed the HiBench paper and released the code and dataset on GitHub and Hugging Face.

🌟 2025-01-15 - Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization was published in AAAI 2025.

🏆 2024-12-01 - IMDL-BenCo was published in NeurIPS 2024 Benchmark Tracks and received a Spotlight award.

🎓 2024-09-01 - Beginning my pursuit of a PhD degree in Hong Kong PolyU.

🎓 2024-06-26 - Got Outstanding Graduate Award from Sichuan University and Sichuan Province! 🎉

🎓 2024-06-26 - Graduated from Sichuan University with a bachelor's degree.

🛠️ 2024-06-12 - Completed the co-work project IMDLBenCo and finished a paper IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization

🔍 2024-05-24 - Finished a paper Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning

📚 2023-08-01 - Participated in NUS Summer School research program at National University of Singapore, completed face recognition project! 🇸🇬


Selected Publications
[KDD'26 ADS Track] Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations
Zhuohang Jiang, Yuxin Chen, Shijie Wang, Haohao Qu, Zhou Jindong, Wenqi Fan, Li Qing, Dongxu Liang, Jun Wang
CCF-A KDD 2026 ADS Track
We introduce AIR, an LLM-driven cross-domain recommendation framework that moves semantic reasoning offline and composes user intent representations efficiently online. AIR achieves about 400x inference acceleration, reaches state-of-the-art results on public benchmarks, and improves Kuaishou E-commerce online GMV by +3.446% in large-scale A/B testing.
[CVPR'26 Findings] SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses
Zhuohang Jiang, Xu Yuan, Haohao Qu, Shanru Lin, Kanglong Liu, Wenqi Fan, Qing Li
CCF-A arXiv
SUPERGLASSES is a real-world smart-glasses VQA benchmark with 2,422 egocentric image-question pairs across 14 domains and 8 query types. We evaluate 26 VLMs and propose SUPERLENS, a multimodal RAG agent that surpasses GPT-4o by 2.19% on this setting.
[KDD'25 Workshop] QA‑Dragon: Query‑Aware Dynamic RAG System for Knowledge‑Intensive Visual Question Answering
Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li
CCF-A arXiv
QA-Dragon is a query-aware dynamic RAG system for knowledge-intensive VQA. By routing queries across domains and retrieval strategies, it supports multimodal, multi-turn, and multi-hop reasoning and improves Meta CRAG-MM Challenge performance by up to 6.35% over baselines.
[KDD'25] A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning, Ziran Liang, Zhuohang Jiang, Haohao Qu, Yujuan Ding, Wenqi Fan, Xiao-yong Wei, Shanru Lin, Hui Liu, Philip S. Yu, Qing Li
CCF-A arXiv
This survey reviews WebAgents powered by large foundation models, focusing on architectures, training methods, and trustworthiness. It organizes recent progress in web automation and outlines key directions for building more reliable autonomous web agents.
[KDD'25] HiBench: Benchmarking LLMs Capability on Hierarchical Structure Reasoning
Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H.F. Ng, Qing Li
CCF-A arXiv GitHub Hugging Face
HiBench is a benchmark for evaluating hierarchical structure reasoning in LLMs, covering 30 tasks and 39,519 queries across six scenarios. Experiments on 20 LLMs reveal strengths in basic hierarchy reasoning and limitations on complex or implicit structures; a compact instruction dataset further improves model performance.
[AAAI'25] Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization
Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng, Chi-Man Pun, Jizhe Zhou
CCF-A arXiv
This work introduces Mesorch, a mesoscopic architecture for image manipulation localization that combines macro-level semantic cues with micro-level tampering traces. Across four datasets, the proposed models improve accuracy, efficiency, and robustness over prior methods.
[NIPS'24] IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization
Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, Jizhe Zhou
CCF-A arXiv GitHub
IMDL-BenCo is a comprehensive benchmark and modular codebase for image manipulation detection and localization. It standardizes training, evaluation, metrics, robustness tests, and eight strong baselines, enabling more reliable comparison across IMDL methods.

Selected Projects
HiBench: Benchmark for Hierarchical Reasoning
First Author & Team LeaderKDD 2025 Benchmark Track • 2025
arXiv GitHub Hugging Face
Designed and developed the first comprehensive benchmark for evaluating LLMs' capability on hierarchical structure reasoning. The benchmark encompasses six representative scenarios with 39,519 queries across varying hierarchical complexity. Key contributions: (1) Led the architectural design and implementation of the evaluation framework, (2) Coordinated a multi-institutional team across different time zones, (3) Open-sourced the complete toolkit including dataset, evaluation metrics, and baseline implementations. The benchmark has been accepted as an oral presentation at KDD 2025 and is being adopted by multiple research groups for hierarchical reasoning evaluation.
Meta CRAG-MM: Multimodal Retrieval Challenge
Team LeaderKDD Cup 2025 • 2025
Led a team to achieve 12th place globally among hundreds of international teams in the Meta CRAG-MM Multimodal Retrieval Challenge. Key contributions: (1) Designed novel multimodal fusion architectures combining vision and language understanding, (2) Implemented efficient retrieval-augmented generation pipelines, (3) Coordinated team efforts in model development, hyperparameter optimization, and submission strategies. The challenge focused on developing AI systems capable of understanding and retrieving information from multimodal content, which aligns with current trends in large multimodal models.
IMDL-BenCo: Benchmark for Image Manipulation Detection & Localization
Co-First AuthorNeurIPS 2024 Benchmark Track — Spotlight • 2024
arXiv GitHub
Developed the first comprehensive benchmark and codebase for Image Manipulation Detection & Localization (IMDL). Key contributions: (1) Implemented GPU-accelerated evaluation metrics for fair and efficient comparison, (2) Designed modular codebase architecture enabling easy customization and extension, (3) Co-authored the manuscript that received a Spotlight Award at NeurIPS 2024. The benchmark includes 8 state-of-the-art models, 15 evaluation metrics, and comprehensive robustness evaluation protocols, significantly advancing the field's standardization and reproducibility.

Invited Talks
Understanding Hierarchical Data with Large Language Models: RAG, Structural Reasoning, and Future Directions
Invited TalksReasoning Day @ KDD 2025 • Toronto, ON, Canada • Aug 2025
Invited to deliver a presentation at the prestigious KDD 2025 Reasoning Day workshop.
The talk will explore cutting-edge developments in leveraging Large Language Models for hierarchical data understanding,
with particular focus on Retrieval-Augmented Generation (RAG) systems and structural reasoning capabilities.
Key topics:
(1) Novel approaches to hierarchical data representation in LLM contexts,
(2) Integration of structural reasoning with retrieval-augmented generation,
(3) Future research directions in reasoning-enhanced AI systems,
(4) Practical applications and deployment considerations for hierarchical reasoning in real-world scenarios.
This invitation recognizes the impact of our HiBench work and positions our research at the forefront of LLM reasoning capabilities.

Education
Sichuan University, Chengdu, Sichuan, China
B.E. in Computer Science and Technology • Sep. 2020 to Jun. 2024

Hong Kong Polytechnic University, Hongkong, China
PHD. in Computer Science and Technology • Sep. 2024 to Present

Experience
National University of Singapore (NUS)
Summer School Participant • Aug. 2023
• Participated in intensive research program at School of Computing
• Completed face recognition project using CNN-based feature extraction and similarity matching
• Gained international research experience and cross-cultural collaboration skills

DICALab, Sichuan University
Research Assistant • Sep. 2022 to Jun. 2024
Advisor: Prof. JiZhe Zhou
• Developed graph-based frameworks for privacy-sensitive object detection
• Participated in National Natural Science Foundation of China project
• Contributed to National Key R&D Program of China
• Co-authored multiple publications in top-tier conferences and journals

Kuaishou E-commerce Team, Kuaishou Inc.
Research Intern • Oct. 2025 to Present
Beijing, China
• Conducted end-to-end recommendation research based on large language models


Selected Awards
KDD Cup 2025 — Meta CRAG-MM
Toronto, Canada, 2025
12th Place (Global)
NeurIPS 2024 — IMDL‑BenCo (Co‑first Author)
Vancouver, Canada, 2024
Spotlight Award
Outstanding Graduate
Sichuan University & Sichuan Province, 2024
Top Achievement
Tencent Scholarship
Sichuan University, China, 2023
Top 2%
A-Level Certificate
Comprehensive Quality Evaluation, China, 2023
Excellence
Comprehensive First Class Scholarship
Sichuan University, Sichuan, China, 2022
Top 1%
Outstanding Students of Sichuan University
Sichuan University, Sichuan, China, 2022
Top 5%

Professional Service
Conference & Journal Reviewer
2023-2025
TIP, ECCV, NeurIPS, KDD, AAAI, IoTJ,
Conference & Journal Topic Coordinator
2025-2026
Frontiers in Artificial Intelligence, Frontiers in Big Data
Teaching Assistant
The Hong Kong Polytechnic University (PolyU)
Artificial Intelligence (COMP4431)
NLP Practicum (COMP5423)
DataBase System (COMP2411)

Skills
Research Topics Large Language Models (LLMs), Retrieval‑Augmented Generation (RAG), Recommender Systems (RecSys)
Frameworks & Tools PyTorch, Hugging Face, NumPy, Docker, Git, Anaconda
Languages Mandarin (native), English (fluent, IELTS 6.5)

Updated at Sep. 2025 · Template inspired by Jon Barron