Agent-based Video Reasoning

Primary supervisor

Zhixi Cai

Co-supervisors

Hamid Rezatofighi
Fucai Ke

Videos contain rich information about actions, events, interactions, and changes over time. While recent AI models have made strong progress in video understanding, reasoning over complex video content remains challenging, especially when the task requires understanding temporal context or connecting information across different moments.

This project explores agent-based approaches for video reasoning. The goal is to investigate how AI agents can support more flexible, structured, and reliable reasoning over video content. The project can be adapted to the student’s interests, including video question answering, temporal reasoning, long-video understanding, or evaluation of video reasoning systems.

Aim/outline

The aim of this project is to study how agent-based AI systems can improve reasoning over video content.

The student will review related work, select a suitable video reasoning task, develop or evaluate an agent-based approach, and compare it with relevant baseline methods. The final project should include experimental evaluation and analysis of the strengths and limitations of the proposed approach.

Required knowledge

Good Python programming skills, basic machine learning knowledge, and interest in video understanding or AI agents.

Experience with deep learning, PyTorch, Hugging Face, computer vision, large language models, video-language models, or agent-based AI systems.

Primary supervisor

Co-supervisors

Aim/outline

Required knowledge

Honours projects

Supervisor Connect

Browse

Recently added