Primary supervisor
Zhixi CaiCo-supervisors
Videos contain rich information about actions, events, interactions, and changes over time. While recent AI models have made strong progress in video understanding, reasoning over complex video content remains challenging, especially when the task requires understanding temporal context or connecting information across different moments.
This project explores agent-based approaches for video reasoning. The goal is to investigate how AI agents can support more flexible, structured, and reliable reasoning over video content. The project can be adapted to the student’s interests, including video question answering, temporal reasoning, long-video understanding, or evaluation of video reasoning systems.
Aim/outline
The aim of this project is to study how agent-based AI systems can improve reasoning over video content.
The student will review related work, select a suitable video reasoning task, develop or evaluate an agent-based approach, and compare it with relevant baseline methods. The final project should include experimental evaluation and analysis of the strengths and limitations of the proposed approach.
Required knowledge
Good Python programming skills, basic machine learning knowledge, and interest in video understanding or AI agents.
Experience with deep learning, PyTorch, Hugging Face, computer vision, large language models, video-language models, or agent-based AI systems.