WALR — Width-Aware Language Reward for Vision-Language-Action Models

Primary supervisor

Co-supervisors

Sukai Huang

This project addresses the language ignoring problem in embodied AI, where robots learn visual shortcuts instead of following instructions. Building on our preprint establishing the relationship between planning width (instruction granularity) and learning difficulty, you will develop WALR—a reward design framework that adapts to instruction complexity. WALR scales language grounding rewards based on instruction granularity (coarse vs. fine) and penalizes vision-only behavior, enabling robust Vision-Language-Action (VLA) models that actually listen to what humans tell them.

Aim/outline

Literature review on reward design for embodied agents, VLA models, and planning width theory.
Implement width-aware reward function with language-consistency penalties using Behavior 1K simulator (https://behavior.stanford.edu/index.html).
Develop width-conditioned scaling functions (sigmoid-based) that adapt reward signals to instruction granularity.
Design hindsight relabeling strategy that converts failed trajectories into learning opportunities at appropriate granularity levels.
Train VLA policies using RL (PPO/SAC) with WALR rewards and evaluate on household manipulation tasks.
Analyze asymmetric generalization (coarse→fine vs. fine→coarse) and language grounding metrics.

URLs/references

Preprint paper: Instruction Granularity as Planning Width: Formalizing and Benchmarking for Vision-Language-Action Models https://sino-huang.github.io/biography/publications/ijcai26_preprint.pdf

IMPORTANT NOTE

For any inquiries regarding this project, please contact Dr. Sukai Huang via sukai.huang@monash.edu

Required knowledge

Python programming and PyTorch
Deep reinforcement learning (PPO, SAC, reward shaping)
Vision-Language Models (CLIP, VLM basics) or multimodal learning
Understanding of imitation learning and behavior cloning
Strong analytical skills for ablation studies and statistical analysis

WALR — Width-Aware Language Reward for Vision-Language-Action Models

Primary supervisor

Co-supervisors

Aim/outline

URLs/references

Required knowledge

Honours projects

Supervisor Connect

Browse

Recently added