Skip to main content

WALR — Width-Aware Language Reward for Vision-Language-Action Models

Primary supervisor

Hamid Rezatofighi

Co-supervisors


This project addresses the language ignoring problem in embodied AI, where robots learn visual shortcuts instead of following instructions. Building on our preprint establishing the relationship between planning width (instruction granularity) and learning difficulty, you will develop WALR—a reward design framework that adapts to instruction complexity. WALR scales language grounding rewards based on instruction granularity (coarse vs. fine) and penalizes vision-only behavior, enabling robust Vision-Language-Action (VLA) models that actually listen to what humans tell them.

Aim/outline

  1. Literature review on reward design for embodied agents, VLA models, and planning width theory.

  2. Implement width-aware reward function with language-consistency penalties using Behavior 1K simulator (https://behavior.stanford.edu/index.html).

  3. Develop width-conditioned scaling functions (sigmoid-based) that adapt reward signals to instruction granularity.

  4. Design hindsight relabeling strategy that converts failed trajectories into learning opportunities at appropriate granularity levels.

  5. Train VLA policies using RL (PPO/SAC) with WALR rewards and evaluate on household manipulation tasks.

  6. Analyze asymmetric generalization (coarse→fine vs. fine→coarse) and language grounding metrics.

URLs/references

Preprint paper: Instruction Granularity as Planning Width: Formalizing and Benchmarking for Vision-Language-Action Models https://sino-huang.github.io/biography/publications/ijcai26_preprint.pdf

IMPORTANT NOTE

For any inquiries regarding this project, please contact Dr. Sukai Huang via sukai.huang@monash.edu

Required knowledge

  • Python programming and PyTorch

  • Deep reinforcement learning (PPO, SAC, reward shaping)

  • Vision-Language Models (CLIP, VLM basics) or multimodal learning

  • Understanding of imitation learning and behavior cloning

  • Strong analytical skills for ablation studies and statistical analysis