Skip to main content

Primary supervisor

Ehsan Shareghi

Web is filled with content, and language agents (as an emerging family of AI systems) are still far from capable in tapping into the information available on the web during their course of action. This project will move on this exciting direction by building a language agent that for any given web page can (1) write a python crawler on-the-fly, and (2) identify its core content.

Student cohort

Double Semester

Aim/outline

  • An open-source system
  • A publication in ACL/EMNLP/NAACL/EACL

Required knowledge

  • Must: fluency in Python and PyTorch
  • Must: academic or Working knowledge of Large Language Models
  • Preferred: experience with writing crawlers
  • Preferred: have built a small fine-tuned language model (i.e., LLaMA)