Primary supervisor
Ehsan ShareghiWeb is filled with content, and language agents (as an emerging family of AI systems) are still far from capable in tapping into the information available on the web during their course of action. This project will move on this exciting direction by building a language agent that for any given web page can (1) write a python crawler on-the-fly, and (2) identify its core content.
Student cohort
Double Semester
Aim/outline
- An open-source system
- A publication in ACL/EMNLP/NAACL/EACL
URLs/references
Required knowledge
- Must: fluency in Python and PyTorch
- Must: academic or Working knowledge of Large Language Models
- Preferred: experience with writing crawlers
- Preferred: have built a small fine-tuned language model (i.e., LLaMA)