Skip to main content

NeuroDistSys (NDS): Optimized Distributed Training and Inference on Large-Scale Distributed Systems

Primary supervisor

Mohammad Goudarzi

In NeuroDistSys (NDS): Optimized Distributed Training and Inference on Large-Scale Distributed Systems, we aim to design and implement cutting-edge techniques to optimize the training and inference of Machine Learning (ML) models across large-scale distributed systems. Leveraging advanced AI and distributed computing strategies, this project focuses on deploying ML models on real-world distributed infrastructures, improving system performance, scalability, and efficiency by optimizing resource usage (e.g., GPUs, CPUs, energy consumption). Researchers and students will explore innovative approaches to reduce latency, increase throughput, and enable real-time resource management, preparing them for impactful roles in AI, cloud computing, and large-scale system design. A practical example of this project includes, but is not limited to, distributed inference of foundation models across heterogeneous server environments.

Feel free to visit my website here and contact me for more information.

 


Learn more about minimum entry requirements.