HSA4ESMLS PPoSS: LARGE: Co-designing Hardware, Software, and Algorithms to Enable Extreme-Scale Machine Learning Systems

Research

The primary goal of this project is to build a new co-designed framework of hardware, software, and algorithms to enable extreme-scale ML systems for the emerging AIoT and IoS systems. The project consists of five research thrusts: Thrust 1 develops hardware, computer architecture and compiler approaches to address the scalability issue in AIoT and IoS systems by enforcing large-scale split learning on devices. Thrust 2 investigates extreme-scale ML on weak embedded devices by designing a new system framework that adaptively partitions and offloads the ML computing workloads. Thrust 3 addresses system and data unreliability by designing new cross-layer algorithms and hardware techniques. Thrust 4 investigates algorithm, hardware and software co-design to enable secure and privacy-preserving ML systems at scale. Thrust 5 involves designing and implementing an IoS testbed and a smart building testbed to evaluate the proposed system designs.