With the explosion of the internet for Web2 (interactive, data-driven) services, the internet is witnessing rapid changes, and datacenters play a pivotal role. Datacenters contain state-of-the-art computation units (compute, memory, network, and other resources) for processing, storing, and distributing a colossal quantity of data, and are limited by power and real estate constraints. Heterogeneity in compute (different server architectures), memory (capacity, type (volatile/non-volatile)), network topology and bandwidth, and workload diversity result in job-to-host placement having a significant impact on Quality-of-Service (QoS) and system efficiency. Added to this, changes in datacenter workload software architectures from monolithic to microservice-based structures have led to higher interactivity within datacenters and varied QoS metrics. Despite considerable attention to this domain, due to the difficulties in determining resource needs coupled with heterogeneity of resources and workloads within datacenters, state-of-the-art systems rely heavily on manually tuning parameters. 

In this dissertation, we design a scalable framework that utilizes low-overhead workload and hardware characterization, complemented with network traffic usage information, to address the above challenges. We allow resource sharing through multi-tenancy for improved utilization while ensuring QoS. Our framework avoids reliance on predefined workload classification and resource needs. We localize traffic and resource allocation to reduce communication costs and contention while meeting workload resource needs. By identifying resource bottlenecks for individual workloads using hardware performance and network traffic information, we can determine the tradeoff between resource sharing (improved utilization) and contention (reduced QoS). Rather than allocation based on peak resource needs, we develop mechanisms to determine the sensitivity of an application to interference and use this metric to determine the benefits of colocation. We validate our techniques using a combination of evaluation on individual servers and datacenter traces.  

 

Advisor: Prof. Sandhya Dwarkadas (Computer Science)  

Committee: Prof. Chen Ding (Computer Science), Prof. Yuhao Zhu (Computer Science), Prof. Michael Huang (ECE), and Dr. Parth Malani (Meta Platforms Inc.) 

Chair: Prof. Xing Qiu (Biostatistics and Computational Biology)   

Event Details

0 people are interested in this event

User Activity

No recent activity