275 Hutchison Rd, Rochester, NY 14620

"Building Better Data-Intensive Systems Using Machine Learning"


Database systems have traditionally relied on empirical approaches and handcrafted rules that encode human intuitions or heuristics to store large-scale data and process user queries over them. These well-tuned empirical approaches and rules work well for the general-purpose case, but are seldom optimal for any actual application because they are not tailored for the specific application properties (e.g., user workload patterns). Further, they fail to consider the complex interaction with the environment running the database systems (e.g., hardware and operating system). One possible solution is to build a specialized system from scratch, tailored for each use case. Although such a specialized system is able to get orders-of-magnitude better performance, building it is time-consuming and requires a huge manual effort. This pushes the urgent need for automated solutions that abstract system-building complexities while getting as close as possible to the performance of specialized systems.

In this talk, I will show how we leverage machine learning to instance-optimize the performance of query scheduling and execution operations in database systems. In particular, I will show how deep reinforcement learning can be used to fully replace a traditional query scheduler. I will also show how even simpler learned models, such as piece-wise linear models approximating the cumulative distribution function (CDF), can significantly improve the performance of fundamental data structures and operations, such as hash tables and in-memory joins. More broadly, this line of work is an eye-opener for researchers and practitioners, showing them the huge potential of exploiting machine learning techniques to solve systems problems in general.

Bio: Ibrahim Sabek is a postdoc at MIT and an NSF/CRA Computing Innovation Fellow. He is interested in building the next generation of machine learning-empowered data management, processing, and analysis systems. Before MIT, he received his Ph.D. from University of Minnesota, Twin Cities, where he studied machine learning techniques for spatial data management and analysis. His Ph.D. work received the University-wide Best Doctoral Dissertation Honorable Mention from University of Minnesota in 2021. He was also awarded the first place in the graduate student research competition (SRC) in ACM SIGSPATIAL 2019 and the best paper runner-up in ACM SIGSPATIAL 2018. Outside of academia, Ibrahim enjoys playing and watching soccer.

Event Details

0 people are interested in this event

User Activity

No recent activity