I will start as an Assistant Professor in Computer Science at Princeton University in Fall 2025. I am looking for PhD students to join my group. If interested, please apply through Princeton and mention my name in your statement.
I am currently an applied scientist at AWS, where I work on autonomics in Amazon Redshift. I was previously a CS PhD in the MIT Data Systems Group, where I was advised by Prof. Tim Kraska. My research was partly supported by a Meta PhD Fellowship.
I research machine learning and optimization techniques for data systems, with a focus on instance-optimization, a new design paradigm for building data systems that can automatically self-optimize to achieve the best performance for any specific application or use case. I have leveraged instance-optimization to introduce novel designs for data storage layouts (1, 2, 3, 4), database indexes (5, 6, 7) and end-to-end data systems (8, 9). See here for a more detailed description of my past and future research directions.
[CV] [Google Scholar] [Twitter] [Research Statement] [Teaching Statement]
Automated Multidimensional Data Layouts in Amazon Redshift. [blog] [press release]
Jialin Ding, Matt Abrams, Sanghita Bandyopadhyay, Luciano Di Palma, Yanzhu Ji, Davide Pagano, Gopal Paliwal, Panos Parchas, Pascal Pfeil, Orestis Polychroniou, Gaurav Saxena, Aamer Shah, Amina Voloder, Sherry Xiao, Davis Zhang, Tim Kraska.
SIGMOD 2024 Industrial Track.
SageDB: An Instance-Optimized Data Analytics System. [talk]
Jialin Ding, Ryan Marcus, Andreas Kipf, Vikram Nathan, Aniruddha Nrusimha, Kapil Vaidya, Alexander van Renen and Tim Kraska.
VLDB 2023.
APEX: A High-Performance Learned Index on Persistent Memory.
Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas and Tianzheng Wang.
VLDB 2022.
Self-Organizing Data Containers. [talk]
Samuel Madden Jialin Ding, Tim Kraska, Sivaprasad Sudhir, David Cohen, Timothy Mattson and Nesime Tatbul.
CIDR 2022.
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. [news] [talk]
Jialin Ding, Vikram Nathan, Mohammad Alizadeh and Tim Kraska.
VLDB 2021.
Instance-Optimized Data Layouts for Cloud Analytics Workloads. [talk]
Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke and Tim Kraska.
SIGMOD 2021.
ALEX: An Updatable Adaptive Learned Index. [talk] [seminar talk] [code]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Hantian Zhang, Yinan Li, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet and Tim Kraska.
SIGMOD 2020.
Learning Multi-dimensional Indexes. [talk] [seminar talk]
Vikram Nathan*, Jialin Ding*, Mohammad Alizadeh and Tim Kraska.
SIGMOD 2020.
SageDB: A Learned Database System. [the morning paper]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao and Vikram Nathan.
CIDR 2019.
Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries. [the morning paper] [blog]
Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan and Peter Bailis.
VLDB 2018.
Learning Bit Allocations for Z-Order Layouts in Analytic Data Systems.
Jenny Gao, Jialin Ding, Sivaprasad Sudhir and Samuel Madden.
ML for Systems Workshop @ NeurIPS 2023.
The Case for Learned Spatial Indexes.
Varun Pandey, Alexander van Renen, Andreas Kipf, Ibrahim Sabek, Jialin Ding and Alfons Kemper.
AIDB Workshop @ VLDB 2020.
LISA: Towards Learned DNA Sequence Search.
Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md and Tim Kraska.
Systems for ML Workshop @ NeurIPS 2019. Oral Presentation.
Learning Multi-dimensional Indexes. [talk]
Vikram Nathan*, Jialin Ding*, Mohammad Alizadeh and Tim Kraska.
ML for Systems Workshop @ NeurIPS 2019. Oral Presentation.
Efficient Mergeable Quantile Sketches using Moments.
Edward Gan, Jialin Ding, Peter Bailis.
SysML 2018. Extended Abstract.
A Machine-Compiled Database of Genome-Wide Association Studies.
Volodymyr Kuleshov, Jialin Ding, Braden Hancock, Alexander Ratner, Christopher Re, Serafim Batzoglou and Michael Snyder.
ISMB 2017. Short Paper.
A Machine-compiled Database of Genome-wide Association Studies.
Volodymyr Kuleshov, Jialin Ding, Christopher Vo, Braden Hancock, Alexander Ratner, Yang Li, Christopher RĂ©, Serafim Batzoglou and Michael Snyder
Nature Communications 2019.
MacroBase: Prioritizing Attention in Fast Data.
Firas Abuzaid, Peter Bailis, Jialin Ding, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong and Sahaana Suri.
TODS 2018.
jialind@amazon.com