Expanding boundaries in scalable session-based recommendations

Abstract

This thesis provides a comprehensive exploration of scalable solutions for session-based recommendation systems, tackling key challenges, which are: large product catalogs, millions of users, sparse interaction data, and the debugging of large-scale recommendation datasets. Conducted in collaboration with Bol, a European e-commerce platform, this research bridges the gap between academic advancements and the practical demands of industrial systems. Chapter 2 focuses on designing and deploying algorithms and frameworks that efficiently process billions of interactions with high efficiency and low latency. These algorithms and frameworks are rigorously evaluated through both offline experiments and real-world deployments. In Chapter 3, the thesis introduces VMIS-kNN, an improved version of the state-of-the-art VS-kNN algorithm. With a more efficient time complexity and small optimizations such as a prebuilt index, VMIS-kNN enhances scalability and responsiveness, enabling recommendation computations within milliseconds. Empirical evaluations across six datasets and multiple programming language implementations demonstrate its effectiveness. Additionally, the thesis presents Serenade, a production-ready, stateful recommendation system with high throughput and low latency. Serenade integrates seamlessly into large-scale e-commerce platforms, significantly improving user engagement business metrics. Chapter 4 highlights the Etude framework, which provides a systematic approach for benchmarking the inference performance of neural network-based session recommendation models under various deployment scenarios. Furthermore, in Chapter 5 introduces KMC-Shapley, a scalable method for estimating Data Shapley Values in sequential kNN-based recommendation systems. This technique enhances the debugging of large-scale recommendation datasets by combining algorithmic rigor with practical utility. The research underscores the importance of balancing predictive performance, system efficiency, and ecological sustainability. The findings confirm the effectiveness of nearest neighbor methods in specific e-commerce contexts, the crucial impact of system latency on user acceptance, and the value of data valuation techniques in maintaining the integrity of kNN-based recommendation systems. The open-source tools and methodologies developed in this thesis advance the state-of-the-art while offering practical insights for industry professionals. By combining theoretical innovation with real-world applicability, this research makes a valuable contribution to the field of session-based recommendation systems.

Type
Publication
PhD Thesis, University of Amsterdam
Date
Links