Analyzing and Predicting Purchase Intent in E-commerce: Anonymous vs. Identified Customers


The popularity of e-commerce platforms continues to grow. Being able to understand, model, and predict customer behavior is an essential ingredient of successful search and recommendation services on an e-commerce platform. In particular, the ability to model, capture, and reliably predict purchase intent of customers is essential for customizing the user experience through personalized result presentations, recommendations, and special offers. Previous work has considered a broad range of prediction models as well as features inferred from clickstream data to record session characteristics, and features inferred from user data to record customer characteristics. So far, most previous work in the area of purchase prediction has focused on known customers, largely ignoring so-called anonymous sessions, that is, sessions initiated by a non-logged-in or unrecognized customer. However, in the de-identified data from a large European e-commerce platform available to us, more than 50% of the sessions start as anonymous sessions. In this paper, we focus on purchase prediction for both anonymous and logged-in sessions on an e-commerce platform. We start with a descriptive analysis of purchase vs. non-purchase sessions. This analysis informs the definition of a feature-based model for purchase prediction for anonymous sessions and logged-in sessions; our models consider a range of session-based features for anonymous sessions, such as the channel type, the number of visited pages, and the device type. For identified user sessions, our analysis points to customer history data as valuable discriminator between purchase and non-purchase sessions. Based on our analysis, we the build two types of predictors: a predictor for anonymous sessions that can accurately predict purchase intent in anonymous sessions, beating a production-ready predictor by over 17.54% F1-score and a predictor for identified customers that uses session data as well as customer history and achieves an F1-score of 96.20% on held out data collected from a real world retail platform. Finally, we discuss the broader practical implications of our findings.

eCommerce workshop at SIGIR