CSAN: Contextual Self-Attention Network
for User Sequential Recommendation

Introduction

The sequential recommendation is an important task for online user-oriented services, such as purchasing products, watching videos, and social media consumption. Recent work usually used RNN-based methods to derive an overall embedding of the whole behavior sequence, which fails to discriminate the significance of individual user behaviors and thus decreases the recommendation performance. Besides, RNN-based encoding has fixed size and makes further recommendation application inefficient and inflexible. The online sequential behaviors of a user are generally heterogeneous, polysemous, and dynamically context-dependent. In this paper, we propose a unified Contextual Self-Attention Network (CSAN) to address the three properties. Heterogeneous user behaviors are considered in our model that are projected into a common latent semantic space. Then the output is fed into the feature-wise self-attention network to capture the polysemy of user behaviors. In addition, the forward and backward position encoding matrices are proposed to model dynamic contextual dependency. Through extensive experiments on two real-world datasets, we demonstrate the superior performance of the proposed model compared with other state-of-the-art algorithms.

Compared with existing sequential recommendation methods, the main contributions of our proposed CSAN can be summarized as follows:

We propose a novel contextual self-attention network for the sequential recommendation, which can leverage user historical behaviors in a more effective manner and have high computational efficiency.
We propose to employ embedding network, self-attention mechanism and position encoding to deal with the heterogeneity, polysemy, and dynamic contextual dependency of user sequential behaviors. This can accurately capture the user's interests and critical information for the sequential recommendation.
Extensive experimental results on both singe-type behavior dataset and multi-type multi-modal behavior dataset demonstrate the superior performance of the proposed model compared with other state-of-the-art algorithms. In addition, we introduce a multi-type and multi-modal behaviors dataset.

Motivation

User behaviors are inherently heterogeneous, polysemous, and dynamically context-dependent:

Fig.1 A schematic diagram of the behavior sequence of two users. Text describes the content topics of user actions at different timestamps. Red rectangles show the two users' different attentions on the same article due to their different contextual behaviors.

Framework

Fig.2 The schematic illustration of the sequence modeling architecture.

Proposed Contextual Self-Attention Network

Fig.3 Illustration of the Contextual Self-Attention Network (CSAN) model.

Dataset

Zhihu Dataset (All_users) 5.87 GB
The dataset includes 17723 users' dynamic activities for one year.

Zhihu Dataset (Selected_users) 5.47GB
The dataset includes 10458 users' dynamic activities for one year. These users are filtered. Each user contains more than 10 dynamic behaviors.

Last updated on 2018/04/24