How does two tower system work in the recommender system


The two-tower recommendation system is a type of collaborative filtering algorithm used in recommendation systems. It is called a “two-tower” system because it consists of two neural networks or “towers” that work together to generate personalized recommendations for users.

How does it work

The first tower is called the “user tower.” It takes as input a user’s historical interactions with items, such as the products they have purchased or the movies they have watched, and converts this information into a fixed-length embedding vector that represents the user’s preferences. This embedding vector is then passed to the second tower.

The second tower is called the “item tower.” It takes as input the metadata of all items in the catalog, such as the title, description, genre, and other features. The item tower also converts this information into a fixed-length embedding vector that represents each item.

The two embedding vectors from the user and item towers are then compared using a similarity function, such as cosine similarity. The similarity score indicates how similar the user’s preferences are to each item in the catalog. The items with the highest similarity scores are recommended to the user.

The two-tower recommendation system is a popular approach to personalized recommendation because it can handle large-scale and sparse data sets, and can capture complex user-item interactions. It has been used in a variety of applications, such as e-commerce, streaming services, and social media platforms.

How training is done using deep learning

In the two-tower recommendation system, the neural networks that generate the user and item embeddings need to be optimized in such a way that the dot product of user embeddings and item embeddings are higher for user purchased items and lower for not purchased items. This is typically achieved through a process called training, where the model is presented with a set of user-item interactions and learns to predict the likelihood of each user interacting with each item in the future.

During training, the model is optimized to minimize a loss function, which measures the difference between the predicted and actual user-item interactions. The most commonly used loss function in recommendation systems is the binary cross-entropy loss, which penalizes the model for making incorrect predictions.

To optimize the neural networks, backpropagation is used to compute the gradients of the loss with respect to the model parameters. The gradients are then used to update the model parameters using an optimization algorithm such as stochastic gradient descent (SGD) or Adam. The process of updating the model parameters is repeated for multiple epochs until the model converges to a set of optimal parameters.

By optimizing the neural networks in this way, the model learns to generate user and item embeddings that capture the underlying patterns and relationships in the data, and can make accurate predictions of user-item interactions. This allows the two-tower recommendation system to provide personalized recommendations that are tailored to the preferences of each individual user.

Efficient search during inference time

When calculating the dot product of a user embedding with all the item embeddings in the item tower, there are several techniques that can be used to make the computation more efficient and faster. Here are a few approaches:

Use matrix multiplication: Rather than calculating the dot product between the user embedding and each item embedding one-by-one, it is more efficient to perform a matrix multiplication between the user embedding and the entire item embedding matrix. This can be done using the numpy or PyTorch library, which are optimized for matrix computations.

Use approximate nearest neighbor (ANN) search: When the number of items is very large, it can be computationally expensive to calculate the dot product between the user embedding and all the item embeddings. One approach to speed up the search is to use an approximate nearest neighbor search algorithm, such as locality-sensitive hashing (LSH) or k-d trees. These algorithms allow us to quickly identify a smaller set of candidate items that are most similar to the user’s preferences.

Use a cache: Since the user embedding is fixed during inference, we can cache the dot products between the user embedding and all item embeddings. This can be done ahead of time during training or on-the-fly during inference. By caching the dot products, we can avoid having to compute them every time a user requests recommendations, which can significantly speed up the recommendation process.

Use parallelization: If the hardware allows for it, we can parallelize the computation of the dot products between the user embedding and all item embeddings. This can be done using multi-threading or GPUs to perform the computation in parallel, which can further speed up the recommendation process.

By using these techniques, we can make the computation of the dot product between the user embedding and all item embeddings more efficient and faster, which can help to improve the performance of the recommendation system.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC