Creating a Recommender System (Part II)

by | Jan 5, 2017 | Developer | 1 comment

After the resounding success of the first article on recommender systems, Álvaro Santos is back with some further insight into creating a recommender system.

Coming soon: A follow-up Meetup in Madrid to go even further into this exciting topic. Stay tuned!


In the previous article of this series, we explained what a recommender system is, describing its main parts and providing some basic algorithms which are frequently used in these systems. We also explained how to code some functions to read JSON files and to map the data in MongoDB and ElasticSearch using Spark SQL and Spark connectors.

This second part will cover:

  • Generating our Collaborative Filtering model.
  • Pre-calculating product / user recommendations.
  • Launching a small REST server to interact with the recommender.
  • Querying the data store to retrieve content-based recommendations.
  • Mixing the different types of recommendations to create a hybrid recommender.

Collaborative Filtering Algorithm

In this recommender service example, we have chosen ALS or Alternating Least Squares as the algorithm for Collaborative Filtering. Although ALS is the only algorithm implemented by Spark for this matter, it has been broadly tested and shown to have a good performance. It is perfectly suited for this project example.

You can learn more about Alternating Least Squares at this link.

Recommender Trainer

In this section, we will code the program that will create our Collaborative Filtering model. It will pre-calculate all recommendations to ensure a faster service.

First of all, we should read all the reviews from MongoDB:

The data cannot however be used by the Spark API “as it comes” from the DB. We must transform our ratings’ Dataframe into a RDD of Spark ratings:

Now it is time to create our ALS model:

Once we have trained the model, the next step is to pre-calculate the recommendations. We should however, firstly create two lists with the products and users:

Then we need to calculate the user recommendations using the Spark API and save the data to MongoDB:

Finally, we have to pre-calculate the product recommendations. Spark does not provide a direct way of calculating the recommendations for products. We will therefore measure the similarity of the products using the cosine similarity:

Recommender Service

After saving all the products/previews and the pre-calculated Collaborative Filtering recommendations in the DBs, it is time to create a simple REST services that will retrieve the final recommendations. For that purpose we have selected the framework, which is simple, elegant and pure Scala.

Creating URL mappings for our recommender service is very simple:

Now it is time to code our recommendation services.  We should start with the Collaborative Filtering recommendations. In this case it is simple because they have been pre-calculated. So we can just read them from MongoDB:

Then we should code the content-based recommendations. Although we have not pre-calculated these types of recommendations, it is quite simple to obtain them using ElasticSearch. To do this, we need to ask the server which products match certain criteria more:

For hybrid recommendations, the theory is simple: use different types of recommendations and combine their output using weights.


In the second part of the series, we have learnt how to:

1.          Create Collaborative Filtering recommendations using Spark.

2.          Obtain content-based recommendations using ElasticSearch.

3.          Combine several types of recommendations to create a hybrid recommender.

If you are interested in finding out more, the code is freely available in my Github repository.

Senior Software engineer with more than 10 years’ experience. For 3 years now, I have been focused 100% on Big Data projects in which I have developed several Personalization services used by millions of users, giving them a better experience and Company Data transformation.