What is a recommendation system?
Recommender systems are machine learning models that allow prediction of a user's interest in an item based on their previous preferences. In real life scenarios, predicting a user's interest helps in recommending similar items to them or to recommend that item to similar users. Popular recommendation techniques are :
- - Collaborative filtering
- - Content-based filtering
- - Hybrid Recommendation System
We have designed a content-based recommender system that helps identify movies that the user would be interested in based on the movies that they have watched before/selected.
Content-based Recommendation
Content based recommendation takes the description or the properties of the item or user's preferences into account to make a recommendation. A recommendation is made based on the similarity between the current item and the item that the user has liked in the past or is currently looking at. The similarity is calculated by taking into account the description of the item for eg. category, tag, genre, and so on.
For eg, if we have four movies in our dataset, Captain America: The First Avenger (Action/Adventure) , Pride and Prejudice (Romance/Drama) , Captain America: The Winter Soldier (Action/Adventure) , and Sense and Sensibility (Romance/Drama). If the user likes or searches the first two items, the engine will recommend Movies 3 & 4 to the user.
Our Approach
We have built a content based recommendation system to recommend movies to a user based on the movies the user has selected. We have a dataset with around 19000 movies from both Bollywood and Hollywood, along with their genre and plot details. We also had supporting attributes like the cast and directors at our disposal.
The recommendations are made based on the movie plot description.
How do we predict similarity in movies?
We have used topic modeling to identify movies that are close to each other. Topic modeling is an unsupervised process to automatically identify topics present in a text object and to derive hidden patterns exhibited by a text corpus.
Considering the entire set of movie plots available to us as documents, we created topics for these documents to determine which movies were nearer to each other based on their plots. We used Latent Dirichlet Allocation or LDA for generating these topics.
LDA for Topic Modelling
Latent Dirichlet Allocation (LDA) is an unsupervised learning algorithm used to discover different topics and their associated indicators (words relating to topic) in a collection of documents. LDA is based on the idea that words often have strong semantic relationships to certain topics, and so topics in a given document will consist of a group of similar words. LDA assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution. Given a dataset of documents, LDA backtracks and tries to figure out what topics would create those documents in the first place. The figure below illustrates this idea of a document being a mixture of topics (politics, economics, and opinions of people) and each of those topics are represented by some set of words.
To summarize, LDA assumes a document is a mixture of topics, where the topics are drawn from the topic-document distribution, and topics consist of words, where the words are drawn from the topic-word distribution. In practice we already have a text corpus, a set of documents. So we are usually not interested in generating new documents, but rather doing inference on how the document is generated from varying topics and words.
Finding similar movies
Once we have our document-topic matrices generated using LDA, we used the hellinger distance, a distance metric used to calculate the distance between two probability distribution vectors, to find movies similar to the movies that the user has selected. Of these movies which contain similar plots, we recommend the top 2 movies for each movie selected.
Using the movie recommender
Go to Recommendation & enter the movies you want to get recommendations for. You can get recommendations for 1,2 or even 3 movies!
Click on the "Click to Submit" button and get your list of recommended movies. Check out their cast, plot and other information!
Check out the creators of this app before you go. :)
And finally, let the binge-watching begin! ;)