How We Used NLP and Django to Build a Movie Suggestion Website & Twitterbot

Vince Salvino

Monday 3:50 p.m.–4:40 p.m.
Audience level: Intermediate or Advanced

Description

Our team built a movie recommendation engine for the Cleveland International Film Festival. This talk walks through how we built a Django project that: scrapes the film data from a website, builds a film similarity index using Django models and Natural Language Processing techniques, and automatically shows suggested films via a website and twitterbot (ciff.coderedcorp.com and @CIFFbot).

Abstract

The Cleveland International Film Festival (CIFF) is a two-week long event featuring hundreds of foreign, independent, and new films making their debut on the silver screen. For anyone less than a film buff, choosing a movie to watch at the film fest is a hard choice: there are no reviews, no IMDb info, and no Netflix/Hulu suggestions. Yes, it’s truly byzantine in that one must actually read all the movie descriptions to decide which one to watch.

With a handful of Python libraries, and 2 days, we developers at CodeRed built a movie recommendation engine for the CIFF. This talk outlines each step we took to build the recommendation engine, website, and twitterbot all centered around a Django project. Overall, this talk offers a complete look at the various parts and pieces that go into building a feature-full Django site, as well as exposure to doing entry-level Artificial Intelligence in Python.

Parsing Movie Data from an Existing Website

Using urllib and BeautifulSoup, we created a Django management command to crawl an existing website, scrape the relevant data, and seed our database with Django models.

Using Natural Language Processing to Build a Similarity Index

Using functionality in the Natural Language Tool Kit (pip package: nltk), we created another Django management command to analyze all the movies stored in our Django models. Each movie had to be compared to every other movie in order to create a complete index.

Building the Website

Thanks to the simplicity of Django, our website required very little back-end code to implement. The talk will show how we implemented dead simple searching using Haystack and Whoosh, as well as how we kept the load times speedy using tricks in the Django queryset API.

The Twitterbot

Thanks to twython, integrating a Django site with twitter is extremely simple. We also created a Django management command to be called via a Linux cron job to post movie recommendations to twitter whenever a movie showtime was coming up or recently ending.

Additional high-level overview of this project can be found on this blog post: https://www.coderedcorp.com/blog/how-we-built-an-ai-bot-for-the-cleveland-internati/


Buy your conference and tutorial tickets here! Tutorials are $150 per session.