Correlating Movie Trends and Short-Form Content

Statistical Modelling
Project Overview
Written using Jupyter Notebook, this project details various data preprocessing, exploratory data analysis, statistical inferencing, and rule mining on The Movies Database and Youtube Trending Videos Dataset

Initially motivated by the rising notion that “our collective attention span has been decreasing because of short-form content”, we wanted to dive deeper into where the movie industry and short-form video scene meet.  We wanted to know if there were patterns emerging and if we could make correlations between different variables of both movies and short-form videos.
My Contributions
With my team of 3, I performed EDA on datasets from The Movie Database and YouTube Trending Videos to identify viewing trends

I applied chi-square test of independence and association rule mining to uncover relationships between release timing and genre
popularity

I visualized results using heatmaps, bar graphs, and parallel coordinate graphs in matplotlib for stakeholder-ready presentations
This project opened my eyes to the realities of data science and how hard it is to get the data that we want from open sources, but also exposed me to real-world patterns and analytics of trends—in this case, Movies and YouTube videos.

It also employed me with the necessary skills to conduct cohesive research and data analysis on various topics, from Preprocessing, Visualization, Inferencing, and Rule Mining.

Overall, the results we got from this project were—to say the least—interesting for a casual movie-goer, but also provides movie production companies a guide on which movie genres to release on what month to achieve higher user ratings.
My Roles
Exploratory Data Analysis
Pre-processed, formatted, and visualized The Movies Database and YouTube Trending Videos dataset
Statistical Inferencing
Applied Chi-Square Test of Independence on Genre and Month dependence, uncovering a significantly high dependence of genres being released to the month it was released
Association Rule Mining
Integrated Movie Genre Ratings and Month Released to make rules on which Movie Genres are more likely to score high on which month