home / Do Movie Ratings Reflect Audience Sentiment?

Scraped IMDb movies and applied a BERT sentiment model to 3.7k reviews. Aggregated sentiment scores were compared with official ratings using hypothesis testing and ANOVA to identify systematic mismatches.

Timeline Contribution Tags
Sept 2025 Self-Project Selenium, LLMs, Analysis, Scraping

Objective

Audience reviews can be nuanced, some movies are perceived as overhyped, while others may be underrated. This project investigates whether IMDb ratings align with the underlying sentiment expressed in user reviews by leveraging a BERT-based sentiment model and statistical analysis.

Approach & Methodology

1. Data Source and Preprocessing

Movie metadata and user reviews were scraped from IMDb using a custom Python pipeline built with Selenium, BeautifulSoup, and Requests.

Preprocessing : Text cleaning included removal of escape sequences and formatting artifacts to ensure compatibility with the sentiment model.

2. BERT Sentiments

The BERT model used here is - nlptown/bert-base-multilingual-uncased-sentiment. Running through hugging face api consumed a lot time, therefore had to run it on kaggle using GPU. The model results a dictionary with keys as rating and value as corresponding probability.

3. Aggregating Sentiments

Weighted mean for sentiment score was calculated and multiplied to 2 to make it out 10 rating. Grouping each movie and aggregating bert-sentiment by mean then joining it with the movies.csv resulted in our final dataset to be used for analysis. A deviation column that is basically. round(actual rating - bert-rating,2)

The resulting dataset is final and will be used for analysis.

Results & Insights

Project screenshot
Project screenshot
Project screenshot

Notebook and Code Source