This project involves analyzing the IMDB movie reviews dataset using Python in Google Colab. The analysis includes data exploration, data cleaning, and deriving insights from the dataset.
The IMDB dataset contains movie reviews along with their associated sentiment (positive or negative). This project aims to explore the dataset, clean the data, perform exploratory data analysis (EDA), and visualize the findings.
The dataset used in this project is the IMDB Top 250 Movies Excel file, which includes:
- Movie Names: The names of the top-rated movies on IMDb.
- Ratings: IMDb ratings for each movie.
- Count of Ratings: The total number of ratings submitted by users for each movie.
- Release Date: The date when each movie was officially released.
- Country: The country of origin for each movie.
- Budget: The estimated budget for producing each movie.
- Domestic Gross: Earnings within the country of origin.
- Domestic Weekend Gross: Opening weekend earnings within the country of origin.
- Worldwide Gross: Total global earnings.
Key steps in data cleaning included:
- Converting dates to datetime format.
- Standardizing currency values to USD.
- Handling missing values and ensuring correct data types.
EDA included:
- Visualization of rating distributions.
- Analysis of release year trends.
- Comparison of budgets and worldwide gross earnings.
- Identification of top-grossing movies.
- Rating Distribution: Histogram of movie ratings.
- Release Year Trend: Line chart of ratings over years.
- Top 5 Movies: Bar chart of the highest worldwide grossing movies.
- Budget vs. Earnings: Scatter plot of budget against worldwide earnings.
- High budget does not guarantee high earnings; other factors such as storyline and cast are crucial.
- Top-grossing movies have varied budgets, showing that financial success depends on multiple factors.
This analysis of the IMDB Top 250 Movies dataset reveals key insights into movie ratings and earnings. It highlights the importance of factors beyond budget in determining a movie's success, providing valuable information for movie enthusiasts, researchers, and industry professionals.