🎭 Sentiment Analysis on IMDb Reviews

This project applies Natural Language Processing (NLP) techniques to classify movie reviews from IMDb into positive or negative sentiment categories. It follows a two-phase approach:

🔍 Phase 1: Text preprocessing and exploratory analysis using R and tidyverse libraries.
🤖 Phase 2 (in progress): Sentiment classification models in Python using Scikit-learn and PyTorch.

📁 Dataset

IMDb Reviews Dataset (Kaggle)
50,000 reviews labeled as positive or negative.

📌 Project Phases

🧪 Phase 1: Text Processing in R

Text cleaning: lowercasing, stopword removal, HTML cleanup
Tokenization and lemmatization
POS tagging with udpipe
Visualization: word clouds, bar charts, n-gram analysis
📈 Full EDA Notebook on RPubs

🧠 Phase 2: Modeling in Python (Coming Soon)

Data exported as IMDB-cleaned.csv
Model candidates:
- Logistic Regression
- Naive Bayes
- Support Vector Machines
- PyTorch-based classifier
Metrics: Accuracy, F1-score, ROC-AUC

📦 Deliverables

sentiment-analysis.Rmd: Full EDA notebook in R
IMDB-cleaned.csv: Preprocessed dataset
model_sentiment.py: Sentiment classification model (planned)
Streamlit / Flask deployment (planned)

🚀 Deployment Ideas

Build an interactive dashboard (Streamlit)
Deploy a REST API using FastAPI
(Optional) Real-time batch sentiment processing with Apache Spark

📚 Skills & Tools Used

R: tidyverse, tidytext, udpipe, ggplot2, SnowballC
Python: scikit-learn, NLTK, PyTorch (planned)
EDA & Reporting: R Markdown, RPubs

🧠 Author

Manuel Alejandro Matías Astorga
Data Scientist | Physicist | Machine Learning Enthusiast
📄 Portfolio Website · LinkedIn

✨ Feel free to fork, contribute or reach out if you're working on similar projects!

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
notebooks		notebooks
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
renv.lock		renv.lock
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎭 Sentiment Analysis on IMDb Reviews

📁 Dataset

📌 Project Phases

🧪 Phase 1: Text Processing in R

🧠 Phase 2: Modeling in Python (Coming Soon)

📦 Deliverables

🚀 Deployment Ideas

📚 Skills & Tools Used

🧠 Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

alexmatiasas/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

🎭 Sentiment Analysis on IMDb Reviews

📁 Dataset

📌 Project Phases

🧪 Phase 1: Text Processing in R

🧠 Phase 2: Modeling in Python (Coming Soon)

📦 Deliverables

🚀 Deployment Ideas

📚 Skills & Tools Used

🧠 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages