Skip to content

alexmatiasas/Sentiment-Analysis

Repository files navigation

🎭 Sentiment Analysis on IMDb Reviews

This project applies Natural Language Processing (NLP) techniques to classify movie reviews from IMDb into positive or negative sentiment categories. It follows a two-phase approach:

  • 🔍 Phase 1: Text preprocessing and exploratory analysis using R and tidyverse libraries.
  • 🤖 Phase 2 (in progress): Sentiment classification models in Python using Scikit-learn and PyTorch.

📁 Dataset

📌 Project Phases

🧪 Phase 1: Text Processing in R

  • Text cleaning: lowercasing, stopword removal, HTML cleanup
  • Tokenization and lemmatization
  • POS tagging with udpipe
  • Visualization: word clouds, bar charts, n-gram analysis
  • 📈 Full EDA Notebook on RPubs

🧠 Phase 2: Modeling in Python (Coming Soon)

  • Data exported as IMDB-cleaned.csv
  • Model candidates:
    • Logistic Regression
    • Naive Bayes
    • Support Vector Machines
    • PyTorch-based classifier
  • Metrics: Accuracy, F1-score, ROC-AUC

📦 Deliverables

  • sentiment-analysis.Rmd: Full EDA notebook in R
  • IMDB-cleaned.csv: Preprocessed dataset
  • model_sentiment.py: Sentiment classification model (planned)
  • Streamlit / Flask deployment (planned)

🚀 Deployment Ideas

  • Build an interactive dashboard (Streamlit)
  • Deploy a REST API using FastAPI
  • (Optional) Real-time batch sentiment processing with Apache Spark

📚 Skills & Tools Used

  • R: tidyverse, tidytext, udpipe, ggplot2, SnowballC
  • Python: scikit-learn, NLTK, PyTorch (planned)
  • EDA & Reporting: R Markdown, RPubs

🧠 Author

Manuel Alejandro Matías Astorga
Data Scientist | Physicist | Machine Learning Enthusiast
📄 Portfolio Website · LinkedIn


✨ Feel free to fork, contribute or reach out if you're working on similar projects!

About

Sentiment Analysis and Natural Language Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published