This project builds a Fake News Detector using Natural Language Processing (NLP) and Machine Learning. We use the Naïve Bayes Classifier trained on the Fake.csv
dataset to classify news articles as FAKE or REAL.
Ensure you have Python installed, then install the required dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn
📁 fake-news-detection
│── 📄 fake_news_detection.py # Main script
│── 📄 README.md # Project documentation
│── 📊 Fake.csv # Dataset (should be placed here)
import pandas as pd
import numpy as np
import re
import string
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
df = pd.read_csv("Fake.csv", on_bad_lines="skip", encoding='utf-8')
print(df.columns) # Display column names
print(df['subject'].unique()) # Check unique values in 'subject'
df.rename(columns={'subject': 'label'}, inplace=True) # Rename 'subject' to 'label'
def clean_text(text):
text = text.lower() # Convert to lowercase
text = re.sub(r'\[.*?\]', '', text) # Remove text inside brackets
text = re.sub(r"https?://\S+|www\.\S+", '', text) # Remove URLs
text = re.sub(r"<.*?>+", '', text) # Remove HTML tags
text = re.sub(r"[^\w\s]", '', text) # Remove punctuation
text = text.strip() # Remove leading/trailing spaces
return text
df['content'] = df['content'].apply(clean_text)
X = df['content']
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)
y_pred = model.predict(X_test_tfidf)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))
# Confusion Matrix
plt.figure(figsize=(6,4))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
def predict_news(news_text):
news_text = clean_text(news_text)
news_vectorized = vectorizer.transform([news_text])
prediction = model.predict(news_vectorized)
return "FAKE NEWS" if prediction[0] == 1 else "REAL NEWS"
# Example Usage
news = "Breaking news! The direct dbt of 72000 per month will be given to each person"
print("\nPrediction:", predict_news(news))
✅ Data preprocessing (cleaning text) improved model accuracy.
✅ TF-IDF vectorization helped in feature extraction.
✅ The Naïve Bayes model effectively classifies fake news.
🔹 Try different machine learning models (e.g., Random Forest, LSTM).
🔹 Improve text preprocessing by removing stopwords.
🔹 Deploy the model using Flask or Streamlit for a web-based interface.
👤 Your Name | GitHub Profile
⭐ If you like this project, don't forget to give it a star on GitHub! ⭐
🚀 Happy Coding & Stay Informed! 📰