TASK 2 : Model Steps – Spam Classification using Logistic Regression
- Step 1 : Import relevant libraries: A library is simply a collection of codes or modules of codes that can be used in a program for specific operations.
sudo apt install -y python3-pandas
sudo apt install -y python3-numpy
import numpy as np
import pandas as pd
- Step 2 : Import sklearn library: Simple and efficient tools for predictive data analysis
sudo apt install -y python3-sklearn
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
- Step 3 : Loading Data: The dataset is loaded from a CSV file using pandas. This dataset contains emails labeled as 'spam' or 'ham' (not spam).
data = pd.read_csv('https://raw.githubusercontent.com/AiDevNepal/ai-saturdays-workshop-8/master/data/spam.csv')
data['target'] = np.where(data['target']=='spam', 1, 0)
- Step 4 : Train and Test Data – Splitting: The dataset is split into training and test sets using train_test_split.
X_train, X_test, Y_train, Y_test = train_test_split(data['text'], data['target'], random_state=0)
vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)
model = LogisticRegression(max_iter=1000)
model.fit(X_train_vectorized, Y_train)
- Step 6 : Model Prediction and Testing:
print("Accuracy:", model.score(X_test_vectorized, Y_test))