Hack, Learn, Secure: Your Cybersecurity Playground

Developed by the Secure and Intelligent Distribution Computing (SIDC) Lab

Home

AI Analyser2

Example – Email Spam Detection

TASK 2 : Model Steps – Spam Classification using Logistic Regression

Step 1 : Import relevant libraries: A library is simply a collection of codes or modules of codes that can be used in a program for specific operations.


          sudo apt install -y python3-pandas

          sudo apt install -y python3-numpy

          import numpy as np

          import pandas as pd

Step 2 : Import sklearn library: Simple and efficient tools for predictive data analysis


          sudo apt install -y python3-sklearn

          from sklearn.feature_extraction.text import CountVectorizer

          from sklearn.model_selection import train_test_split

          from sklearn.linear_model import LogisticRegression

Step 3 : Loading Data: The dataset is loaded from a CSV file using pandas. This dataset contains emails labeled as 'spam' or 'ham' (not spam).


          data = pd.read_csv('https://raw.githubusercontent.com/AiDevNepal/ai-saturdays-workshop-8/master/data/spam.csv')

          data['target'] = np.where(data['target']=='spam', 1, 0)

Step 4 : Train and Test Data – Splitting: The dataset is split into training and test sets using train_test_split.


          X_train, X_test, Y_train, Y_test = train_test_split(data['text'], data['target'], random_state=0)

          vectorizer = CountVectorizer()

	    X_train_vectorized = vectorizer.fit_transform(X_train)

	    X_test_vectorized = vectorizer.transform(X_test)

Step 5 : Model Training:


          model = LogisticRegression(max_iter=1000)

          model.fit(X_train_vectorized, Y_train)

Step 6 : Model Prediction and Testing:

 
	   
	   print("Accuracy:", model.score(X_test_vectorized, Y_test))