Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Naive Bayes

Naive Bayes is a simple yet powerful algorithm used for classification tasks in machine learning. It is based on Bayes’ Theorem, which describes the probability of an event based on prior knowledge of conditions that might be related to the event. The “naive” aspect of the algorithm comes from the assumption that all features are independent of each other, which is often not the case in real-world data.

Dataset

Our dataset,

OutlookTemperatureHumidityWindyPlay
sunnyhothighfalseno
sunnyhothightrueno
overcasthothighfalseyes
rainymildhighfalseyes
rainycoolnormalfalseyes
rainycoolnormaltrueno
overcastcoolnormaltrueyes
sunnymildhighfalseno
sunnycoolnormalfalseyes
rainymildnormalfalseyes
sunnymildnormaltrueyes
overcastmildhightrueyes
overcasthotnormalfalseyes
rainymildhightrueno

Theory

Bayes’ Theorem

According to Bayes’ theorem, this is proportional to the prior P(y)P(y) multiplied by the likelihood P(Ey)P(E | y):

P(yE)P(y)P(Ey)P(y | E) \propto P(y) \cdot P(E | y)

The “naive” assumption is that all features are independent, so we can break down the likelihood:

P(Ey)=P(sunnyy)P(cooly)P(highy)P(truey)P(E | y) = P(\text{sunny} | y) \cdot P(\text{cool} | y) \cdot P(\text{high} | y) \cdot P(\text{true} | y)

This gives us the final formula we need to compare:

Ypredicted=argmaxy{yes, no}[P(y)P(sunnyy)P(cooly)P(highy)P(truey)]Y_{predicted} = \arg\max_{y \in \{\text{yes, no}\}} \left[ P(y) \cdot P(\text{sunny} | y) \cdot P(\text{cool} | y) \cdot P(\text{high} | y) \cdot P(\text{true} | y) \right]

Likelihood Calculations (with Laplace Smoothing)

Our code uses Laplace (or Add-1) Smoothing to prevent zero-probability problems. The formula for each conditional probability is:

P(xiy)=count(xi,y)+1count(y)+kP(x_i | y) = \frac{\text{count}(x_i, y) + 1}{\text{count}(y) + k}

Where:

  • count(xi,y)\text{count}(x_i, y) is the number of times the feature value xix_i appears with class yy.

  • count(y)\text{count}(y) is the total number of times class yy appears.

  • kk is the total number of unique values for that feature (e.g., k=3k=3 for Outlook, k=2k=2 for Windy).

Implementation

dataset = [
    ['sunny', 'hot', 'high', 'false', 'no'],
    ['sunny', 'hot', 'high', 'true', 'no'],
    ['overcast', 'hot', 'high', 'false', 'yes'],
    ['rainy', 'mild', 'high', 'false', 'yes'],
    ['rainy', 'cool', 'normal', 'false', 'yes'],
    ['rainy', 'cool', 'normal', 'true', 'no'],
    ['overcast', 'cool', 'normal', 'true', 'yes'],
    ['sunny', 'mild', 'high', 'false', 'no'],
    ['sunny', 'cool', 'normal', 'false', 'yes'],
    ['rainy', 'mild', 'normal', 'false', 'yes'],
    ['sunny', 'mild', 'normal', 'true', 'yes'],
    ['overcast', 'mild', 'high', 'true', 'yes'],
    ['overcast', 'hot', 'normal', 'false', 'yes'],
    ['rainy', 'mild', 'high', 'true', 'no']
]
def train_naive_bayes(data):
    label_counts = {}
    feature_counts = {}

    for row in data:
        outlook, temp, humidity, windy, label = row

        label_counts[label] = label_counts.get(label, 0) + 1

        if label not in feature_counts:
            feature_counts[label] = {"Outlook": {}, "Temp": {}, "Humidity": {}, "Windy": {}}

        feature_counts[label]["Outlook"][outlook] = feature_counts[label]["Outlook"].get(outlook, 0) + 1
        feature_counts[label]["Temp"][temp] = feature_counts[label]["Temp"].get(temp, 0) + 1
        feature_counts[label]["Humidity"][humidity] = feature_counts[label]["Humidity"].get(humidity, 0) + 1
        feature_counts[label]["Windy"][windy] = feature_counts[label]["Windy"].get(windy, 0) + 1

    return label_counts, feature_counts

label_counts, feature_counts = train_naive_bayes(dataset)
print("Label Counts:", label_counts)
print("Feature Counts:", feature_counts)
Label Counts: {'no': 5, 'yes': 9}
Feature Counts: {'no': {'Outlook': {'sunny': 3, 'rainy': 2}, 'Temp': {'hot': 2, 'cool': 1, 'mild': 2}, 'Humidity': {'high': 4, 'normal': 1}, 'Windy': {'false': 2, 'true': 3}}, 'yes': {'Outlook': {'overcast': 4, 'rainy': 3, 'sunny': 2}, 'Temp': {'hot': 2, 'mild': 4, 'cool': 3}, 'Humidity': {'high': 3, 'normal': 6}, 'Windy': {'false': 6, 'true': 3}}}
def predict_naive_bayes(x, label_counts, feature_counts):
    total = sum(label_counts.values())
    probs = {}
    
    feature_names = ["Outlook", "Temp", "Humidity", "Windy"]

    for label in label_counts:
        probs[label] = label_counts[label] / total

        for i, feature in enumerate(feature_names):
            value = x[i] # Get 'sunny', then 'cool', etc.
            
            count = feature_counts[label][feature].get(value, 0)
            
            num_options = len(feature_counts[label][feature])

            probs[label] *= (count + 1) / (label_counts[label] + num_options)

    return max(probs, key=probs.get)


test_sample = ['sunny', 'cool', 'high', 'true']
prediction = predict_naive_bayes(test_sample, label_counts, feature_counts)

print("Test Sample:", test_sample)
print("Predicted Class:", prediction)
Test Sample: ['sunny', 'cool', 'high', 'true']
Predicted Class: no

Sci-kit

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix

df = pd.DataFrame(dataset, columns=['outlook', 'temperature', 'humidity', 'windy', 'play'])
le = LabelEncoder()
for i in df.columns:
    df[i] = le.fit_transform(df[i])

x = df.drop('play', axis=1)
y = df['play']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

clf = GaussianNB()
clf.fit(x_train, y_train)

y_pred = clf.predict(x_test)

print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
Classification Report:
              precision    recall  f1-score   support

           0       0.50      0.50      0.50         2
           1       0.67      0.67      0.67         3

    accuracy                           0.60         5
   macro avg       0.58      0.58      0.58         5
weighted avg       0.60      0.60      0.60         5

Confusion Matrix:
[[1 1]
 [1 2]]