Naïve Bayes — The Idiot Genius of Algorithms: Machine Learning in Python

Published in

Artificial Intelligence in Plain English

5 min readJan 24, 2021

Probability and Classification is one of the most important aspect of Machine Learning. They often go hand in hand with each other. We use various algorithms to classify data into distinguishable classes. One such algorithm is The Naïve Bayes Algorithm.

“A learner that uses Bayes’ theorem and assumes the effects are independent given the cause is called a Naïve Bayes classifier. That’s because, well, that’s such a naïve assumption.”
— Pedro Domingos

The basics of Naïve Bayes Algorithm

In order to understand Naïve Bayes, we require some basic knowledge of probability and the Bayes Theorem. To understand this, lets consider an example:

A Dice and a Coin are thrown. What is the probability of a HEAD and a 4?
We know that Probability P of an Event E can be calculated as P(E) = Number of favorable Outcomes / Total Number of Outcomes. So we can calculate the probability for this example as: (1/2) * (1/6) = 1/12.

There are a few different types of probabilities that we need to understand before staring with Naïve Bayes Algorithm.

Conditional Probability:
It is the measure of probability of an Event A occurring given that another Event B has occurred.
Joint Probability:
It is the measure of two or more events happening simultaneously.
Marginal Probability:
It is the measure of probability of an Event irrespective of the outcome of other Events.
Proportionality:
It is the relationship between two quantities that are multiplicatively connected to a constant.

Bayes Theorem

The Naïve Bayes Classifier is based on Bayes Theorem with the “Naïve Assumption” that the features are independent of each other. To understand this assumption, we shall look into what Bayes Theorem states:

Posterior Probability:
It is the probability of Event A occurring given that Event B has already occurred.
Prior Probability:
It is the probability of Event A to occur before any relevant evidence is taken into account.
Likelihood:
It is the probability of Event B occurring given that Event A has already occurred.
Marginalization:
It is the probability of Event B occurring.

Given a feature vector X=(x₁,x₂,…,xₙ) and a class variable Cₖ, Bayes Theorem states that:

Here, P(Cₖ|X) is the Posterior Probability, P(X|Cₖ) is the likelihood, P(Cₖ) the Prior Probability, and P(X) the Marginalization factor or the Prior Probability of Predictor.

Using chain rule, we can calculate the likelihood P(X|Cₖ) as:

The Naïve Assumption

It is extremely difficult and expensive to calculate P(X|Cₖ) using the above formula. Instead, Naïve Bayes assumes conditional independence of each feature. This makes it easier to find the Posterior Probability as:

Assuming: P(xᵢ|xᵢ+₁,…,xₙ|Cₖ) = P(xᵢ|Cₖ)

We can conclude: P(X|Cₖ) = P(x₁,…,xₙ|Cₖ) = ⁿ∏ᵢ₌₁ P(xᵢ|Cₖ)

Thus, Posterior Probability can be written as:

Here, P(X) is the Prior Probability of predictor which is constant given the input. So we can say:

and finally, we need to find the maximum of ⁿ∏ᵢ₌₁ P(xᵢ|Cₖ) for different class values of Cₖ :

Types of Naïve Bayes

There are 3 types of Naïve Bayes Algorithm used in practice:

Multinomial Naïve Bayes:
It assumes that each P(xₙ|Cₖ) follows a multinomial distribution. It is mainly used in document classification problems and looks at frequency of words.
Bernoulli Naïve Bayes:
It is similar to Multinomial Naïve Bayes, except that the predictors are Boolean.
Gaussian Naïve Bayes:
It assumes that continuous values are sampled from a gaussian distribution.

Implementing Naïve Bayes from Scratch

Import Required Libraries:

import numpy as np
import pandas as pd

Define the Naïve Bayes Class and Functions:

class NaiveBayes:
    def __init__(self, X, y):
        self.no_examples, self.no_features = X.shape
        self.no_classes = len(np.unique(y))
        self.eps = 1e-6def fit(self, X):
        self.classes_mean = {}
        self.classes_variance = {}
        self.classes_prior = {}
        for c in range(self.no_classes):
            X_c = X[y == c]
            self.classes_mean[str(c)] = np.mean(X_c, axis=0)
            self.classes_variance[str(c)] = np.var(X_c, axis=0)
            self.classes_prior[str(c)] = X_c.shape[0] / X.shape[0]def predict(self, X):
        probs = np.zeros((self.no_examples, self.no_classes))
        for c in range(self.no_classes):
            prior = self.classes_prior[str(c)]
            probs_c = self.density_function(X, self.classes_mean[str(c)], self.classes_variance[str(c)])
            probs[:, c] = probs_c + np.log(prior)
        return np.argmax(probs, 1)# log of PDF used:
def density_function(self, x, mean, sigma):
        const = -self.no_features / 2 * np.log(2 * np.pi) - 0.5 * np.sum(np.log(sigma + self.eps))
        probs = 0.5 * np.sum(np.power(x - mean, 2) / (sigma + self.eps), 1)
        return const - probs

Note: The log of Probability Distribution Function is used here to scale the 0–1 probability to -∞-0 and converts Multiplication operation to Addition which reduces the expense of algorithm.

Load the Dataset:

Note: Dataset used can be found here. Download Data, Download Targets.

X = np.loadtxt("/content/data.txt", delimiter=",")
y = np.loadtxt("/content/targets.txt") - 1
print(X.shape, y.shape)

Fit the Data:

# Create an Object of Class NaiveBayes
NB = NaiveBayes(X, y)# Fit the Data
NB.fit(X)

Predict and Check Accuracy:

# Predict the Data
y_pred = NB.predict(X)# Check Accuracy
print(f"Accuracy: {(sum(y_pred==y)/X.shape[0])*100}")

This documentation gives an inside on working and math used in Naïve Bayes Algorithm. The blog provides a basic understanding of different types of probability involved in the process of creating a Naïve Bayes Model from Scratch. It is recommended to read more on the Internet about the math behind all types of Naïve Bayes Algorithms.

Thanks for reading.
Don’t forget to click on 👏!