In the previous article, we learned about Classification, where the goal is to predict categories such as:

  • Spam or Not Spam
  • Fraud or Genuine
  • Pass or Fail
  • Disease or No Disease

A classification model often needs to answer a question like:

"What is the probability that this observation belongs to a particular class?"

For example:

  • Probability a customer will churn = 0.85
  • Probability an email is spam = 0.92
  • Probability a patient has a disease = 0.76

But there is a challenge.

Many Machine Learning models naturally produce outputs ranging from:

 to +-\infty \text{ to } +\infty

Probabilities, however, must always lie between:

0 and 10 \text{ and } 1

How do we convert any numerical value into a valid probability?

The answer is the Sigmoid Function.

The Sigmoid Function is one of the most important mathematical functions in Machine Learning and forms the foundation of Logistic Regression.

What is the Sigmoid Function?

The Sigmoid Function is a mathematical function that converts any real number into a value between 0 and 1.

Formula:

σ(x)=11+ex\sigma(x)=\frac{1}{1+e^{-x}}

Where:

  • xx = Input value
  • ee = Euler's number (≈ 2.718)
  • σ(x)\sigma(x) = Output probability

Regardless of how large or small the input is, the output always remains between 0 and 1.

Why Do We Need the Sigmoid Function?

Suppose a model produces:

x=100x=100

Can this be a probability?

No.

Probabilities cannot exceed 1.

Similarly:

x=50x=-50

Cannot represent a probability.

The Sigmoid Function transforms such values into:

0σ(x)10 \le \sigma(x) \le 1

making them valid probabilities.

Understanding the Shape of the Sigmoid Curve

The Sigmoid Function produces an S-shaped curve.

1.0 |                 ****
| ****
0.5 |---------****
| ****
0.0 |****
+------------------->
x

This shape is why it is often called the:

S-Curve

Key Property

No matter what value we provide:

Output always lies between:

0 and 10 \text{ and } 1

This makes it ideal for classification problems.

Example Calculations

Case 1

Input:

x=0x=0

Formula:

σ(0)=11+e0=12\sigma(0) = \frac{1}{1+e^0} = \frac{1}{2}

Result:

0.50.5

Case 2

Input:

x=5x=5

Result:

0.9930.993

Very close to 1.

Case 3

Input:

x=5x=-5

Result:

0.0070.007

Very close to 0.

Interpretation

Large Positive Inputs:

Probability ≈ 1

Large Negative Inputs:

Probability ≈ 0

Input Near Zero:

Probability ≈ 0.5

Why is 0.5 Important?

Notice:

σ(0)=0.5\sigma(0)=0.5

This creates a natural decision boundary.

Classification systems often use:

Probability ≥ 0.5

Class 1

Probability < 0.5

Class 0

Input vs Output Table

Input (x)Sigmoid Output
-100.000045
-50.0067
-20.119
00.5
20.881
50.993
100.99995

Observe:

  • Large negatives approach 0
  • Large positives approach 1

Understanding the Transformation

Without Sigmoid:

Input:
-∞ → +∞

With Sigmoid:

Output:
0 → 1

The function compresses an infinite range into a probability range.

Why Not Use a Straight Line?

Suppose:

y=xy=x

Input:

x=10x=10

Output:

1010

Not a valid probability.

A probability function must:

  • Stay between 0 and 1
  • Increase smoothly
  • Be mathematically differentiable

Sigmoid satisfies all these requirements.

Symmetry of the Sigmoid Function

The curve is symmetric around:

x=0x=0

Example:

xOutput
-20.119
20.881

The probabilities mirror each other around 0.5.

Saturation Regions

Sigmoid has two saturation regions.

Left Saturation

Very negative inputs:

Output ≈ 0

Right Saturation

Very positive inputs:

Output ≈ 1

In these regions, changes in input produce very small output changes.

Sigmoid as a Probability Generator

Suppose a model computes:

z=3z=3

Applying Sigmoid:

σ(3)=0.95\sigma(3)=0.95

Interpretation:

95% probability of belonging to Class 1.

Example: Loan Approval

Model Output:

z=2z=2

Sigmoid:

0.880.88

Interpretation:

88% probability that the loan should be approved.

Example: Spam Detection

Model Output:

z=4z=-4

Sigmoid:

0.0180.018

Interpretation:

1.8% probability that the email is spam.

Prediction:

Not Spam.

Sigmoid and Decision Making

The Sigmoid Function itself does not produce classes.

It produces probabilities.

Example:

0.92

The classifier then applies a threshold.

Typically:

Threshold = 0.5

Threshold-Based Classification

Example:

ProbabilityPrediction
0.90Class 1
0.70Class 1
0.55Class 1
0.40Class 0
0.10Class 0

Sigmoid in Logistic Regression

Logistic Regression first computes:

z=β0+β1x1+β2x2++βnxnz=\beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_nx_n

This value can be any real number.

Then:

Sigmoid transforms it into:

P(y=1)P(y=1)

This probability is used for classification.

Visualization of Logistic Regression Process

Features

Linear Equation

z

Sigmoid Function

Probability

Class Label

Derivative of the Sigmoid Function

One reason Sigmoid became popular is its simple derivative.

Formula:

σ(x)=σ(x)(1σ(x))\sigma'(x)=\sigma(x)(1-\sigma(x))

This property simplifies optimization and learning.

Advantages of the Sigmoid Function

  • Outputs valid probabilities
  • Smooth and differentiable
  • Easy to interpret
  • Works well for binary classification
  • Forms the foundation of Logistic Regression

Limitations of the Sigmoid Function

Vanishing Gradient Problem

For very large positive or negative inputs:

Gradient becomes extremely small.

This can slow learning in deep neural networks.

Not Zero-Centered

Outputs range:

0 to 10 \text{ to } 1

rather than:

1 to 1-1 \text{ to } 1

This can sometimes affect optimization efficiency.

Python Implementation

Using NumPy:

import numpy as np

def sigmoid(x):
return 1 / (1 + np.exp(-x))

Example:

print(sigmoid(0))

Output:

0.5

Example:

print(sigmoid(5))

Output:

0.993

Visualizing the Sigmoid Function

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-10, 10, 100)

y = 1 / (1 + np.exp(-x))

plt.plot(x, y)

plt.xlabel("x")
plt.ylabel("Sigmoid(x)")

plt.show()

Real-World Applications

Logistic Regression

Converts model outputs into probabilities.

Medical Diagnosis

Disease probability estimation.

Credit Scoring

Probability of loan default.

Marketing

Customer churn prediction.

Fraud Detection

Probability of fraudulent activity.

Common Mistakes

Thinking Sigmoid Produces Classes

Incorrect.

Sigmoid produces probabilities.

Thresholds convert probabilities into classes.

Assuming Probability Equals Certainty

A probability of:

0.800.80

means likely, not guaranteed.

Ignoring Threshold Selection

The default threshold is often 0.5, but different applications may require different thresholds.

Best Practices

  • Interpret sigmoid outputs as probabilities
  • Use suitable classification thresholds
  • Combine with proper evaluation metrics
  • Understand probability calibration
  • Visualize probability distributions when possible

Sigmoid Function Workflow

A typical classification workflow is:

  1. Compute linear score
  2. Apply sigmoid function
  3. Obtain probability
  4. Apply threshold
  5. Predict class label

Why the Sigmoid Function is Important

The Sigmoid Function acts as the bridge between mathematical model outputs and real-world probabilities. Without it, Logistic Regression would produce unrestricted numerical values that cannot be interpreted as probabilities.

By compressing any real number into a range between 0 and 1, the Sigmoid Function enables probability-based decision making, making it one of the most important mathematical concepts in classification problems.

In the next article, we will explore Logistic Regression Intuition, where we will use the Sigmoid Function to understand how classification models learn decision boundaries and make predictions.