Sigmoid Function in Machine Learning

Last updated: Jun 13, 2026

Author :

Christy Harshitha Dakarapu

In the previous article, we learned about Classification, where the goal is to predict categories such as:

Spam or Not Spam
Fraud or Genuine
Pass or Fail
Disease or No Disease

A classification model often needs to answer a question like:

"What is the probability that this observation belongs to a particular class?"

For example:

Probability a customer will churn = 0.85
Probability an email is spam = 0.92
Probability a patient has a disease = 0.76

But there is a challenge.

Many Machine Learning models naturally produce outputs ranging from:

-\infty \text{ to } +\infty

Probabilities, however, must always lie between:

0 \text{ and } 1

How do we convert any numerical value into a valid probability?

The answer is the Sigmoid Function.

The Sigmoid Function is one of the most important mathematical functions in Machine Learning and forms the foundation of Logistic Regression.

What is the Sigmoid Function?

The Sigmoid Function is a mathematical function that converts any real number into a value between 0 and 1.

Formula:

$\sigma(x)=\frac{1}{1+e^{-x}}$

Where:

$x$ = Input value
$e$ = Euler's number (≈ 2.718)
$\sigma(x)$ = Output probability

Regardless of how large or small the input is, the output always remains between 0 and 1.

Why Do We Need the Sigmoid Function?

Suppose a model produces:

x=100

Can this be a probability?

No.

Probabilities cannot exceed 1.

Similarly:

x=-50

Cannot represent a probability.

The Sigmoid Function transforms such values into:

0 \le \sigma(x) \le 1

making them valid probabilities.

Understanding the Shape of the Sigmoid Curve

The Sigmoid Function produces an S-shaped curve.


1.0 |                 ****
    |             ****
0.5 |---------****
    |     ****
0.0 |****
    +------------------->
         x

This shape is why it is often called the:

S-Curve

Key Property

No matter what value we provide:

Output always lies between:

0 \text{ and } 1

This makes it ideal for classification problems.

Example Calculations

Case 1

Input:

x=0

Formula:

\sigma(0) = \frac{1}{1+e^0} = \frac{1}{2}

Result:

0.5

Case 2

Input:

x=5

Result:

0.993

Very close to 1.

Case 3

Input:

x=-5

Result:

0.007

Very close to 0.

Interpretation

Large Positive Inputs:


Probability ≈ 1

Large Negative Inputs:


Probability ≈ 0

Input Near Zero:


Probability ≈ 0.5

Why is 0.5 Important?

Notice:

\sigma(0)=0.5

This creates a natural decision boundary.

Classification systems often use:


Probability ≥ 0.5
      ↓
Class 1

Probability < 0.5
      ↓
Class 0

Input vs Output Table

Input (x)	Sigmoid Output
-10	0.000045
-5	0.0067
-2	0.119
0	0.5
2	0.881
5	0.993
10	0.99995

Observe:

Large negatives approach 0
Large positives approach 1

Understanding the Transformation

Without Sigmoid:


Input:
-∞ → +∞

With Sigmoid:


Output:
0 → 1

The function compresses an infinite range into a probability range.

Why Not Use a Straight Line?

Suppose:

y=x

Input:

x=10

Output:

10

Not a valid probability.

A probability function must:

Stay between 0 and 1
Increase smoothly
Be mathematically differentiable

Sigmoid satisfies all these requirements.

Symmetry of the Sigmoid Function

The curve is symmetric around:

x=0

Example:

x	Output
-2	0.119
2	0.881

The probabilities mirror each other around 0.5.

Saturation Regions

Sigmoid has two saturation regions.

Left Saturation

Very negative inputs:


Output ≈ 0

Right Saturation

Very positive inputs:


Output ≈ 1

In these regions, changes in input produce very small output changes.

Sigmoid as a Probability Generator

Suppose a model computes:

z=3

Applying Sigmoid:

\sigma(3)=0.95

Interpretation:

95% probability of belonging to Class 1.

Example: Loan Approval

Model Output:

z=2

Sigmoid:

0.88

Interpretation:

88% probability that the loan should be approved.

Example: Spam Detection

Model Output:

z=-4

Sigmoid:

0.018

Interpretation:

1.8% probability that the email is spam.

Prediction:

Not Spam.

Sigmoid and Decision Making

The Sigmoid Function itself does not produce classes.

It produces probabilities.

Example:


0.92

The classifier then applies a threshold.

Typically:


Threshold = 0.5

Threshold-Based Classification

Example:

Probability	Prediction
0.90	Class 1
0.70	Class 1
0.55	Class 1
0.40	Class 0
0.10	Class 0

Sigmoid in Logistic Regression

Logistic Regression first computes:

z=\beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_nx_n

This value can be any real number.

Then:

Sigmoid transforms it into:

P(y=1)

This probability is used for classification.

Visualization of Logistic Regression Process


Features
    ↓
Linear Equation
    ↓
z
    ↓
Sigmoid Function
    ↓
Probability
    ↓
Class Label

Derivative of the Sigmoid Function

One reason Sigmoid became popular is its simple derivative.

Formula:

$\sigma'(x)=\sigma(x)(1-\sigma(x))$

This property simplifies optimization and learning.

Advantages of the Sigmoid Function

Outputs valid probabilities
Smooth and differentiable
Easy to interpret
Works well for binary classification
Forms the foundation of Logistic Regression

Limitations of the Sigmoid Function

Vanishing Gradient Problem

For very large positive or negative inputs:

Gradient becomes extremely small.

This can slow learning in deep neural networks.

Not Zero-Centered

Outputs range:

0 \text{ to } 1

rather than:

-1 \text{ to } 1

This can sometimes affect optimization efficiency.

Python Implementation

Using NumPy:


import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Example:


print(sigmoid(0))

Output:

0.5

Example:


print(sigmoid(5))

Output:


0.993

Visualizing the Sigmoid Function


import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-10, 10, 100)

y = 1 / (1 + np.exp(-x))

plt.plot(x, y)

plt.xlabel("x")
plt.ylabel("Sigmoid(x)")

plt.show()

Real-World Applications

Logistic Regression

Converts model outputs into probabilities.

Medical Diagnosis

Disease probability estimation.

Credit Scoring

Probability of loan default.

Marketing

Customer churn prediction.

Fraud Detection

Probability of fraudulent activity.

Common Mistakes

Thinking Sigmoid Produces Classes

Incorrect.

Sigmoid produces probabilities.

Thresholds convert probabilities into classes.

Assuming Probability Equals Certainty

A probability of:

0.80

means likely, not guaranteed.

Ignoring Threshold Selection

The default threshold is often 0.5, but different applications may require different thresholds.

Best Practices

Interpret sigmoid outputs as probabilities
Use suitable classification thresholds
Combine with proper evaluation metrics
Understand probability calibration
Visualize probability distributions when possible

Sigmoid Function Workflow

A typical classification workflow is:

Compute linear score
Apply sigmoid function
Obtain probability
Apply threshold
Predict class label

Why the Sigmoid Function is Important

The Sigmoid Function acts as the bridge between mathematical model outputs and real-world probabilities. Without it, Logistic Regression would produce unrestricted numerical values that cannot be interpreted as probabilities.

By compressing any real number into a range between 0 and 1, the Sigmoid Function enables probability-based decision making, making it one of the most important mathematical concepts in classification problems.

In the next article, we will explore Logistic Regression Intuition, where we will use the Sigmoid Function to understand how classification models learn decision boundaries and make predictions.