In the previous article, we learned about Classification, where the goal is to predict categories such as:
- Spam or Not Spam
- Fraud or Genuine
- Pass or Fail
- Disease or No Disease
A classification model often needs to answer a question like:
"What is the probability that this observation belongs to a particular class?"
For example:
- Probability a customer will churn = 0.85
- Probability an email is spam = 0.92
- Probability a patient has a disease = 0.76
But there is a challenge.
Many Machine Learning models naturally produce outputs ranging from:
−∞ to +∞Probabilities, however, must always lie between:
0 and 1How do we convert any numerical value into a valid probability?
The answer is the Sigmoid Function.
The Sigmoid Function is one of the most important mathematical functions in Machine Learning and forms the foundation of Logistic Regression.
What is the Sigmoid Function?
The Sigmoid Function is a mathematical function that converts any real number into a value between 0 and 1.
Formula:
σ(x)=1+e−x1
Where:
- x = Input value
- e = Euler's number (≈ 2.718)
- σ(x) = Output probability
Regardless of how large or small the input is, the output always remains between 0 and 1.
Why Do We Need the Sigmoid Function?
Suppose a model produces:
x=100Can this be a probability?
No.
Probabilities cannot exceed 1.
Similarly:
x=−50Cannot represent a probability.
The Sigmoid Function transforms such values into:
0≤σ(x)≤1making them valid probabilities.
Understanding the Shape of the Sigmoid Curve
The Sigmoid Function produces an S-shaped curve.
1.0 | ****
| ****
0.5 |---------****
| ****
0.0 |****
+------------------->
x
This shape is why it is often called the:
S-Curve
Key Property
No matter what value we provide:
Output always lies between:
0 and 1This makes it ideal for classification problems.
Example Calculations
Case 1
Input:
x=0Formula:
σ(0)=1+e01=21Result:
0.5Case 2
Input:
x=5Result:
0.993Very close to 1.
Case 3
Input:
x=−5Result:
0.007Very close to 0.
Interpretation
Large Positive Inputs:
Probability ≈ 1
Large Negative Inputs:
Probability ≈ 0
Input Near Zero:
Probability ≈ 0.5
Why is 0.5 Important?
Notice:
σ(0)=0.5This creates a natural decision boundary.
Classification systems often use:
Probability ≥ 0.5
↓
Class 1
Probability < 0.5
↓
Class 0
Input vs Output Table
| Input (x) | Sigmoid Output |
|---|---|
| -10 | 0.000045 |
| -5 | 0.0067 |
| -2 | 0.119 |
| 0 | 0.5 |
| 2 | 0.881 |
| 5 | 0.993 |
| 10 | 0.99995 |
Observe:
- Large negatives approach 0
- Large positives approach 1
Understanding the Transformation
Without Sigmoid:
Input:
-∞ → +∞
With Sigmoid:
Output:
0 → 1
The function compresses an infinite range into a probability range.
Why Not Use a Straight Line?
Suppose:
y=xInput:
x=10Output:
10Not a valid probability.
A probability function must:
- Stay between 0 and 1
- Increase smoothly
- Be mathematically differentiable
Sigmoid satisfies all these requirements.
Symmetry of the Sigmoid Function
The curve is symmetric around:
x=0Example:
| x | Output |
|---|---|
| -2 | 0.119 |
| 2 | 0.881 |
The probabilities mirror each other around 0.5.
Saturation Regions
Sigmoid has two saturation regions.
Left Saturation
Very negative inputs:
Output ≈ 0
Right Saturation
Very positive inputs:
Output ≈ 1
In these regions, changes in input produce very small output changes.
Sigmoid as a Probability Generator
Suppose a model computes:
z=3Applying Sigmoid:
σ(3)=0.95Interpretation:
95% probability of belonging to Class 1.
Example: Loan Approval
Model Output:
z=2Sigmoid:
0.88Interpretation:
88% probability that the loan should be approved.
Example: Spam Detection
Model Output:
z=−4Sigmoid:
0.018Interpretation:
1.8% probability that the email is spam.
Prediction:
Not Spam.
Sigmoid and Decision Making
The Sigmoid Function itself does not produce classes.
It produces probabilities.
Example:
0.92
The classifier then applies a threshold.
Typically:
Threshold = 0.5
Threshold-Based Classification
Example:
| Probability | Prediction |
|---|---|
| 0.90 | Class 1 |
| 0.70 | Class 1 |
| 0.55 | Class 1 |
| 0.40 | Class 0 |
| 0.10 | Class 0 |
Sigmoid in Logistic Regression
Logistic Regression first computes:
z=β0+β1x1+β2x2+⋯+βnxnThis value can be any real number.
Then:
Sigmoid transforms it into:
P(y=1)This probability is used for classification.
Visualization of Logistic Regression Process
Features
↓
Linear Equation
↓
z
↓
Sigmoid Function
↓
Probability
↓
Class Label
Derivative of the Sigmoid Function
One reason Sigmoid became popular is its simple derivative.
Formula:
σ′(x)=σ(x)(1−σ(x))
This property simplifies optimization and learning.
Advantages of the Sigmoid Function
- Outputs valid probabilities
- Smooth and differentiable
- Easy to interpret
- Works well for binary classification
- Forms the foundation of Logistic Regression
Limitations of the Sigmoid Function
Vanishing Gradient Problem
For very large positive or negative inputs:
Gradient becomes extremely small.
This can slow learning in deep neural networks.
Not Zero-Centered
Outputs range:
0 to 1rather than:
−1 to 1This can sometimes affect optimization efficiency.
Python Implementation
Using NumPy:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
Example:
print(sigmoid(0))
Output:
0.5
Example:
print(sigmoid(5))
Output:
0.993
Visualizing the Sigmoid Function
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-10, 10, 100)
y = 1 / (1 + np.exp(-x))
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("Sigmoid(x)")
plt.show()
Real-World Applications
Logistic Regression
Converts model outputs into probabilities.
Medical Diagnosis
Disease probability estimation.
Credit Scoring
Probability of loan default.
Marketing
Customer churn prediction.
Fraud Detection
Probability of fraudulent activity.
Common Mistakes
Thinking Sigmoid Produces Classes
Incorrect.
Sigmoid produces probabilities.
Thresholds convert probabilities into classes.
Assuming Probability Equals Certainty
A probability of:
0.80means likely, not guaranteed.
Ignoring Threshold Selection
The default threshold is often 0.5, but different applications may require different thresholds.
Best Practices
- Interpret sigmoid outputs as probabilities
- Use suitable classification thresholds
- Combine with proper evaluation metrics
- Understand probability calibration
- Visualize probability distributions when possible
Sigmoid Function Workflow
A typical classification workflow is:
- Compute linear score
- Apply sigmoid function
- Obtain probability
- Apply threshold
- Predict class label
Why the Sigmoid Function is Important
The Sigmoid Function acts as the bridge between mathematical model outputs and real-world probabilities. Without it, Logistic Regression would produce unrestricted numerical values that cannot be interpreted as probabilities.
By compressing any real number into a range between 0 and 1, the Sigmoid Function enables probability-based decision making, making it one of the most important mathematical concepts in classification problems.
In the next article, we will explore Logistic Regression Intuition, where we will use the Sigmoid Function to understand how classification models learn decision boundaries and make predictions.