Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Inteligencia Artificial

The Sigmoid Function: A Key Tool in Neural Networks

The Sigmoid Function: A Key Tool in Neural Networks

More about this article

Quick summary
  • The sigmoid function (also called the logistic function) maps any real value to (0, 1), which makes it directly useful for producing probabilities from a neural network's output layer.
  • It has a derivative everywhere, so it works with backpropagation without special handling.
  • At very large or very small inputs it saturates: the gradient drops close to zero and stops reaching earlier layers.
  • So it stays in the output layer of binary classifiers, but nobody puts it in the hidden layers anymore if they can help it.
Key concepts
  • Definition and formula: f(x) = 1 / (1 + e⁻ˣ). Output always falls between 0 and 1, and the derivative is σ(x)(1 − σ(x)), computed from the already-evaluated output.
  • Implementation in neural networks: Each neuron computes a weighted sum z, then applies σ(z). During training, that same output feeds directly into the gradient calculation.
  • Advantages and disadvantages: The output is a directly interpretable probability and the gradient exists everywhere, but the function flattens at extreme values and always outputs positive numbers, which slows convergence in deep networks.
Useful links
Keep reading

Actualizado: 2026-05-16

The sigmoid function maps any real value to a number between 0 and 1, making it the natural tool for expressing probabilities within a neural network. Its characteristic S-shape has made it a cornerstone of binary classification for decades.

Key takeaways

  • The sigmoid squashes output into (0, 1), ideal for interpreting results as probabilities.
  • It is differentiable at every point, enabling use with backpropagation.
  • It suffers from saturation and vanishing gradients at extreme inputs.
  • It remains the standard function in the output layer for binary classification.
  • ReLU and tanh have replaced it in deep hidden layers.

Definition and formula

The sigmoid function — also called the logistic function — is defined as:

f(x) = 1 / (1 + e⁻ˣ)

where e is Euler’s constant (~2.718). Its fundamental properties:

  • As x → +∞, f(x) → 1.
  • As x → −∞, f(x) → 0.
  • At x = 0, f(0) = 0.5.
Sigmoid logistic curve showing the characteristic S-shape between 0 and 1

Sigmoid logistic curve showing the characteristic S-shape between 0 and 1

The smoothness of this curve is key: the sigmoid’s derivative exists at every point, allowing gradients to be computed and network weights adjusted during training.

Implementation in neural networks

In a neural network, the sigmoid is applied to each neuron’s weighted input sum:

  1. Compute z = w₁x₁ + w₂x₂ + … + b (weighted sum + bias).
  2. Apply sigmoid: a = σ(z).
  3. Output a feeds the next layer or is the final prediction.

During training, the sigmoid participates in gradient calculation. Its derivative is σ’(x) = σ(x)(1 − σ(x)), an elegant expression computed directly from the already-evaluated output, without re-evaluating the function.

Symbolic representation of a neuron with sigmoid activation function

Symbolic representation of a neuron with sigmoid activation function

Advantages and disadvantages

Advantages:

  • Output interpretable as a probability between 0 and 1.
  • Differentiable at every point: compatible with backpropagation.
  • Monotonically increasing: ordering relationships are preserved.

Disadvantages:

  • Saturation: for |x| > 5, the function flattens and the gradient approaches zero.
  • Vanishing gradient: in deep networks, gradients are multiplied layer by layer and die out before reaching early layers.
  • Non-zero-centred output: all outputs are positive, which can slow convergence.

These problems motivated the development of ReLU for hidden layers and the step function for conceptual analysis of binary activations.

Practical use cases

The sigmoid remains the preferred choice in specific scenarios:

  • Binary classification output layer: does a patient have elevated cardiac risk? Is an email spam?
  • Probability modelling: conversion prediction in marketing campaigns, credit scoring.
  • Gates in LSTM architectures: memory cells use sigmoid to control what information to retain or discard.
  • Logistic regression: the sigmoid is the mathematical core of one of the most widely used statistical models in industry.

An applied example: in image analysis systems, the output layer of a binary classifier (does this X-ray contain a lesion?) almost always uses sigmoid. For multi-class classification, the alternative is softmax.

Conclusion

The sigmoid function remains irreplaceable in the output layer of any binary classifier that needs to produce an interpretable probability. Its limitations in deep layers are real but well understood: using it where appropriate and delegating to ReLU or tanh where it doesn’t belong is the key to solid neural network design.

Was this useful?
[Total: 15 · Average: 4.2]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.