Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial

algebra-lineal deep-learning formulacion-matematica funciones-activacion redes-neuronales retropropagacion

Mathematical Formulation of Artificial Neural Network Input

March 24, 2023 9 min read 113 reads

Table of contents

Key takeaways
The mathematical representation of the input
The hidden layer: linear transformation followed by non-linearity
Activation functions: why they matter
The backpropagation algorithm
Conclusion

Actualizado: 2026-05-03

Beneath the surface of any neural network lies linear algebra: vectors, matrices, and functions. Understanding the mathematical formulation of the input is not an academic exercise — it is the foundation for debugging models, diagnosing gradient problems, and choosing activation functions with sound reasoning.

Key takeaways

Each input to a neural network is represented as a column vector x of n dimensions.
The hidden layer applies a linear transformation via a weight matrix W and bias vector b, followed by a non-linear activation function.
The activation function introduces non-linearity without which the network would be equivalent to a simple linear regression.
Training adjusts weights W and b by minimising a loss function via gradient descent with backpropagation.
Vanishing and exploding gradients are the main mathematical problem in deep networks.

The mathematical representation of the input

A data input sample is represented as a column vector of dimension n:

$$mathbf{x} = begin{pmatrix} x_1 \ x_2 \ vdots \ x_n end{pmatrix} in mathbb{R}^n$$

Where each xᵢ is a feature of the data point: a pixel in an image, an encoded word in text, a numerical value in a table. For a batch of B samples, the input is organised as a matrix X of dimension B × n, enabling parallel processing of multiple samples via efficient matrix operations on GPU.

The hidden layer: linear transformation followed by non-linearity

For a hidden layer with M neurons, the operation is:

z = Wx + b

h = f(z)

Where:

W is the weight matrix of dimension M × n. Row j of W contains the weights for neuron j.
b is the bias vector of dimension M, which allows shifting the activation independently of the input.
f is the activation function, applied element-wise.

The bias b is critical: without it, if all inputs are zero, the output would also be zero regardless of the weights, limiting the network’s expressive capacity.

Activation functions: why they matter

Without a non-linear activation function, the composition of linear layers remains linear. A 100-layer network without activations is mathematically equivalent to a single linear layer. The most commonly used activation functions are:

Sigmoid: f(z) = 1 / (1 + e⁻ᶻ) — output in (0,1), useful in the output layer for binary classification, but prone to vanishing gradients in deep networks.
ReLU (Rectified Linear Unit): f(z) = max(0, z) — computationally efficient and mitigates the vanishing gradient problem; the default function in hidden layers of deep networks.
Tanh: f(z) = (eᶻ – e⁻ᶻ) / (eᶻ + e⁻ᶻ) — zero-centred output, better behaviour than sigmoid in hidden layers.
Softmax: used in the output layer for multi-class classification — converts a vector of arbitrary values into a probability distribution. See Softmax function.
Linear function: f(z) = z — no transformation, used in the output layer for regression. See linear activation function.

Logistic curve (sigmoid function) showing the transformation of real values to probabilities between 0 and 1

The backpropagation algorithm

Training consists of adjusting W and b to minimise a loss function L (e.g., cross-entropy for classification, MSE for regression). The backpropagation algorithm computes the gradient of L with respect to each parameter using the chain rule:

$$frac{partial L}{partial mathbf{W}^{(l)}} = frac{partial L}{partial mathbf{h}^{(l)}} cdot frac{partial mathbf{h}^{(l)}}{partial mathbf{z}^{(l)}} cdot frac{partial mathbf{z}^{(l)}}{partial mathbf{W}^{(l)}}$$

The process, step by step:

Forward pass: compute the network’s prediction.
Loss calculation: compare prediction to real label.
Backward pass: compute gradients layer by layer from output to input.
Parameter update: with gradient descent, W ← W – η · ∂L/∂W, where η is the learning rate.

The vanishing gradient problem occurs when gradients become exponentially small as they propagate backwards through many layers — mainly with sigmoid and tanh. ReLU and its variants (Leaky ReLU, ELU) mitigate this problem. The opposite problem — exploding gradients — is addressed with gradient clipping.

For more architectural context, see neural networks and deep learning. For multi-class classification from the output layer perspective, see Softmax function. Practical use of these models in rapid benchmarks is covered in LazyPredict in Python.

Conclusion

The mathematical formulation of a neural network is elegant in its structure: linear algebra in each layer, non-linearity in each activation, and iterative optimisation in training. Understanding these fundamentals is not optional for anyone who wants to go beyond using libraries as black boxes. Diagnosing a model that does not converge, choosing the correct activation function, or designing the right architecture all depend directly on understanding the mathematics underneath.

Was this useful?

[Total: 11 · Average: 4.3]

Post Views: 113

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Mathematical Formulation of Artificial Neural Network Input

Key takeaways

The mathematical representation of the input

The hidden layer: linear transformation followed by non-linearity

Activation functions: why they matter

The backpropagation algorithm

Conclusion

Related posts

“EU AI Act 2026: a technical checklist for Spanish CTOs”

Agent observability with OpenTelemetry GenAI semconv in 2026

How to install and tune oMLX on M5 Max 128 GB

Multi-agent systems: LangGraph vs CrewAI vs Autogen in 2026