How Important is Linear Algebra for Deep Learning? (Part 3 of 3 )

5 min readSep 21, 2024

Algebra You didn’t know you needed!

No way!

Have you really checked out Part 1 and Part 2? If not, go ahead and take a look! 😐

This is Part 3 of a 3-part series on Linear Algebra, where I’ll dive into the key concepts you need to understand the principles behind deep learning.

Linear Algebra

Part 3:

In this section, we’ll cover:

Tensors
PCA (Principal Component Analysis)

1. Tensors

A tensor is a generalization of scalars, vectors, and matrices to higher dimensions.

For instance:

A scalar is a 0D tensor (just one number).
A vector is a 1D tensor (an ordered list of numbers).
A matrix is a 2D tensor (a grid of numbers with rows and columns).

From there, you can have 3D, 4D, or even higher-dimensional tensors. Each dimension is like an extra “direction” you can move in when navigating through the data.

Any input to a deep learning model (text, images, etc.) is first converted into tensors.

For example:

A sentence might be turned into a 2D tensor, where each row is a vector representing a word.
An image is usually a 3D tensor with dimensions representing height, width, and color channels.

So, if we have a 2D tensor, why not just call it a matrix since matrices are also 2D?

We could call it a matrix, but when using the word “Tensor” it works for any number of dimensions (1D, 2D, 3D, etc.). This makes things consistent in deep learning, where we often deal with more than 2D data. Also, tensors can hold different types of data, not just numbers in a grid like matrices do. It helps to use “tensors” for general use across many types of data.

2. Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a method that reduces the dimensionality of large datasets by extracting key components that retain important information while filtering out noise.

PCA is a specific application of Linear Transformations that focuses on simplifying data.

When we apply PCA, we’re effectively performing a linear transformation to reduce the dimensionality, which means we’re reshaping the data in a way that highlights its most important features.

⋙ But wait, if linear transformations are applied in neural network layers, why do we need PCA?

PCA is often used as a preprocessing step before feeding data into a neural network, reducing noise and computational complexity.

Let’s make it a bit simple …

It takes high dimensional data and projects it into a lower dimensional space while striving to preserve as much of the relevant information as possible.

This process is akin to finding the most informative “angles” from which to view the data.

Imagine … you’re photographing a 3D object like a statue

If you take pictures from every angle, some images will look very similar and won’t add much new information about the statue. You want to find the best angles that show its most important features.

PCA : I can do this :-)

It looks at your data, which might have many dimensions (like lots of angles), and finds the directions where the data varies the most, which represent the most important features of the data, discarding the less important ones.

How huh??

PCA starts by creating a covariance matrix from your data, which shows how features vary together. Helps understand their relationships.

2. Next, calculate the eigenvalues and eigenvectors of the covariance matrix Eigenvectors point in the directions of maximum variance in the data, while eigenvalues indicate the magnitude of that variance.

3. Then sort the eigenvalues in descending order and select the top eigenvectors (those with the largest eigenvalues) as your principal components.

4. Finally, projecting the original data onto these principal components results in a lower-dimensional representation that retains key features.

Applications of PCA in Deep Learning

Convolutional Neural Networks (CNNs): PCA can be used to reduce the dimensionality of image data before feeding it into a CNN, potentially improving training efficiency. Check this out Face Recognition using PCA
Autoencoders: PCA principles are often used in designing autoencoders, which are neural networks that learn efficient data codings.
Generative Models: In generative adversarial networks (GANs) and variational autoencoders (VAEs), PCA can help in understanding and manipulating the latent space.

Despite its power, PCA isn’t always the go-to method for dimension reduction

Because ….

As a form of linear transformation, it assumes linear relationships between variables, which may not always hold true in complex real-world datasets
The results of PCA can be significantly affected by the scale of input variables.
The resulting components can be difficult to interpret in terms of the original features
Although aims to prevent important information, some nuanced details might be lost in the process
For large datasets, it can be computationally expensive
Works best with continuous variables and may not be appropriate for categorical data.

That’s it!

I hope you find this little series on the importance of linear algebra for deep learning helpful.

If you have any questions or want me to cover another topic, feel free to let me know in the comments I'm excited to write about it! 🤓

Show your support by 👏

Follow me on Medium
Connect with me on LinkedIn