âšď¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 2.6 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175 |
| Last Crawled | 2026-01-18 21:13:54 (2 months ago) |
| First Indexed | 2024-02-12 10:35:22 (2 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Convolutional Neural Networks: A Comprehensive Guide | by Jorgecardete | The Deep Hub | Medium |
| Meta Description | Convolutional Neural Networks: A Comprehensive Guide Exploring the power of CNNs in image analysis Table of contents What are Convolutional Neural Networks? Convolutional ⌠|
| Meta Canonical | null |
| Boilerpipe Text | Exploring the power of CNNs in image analysis
14 min read
Feb 7, 2024
Press enter or click to view image in full size
Image created by the author with DALL-E 3
Table of contents
What are Convolutional Neural Networks?
Convolutional layers
Channels
Stride
Padding
Pooling Layers
Flattening layers
Activation functions in CNNs
C
onvolutional Neural Networks
, commonly referred to as
CNNs
are a specialized type of neural network designed to process and classify images.
If you are new to this field you might be thinking
how is it possible to classify an image?
WellâŚ
images are also numbers!
Digital images are essentially
grids of tiny units called pixels
. Each pixel represents the smallest unit of an image and holds information
about the color and intensity at that particular point
.
Pixel representation |
Source
Typically, each pixel is composed of three values corresponding to the
red, green, and blue (RGB)
color channels. These values determine the
color and intensity
of that pixel.
You can use the following
tool
to understand better
how the RGB vector is formed:
Geogebra RGB tool |
Source
In contrast, in a
grayscale image
, each pixel carries a single value that represents the intensity of light at that point.
Usually ranging
from black (0) to white (255)
.
Press enter or click to view image in full size
Grayscale image |
Source
How do CNNs work?
To understand how a CNN functions let´s recap some of the basic concepts about Neural Networks.
(If you are reading this post I am assuming that you are familiar with
basic neural networks
. If that´s not the case I strongly recommend you to read this
article
).
1.-
Neurons:
The most basic unit in a neural network. They are composed of a
sum of linear functions
and a
non-linear function
known as the
activation function
is applied to them.
Press enter or click to view image in full size
Neuron representation |
Source
2.-
Input layer:
Each neuron in the input layer corresponds to one of the input features.
For instance, in an image classification task where the input is a
28 x 28-pixel image
, the input layer would have
784 neurons
(one for each pixel).
3.-
Hidden Layer:
The layers between the input and the output layer. Each neuron in this layer is s
ummed
by the result of the neurons in the previous layers and multiplied by a
non-linear function
.
4.-
Output Layer:
The number of neurons in the output layer corresponds to the number of output classes (In case we are facing a
regression
problem the output layer will only have
one neuron
).
For example, in a classification task with digits
from 0 to 9
, the output layer would have
10 neurons
.
Nerual Network process | Source: 3Blue1Brown
Once a prediction is made, a
loss
is calculated and the network enters a
self-improvement iterative process
through which the weights are adjusted with
backpropagation
to reduce this error.
Now we are ready
to understand convolutional neural networks!
The first question we should ask ourselves:
What makes a CNN different from a basic neural network?
Convolutional layers
They are the fundamental building blocks of CNNs. These layers perform a critical mathematical operation known as
convolution
.
This process entails the application of
specialized filters known as kernels
, that traverse through the input image to learn complex visual patterns.
Kernels
They are essentially small matrices of numbers. These filters move across the image performing
element-wise multiplication
with the part of the image they cover, extracting features such as
edges, textures, and shapes
.
Kernel operation |
Source
In the figure above, visualize the input as an image transformed into pixels.
We multiply each term of the image by a 3
Ă
3 matrix (this shape can vary)
and pass it into an output matrix
.
There are various methods to decide the digits inside the kernel. This will depend on the effect you want to achieve such as detecting edges, blurring, sharpeningâŚ
But what are we doing exactly?
Let´s take a deeper look at it.
Convolution Operation
The convolution operation involves multiplying
the kernel value
s by the
original pixel values
of the image and then
summing up the results
.
This is a basic example with a 2
Ă
2 kernel:
We start in the left corner of the input:
(0 Ă 0) + (1 Ă 1) + (3 Ă 2) + (4 Ă 3) =
19
Then we slice one pixel to the right and perform the same operation:
(1 Ă 0) + (2 Ă 1) + (4 Ă 2) + (5 Ă 3 ) =
25
After we completed the first row we move one pixel down and start again from the left:
(3 Ă 0) + (4 Ă 1) + (6 Ă 2) + (7 Ă 3) =
37
Finally, we again slice one pixel to the right:
(4 Ă 0) + (5 Ă 1) + (7 Ă 2) + (8 Ă 3) =
43
The output matrix of this process is known as the
Feature map.
Perfect, now we understand how
this operation works!
ButâŚ
Why is it so useful? We are just multiplying and adding pixels, how can we extract image features doing this?
For now, I won´t be diving deeper into the convolution operation because I don´t consider it to be pivotal for understanding Conv. nets in the beginning.
However, if you are very curious I will leave you what I believe to be
the best public answer
to that question:
That´s it, you´ve understood the most fundamental concept behind CNNs,
Convolutional Layers
!
At this point, you may be having a bunch of doubts (at least I had them).
I mean, we understand
how a convolution works
, but:
Kernels always traverse through the image matrix
one pixel at a time
?
What happens with the
pixels in the corners
, we are only passing over them one time, what if they have an important feature?
And what about
RGB images
? We stated that they are represented in
3 dimensions
, how does the kernel traverse over them?
These are a lot of questions but don´t worry, all of them have an easy answer.
Weâll start by understanding
three essential components
inside convolutional layers:
Channels
Stride
Padding
1.- Channels
As I explained before, digital images are often composed of
three channels (RGB)
which are represented in three different matrices.
RGB decomposed image |
Source
For an RGB image, there are typically
separate kernels for each color channel
because different features might be more visible or relevant in one channel compared to the others.
Convolution operation in Red, Green, and Blue channels |
Source
Depth of the layer
The
âdepthâ
of a layer refers to the number of kernels it contains. Each filter produces a separate
feature map
, and the collection of these feature maps forms the
complete output of the layer
.
The output normally has multiple channels, where each channel is a feature map corresponding to a particular kernel.
In the case of RGB, we typically use
one channel
for each of the 3 matrices, but we can add as many as we want.
For example
, let´s say that you have a gray-scale image of a cat, you could create a channel specialized in detecting the ears and another in the mouth.
CNN representation |
Source
This image illustrates the concept quite well, think of each layer in the convolution as a feature map with a different kernel (don´t worry about the pooling part for now, we`ll break it down in a minute).
âŁď¸ BE CAREFUL
with misunderstanding the channels in the convolution layer with the color channels in the image. That was a representative example to understand the concept but
you can add as many channels as you want
.
Each channel will detect a
different feature
in the image based on the values you assign to its kernel.
2.- Stride
We have discussed that in a convolution a kernel moves through the pixels of an image, but we haven´t talked about the different ways in which it can do it.
Stride refers to
the number of pixels by which a kernel moves across the input image
.
The example we saw before had a stride of 1, but this can change.
Let´s see a visual representation:
Stride = 1
Hyperparameters of a Convolutional Layer |
Source
Stride = 2
Press enter or click to view image in full size
Hyperparameters of a Convolutional Layer |
Source
A stride of 2 not only changes the way the convolution iterates over the input size but also the output by making it smaller (2
Ă
2).
Taking this into account we can conclude that:
A
larger stride
will produce smaller output dimensions (as it covers the input image faster), whereas a
smaller stride
results in a larger output dimension.
But why would we want to change the stride?
Increasing
the stride will allow the filter to cover a
larger area of the input image
, which can be useful for capturing
more global features
.
In contrast,
lowering
the stride will capture
finer and more local details
.
In addition, increasing the stride will control
overfitting
and
reduce computational efficiency
as it will reduce the spatial dimensions of the feature map.
3.- Padding
Padding refers to the
addition of extra pixels around the edge
of the input image.
When you focus on the pixels in the imageâs edges, youâll notice that
we traverse them fewer times
compared to those
positioned in the center
.
The purpose of padding is to
adjust the spatial size
of the output of a convolutional operation and to
preserve spatial information at the borders
.
Get Jorgecardeteâs stories in your inbox
Join Medium for free to get updates from this writer.
Let´s see another example with the
CNN explainer
Padding = 0 (focus on the edges and count how many times the kernel is passing through them)
Hyperparameters of a Convolutional Layer |
Source
Padding = 1
Hyperparameters of a Convolutional Layer |
Source
Now we are passing more times through the pixels in the edges and getting more information about them.
In which cases do you want to apply padding?
Mainly when the edges of the image
contain useful information
that you want to capture.
You can increase the padding up to the kernel size you are using.
And how does it affect the output field?
Padding
increases the size of the output feature map
. If you increase the padding while keeping the kernel size and stride constant, the convolution operation has more âroomâ to take place,
resulting in a larger output
.
The output size of a convolutional layer can be calculated using the following formula:
Where
â2 Ă Paddingâ
accounts for padding applied to both the left and right sides (or top and bottom sides) of the input.
â+ 1â
accounts for the initial position of the filter, which starts at the beginning of the padded input.
âŁď¸ This is a visual explanation of Padding but at a practical level, it doesn´t have to be
always the same on all sides of the image
.
The padding dimensions can be
asymmetric
or even have a
custom padding
design.
If you have reached this point now you can officially say that you know how Convolutional Layers work!
Nevertheless,
this is not the end of the journeyâŚ
There is a common misconception among beginners that Conv. layers are Convolutional Neural Networks.
Well, convolutional layers are an essential component, but as its name indicates, they are a
LAYER
inside CNNs.
We have comprehended the most important part of CNNs, but there are still
two other special types of layers
that we have to understand:
Pooling Layers
Flattening Layers
Pooling Layers
Before explaining how these layers work
it´s crucial to have this clear
:
Although Convolutional Layers can decrease the output size, their principal objective is not
DIMENSIONALITY REDUCTION
.
The main objective of Convolutional Layers is
FEATURE EXTRACTION
.
In fact, in most cases we are
not reducing the dimensions
of our data because we are creating
new channels
that weren´t there before, so even if our feature map dimensions are smaller,
we have more of them
.
Convolutional neural network representation |
Source
Take a look at this example, here we might be reducing a bit our feature map in each Convolutional Layer but we are creating much more channels.
What about the subsampling layers?
Those are pooling layers and its main objective is indeed
dimensionality reduction!
How Pooling Layers Work
Imagine you have a large image and want to make it smaller but keep a
ll the important features
like edges and colors.
The pooling layer operates independently on every depth slice of the input. It resizes it spatially, using the
Max
or
Average
of the values in a window slid over the input data.
Press enter or click to view image in full size
Max
and
Avg
Pooling Layers |
Source
In this example, we have reduced the feature map from (4
Ă
4) to (2
Ă
2).
What is the difference between pooling and the convolution operation?
In
pooling
, we are not applying any kernel to the input data, we are just
simplifying the information
with a math operation (Max or Avg).
What about the channels, pooling also reduces the number of channels?
You must understand this:
Pooling layers
DO NOT REDUCE THE NUMBER OF CHANNELS
.
Each pooling operation
IS APPLIED INDEPENDENTLY TO EACH CHANNEL
of the input data.
Let´s see another example, channels can be a bit complex to visualize at first and I want to ensure that you understand correctly how they work.
Layers inside a CNN |
Source
This is a good representation, here you can see how each pooling layer is
reducing the dimensions of the spatial space
but it's not
reducing the number of channels
.
The number of channels is not reduced until the end of the architecture.
With Convolutional and Pooling layers
we CAN´T reduce the number of channels
, just add more to the existing ones.
So why and how do we combine all these channels?
After
convolutional
and
pooling
layers have
extracted relevant features
from the input image we have to turn this high-dimensional feature map into a format suitable for feeding into fully connected layers.
Here´s where
flattening layers come into action!
Flattening layers
Imagine you have a grid of data (like pixels in a feature map), and you want to line up all of these grid points in a single, long line.
Thatâs what flattening does. It takes the entire feature map and reorganizes it into a
single, long vector
.
Press enter or click to view image in full size
Flattening concept |
Source
âŁď¸ Although flattening changes the shape of the data, it does not make any changes to the actual information.
Why do we need flattening layers?
Integration of features
By flattening the feature maps into a vector, the network can integrate the spatially distributed features extracted for tasks such as classification.
Compatibility with Dense Layers
Fully connected layers (dense layers) are designed to operate on
1-dimensional data
, hence, flattening is a necessary step to transition from the multidimensional tensors produced by convolutional layers to the format required for dense layers.
This leads us to our next question:
Why do we need Dense Layers in CNNs?
While convolutional layers are good at
detecting features
in input data, dense layers are essential for
integrating these features into predictions
.
For example, if we design a convolutional neural network for
facial recognition
, early layers might detect
edges and textures
, while dense layers might interpret these to
recognize specific facial features
.
Without dense layers, CNNs would not be able to perform the
high-level tasks
that are often required, such as
classifying images
,
detecting objects
, or
making predictions
based on visual inputs.
CNN recap
Up to this point, we have revised the whole CNN structure:
Convolutional Layers
Pooling layers
Flattening layers
Dense layers
With the fundamental concepts of
channels, stride
, and
pooling
.
We could say that we have joined all the pieces of the puzzle!
Or maybe notâŚ
what about activation functions and backpropagation?
Backpropagation
functions similarly in feed-forward neural networks but with some special adjustments. I won´t focus much on its technical details.
You can check out this very interesting article to know more about it!
If you know nothing about Backpropagation you can start by taking a look at my publication:
However, I will certainly take a look at
activation functions
.
Activation functions in Convolutional Neural Networks
As you may know activation functions are indispensable, otherwise, we would be creating a very large linear model.
As in simple neural networks, we also need these
non-linear terms
in ConvNets. However,
not all the layers we have seen have an activation function.
Let's use an image as a reference to visualize this. Now you should understand the representation without any problem! Just one little thingâŚ
The first two
pooling layers
are not shown in this diagram, this is another way of visualizing CNNs, it doesn´t mean that they are not there, just imagine a
filter between each layer that makes them smaller
.
Press enter or click to view image in full size
Complete CNN representation |
Source
In the
feature extraction
part, the activations will be in the
convolutional layers
. The process is quite straightforward, after each convolution operation you multiply the result by an activation function.
Press enter or click to view image in full size
Convolutional layer structure |
Source
The
pooling
and
flattening
layers
DON´T have an activation function
.
As we explained before the main function of pooling layers is
dimensionality reduction
and the main purpose of flattening layers is
restructuring the data into a 1D vector
.
We
don´t need to include non-linearities
for doing that. Nevertheless, we do need activation functions for extracting
complex features
(we won´t be able to capture relevant characteristics of an image with only a linear function).
In the
classification part
, all the fully connected layers and the output layer will have an activation function, as in simple neural nets.
Here we also need an activation function because we are using the features extracted to make a classification or a prediction, and the algorithm has to
learn complex interactions
(as a simple neural network would do).
Thanks for reading! If you like the article make sure to clap (up to 50!) and follow me on
Medium
to stay updated with my new articles.
Also, make sure to follow
my
new publication! |
| Markdown | [Sitemap](https://medium.com/sitemap/sitemap.xml)
[Open in app](https://play.google.com/store/apps/details?id=com.medium.reader&referrer=utm_source%3DmobileNavBar&source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2Fthedeephub%2Fconvolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)
[Medium Logo](https://medium.com/?source=post_page---top_nav_layout_nav-----------------------------------------)
[Write](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fnew-story&source=---top_nav_layout_nav-----------------------new_post_topnav------------------)
[Search](https://medium.com/search?source=post_page---top_nav_layout_nav-----------------------------------------)
Sign up
[Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2Fthedeephub%2Fconvolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175&source=post_page---top_nav_layout_nav-----------------------global_nav------------------)

[Mastodon](https://me.dm/@thedeephub)
[The Deep Hub](https://medium.com/thedeephub?source=post_page---publication_nav-d0b4131403b5-5cc0b5eae175---------------------------------------)
¡
Follow publication
[](https://medium.com/thedeephub?source=post_page---post_publication_sidebar-d0b4131403b5-5cc0b5eae175---------------------------------------)
Your data science hub. A Medium publication dedicated to exchanging ideas and empowering your knowledge.
Follow publication
1
1
Top highlight
2
1
1
1
1
1
# Convolutional Neural Networks: A Comprehensive Guide
## Exploring the power of CNNs in image analysis
[](https://medium.com/@jorgecardete?source=post_page---byline--5cc0b5eae175---------------------------------------)
[Jorgecardete](https://medium.com/@jorgecardete?source=post_page---byline--5cc0b5eae175---------------------------------------)
Follow
14 min read
¡
Feb 7, 2024
3\.3K
45
[Listen](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2Fplans%3Fdimension%3Dpost_audio_button%26postId%3D5cc0b5eae175&operation=register&redirect=https%3A%2F%2Fmedium.com%2Fthedeephub%2Fconvolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175&source=---header_actions--5cc0b5eae175---------------------post_audio_button------------------)
Share
Press enter or click to view image in full size

Image created by the author with DALL-E 3
### **Table of contents**
1. [What are Convolutional Neural Networks?](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#6caf)
2. [Convolutional layers](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#b6fa)
3. [Channels](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#316d)
4. [Stride](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#8a64)
5. [Padding](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#d74c)
6. [Pooling Layers](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#83ba)
7. [Flattening layers](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#7285)
8. [Activation functions in CNNs](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#fc3b)
C**onvolutional Neural Networks**, commonly referred to as **CNNs** are a specialized type of neural network designed to process and classify images.
If you are new to this field you might be thinking **how is it possible to classify an image?**
Well⌠**images are also numbers\!**
Digital images are essentially **grids of tiny units called pixels**. Each pixel represents the smallest unit of an image and holds information **about the color and intensity at that particular point**.

Pixel representation \| [Source](https://medium.com/r?url=https%3A%2F%2Fwww.javatpoint.com%2Fconcept-of-pixel)
Typically, each pixel is composed of three values corresponding to the **red, green, and blue (RGB)** color channels. These values determine the **color and intensity** of that pixel.
> You can use the following [tool](https://www.geogebra.org/m/Dq2A7aRv) to understand better **how the RGB vector is formed:**

Geogebra RGB tool \| [Source](https://medium.com/r?url=https%3A%2F%2Fwww.geogebra.org%2Fm%2FDq2A7aRv)
In contrast, in a **grayscale image**, each pixel carries a single value that represents the intensity of light at that point.
> Usually ranging **from black (0) to white (255)**.
Press enter or click to view image in full size

Grayscale image \| [Source](https://medium.com/r?url=https%3A%2F%2Fwww.nzfaruqui.com%2Ftag%2Faccessing-the-pixel-value-grayscale-image%2F)
### How do CNNs work?
To understand how a CNN functions let´s recap some of the basic concepts about Neural Networks.
> (If you are reading this post I am assuming that you are familiar with **basic neural networks**. If that´s not the case I strongly recommend you to read this [article](https://towardsdatascience.com/understanding-neural-networks-19020b758230)).
1\.- **Neurons:** The most basic unit in a neural network. They are composed of a **sum of linear functions** and a **non-linear function** known as the **activation function** is applied to them.
Press enter or click to view image in full size

Neuron representation \| [Source](https://thedatafrog.com/en/articles/logistic-regression/)
2\.- **Input layer:** Each neuron in the input layer corresponds to one of the input features.
> For instance, in an image classification task where the input is a **28 x 28-pixel image**, the input layer would have **784 neurons** (one for each pixel).
3\.- **Hidden Layer:** The layers between the input and the output layer. Each neuron in this layer is s**ummed** by the result of the neurons in the previous layers and multiplied by a **non-linear function**.
4\.- **Output Layer:** The number of neurons in the output layer corresponds to the number of output classes (In case we are facing a **regression** problem the output layer will only have **one neuron**).
> For example, in a classification task with digits **from 0 to 9**, the output layer would have **10 neurons**.

Nerual Network process \| Source: 3Blue1Brown
Once a prediction is made, a **loss** is calculated and the network enters a **self-improvement iterative process** through which the weights are adjusted with [**backpropagation**](https://medium.com/towards-artificial-intelligence/backpropagation-2eeb25201095)to reduce this error.
Now we are ready **to understand convolutional neural networks\!**
The first question we should ask ourselves:
- What makes a CNN different from a basic neural network?
### Convolutional layers
They are the fundamental building blocks of CNNs. These layers perform a critical mathematical operation known as **convolution**.
This process entails the application of **specialized filters known as kernels**, that traverse through the input image to learn complex visual patterns.
**Kernels** They are essentially small matrices of numbers. These filters move across the image performing **element-wise multiplication** with the part of the image they cover, extracting features such as **edges, textures, and shapes**.

Kernel operation \| [Source](https://medium.com/r?url=https%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile%3A2D_Convolution_Animation.gif)
In the figure above, visualize the input as an image transformed into pixels.
We multiply each term of the image by a 3 *Ă* 3 matrix (this shape can vary) **and pass it into an output matrix**.
There are various methods to decide the digits inside the kernel. This will depend on the effect you want to achieve such as detecting edges, blurring, sharpeningâŚ
> But what are we doing exactly?
Let´s take a deeper look at it.
**Convolution Operation**The convolution operation involves multiplying**the kernel value**s by the**original pixel values**of the image and then**summing up the results**.
This is a basic example with a 2 *Ă* 2 kernel:

We start in the left corner of the input:
- *(0 Ă 0) + (1 Ă 1) + (3 Ă 2) + (4 Ă 3) =*
***19***
Then we slice one pixel to the right and perform the same operation:
- *(1 Ă 0) + (2 Ă 1) + (4 Ă 2) + (5 Ă 3 ) =* ***25***
After we completed the first row we move one pixel down and start again from the left:
- *(3 Ă 0) + (4 Ă 1) + (6 Ă 2) + (7 Ă 3) =* ***37***
Finally, we again slice one pixel to the right:
- *(4 Ă 0) + (5 Ă 1) + (7 Ă 2) + (8 Ă 3) =* ***43***
The output matrix of this process is known as the**Feature map.**
Perfect, now we understand how **this operation works\!** ButâŚ
> Why is it so useful? We are just multiplying and adding pixels, how can we extract image features doing this?
For now, I won´t be diving deeper into the convolution operation because I don´t consider it to be pivotal for understanding Conv. nets in the beginning.
However, if you are very curious I will leave you what I believe to be **the best public answer** to that question:
That´s it, you´ve understood the most fundamental concept behind CNNs,**Convolutional Layers**\!
At this point, you may be having a bunch of doubts (at least I had them).
I mean, we understand **how a convolution works**, but:
- Kernels always traverse through the image matrix **one pixel at a time**?
- What happens with the
**pixels in the corners**
, we are only passing over them one time, what if they have an important feature?
- And what about
**RGB images**
? We stated that they are represented in
**3 dimensions**
, how does the kernel traverse over them?
These are a lot of questions but don´t worry, all of them have an easy answer.
Weâll start by understanding **three essential components** inside convolutional layers:
1. [*Channels*](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#316d)
2. [*Stride*](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#8a64)
3. [*Padding*](https://medium.com/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175#d74c)
### 1\.- Channels
As I explained before, digital images are often composed of **three channels (RGB)** which are represented in three different matrices.

RGB decomposed image \| [Source](https://medium.com/r?url=https%3A%2F%2Fwww.geeksforgeeks.org%2Fmatlab-rgb-image-representation%2F)
For an RGB image, there are typically **separate kernels for each color channel** because different features might be more visible or relevant in one channel compared to the others.

Convolution operation in Red, Green, and Blue channels \| [Source](https://medium.com/r?url=https%3A%2F%2Ftowardsdatascience.com%2Fintuitively-understanding-convolutions-for-deep-learning-1f6f42faee1)
- **Depth of the layer**
The**âdepthâ**of a layer refers to the number of kernels it contains. Each filter produces a separate**feature map**, and the collection of these feature maps forms the***complete output of the layer***.
> The output normally has multiple channels, where each channel is a feature map corresponding to a particular kernel.
In the case of RGB, we typically use **one channel** for each of the 3 matrices, but we can add as many as we want.
> **For example**, let´s say that you have a gray-scale image of a cat, you could create a channel specialized in detecting the ears and another in the mouth.

CNN representation \| [Source](https://medium.com/r?url=https%3A%2F%2Fwww.topcoder.com%2Fthrive%2Farticles%2Foverview-of-convolutional-neural-networks%3Futm_source%3Dthrive%26utm_campaign%3Dthrive-feed%26utm_medium%3Drss-feed)
This image illustrates the concept quite well, think of each layer in the convolution as a feature map with a different kernel (don´t worry about the pooling part for now, we\`ll break it down in a minute).
> **âŁď¸ BE CAREFUL** with misunderstanding the channels in the convolution layer with the color channels in the image. That was a representative example to understand the concept but **you can add as many channels as you want**.
>
> Each channel will detect a **different feature** in the image based on the values you assign to its kernel.
### 2\.- Stride
We have discussed that in a convolution a kernel moves through the pixels of an image, but we haven´t talked about the different ways in which it can do it.
Stride refers to **the number of pixels by which a kernel moves across the input image**.
The example we saw before had a stride of 1, but this can change.
Let´s see a visual representation:
- Stride = 1

Hyperparameters of a Convolutional Layer \| [Source](https://poloclub.github.io/cnn-explainer/)
- Stride = 2
Press enter or click to view image in full size

Hyperparameters of a Convolutional Layer \| [Source](https://poloclub.github.io/cnn-explainer/)
A stride of 2 not only changes the way the convolution iterates over the input size but also the output by making it smaller (2 *Ă* 2).
Taking this into account we can conclude that:
> A **larger stride** will produce smaller output dimensions (as it covers the input image faster), whereas a **smaller stride** results in a larger output dimension.
> But why would we want to change the stride?
**Increasing** the stride will allow the filter to cover a **larger area of the input image**, which can be useful for capturing **more global features**.
In contrast, **lowering** the stride will capture **finer and more local details**.
In addition, increasing the stride will control**overfitting**and**reduce computational efficiency**as it will reduce the spatial dimensions of the feature map.
### 3\.- Padding
Padding refers to the **addition of extra pixels around the edge** of the input image.
When you focus on the pixels in the imageâs edges, youâll notice that **we traverse them fewer times** compared to those **positioned in the center**.
The purpose of padding is to **adjust the spatial size** of the output of a convolutional operation and to **preserve spatial information at the borders**.
## Get Jorgecardeteâs stories in your inbox
Join Medium for free to get updates from this writer.
Subscribe
Subscribe
Let´s see another example with the [CNN explainer](https://poloclub.github.io/cnn-explainer/)
- Padding = 0 (focus on the edges and count how many times the kernel is passing through them)

Hyperparameters of a Convolutional Layer \| [Source](https://poloclub.github.io/cnn-explainer/)
- Padding = 1

Hyperparameters of a Convolutional Layer \| [Source](https://poloclub.github.io/cnn-explainer/)
Now we are passing more times through the pixels in the edges and getting more information about them.
> In which cases do you want to apply padding?
Mainly when the edges of the image **contain useful information** that you want to capture.
> You can increase the padding up to the kernel size you are using.
> And how does it affect the output field?
Padding **increases the size of the output feature map**. If you increase the padding while keeping the kernel size and stride constant, the convolution operation has more âroomâ to take place, **resulting in a larger output**.
The output size of a convolutional layer can be calculated using the following formula:

Where
- **â2 Ă Paddingâ** accounts for padding applied to both the left and right sides (or top and bottom sides) of the input.
- **â+ 1â** accounts for the initial position of the filter, which starts at the beginning of the padded input.
> âŁď¸ This is a visual explanation of Padding but at a practical level, it doesn´t have to be **always the same on all sides of the image**.
>
> The padding dimensions can be **asymmetric** or even have a **custom padding** design.
If you have reached this point now you can officially say that you know how Convolutional Layers work\!
Nevertheless, **this is not the end of the journeyâŚ**
There is a common misconception among beginners that Conv. layers are Convolutional Neural Networks.
Well, convolutional layers are an essential component, but as its name indicates, they are a **LAYER** inside CNNs.
We have comprehended the most important part of CNNs, but there are still **two other special types of layers** that we have to understand:
- Pooling Layers
- Flattening Layers
### Pooling Layers
Before explaining how these layers work **it´s crucial to have this clear**:
> Although Convolutional Layers can decrease the output size, their principal objective is not **DIMENSIONALITY REDUCTION**.
>
> The main objective of Convolutional Layers is**FEATURE EXTRACTION**.
In fact, in most cases we are **not reducing the dimensions** of our data because we are creating **new channels** that weren´t there before, so even if our feature map dimensions are smaller, **we have more of them**.

Convolutional neural network representation \| [Source](https://en.wikipedia.org/wiki/Convolutional_neural_network)
Take a look at this example, here we might be reducing a bit our feature map in each Convolutional Layer but we are creating much more channels.
> What about the subsampling layers?
Those are pooling layers and its main objective is indeed **dimensionality reduction\!**
### **How Pooling Layers Work**
Imagine you have a large image and want to make it smaller but keep a**ll the important features** like edges and colors.
The pooling layer operates independently on every depth slice of the input. It resizes it spatially, using the **Max** or **Average** of the values in a window slid over the input data.
Press enter or click to view image in full size

**Max** and **Avg** Pooling Layers \| [Source](https://medium.com/r?url=https%3A%2F%2Fpub.towardsai.net%2Fintroduction-to-pooling-layers-in-cnn-dafe61eabe34)
In this example, we have reduced the feature map from (4 *Ă* 4\) to (2 *Ă* 2\).
> What is the difference between pooling and the convolution operation?
In **pooling**, we are not applying any kernel to the input data, we are just **simplifying the information** with a math operation (Max or Avg).
> What about the channels, pooling also reduces the number of channels?
You must understand this:
> Pooling layers **DO NOT REDUCE THE NUMBER OF CHANNELS**.
>
> Each pooling operation **IS APPLIED INDEPENDENTLY TO EACH CHANNEL** of the input data.
Let´s see another example, channels can be a bit complex to visualize at first and I want to ensure that you understand correctly how they work.

Layers inside a CNN \| [Source](https://jacobheyman702.medium.com/different-pooling-layers-for-cnn-4652a5103d62%C3%A7)
This is a good representation, here you can see how each pooling layer is **reducing the dimensions of the spatial space** but it's not **reducing the number of channels**.
The number of channels is not reduced until the end of the architecture.
> With Convolutional and Pooling layers **we CAN´T reduce the number of channels**, just add more to the existing ones.
> So why and how do we combine all these channels?
After **convolutional** and **pooling** layers have **extracted relevant features** from the input image we have to turn this high-dimensional feature map into a format suitable for feeding into fully connected layers.
Here´s where **flattening layers come into action\!**
### Flattening layers
Imagine you have a grid of data (like pixels in a feature map), and you want to line up all of these grid points in a single, long line.
Thatâs what flattening does. It takes the entire feature map and reorganizes it into a **single, long vector**.
Press enter or click to view image in full size

Flattening concept \| [Source](https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-step-3-flattening)
> âŁď¸ Although flattening changes the shape of the data, it does not make any changes to the actual information.
- Why do we need flattening layers?
**Integration of features** By flattening the feature maps into a vector, the network can integrate the spatially distributed features extracted for tasks such as classification.
**Compatibility with Dense Layers** Fully connected layers (dense layers) are designed to operate on **1-dimensional data**, hence, flattening is a necessary step to transition from the multidimensional tensors produced by convolutional layers to the format required for dense layers.
This leads us to our next question:
> Why do we need Dense Layers in CNNs?
While convolutional layers are good at **detecting features** in input data, dense layers are essential for **integrating these features into predictions**.
For example, if we design a convolutional neural network for **facial recognition**, early layers might detect **edges and textures**, while dense layers might interpret these to **recognize specific facial features**.
> Without dense layers, CNNs would not be able to perform the **high-level tasks** that are often required, such as **classifying images**, **detecting objects**, or **making predictions** based on visual inputs.
### CNN recap
Up to this point, we have revised the whole CNN structure:
- Convolutional Layers
- Pooling layers
- Flattening layers
- Dense layers
With the fundamental concepts of **channels, stride**, and **pooling**.
We could say that we have joined all the pieces of the puzzle\!
Or maybe not⌠**what about activation functions and backpropagation?**
**Backpropagation** functions similarly in feed-forward neural networks but with some special adjustments. I won´t focus much on its technical details.
> You can check out this very interesting article to know more about it\!
[Convolutions and Backpropagations Ever since AlexNet won the ImageNet competition in 2012, Convolutional Neural Networks (CNNs) have become ubiquitous⌠pavisj.medium.com](https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c?source=post_page-----5cc0b5eae175---------------------------------------)
> If you know nothing about Backpropagation you can start by taking a look at my publication:
[Backpropagation From Mystery to Mastery: Decoding the engine behind Neural Networks. medium.com](https://medium.com/@jorgecardete/backpropagation-2eeb25201095?source=post_page-----5cc0b5eae175---------------------------------------)
However, I will certainly take a look at **activation functions**.
### Activation functions in Convolutional Neural Networks
As you may know activation functions are indispensable, otherwise, we would be creating a very large linear model.
As in simple neural networks, we also need these **non-linear terms** in ConvNets. However, **not all the layers we have seen have an activation function.**
Let's use an image as a reference to visualize this. Now you should understand the representation without any problem! Just one little thingâŚ
> The first two **pooling layers** are not shown in this diagram, this is another way of visualizing CNNs, it doesn´t mean that they are not there, just imagine a **filter between each layer that makes them smaller**.
Press enter or click to view image in full size

Complete CNN representation \| [Source](https://developersbreach.com/convolution-neural-network-deep-learning/)
In the **feature extraction** part, the activations will be in the **convolutional layers**. The process is quite straightforward, after each convolution operation you multiply the result by an activation function.
Press enter or click to view image in full size

Convolutional layer structure \| [Source](https://learnopencv.com/understanding-convolutional-neural-networks-cnn/)
The **pooling** and **flattening** layers **DON´T have an activation function**.
As we explained before the main function of pooling layers is **dimensionality reduction** and the main purpose of flattening layers is **restructuring the data into a 1D vector**.
We **don´t need to include non-linearities** for doing that. Nevertheless, we do need activation functions for extracting **complex features** (we won´t be able to capture relevant characteristics of an image with only a linear function).
In the **classification part**, all the fully connected layers and the output layer will have an activation function, as in simple neural nets.
Here we also need an activation function because we are using the features extracted to make a classification or a prediction, and the algorithm has to **learn complex interactions** (as a simple neural network would do).
### **Activations â Convolutional and dense layers**
**ReLU:** is the most common activation function. It outputs the input directly **if it is positive**, otherwise, it **outputs zero**. It has the benefit of **reducing training time** and mitigating the **vanishing gradient problem**.
**Leaky ReLU:** A variation of ReLU that allows a **small, non-zero gradient** when the unit is inactive, which can help **prevent dead neurons** during training.
[Activation functions: ReLU vs. Leaky ReLU Itâs never too late to board the âLearning and discussing the insightsâ train, and here are my two cents on my recent⌠medium.com](https://medium.com/mlearning-ai/activation-functions-relu-vs-leaky-relu-b8272dc0b1be?source=post_page-----5cc0b5eae175---------------------------------------)
### **Activations â Output layer**
**Sigmoid:** Produces an output in the **range (0, 1)**. Itâs **not commonly used in hidden layers anymore** due to the vanishing gradient problem, but itâs still used for **binary classification** in the output layer.
**Tanh (Hyperbolic Tangent):** Output values in the **range (-1, 1)**. It is similar to the sigmoid but can provide better training performance for some problems due to its output range.
[Activation Functions in Neural Networks Sigmoid, tanh, Softmax, ReLU, Leaky ReLU EXPLAINED !!! towardsdatascience.com](https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6?source=post_page-----5cc0b5eae175---------------------------------------)
### Bibliography
1. [***Polo Club of Data Science****. (2020). CNN Explainer.*](https://poloclub.github.io/cnn-explainer/)
2. [***IBM****. (2020). Convolutional Neural Networks.*](https://www.ibm.com/topics/convolutional-neural-networks)
3. [***Saha, S.*** *(2018). A Comprehensive Guide to Convolutional Neural Networks â the ELI5 way. Towards Data Science.*](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)
4. [***CireČan, D. C.*** *(2016). Convolutional Neural Networks for Visual Recognition. Springer International Publishing.*](https://doi.org/10.1007/978-3-319-57550-6)
5. [***DeepLearning.TV.*** *(2019). Convolutional Neural Networks (CNNs) explained. \[Video\]. YouTube.*](https://www.youtube.com/watch?v=KuXjwB4LzSA&t=22s)
*Thanks for reading! If you like the article make sure to clap (up to 50!) and follow me on* [*Medium*](https://medium.com/@jorgecardete) *to stay updated with my new articles.*
Also, make sure to follow **my** **new publication\!**
[The Deep Hub Your data science hub. A Medium publication dedicated to exchanging ideas and empowering your knowledge. medium.com](https://medium.com/thedeephub?source=post_page-----5cc0b5eae175---------------------------------------)
[AI](https://medium.com/tag/ai?source=post_page-----5cc0b5eae175---------------------------------------)
[Machine Learning](https://medium.com/tag/machine-learning?source=post_page-----5cc0b5eae175---------------------------------------)
[Data Science](https://medium.com/tag/data-science?source=post_page-----5cc0b5eae175---------------------------------------)
[Deep Learning](https://medium.com/tag/deep-learning?source=post_page-----5cc0b5eae175---------------------------------------)
[Cnn](https://medium.com/tag/cnn?source=post_page-----5cc0b5eae175---------------------------------------)
3\.3K
3\.3K
45
[](https://medium.com/thedeephub?source=post_page---post_publication_info--5cc0b5eae175---------------------------------------)
[](https://medium.com/thedeephub?source=post_page---post_publication_info--5cc0b5eae175---------------------------------------)
Follow
[Published in The Deep Hub](https://medium.com/thedeephub?source=post_page---post_publication_info--5cc0b5eae175---------------------------------------)
[2\.1K followers](https://medium.com/thedeephub/followers?source=post_page---post_publication_info--5cc0b5eae175---------------------------------------)
¡[Last published Dec 11, 2025](https://medium.com/thedeephub/the-mandate-of-measurement-agentic-observability-as-the-key-to-llm-optimization-206c6b8c3ba3?source=post_page---post_publication_info--5cc0b5eae175---------------------------------------)
Your data science hub. A Medium publication dedicated to exchanging ideas and empowering your knowledge.
Follow
[](https://medium.com/@jorgecardete?source=post_page---post_author_info--5cc0b5eae175---------------------------------------)
[](https://medium.com/@jorgecardete?source=post_page---post_author_info--5cc0b5eae175---------------------------------------)
Follow
[Written by Jorgecardete](https://medium.com/@jorgecardete?source=post_page---post_author_info--5cc0b5eae175---------------------------------------)
[2\.4K followers](https://medium.com/@jorgecardete/followers?source=post_page---post_author_info--5cc0b5eae175---------------------------------------)
¡[4K following](https://medium.com/@jorgecardete/following?source=post_page---post_author_info--5cc0b5eae175---------------------------------------)
AI enthusiast - I write as I learn đ <https://medium.com/thedeephub>
Follow
## Responses (45)

Write a response
[What are your thoughts?](https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Fmedium.com%2Fthedeephub%2Fconvolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175&source=---post_responses--5cc0b5eae175---------------------respond_sidebar------------------)
Cancel
Respond
[](https://medium.com/@Digital_Solution?source=post_page---post_responses--5cc0b5eae175----0-----------------------------------)
[@Digital\_Solution](https://medium.com/@Digital_Solution?source=post_page---post_responses--5cc0b5eae175----0-----------------------------------)
[Feb 9, 2024](https://medium.com/@Digital_Solution/very-detailed-and-comprehensive-article-thanks-for-sharing-7e85bb6df1e9?source=post_page---post_responses--5cc0b5eae175----0-----------------------------------)
Thanks for reading! If you like the article make sure to clap (up to 50!) and follow me on Medium to stay updated with my new articles.
```
very detailed and comprehensive article. Thanks for sharing.
```
61
Reply
[](https://medium.com/@michael.ehrig?source=post_page---post_responses--5cc0b5eae175----1-----------------------------------)
[Michael Ehrig](https://medium.com/@michael.ehrig?source=post_page---post_responses--5cc0b5eae175----1-----------------------------------)
[Feb 10, 2024](https://medium.com/@michael.ehrig/they-are-not-multiplied-by-the-non-linear-function-117a6e21d8f7?source=post_page---post_responses--5cc0b5eae175----1-----------------------------------)
sum of linear functions multiplied by a non-linear function
```
They are not multiplied by the non-linear function. The non-linear function is applied to the sum of linear functions
```
11
1 reply
Reply
[](https://medium.com/@shb8086?source=post_page---post_responses--5cc0b5eae175----2-----------------------------------)
[Shima she/her](https://medium.com/@shb8086?source=post_page---post_responses--5cc0b5eae175----2-----------------------------------)
[Feb 9, 2024](https://medium.com/@shb8086/your-explanation-about-cnns-is-easy-to-understand-and-informative-28d4395c3cf3?source=post_page---post_responses--5cc0b5eae175----2-----------------------------------)
```
Your explanation about CNNs is easy to understand and informative. Adding real-life examples and details about how CNNs are trained could improve it.
```
19
Reply
See all responses
## More from Jorgecardete and The Deep Hub

[](https://medium.com/towards-artificial-intelligence?source=post_page---author_recirc--5cc0b5eae175----0---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
In
[Towards AI](https://medium.com/towards-artificial-intelligence?source=post_page---author_recirc--5cc0b5eae175----0---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
by
[Jorgecardete](https://medium.com/@jorgecardete?source=post_page---author_recirc--5cc0b5eae175----0---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
[BackpropagationFrom mystery to mastery: Decoding the engine behind Neural Networks.](https://medium.com/towards-artificial-intelligence/backpropagation-2eeb25201095?source=post_page---author_recirc--5cc0b5eae175----0---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
Nov 1, 2023
[A clap icon 1.2KA response icon 8](https://medium.com/towards-artificial-intelligence/backpropagation-2eeb25201095?source=post_page---author_recirc--5cc0b5eae175----0---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)

[](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175----1---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
In
[The Deep Hub](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175----1---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
by
[Nikhil Chowdary Paleti](https://medium.com/@nikhil2362?source=post_page---author_recirc--5cc0b5eae175----1---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
[Positional Encoding Explained: A Deep Dive into Transformer PEPositional encoding is a crucial component of transformer models, yet itâs often overlooked and not given the attention it deserves. ManyâŚ](https://medium.com/thedeephub/positional-encoding-explained-a-deep-dive-into-transformer-pe-65cfe8cfe10b?source=post_page---author_recirc--5cc0b5eae175----1---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
Jul 5, 2024
[A clap icon 115A response icon 4](https://medium.com/thedeephub/positional-encoding-explained-a-deep-dive-into-transformer-pe-65cfe8cfe10b?source=post_page---author_recirc--5cc0b5eae175----1---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)

[](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175----2---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
In
[The Deep Hub](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175----2---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
by
[Tayyib Ul Hassan Gondal](https://medium.com/@tayyibgondal2003?source=post_page---author_recirc--5cc0b5eae175----2---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
[All you need to know about Tokenization in LLMsIn this blog, Iâll explain everything about tokenization, which is an important step before pre-training a large language model (LLM). ByâŚ](https://medium.com/thedeephub/all-you-need-to-know-about-tokenization-in-llms-7a801302cf54?source=post_page---author_recirc--5cc0b5eae175----2---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
Jul 4, 2024
[A clap icon 154A response icon 2](https://medium.com/thedeephub/all-you-need-to-know-about-tokenization-in-llms-7a801302cf54?source=post_page---author_recirc--5cc0b5eae175----2---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)

[](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175----3---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
In
[The Deep Hub](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175----3---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
by
[Jorgecardete](https://medium.com/@jorgecardete?source=post_page---author_recirc--5cc0b5eae175----3---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
[The Art and Science of InterpolationExploring the pillars of image processing](https://medium.com/thedeephub/the-art-and-science-of-interpolation-b12b99f2e053?source=post_page---author_recirc--5cc0b5eae175----3---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
Feb 8, 2024
[A clap icon 708A response icon 4](https://medium.com/thedeephub/the-art-and-science-of-interpolation-b12b99f2e053?source=post_page---author_recirc--5cc0b5eae175----3---------------------5191bc43_49e7_4e9a_8fd7_b819e692fd5d--------------)
[See all from Jorgecardete](https://medium.com/@jorgecardete?source=post_page---author_recirc--5cc0b5eae175---------------------------------------)
[See all from The Deep Hub](https://medium.com/thedeephub?source=post_page---author_recirc--5cc0b5eae175---------------------------------------)
## Recommended from Medium

[](https://medium.com/@naveenpandey2706?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[Naveen Pandey](https://medium.com/@naveenpandey2706?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[Vision Transformers Explained (Part 2): How Patches Become TokensIn Part 1, we learned how Vision Transformers (ViTs) treat images as sequences of patches, similar to how transformers process words in aâŚ](https://medium.com/@naveenpandey2706/vision-transformers-explained-part-2-how-patches-become-tokens-b5f02cc3578c?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
Nov 16, 2025
[A clap icon 11](https://medium.com/@naveenpandey2706/vision-transformers-explained-part-2-how-patches-become-tokens-b5f02cc3578c?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)

[](https://medium.com/@dhanakala2403?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[Dhana](https://medium.com/@dhanakala2403?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[CNN architectuređ˘ complexity explainedđĄđĄđĄUnravelling CNNđ§Šđ§Šđ§Š complexity. So how does it work?](https://medium.com/@dhanakala2403/cnn-architecture-complexity-explained-9f23fb2490ce?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
Oct 21, 2025

[](https://medium.com/@wlockett?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[Will Lockett](https://medium.com/@wlockett?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[The AI Bubble Is About To Burst, But The Next Bubble Is Already GrowingTechbros are preparing their latest bandwagon.](https://medium.com/@wlockett/the-ai-bubble-is-about-to-burst-but-the-next-bubble-is-already-growing-383c0c0c7ede?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
Sep 14, 2025
[A clap icon 21KA response icon 880](https://medium.com/@wlockett/the-ai-bubble-is-about-to-burst-but-the-next-bubble-is-already-growing-383c0c0c7ede?source=post_page---read_next_recirc--5cc0b5eae175----0---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)

[](https://medium.com/artificial-corner?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
In
[Artificial Corner](https://medium.com/artificial-corner?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
by
[The PyCoach](https://medium.com/@frank-andrade?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[The Best AI Tools for 2026If youâre going to learn a new AI tool, make sure itâs one of these](https://medium.com/artificial-corner/the-best-ai-tools-for-2026-933535a44f8b?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
Dec 1, 2025
[A clap icon 3.9KA response icon 210](https://medium.com/artificial-corner/the-best-ai-tools-for-2026-933535a44f8b?source=post_page---read_next_recirc--5cc0b5eae175----1---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)

[](https://medium.com/generative-ai?source=post_page---read_next_recirc--5cc0b5eae175----2---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
In
[Generative AI](https://medium.com/generative-ai?source=post_page---read_next_recirc--5cc0b5eae175----2---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
by
[Adham Khaled](https://medium.com/@adhamhidawy?source=post_page---read_next_recirc--5cc0b5eae175----2---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[Stanford Just Killed Prompt Engineering With 8 Words (And I Canât Believe It Worked)ChatGPT keeps giving you the same boring response? This new technique unlocks 2Ă more creativity from ANY AI model â no training requiredâŚ](https://medium.com/generative-ai/stanford-just-killed-prompt-engineering-with-8-words-and-i-cant-believe-it-worked-8349d6524d2b?source=post_page---read_next_recirc--5cc0b5eae175----2---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
Oct 20, 2025
[A clap icon 22KA response icon 577](https://medium.com/generative-ai/stanford-just-killed-prompt-engineering-with-8-words-and-i-cant-believe-it-worked-8349d6524d2b?source=post_page---read_next_recirc--5cc0b5eae175----2---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)

[](https://medium.com/@ggarkoti02?source=post_page---read_next_recirc--5cc0b5eae175----3---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[Gaurav Garkoti](https://medium.com/@ggarkoti02?source=post_page---read_next_recirc--5cc0b5eae175----3---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[The Complete Guide to YOLO: Real-Time Object Detection ExplainedYOLO (You Only Look Once) has revolutionized the field of real-time object detection by introducing a remarkably fast and efficientâŚ](https://medium.com/@ggarkoti02/the-complete-guide-to-yolo-real-time-object-detection-explained-3ee824c0ec2d?source=post_page---read_next_recirc--5cc0b5eae175----3---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
Dec 19, 2025
[A clap icon 7](https://medium.com/@ggarkoti02/the-complete-guide-to-yolo-real-time-object-detection-explained-3ee824c0ec2d?source=post_page---read_next_recirc--5cc0b5eae175----3---------------------3661ae5d_511a_4a12_baf9_f9a5412780a2--------------)
[See more recommendations](https://medium.com/?source=post_page---read_next_recirc--5cc0b5eae175---------------------------------------)
[Help](https://help.medium.com/hc/en-us?source=post_page-----5cc0b5eae175---------------------------------------)
[Status](https://status.medium.com/?source=post_page-----5cc0b5eae175---------------------------------------)
[About](https://medium.com/about?autoplay=1&source=post_page-----5cc0b5eae175---------------------------------------)
[Careers](https://medium.com/jobs-at-medium/work-at-medium-959d1a85284e?source=post_page-----5cc0b5eae175---------------------------------------)
[Press](mailto:pressinquiries@medium.com)
[Blog](https://blog.medium.com/?source=post_page-----5cc0b5eae175---------------------------------------)
[Privacy](https://policy.medium.com/medium-privacy-policy-f03bf92035c9?source=post_page-----5cc0b5eae175---------------------------------------)
[Rules](https://policy.medium.com/medium-rules-30e5502c4eb4?source=post_page-----5cc0b5eae175---------------------------------------)
[Terms](https://policy.medium.com/medium-terms-of-service-9db0094a1e0f?source=post_page-----5cc0b5eae175---------------------------------------)
[Text to speech](https://speechify.com/medium?source=post_page-----5cc0b5eae175---------------------------------------) |
| Readable Markdown | null |
| Shard | 77 (laksa) |
| Root Hash | 13179037029838926277 |
| Unparsed URL | com,medium!/thedeephub/convolutional-neural-networks-a-comprehensive-guide-5cc0b5eae175 s443 |