ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0 months ago (distributed domain, exempt) |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://en.wikipedia.org/wiki/Convolutional_neural_network |
| Last Crawled | 2026-04-06 10:24:25 (12 hours ago) |
| First Indexed | 2014-07-05 06:15:28 (11 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Convolutional neural network - Wikipedia |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | A
convolutional neural network
(
CNN
) is a type of
feedforward neural network
that learns
features
via filter (or
kernel
) optimization. This type of
deep learning
network has been applied to process and make
predictions
from many different types of data including text, images and audio.
[
1
]
CNNs are the de-facto standard in deep learning-based approaches to
computer vision
[
2
]
and
image processing
, and have only recently been replaced—in some cases—by newer architectures such as the
transformer
.
Vanishing gradients
and exploding gradients, seen during
backpropagation
in earlier neural networks, are prevented by the
regularization
that comes from using shared weights over fewer connections.
[
3
]
[
4
]
For example, for
each
neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded
convolution
(or cross-correlation) kernels,
[
5
]
[
6
]
only 25 weights for each convolutional layer are required to process 5x5-sized tiles.
[
7
]
[
8
]
Higher-layer features are extracted from wider context windows, compared to lower-layer features.
Some applications of CNNs include:
image and video recognition
,
[
9
]
recommender systems
,
[
10
]
image classification
,
image segmentation
,
medical image analysis
,
natural language processing
,
[
11
]
brain–computer interfaces
,
[
12
]
and
financial
time series
.
[
13
]
CNNs are also known as
shift invariant
or
space invariant artificial neural networks
, based on the shared-weight architecture of the
convolution
kernels or filters that slide along input features and provide translation-
equivariant
responses known as feature maps.
[
14
]
[
15
]
Counter-intuitively, most convolutional neural networks are not
invariant to translation
, due to the downsampling operation they apply to the input.
[
16
]
Feedforward neural networks
are usually fully connected networks, that is, each neuron in one
layer
is connected to all neurons in the next
layer
. The "full connectivity" of these networks makes them prone to
overfitting
data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the generalized principles that characterize a given dataset rather than the biases of a poorly-populated set.
[
17
]
Convolutional networks were
inspired
by
biological
processes
[
18
]
[
19
]
[
20
]
[
21
]
in that the connectivity pattern between
neurons
resembles the organization of the animal
visual cortex
. Individual
cortical neurons
respond to stimuli only in a restricted region of the
visual field
known as the
receptive field
. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other
image classification algorithms
. This means that the network learns to optimize the filters (or kernels) through automated learning, whereas in traditional algorithms these filters are
hand-engineered
. This simplifies and automates the process, enhancing efficiency and scalability overcoming human-intervention bottlenecks.
Comparison of the
LeNet
(1995) and
AlexNet
(2012) convolution, pooling and dense layers
A convolutional neural network consists of an input layer,
hidden layers
and an output layer. In a convolutional neural network, the hidden layers include one or more layers that perform convolutions. Typically this includes a layer that performs a
dot product
of the convolution kernel with the layer's input matrix. This product is usually the
Frobenius inner product
, and its activation function is commonly
ReLU
. As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as
pooling layers
, fully connected layers, and normalization layers.
Here it should be noted how close a convolutional neural network is to a
matched filter
.
[
22
]
Convolutional layers
[
edit
]
In a CNN, the input is a
tensor
with shape:
(number of inputs) × (input height) × (input width) × (input
channels
)
After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map, with shape:
(number of inputs) × (feature map height) × (feature map width) × (feature map
channels
).
Convolutional layers convolve the input and pass its result to the next layer. This is similar to the response of a neuron in the visual cortex to a specific stimulus.
[
23
]
Each convolutional neuron processes data only for its
receptive field
.
1D convolutional neural network feed forward example
Although
fully connected feedforward neural networks
can be used to learn features and classify data, this architecture is generally impractical for larger inputs (e.g., high-resolution images), which would require massive numbers of neurons because each pixel is a relevant input feature. A fully connected layer for an image of size 100 × 100 has 10,000 weights for
each
neuron in the second layer. Convolution reduces the number of free parameters, allowing the network to be deeper.
[
7
]
For example, using a 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using shared weights means there are many fewer parameters, which helps avoid the vanishing gradients and exploding gradients problems seen during
backpropagation
in earlier neural networks.
[
3
]
[
4
]
To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers,
[
24
]
which are based on a depthwise convolution followed by a pointwise convolution. The
depthwise convolution
is a spatial convolution applied independently over each channel of the input tensor, while the
pointwise convolution
is a standard convolution restricted to the use of
kernels.
Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map.
[
25
]
[
26
]
There are two common types of pooling in popular use: max and average.
Max pooling
uses the maximum value of each local cluster of neurons in the feature map,
[
27
]
[
28
]
while
average pooling
takes the average value.
Fully connected layers
[
edit
]
Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional
multilayer perceptron
neural network (MLP). Each neuron in the fully connected layer receives input from all the neurons in the previous layer. These inputs are weighted and summed with the corresponding biases, and then passed through an activation function to perform a nonlinear transformation, generating the output. The flattened matrix goes through a fully connected layer to classify the images.
In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's
receptive field
. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the
entire previous layer
. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers.
To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution
[
29
]
[
30
]
expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios,
[
31
]
thus having a variable receptive field size.
Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights.
The vectors of weights and biases are called
filters
and represent particular
features
of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the
memory footprint
because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector weighting.
[
32
]
A deconvolutional neural network is essentially the reverse of a CNN. It consists of deconvolutional layers and unpooling layers.
[
33
]
A deconvolutional layer is the transpose of a convolutional layer. Specifically, a convolutional layer can be written as a multiplication with a matrix, and a deconvolutional layer is multiplication with the transpose of that matrix.
[
34
]
An unpooling layer expands the layer. The max-unpooling layer is the simplest, as it simply copies each entry multiple times. For example, a 2-by-2 max-unpooling layer is
.
Deconvolution layers are used in image generators. By default, it creates periodic checkerboard artifact, which can be fixed by upscale-then-convolve.
[
35
]
CNN are often compared to the way the brain achieves vision processing in living
organisms
.
[
36
]
Receptive fields in the visual cortex
[
edit
]
Work by
Hubel
and
Wiesel
in the 1950s and 1960s showed that cat
visual cortices
contain neurons that individually respond to small regions of the
visual field
. Provided the eyes are not moving, the region of visual space within which visual stimuli affect the firing of a single neuron is known as its
receptive field
.
[
37
]
Neighboring cells have similar and overlapping receptive fields. Receptive field size and location varies systematically across the cortex to form a complete map of visual space.
[
citation needed
]
The cortex in each hemisphere represents the contralateral
visual field
.
[
citation needed
]
Their 1968 paper identified two basic visual cell types in the brain:
[
19
]
simple cells
, whose output is maximized by straight edges having particular orientations within their receptive field
complex cells
, which have larger
receptive fields
, whose output is insensitive to the exact position of the edges in the field.
Hubel and Wiesel also proposed a cascading model of these two types of cells for use in pattern recognition tasks.
[
38
]
[
37
]
Fukushima's analog threshold elements in a vision model
[
edit
]
In 1969,
Kunihiko Fukushima
introduced a multilayer visual feature detection network, inspired by the above-mentioned work of Hubel and Wiesel, in which "All the elements in one layer have the same set of interconnecting coefficients; the arrangement of the elements and their interconnections are all homogeneous over a given layer." This is the essential core of a convolutional network, but the weights were not trained. In the same paper, Fukushima also introduced the
ReLU
(rectified linear unit)
activation function
.
[
39
]
[
40
]
Neocognitron, origin of the trainable CNN architecture
[
edit
]
The "
neocognitron
"
[
18
]
was introduced by Fukushima in 1980.
[
20
]
[
28
]
[
1
]
The neocognitron introduced the two basic types of layers:
"S-layer": a shared-weights receptive-field layer, later known as a convolutional layer, which contains units whose receptive fields cover a patch of the previous layer. A shared-weights receptive-field group (a "plane" in neocognitron terminology) is often called a filter, and a layer typically has several such filters.
"C-layer": a downsampling layer that contain units whose receptive fields cover patches of previous convolutional layers. Such a unit typically computes a weighted average of the activations of the units in its patch, and applies inhibition (divisive normalization) pooled from a somewhat larger patch and across different filters in a layer, and applies a saturating activation function. The patch weights are nonnegative and are not trainable in the original neocognitron. The downsampling and competitive inhibition help to classify features and objects in visual scenes even when the objects are shifted.
Several
supervised
and
unsupervised learning
algorithms have been proposed over the decades to train the weights of a neocognitron.
[
18
]
Today, however, the CNN architecture is usually trained through
backpropagation
.
Fukushima's ReLU activation function was not used in his neocognitron since all the weights were nonnegative; lateral inhibition was used instead. The rectifier has become a very popular activation function for CNNs and
deep neural networks
in general.
[
41
]
Convolution in time
[
edit
]
The term "convolution" first appears in neural networks in a paper by Toshiteru Homma, Les Atlas, and Robert Marks II at the first
Conference on Neural Information Processing Systems
in 1987. Their paper replaced multiplication with convolution in time, inherently providing shift invariance, motivated by and connecting more directly to the
signal-processing concept of a filter
, and demonstrated it on a speech recognition task.
[
8
]
They also pointed out that as a data-trainable system, convolution is essentially equivalent to correlation since reversal of the weights does not affect the final learned function ("For convenience, we denote * as correlation instead of convolution. Note that convolving a(t) with b(t) is equivalent to correlating a(-t) with b(t).").
[
8
]
Modern CNN implementations typically do correlation and call it convolution, for convenience, as they did here.
Time delay neural networks
[
edit
]
The
time delay neural network
(TDNN) was introduced in 1987 by
Alex Waibel
et al. for phoneme recognition and was an early convolutional network exhibiting shift-invariance.
[
42
]
A TDNN is a 1-D convolutional neural net where the convolution is performed along the time axis of the data. It is the first CNN utilizing weight sharing in combination with a training by gradient descent, using
backpropagation
.
[
43
]
Thus, while also using a pyramidal structure as in the neocognitron, it performed a global optimization of the weights instead of a local one.
[
42
]
TDNNs are convolutional networks that share weights along the temporal dimension.
[
44
]
They allow speech signals to be processed time-invariantly. In 1990 Hampshire and Waibel introduced a variant that performs a two-dimensional convolution.
[
45
]
Since these TDNNs operated on spectrograms, the resulting phoneme recognition system was invariant to both time and frequency shifts, as with images processed by a neocognitron.
TDNNs improved the performance of far-distance speech recognition.
[
46
]
Image recognition with CNNs trained by gradient descent
[
edit
]
Denker et al. (1989) designed a 2-D CNN system to recognize hand-written
ZIP Code
numbers.
[
47
]
However, the lack of an efficient training method to determine the
kernel
coefficients of the involved convolutions meant that all the coefficients had to be laboriously hand-designed.
[
48
]
Following the advances in the training of 1-D CNNs by Waibel et al. (1987),
Yann LeCun
et al. (1989)
[
48
]
used back-propagation to learn the convolution kernel coefficients directly from images of hand-written numbers. Learning was thus fully automatic, performed better than manual coefficient design, and was suited to a broader range of image recognition problems and image types.
Wei Zhang et al. (1988)
[
14
]
[
15
]
used back-propagation to train the convolution kernels of a CNN for alphabets recognition. The model was called shift-invariant pattern recognition neural network before the name CNN was coined later in the early 1990s. Wei Zhang et al. also applied the same CNN without the last fully connected layer for medical image object segmentation (1991)
[
49
]
and breast cancer detection in mammograms (1994).
[
50
]
This approach became a foundation of modern
computer vision
.
In 1990 Yamaguchi et al. introduced the concept of max pooling, a fixed filtering operation that calculates and propagates the maximum value of a given region. They did so by combining TDNNs with max pooling to realize a speaker-independent isolated word recognition system.
[
27
]
In their system they used several TDNNs per word, one for each
syllable
. The results of each TDNN over the input signal were combined using max pooling and the outputs of the pooling layers were then passed on to networks performing the actual word classification.
In a variant of the neocognitron called the
cresceptron
, instead of using Fukushima's spatial averaging with inhibition and saturation, J. Weng et al. in 1993 used max pooling, where a downsampling unit computes the maximum of the activations of the units in its patch,
[
51
]
introducing this method into the vision field.
Max pooling is often used in modern CNNs.
[
52
]
LeNet-5, a pioneering 7-level convolutional network by
LeCun
et al. in 1995,
[
53
]
classifies hand-written numbers on
checks
digitized in 32×32 pixel images. The ability to process higher-resolution images requires larger and more layers of convolutional neural networks, so this technique is constrained by the availability of computing resources.
It was superior than other commercial courtesy amount reading systems (as of 1995). The system was integrated in
NCR
's check reading systems, and fielded in several American banks since June 1996, reading millions of checks per day.
[
54
]
Shift-invariant neural network
[
edit
]
A shift-invariant neural network was proposed by Wei Zhang et al. for image character recognition in 1988.
[
14
]
[
15
]
It is a modified Neocognitron by keeping only the convolutional interconnections between the image feature layers and the last fully connected layer. The model was trained with back-propagation. The training algorithm was further improved in 1991
[
55
]
to improve its generalization ability. The model architecture was modified by removing the last fully connected layer and applied for medical image segmentation (1991)
[
49
]
and automatic detection of breast cancer in
mammograms (1994)
.
[
50
]
A different convolution-based design was proposed in 1988
[
56
]
for application to decomposition of one-dimensional
electromyography
convolved signals via de-convolution. This design was modified in 1989 to other de-convolution-based designs.
[
57
]
[
58
]
GPU implementations
[
edit
]
Although CNNs were invented in the 1980s, their breakthrough in the 2000s required fast implementations on
graphics processing units
(GPUs).
In 2004, it was shown by K. S. Oh and K. Jung that standard neural networks can be greatly accelerated on GPUs. Their implementation was 20 times faster than an equivalent implementation on
CPU
.
[
59
]
In 2005, another paper also emphasised the value of
GPGPU
for
machine learning
.
[
60
]
The first GPU-implementation of a CNN was described in 2006 by K. Chellapilla et al. Their implementation was 4 times faster than an equivalent implementation on CPU.
[
61
]
In the same period, GPUs were also used for unsupervised training of
deep belief networks
.
[
62
]
[
63
]
[
64
]
[
65
]
In 2010, Dan Ciresan et al. at
IDSIA
trained deep feedforward networks on GPUs.
[
66
]
In 2011, they extended this to CNNs, accelerating by 60 compared to training CPU.
[
25
]
In 2011, the network won an image recognition contest where they achieved superhuman performance for the first time.
[
67
]
Then they won more competitions and achieved state of the art on several benchmarks.
[
68
]
[
52
]
[
28
]
Subsequently,
AlexNet
, a similar GPU-based CNN by Alex Krizhevsky et al. won the
ImageNet Large Scale Visual Recognition Challenge
2012.
[
69
]
It was an early catalytic event for the
AI boom
.
Compared to the training of CNNs using
GPUs
, not much attention was given to CPU. (Viebke et al 2019) parallelizes CNN by thread- and
SIMD
-level parallelism that is available on the
Intel Xeon Phi
.
[
70
]
[
71
]
Distinguishing features
[
edit
]
In the past, traditional
multilayer perceptron
(MLP) models were used for image recognition.
[
example needed
]
However, the full connectivity between nodes caused the
curse of dimensionality
, and was computationally intractable with higher-resolution images. A 1000×1000-pixel image with
RGB color
channels has 3 million weights per fully-connected neuron, which is too high to feasibly process efficiently at scale.
CNN layers arranged in 3 dimensions
For example, in
CIFAR-10
, images are only of size 32×32×3 (32 wide, 32 high, 3 color channels), so a single fully connected neuron in the first hidden layer of a regular neural network would have 32*32*3 = 3,072 weights. A 200×200 image, however, would lead to neurons that have 200*200*3 = 120,000 weights.
Also, such network architecture does not take into account the spatial structure of data, treating input pixels which are far apart in the same way as pixels that are close together. This ignores
locality of reference
in data with a grid-topology (such as images), both computationally and semantically. Thus, full connectivity of neurons is wasteful for purposes such as image recognition that are dominated by
spatially local
input patterns.
Convolutional neural networks are variants of multilayer perceptrons, designed to emulate the behavior of a
visual cortex
. These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images. As opposed to MLPs, CNNs have the following distinguishing features:
3D volumes of neurons. The layers of a CNN have neurons arranged in
3 dimensions
: width, height and depth.
[
72
]
Each neuron inside a convolutional layer is connected to only a small region of the layer before it, called a receptive field. Distinct types of layers, both locally and completely connected, are stacked to form a CNN architecture.
Local connectivity: following the concept of receptive fields, CNNs exploit spatial locality by enforcing a local connectivity pattern between neurons of adjacent layers. The architecture thus ensures that the learned "filters" produce the strongest response to a spatially local input pattern. Stacking many such layers leads to nonlinear filters that become increasingly global (i.e. responsive to a larger region of pixel space) so that the network first creates representations of small parts of the input, then from them assembles representations of larger areas.
Shared weights: In CNNs, each filter is replicated across the entire visual field. These replicated units share the same parameterization (weight vector and bias) and form a feature map. This means that all the neurons in a given convolutional layer respond to the same feature within their specific response field. Replicating units in this way allows for the resulting activation map to be
equivariant
under shifts of the locations of input features in the visual field, i.e. they grant translational
equivariance
—given that the layer has a stride of one.
[
73
]
Pooling: In a CNN's
pooling layers
, feature maps are divided into rectangular sub-regions, and the features in each rectangle are independently down-sampled to a single value, commonly by taking their average or maximum value. In addition to reducing the sizes of feature maps, the pooling operation grants a degree of local
translational invariance
to the features contained therein, allowing the CNN to be more robust to variations in their positions.
[
16
]
Together, these properties allow CNNs to achieve better generalization on
vision problems
. Weight sharing dramatically reduces the number of
free parameters
learned, thus lowering the memory requirements for running the network and allowing the training of larger, more powerful networks.
A CNN architecture is formed by a stack of distinct layers that transform the input volume into an output volume (e.g. holding the class scores) through a differentiable function. A few distinct types of layers are commonly used. These are further discussed below.
Neurons of a convolutional layer (blue), connected to their receptive field (red)
Convolutional layer
[
edit
]
A worked example of performing a convolution. The convolution has stride 1, zero-padding, with kernel size 3-by-3. The convolution kernel is a
discrete Laplacian operator
.
The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or
kernels
), which have a small receptive field, but extend through the full depth of the input volume. During the forward pass, each filter is
convolved
across the width and height of the input volume, computing the
dot product
between the filter entries and the input, producing a 2-dimensional
activation map
of that filter. As a result, the network learns filters that activate when it detects some specific type of
feature
at some spatial position in the input.
[
74
]
[
nb 1
]
Stacking the activation maps for all filters along the depth dimension forms the full output volume of the convolution layer. Every entry in the output volume can thus also be interpreted as an output of a neuron that looks at a small region in the input. Each entry in an activation map use the same set of parameters that define the filter.
Self-supervised learning
has been adapted for use in convolutional layers by using sparse patches with a high-mask ratio and a global response normalization layer.
[
citation needed
]
Typical CNN architecture
When dealing with high-dimensional inputs such as images, it is impractical to connect neurons to all neurons in the previous volume because such a network architecture does not take the spatial structure of the data into account. Convolutional networks exploit spatially local correlation by enforcing a
sparse local connectivity
pattern between neurons of adjacent layers: each neuron is connected to only a small region of the input volume.
The extent of this connectivity is a
hyperparameter
called the
receptive field
of the neuron. The connections are
local in space
(along width and height), but always extend along the entire depth of the input volume. Such an architecture ensures that the learned filters produce the strongest response to a spatially local input pattern.
[
75
]
Spatial arrangement
[
edit
]
Three
hyperparameters
control the size of the output volume of the convolutional layer: the depth,
stride
, and padding size:
The
depth
of the output volume controls the number of neurons in a layer that connect to the same region of the input volume. These neurons learn to activate for different features in the input. For example, if the first convolutional layer takes the raw image as input, then different neurons along the depth dimension may activate in the presence of various oriented edges, or blobs of color.
Stride
controls how depth columns around the width and height are allocated. If the stride is 1, then we move the filters one pixel at a time. This leads to heavily
overlapping
receptive fields between the columns, and to large output volumes. For any integer
a stride
S
means that the filter is translated
S
units at a time per output. In practice,
is rare. A greater stride means smaller overlap of receptive fields and smaller spatial dimensions of the output volume.
[
76
]
Sometimes, it is convenient to pad the input with zeros (or other values, such as the average of the region) on the border of the input volume. The size of this padding is a third hyperparameter. Padding provides control of the output volume's spatial size. In particular, sometimes it is desirable to exactly preserve the spatial size of the input volume, this is commonly referred to as "same" padding.
Three example padding conditions. Replication condition means that the pixel outside is padded with the closest pixel inside. The reflection padding is where the pixel outside is padded with the pixel inside, reflected across the boundary of the image. The circular padding is where the pixel outside wraps around to the other side of the image.
The spatial size of the output volume is a function of the input volume size
, the kernel field size
of the convolutional layer neurons, the stride
, and the amount of zero padding
on the border. The number of neurons that "fit" in a given volume is then:
If this number is not an
integer
, then the strides are incorrect and the neurons cannot be tiled to fit across the input volume in a
symmetric
way. In general, setting zero padding to be
when the stride is
ensures that the input volume and output volume will have the same size spatially. However, it is not always completely necessary to use all of the neurons of the previous layer. For example, a neural network designer may decide to use just a portion of padding.
A parameter sharing scheme is used in convolutional layers to control the number of free parameters. It relies on the assumption that if a patch feature is useful to compute at some spatial position, then it should also be useful to compute at other positions. Denoting a single 2-dimensional slice of depth as a
depth slice
, the neurons in each depth slice are constrained to use the same weights and bias.
Since all neurons in a single depth slice share the same parameters, the forward pass in each depth slice of the convolutional layer can be computed as a
convolution
of the neuron's weights with the input volume.
[
nb 2
]
Therefore, it is common to refer to the sets of weights as a filter (or a
kernel
), which is convolved with the input. The result of this convolution is an
activation map
, and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume. Parameter sharing contributes to the
translation invariance
of the CNN architecture.
[
16
]
Sometimes, the parameter sharing assumption may not make sense. This is especially the case when the input images to a CNN have some specific centered structure; for which we expect completely different features to be learned on different spatial locations. One practical example is when the inputs are faces that have been centered in the image: we might expect different eye-specific or hair-specific features to be learned in different parts of the image. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a "locally connected layer". In this layer, the convolutional kernels' parameters are not shared. Instead, the network learns independent weights and biases for each spatial location. This allows each location to have its own feature-learning ability, making it better suited to handle images with distinct central structures or irregular features.
Worked example of 2x2 maxpooling with stride 2
Max pooling with a 2x2 filter and stride = 2
Another important concept of CNNs is pooling, which is used as a form of non-linear
down-sampling
. Pooling provides downsampling because it reduces the spatial dimensions (height and width) of the input feature maps while retaining the most important information. There are several non-linear functions to implement pooling, where
max pooling
and
average pooling
are the most common. Pooling aggregates information from small regions of the input creating
partitions
of the input feature map, typically using a fixed-size window (like 2x2) and applying a stride (often 2) to move the window across the input.
[
77
]
Note that without using a stride greater than 1, pooling would not perform downsampling, as it would simply move the pooling window across the input one step at a time, without reducing the size of the feature map. In other words, the stride is what actually causes the downsampling by determining how much the pooling window moves over the input.
Intuitively, the exact location of a feature is less important than its rough location relative to other features. This is the idea behind the use of pooling in convolutional neural networks. The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters,
memory footprint
and amount of computation in the network, and hence to also control
overfitting
. This is known as down-sampling. It is common to periodically insert a pooling layer between successive convolutional layers (each one typically followed by an activation function, such as a
ReLU layer
) in a CNN architecture.
[
74
]
: 460–461
While pooling layers contribute to local translation invariance, they do not provide global translation invariance in a CNN, unless a form of global pooling is used.
[
16
]
[
73
]
The pooling layer commonly operates independently on every depth, or slice, of the input and resizes it spatially. A very common form of max pooling is a layer with filters of size 2×2, applied with a stride of 2, which subsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations:
In this case, every
max operation
is over 4 numbers. The depth dimension remains unchanged (this is true for other forms of pooling as well).
In addition to max pooling, pooling units can use other functions, such as
average
pooling or
ℓ
2
-norm
pooling. Average pooling was often used historically but has recently fallen out of favor compared to max pooling, which generally performs better in practice.
[
78
]
Due to the effects of fast spatial reduction of the size of the representation,
[
which?
]
there is a recent trend towards using smaller filters
[
79
]
or discarding pooling layers altogether.
[
80
]
RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.
Channel max pooling
[
edit
]
A channel max pooling (CMP) operation layer conducts the MP operation along the channel side among the corresponding positions of the consecutive feature maps for the purpose of redundant information elimination. The CMP makes the significant features gather together within fewer channels, which is important for fine-grained image classification that needs more discriminating features. Meanwhile, another advantage of the CMP operation is to make the channel number of feature maps smaller before it connects to the first fully connected (FC) layer. Similar to the MP operation, we denote the input feature maps and output feature maps of a CMP layer as F ∈ R(C×M×N) and C ∈ R(c×M×N), respectively, where C and c are the channel numbers of the input and output feature maps, M and N are the widths and the height of the feature maps, respectively. Note that the CMP operation only changes the channel number of the feature maps. The width and the height of the feature maps are not changed, which is different from the MP operation.
[
81
]
See
[
82
]
[
83
]
for reviews for pooling methods.
ReLU is the abbreviation of
rectified linear unit
. It was proposed by
Alston Householder
in 1941,
[
84
]
and used in CNN by
Kunihiko Fukushima
in 1969.
[
39
]
ReLU applies the non-saturating
activation function
.
[
69
]
It effectively removes negative values from an activation map by setting them to zero.
[
85
]
It introduces
nonlinearity
to the
decision function
and in the overall network without affecting the receptive fields of the convolution layers.
In 2011, Xavier Glorot, Antoine Bordes and
Yoshua Bengio
found that ReLU enables better training of deeper networks,
[
86
]
compared to widely used activation functions prior to 2011.
Other functions can also be used to increase nonlinearity, for example the saturating
hyperbolic tangent
,
, and the
sigmoid function
. ReLU is often preferred to other functions because it trains the neural network several times faster without a significant penalty to
generalization
accuracy.
[
87
]
Fully connected layer
[
edit
]
After several convolutional and max pooling layers, the final classification is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular (non-convolutional)
artificial neural networks
. Their activations can thus be computed as an
affine transformation
, with
matrix multiplication
followed by a bias offset (
vector addition
of a learned or fixed bias term).
The "loss layer", or "
loss function
", exemplifies how
training
penalizes the deviation between the predicted output of the network, and the
true
data labels (during supervised learning). Various
loss functions
can be used, depending on the specific task.
The
Softmax
loss function is used for predicting a single class of
K
mutually exclusive classes.
[
nb 3
]
Sigmoid
cross-entropy
loss is used for predicting
K
independent probability values in
.
Euclidean
loss is used for
regressing
to
real-valued
labels
.
Hyperparameters are various settings that are used to control the learning process. CNNs use more
hyperparameters
than a standard multilayer perceptron (MLP).
Padding is the addition of (typically) 0-valued pixels on the borders of an image. This is done so that the border pixels are not undervalued (lost) from the output because they would ordinarily participate in only a single receptive field instance. The padding applied is typically one less than the corresponding kernel dimension. For example, a convolutional layer using 3x3 kernels would receive a 2-pixel pad, that is 1 pixel on each side of the image.
[
citation needed
]
The stride is the number of pixels that the analysis window moves on each iteration. A stride of 2 means that each kernel is offset by 2 pixels from its predecessor.
Since feature map size decreases with depth, layers near the input layer tend to have fewer filters while higher layers can have more. To equalize computation at each layer, the product of feature values
v
a
with pixel position is kept roughly constant across layers. Preserving more information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) non-decreasing from one layer to the next.
The number of feature maps directly controls the capacity and depends on the number of available examples and task complexity.
Filter (or kernel) size
[
edit
]
Common filter sizes found in the literature vary greatly, and are usually chosen based on the data set. Typical filter sizes range from 1x1 to 7x7. As two famous examples,
AlexNet
used 3x3, 5x5, and 11x11.
Inceptionv3
used 1x1, 3x3, and 5x5.
The challenge is to find the right level of granularity so as to create abstractions at the proper scale, given a particular data set, and without
overfitting
.
Pooling type and size
[
edit
]
Max pooling
is typically used, often with a 2x2 dimension. This implies that the input is drastically
downsampled
, reducing processing cost.
Greater pooling
reduces the dimension
of the signal, and may result in unacceptable
information loss
. Often, non-overlapping pooling windows perform best.
[
78
]
Dilation involves ignoring pixels within a kernel. This reduces processing memory potentially without significant signal loss. A dilation of 2 on a 3x3 kernel expands the kernel to 5x5, while still processing 9 (evenly spaced) pixels. Specifically, the processed pixels after the dilation are the cells (1,1), (1,3), (1,5), (3,1), (3,3), (3,5), (5,1), (5,3), (5,5), where (i,j) denotes the cell of the i-th row and j-th column in the expanded 5x5 kernel. Accordingly, dilation of 4 expands the kernel to 7x7.
[
citation needed
]
Translation equivariance and aliasing
[
edit
]
It is commonly assumed that CNNs are invariant to shifts of the input. Convolution or pooling layers within a CNN that do not have a stride greater than one are indeed
equivariant
to translations of the input.
[
73
]
However, layers with a stride greater than one ignore the
Nyquist–Shannon sampling theorem
and might lead to
aliasing
of the input signal
[
73
]
While, in principle, CNNs are capable of implementing anti-aliasing filters, it has been observed that this does not happen in practice,
[
88
]
and therefore yield models that are not equivariant to translations.
Furthermore, if a CNN makes use of fully connected layers, translation equivariance does not imply translation invariance, as the fully connected layers are not invariant to shifts of the input.
[
89
]
[
16
]
One solution for complete translation invariance is avoiding any down-sampling throughout the network and applying global average pooling at the last layer.
[
73
]
Additionally, several other partial solutions have been proposed, such as
anti-aliasing
before downsampling operations,
[
90
]
spatial transformer networks,
[
91
]
data augmentation
, subsampling combined with pooling,
[
16
]
and
capsule neural networks
.
[
92
]
The accuracy of the final model is typically estimated on a sub-part of the dataset set apart at the start, often called a test set. Alternatively, methods such as
k
-fold cross-validation
are applied. Other strategies include using
conformal prediction
.
[
93
]
[
94
]
Regularization methods
[
edit
]
Regularization
is a process of introducing additional information to solve an
ill-posed problem
or to prevent
overfitting
. CNNs use various types of regularization.
Because networks have so many parameters, they are prone to overfitting. One method to reduce overfitting is
dropout
, introduced in 2014.
[
95
]
At each training stage, individual nodes are either "dropped out" of the net (ignored) with probability
or kept with probability
, so that a reduced network is left; incoming and outgoing edges to a dropped-out node are also removed. Only the reduced network is trained on the data in that stage. The removed nodes are then reinserted into the network with their original weights.
In the training stages,
is usually 0.5; for input nodes, it is typically much higher because information is directly lost when input nodes are ignored.
At testing time after training has finished, we would ideally like to find a sample average of all possible
dropped-out networks; unfortunately this is unfeasible for large values of
. However, we can find an approximation by using the full network with each node's output weighted by a factor of
, so the
expected value
of the output of any node is the same as in the training stages. This is the biggest contribution of the dropout method: although it effectively generates
neural nets, and as such allows for model combination, at test time only a single network needs to be tested.
By avoiding training all nodes on all training data, dropout decreases overfitting. The method also significantly improves training speed. This makes the model combination practical, even for
deep neural networks
. The technique seems to reduce node interactions, leading them to learn more robust features
[
clarification needed
]
that better generalize to new data.
DropConnect is the generalization of dropout in which each connection, rather than each output unit, can be dropped with probability
. Each unit thus receives input from a random subset of units in the previous layer.
[
96
]
DropConnect is similar to dropout as it introduces dynamic sparsity within the model, but differs in that the sparsity is on the weights, rather than the output vectors of a layer. In other words, the fully connected layer with DropConnect becomes a sparsely connected layer in which the connections are chosen at random during the training stage.
A major drawback to dropout is that it does not have the same benefits for convolutional layers, where the neurons are not fully connected.
Even before dropout, in 2013 a technique called stochastic pooling,
[
97
]
the conventional
deterministic
pooling operations were replaced with a stochastic procedure, where the activation within each pooling region is picked randomly according to a
multinomial distribution
, given by the activities within the pooling region. This approach is free of hyperparameters and can be combined with other regularization approaches, such as dropout and
data augmentation
.
An alternate view of stochastic pooling is that it is equivalent to standard max pooling but with many copies of an input image, each having small local
deformations
. This is similar to explicit
elastic deformations
of the input images,
[
98
]
which delivers excellent performance on the
MNIST data set
.
[
98
]
Using stochastic pooling in a multilayer model gives an exponential number of deformations since the selections in higher layers are independent of those below.
Because the degree of model overfitting is determined by both its power and the amount of training it receives, providing a convolutional network with more training examples can reduce overfitting. Because there is often not enough available data to train, especially considering that some part should be spared for later testing, two approaches are to either generate new data from scratch (if possible) or perturb existing data to create new ones. The latter one is used since mid-1990s.
[
53
]
For example, input images can be cropped, rotated, or rescaled to create new examples with the same labels as the original training set.
[
99
]
One of the simplest methods to prevent overfitting of a network is to simply stop the training before overfitting has had a chance to occur. It comes with the disadvantage that the learning process is halted.
Number of parameters
[
edit
]
Another simple way to prevent overfitting is to limit the number of parameters, typically by limiting the number of hidden units in each layer or limiting network depth. For convolutional networks, the filter size also affects the number of parameters. Limiting the number of parameters restricts the predictive power of the network directly, reducing the complexity of the function that it can perform on the data, and thus limits the amount of overfitting. This is equivalent to a "
zero norm
".
A simple form of added regularizer is weight decay, which simply adds an additional error, proportional to the sum of weights (
L1 norm
) or squared magnitude (
L2 norm
) of the weight vector, to the error at each node. The level of acceptable model complexity can be reduced by increasing the proportionality constant('alpha' hyperparameter), thus increasing the penalty for large weight vectors.
L2 regularization is the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. The L2 regularization has the intuitive interpretation of heavily penalizing peaky weight vectors and preferring diffuse weight vectors. Due to multiplicative interactions between weights and inputs this has the useful property of encouraging the network to use all of its inputs a little rather than some of its inputs a lot.
L1 regularization is also common. It makes the weight vectors sparse during optimization. In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the noisy inputs. L1 with L2 regularization can be combined; this is called
elastic net regularization
.
Max norm constraints
[
edit
]
Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use
projected gradient descent
to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector
of every neuron to satisfy
. Typical values of
are order of 3–4. Some papers report improvements
[
100
]
when using this form of regularization.
Hierarchical coordinate frames
[
edit
]
Pooling loses the precise spatial relationships between high-level parts (such as nose and mouth in a face image). These relationships are needed for identity recognition. Overlapping the pools so that each feature occurs in multiple pools, helps retain the information. Translation alone cannot extrapolate the understanding of geometric relationships to a radically new viewpoint, such as a different orientation or scale. On the other hand, people are very good at extrapolating; after seeing a new shape once they can recognize it from a different viewpoint.
[
101
]
An earlier common way to deal with this problem is to train the network on transformed data in different orientations, scales, lighting, etc. so that the network can cope with these variations. This is computationally intensive for large data-sets. The alternative is to use a hierarchy of coordinate frames and use a group of neurons to represent a conjunction of the shape of the feature and its pose relative to the
retina
. The pose relative to the retina is the relationship between the coordinate frame of the retina and the intrinsic features' coordinate frame.
[
102
]
Thus, one way to represent something is to embed the coordinate frame within it. This allows large features to be recognized by using the consistency of the poses of their parts (e.g. nose and mouth poses make a consistent prediction of the pose of the whole face). This approach ensures that the higher-level entity (e.g. face) is present when the lower-level (e.g. nose and mouth) agree on its prediction of the pose. The vectors of neuronal activity that represent pose ("pose vectors") allow spatial transformations modeled as linear operations that make it easier for the network to learn the hierarchy of visual entities and generalize across viewpoints. This is similar to the way the human
visual system
imposes coordinate frames in order to represent shapes.
[
103
]
CNNs are often used in
image recognition
systems. In 2012, an
error rate
of 0.23% on the
MNIST database
was reported.
[
28
]
Another paper on using CNN for image classification reported that the learning process was "surprisingly fast"; in the same paper, the best published results as of 2011 were achieved in the MNIST database and the NORB database.
[
25
]
Subsequently, a similar CNN called
AlexNet
[
104
]
won the
ImageNet Large Scale Visual Recognition Challenge
2012.
When applied to
facial recognition
, CNNs achieved a large decrease in error rate.
[
105
]
Another paper reported a 97.6% recognition rate on "5,600 still images of more than 10 subjects".
[
21
]
CNNs were used to assess
video quality
in an objective way after manual training; the resulting system had a very low
root mean square error
.
[
106
]
The
ImageNet Large Scale Visual Recognition Challenge
is a benchmark in object classification and detection, with millions of images and hundreds of object classes. In the ILSVRC 2014,
[
107
]
a large-scale visual recognition challenge, almost every highly ranked team used CNN as their basic framework. The winner
GoogLeNet
[
108
]
(the foundation of
DeepDream
) increased the mean average
precision
of object detection to 0.439329, and reduced classification error to 0.06656, the best result to date. Its network applied more than 30 layers. That performance of convolutional neural networks on the ImageNet tests was close to that of humans.
[
109
]
The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters, an increasingly common phenomenon with modern digital cameras. By contrast, those kinds of images rarely trouble humans. Humans, however, tend to have trouble with other issues. For example, they are not good at classifying objects into fine-grained categories such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this.
[
citation needed
]
In 2015, a many-layered CNN demonstrated the ability to spot faces from a wide range of angles, including upside down, even when partially occluded, with competitive performance. The network was trained on a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They used batches of 128 images over 50,000 iterations.
[
110
]
Compared to image data domains, there is relatively little work on applying CNNs to video classification. Video is more complex than images since it has another (temporal) dimension. However, some extensions of CNNs into the video domain have been explored. One approach is to treat space and time as equivalent dimensions of the input and perform convolutions in both time and space.
[
111
]
[
112
]
Another way is to fuse the features of two convolutional neural networks, one for the spatial and one for the temporal stream.
[
113
]
[
114
]
[
115
]
Long short-term memory
(LSTM)
recurrent
units are typically incorporated after the CNN to account for inter-frame or inter-clip dependencies.
[
116
]
[
117
]
Unsupervised learning
schemes for training spatio-temporal features have been introduced, based on Convolutional Gated Restricted
Boltzmann Machines
[
118
]
and Independent Subspace Analysis.
[
119
]
Its application can be seen in
text-to-video model
.
[
citation needed
]
Natural language processing
[
edit
]
CNNs have also been explored for
natural language processing
. CNN models are effective for various NLP problems and achieved excellent results in
semantic parsing
,
[
120
]
search query retrieval,
[
121
]
sentence modeling,
[
122
]
classification,
[
123
]
prediction
[
124
]
and other traditional NLP tasks.
[
125
]
Compared to traditional language processing methods such as
recurrent neural networks
, CNNs can represent different contextual realities of language that do not rely on a series-sequence assumption, while RNNs are better suitable when classical time series modeling is required.
[
126
]
[
127
]
[
128
]
[
129
]
Animal behavior detection
[
edit
]
CNNs have been applied in ecological and behavioral research to automatically detect and quantify animal behavior from visual data,
[
130
]
[
131
]
enabling identification of animals,
[
132
]
[
133
]
tracking of individuals,
[
134
]
estimation of pose,
[
135
]
[
136
]
[
137
]
and classification of specific actions such as feeding,
[
138
]
and social interactions.
[
131
]
[
138
]
Combined with multi-object tracking and temporal modeling, these systems can extract behavioral sequences over extended recordings, reducing reliance on manual annotation and increasing throughput for studies of individual variation, social networks, and collective dynamics.
A CNN with 1-D convolutions was used on time series in the frequency domain (spectral residual) by an unsupervised model to detect anomalies in the time domain.
[
139
]
CNNs have been used in
drug discovery
. Predicting the interaction between molecules and biological
proteins
can identify potential treatments. In 2015, Atomwise introduced AtomNet, the first deep learning neural network for
structure-based drug design
.
[
140
]
The system trains directly on 3-dimensional representations of chemical interactions. Similar to how image recognition networks learn to compose smaller, spatially proximate features into larger, complex structures,
[
141
]
AtomNet discovers chemical features, such as
aromaticity
,
sp
3
carbons
, and
hydrogen bonding
. Subsequently, AtomNet was used to predict novel candidate
biomolecules
for multiple disease targets, most notably treatments for the
Ebola virus
[
142
]
and
multiple sclerosis
.
[
143
]
CNNs have been used in the game of
checkers
. From 1999 to 2001,
Fogel
and Chellapilla published papers showing how a convolutional neural network could learn to play checkers using co-evolution. The learning process did not use prior human professional games, but rather focused on a minimal set of information contained in the checkerboard: the location and type of pieces, and the difference in number of pieces between the two sides. Ultimately, the program (
Blondie24
) was tested on 165 games against players and ranked in the highest 0.4%.
[
144
]
[
145
]
It also earned a win against the program
Chinook
at its "expert" level of play.
[
146
]
CNNs have been used in
computer Go
. In December 2014, Clark and
Storkey
published a paper showing that a CNN trained by supervised learning from a database of human professional games could outperform
GNU Go
and win some games against
Monte Carlo tree search
Fuego 1.1 in a fraction of the time it took Fuego to play.
[
147
]
Later it was announced that a large 12-layer convolutional neural network had correctly predicted the professional move in 55% of positions, equalling the accuracy of a
6 dan
human player. When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional search program GNU Go in 97% of games, and matched the performance of the
Monte Carlo tree search
program Fuego simulating ten thousand playouts (about a million positions) per move.
[
148
]
A couple of CNNs for choosing moves to try ("policy network") and evaluating positions ("value network") driving MCTS were used by
AlphaGo
, the first to beat the best human player at the time.
[
149
]
Time series forecasting
[
edit
]
Recurrent neural networks are generally considered the best neural network architectures for time series forecasting (and sequence modeling in general), but recent studies show that convolutional networks can perform comparably or even better.
[
150
]
[
13
]
Dilated convolutions
[
151
]
might enable one-dimensional convolutional neural networks to effectively learn time series dependences.
[
152
]
Convolutions can be implemented more efficiently than RNN-based solutions, and they do not suffer from vanishing (or exploding) gradients.
[
153
]
Convolutional networks can provide an improved forecasting performance when there are multiple similar time series to learn from.
[
154
]
CNNs can also be applied to further tasks in time series analysis (e.g., time series classification
[
155
]
or quantile forecasting
[
156
]
).
Cultural heritage and 3D-datasets
[
edit
]
As archaeological findings such as
clay tablets
with
cuneiform writing
are increasingly acquired using
3D scanners
, benchmark datasets are becoming available, including
HeiCuBeDa
[
157
]
providing almost 2000 normalized 2-D and 3-D datasets prepared with the
GigaMesh Software Framework
.
[
158
]
So
curvature
-based measures are used in conjunction with geometric neural networks (GNNs), e.g. for period classification of those clay tablets being among the oldest documents of human history.
[
159
]
[
160
]
For many applications, training data is not very available. Convolutional neural networks usually require a large amount of training data in order to avoid
overfitting
. A common technique is to train the network on a larger data set from a related domain. Once the network parameters have converged an additional training step is performed using the in-domain data to fine-tune the network weights, this is known as
transfer learning
. Furthermore, this technique allows convolutional network architectures to successfully be applied to problems with tiny training sets.
[
161
]
Human interpretable explanations
[
edit
]
End-to-end training and prediction are common practice in
computer vision
. However, human interpretable explanations are required for
critical systems
such as
self-driving cars
.
[
162
]
With recent advances in
visual salience
,
spatial attention
, and
temporal attention
, the most critical spatial regions/temporal instants could be visualized to justify the CNN predictions.
[
163
]
[
164
]
A deep Q-network (DQN) is a type of deep learning model that combines a deep neural network with
Q-learning
, a form of
reinforcement learning
. Unlike earlier reinforcement learning agents, DQNs that utilize CNNs can learn directly from high-dimensional sensory inputs via reinforcement learning.
[
165
]
Preliminary results were presented in 2014, with an accompanying paper in February 2015.
[
166
]
The research described an application to
Atari 2600
gaming. Other deep reinforcement learning models preceded it.
[
167
]
Deep belief networks
[
edit
]
Convolutional deep belief networks
(CDBN) have structure very similar to convolutional neural networks and are trained similarly to deep belief networks. Therefore, they exploit the 2D structure of images, like CNNs do, and make use of pre-training like
deep belief networks
. They provide a generic structure that can be used in many image and signal processing tasks. Benchmark results on standard image datasets like CIFAR
[
168
]
have been obtained using CDBNs.
[
169
]
Neural abstraction pyramid
Neural abstraction pyramid
[
edit
]
The feed-forward architecture of convolutional neural networks was extended in the neural abstraction pyramid
[
170
]
by lateral and feedback connections. The resulting recurrent convolutional network allows for the flexible incorporation of contextual information to iteratively resolve local ambiguities. In contrast to previous models, image-like outputs at the highest resolution were generated, e.g., for semantic segmentation, image reconstruction, and object localization tasks.
Caffe
: A library for convolutional neural networks. Created by the Berkeley Vision and Learning Center (BVLC). It supports both CPU and GPU. Developed in
C++
, and has
Python
and
MATLAB
wrappers.
Deeplearning4j
: Deep learning in
Java
and
Scala
on multi-GPU-enabled
Spark
. A general-purpose deep learning library for the JVM production stack running on a C++ scientific computing engine. Allows the creation of custom layers. Integrates with Hadoop and Kafka.
Dlib
: A toolkit for making real world machine learning and data analysis applications in C++.
Microsoft Cognitive Toolkit
: A deep learning toolkit written by Microsoft with several unique features enhancing scalability over multiple nodes. It supports full-fledged interfaces for training in C++ and Python and with additional support for model inference in
C#
and Java.
TensorFlow
:
Apache 2.0
-licensed Theano-like library with support for CPU, GPU, Google's proprietary
tensor processing unit
(TPU),
[
171
]
and mobile devices.
Theano
: The reference deep-learning library for Python with an API largely compatible with the popular
NumPy
library. Allows user to write symbolic mathematical expressions, then automatically generates their derivatives, saving the user from having to code gradients or backpropagation. These symbolic expressions are automatically compiled to
CUDA
code for a fast,
on-the-GPU
implementation.
Torch
: A
scientific computing
framework with wide support for machine learning algorithms, written in
C
and
Lua
.
Attention (machine learning)
Circuit (neural network)
Convolution
Deep learning
Natural-language processing
Neocognitron
Scale-invariant feature transform
Time delay neural network
Vision processing unit
^
When applied to other types of data than image data, such as sound data, "spatial position" may variously correspond to different points in the
time domain
,
frequency domain
, or other
mathematical spaces
.
^
hence the name "convolutional layer"
^
So-called
categorical data
.
^
a
b
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015-05-28).
"Deep learning"
.
Nature
.
521
(7553):
436–
444.
Bibcode
:
2015Natur.521..436L
.
doi
:
10.1038/nature14539
.
ISSN
1476-4687
.
PMID
26017442
.
^
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. (December 1989).
"Backpropagation Applied to Handwritten Zip Code Recognition"
.
Neural Computation
.
1
(4):
541–
551.
doi
:
10.1162/neco.1989.1.4.541
.
ISSN
0899-7667
.
^
a
b
Venkatesan, Ragav; Li, Baoxin (2017-10-23).
Convolutional Neural Networks in Visual Computing: A Concise Guide
. CRC Press.
ISBN
978-1-351-65032-8
.
Archived
from the original on 2023-10-16
. Retrieved
2020-12-13
.
^
a
b
Balas, Valentina E.; Kumar, Raghvendra; Srivastava, Rajshree (2019-11-19).
Recent Trends and Advances in Artificial Intelligence and Internet of Things
. Springer Nature.
ISBN
978-3-030-32644-9
.
Archived
from the original on 2023-10-16
. Retrieved
2020-12-13
.
^
Zhang, Yingjie; Soon, Hong Geok; Ye, Dongsen; Fuh, Jerry Ying Hsi; Zhu, Kunpeng (September 2020). "Powder-Bed Fusion Process Monitoring by Machine Vision With Hybrid Convolutional Neural Networks".
IEEE Transactions on Industrial Informatics
.
16
(9):
5769–
5779.
Bibcode
:
2020ITII...16.5769Z
.
doi
:
10.1109/TII.2019.2956078
.
ISSN
1941-0050
.
S2CID
213010088
.
^
Chervyakov, N.I.; Lyakhov, P.A.; Deryabin, M.A.; Nagornov, N.N.; Valueva, M.V.; Valuev, G.V. (September 2020).
"Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network"
.
Neurocomputing
.
407
:
439–
453.
doi
:
10.1016/j.neucom.2020.04.018
.
S2CID
219470398
.
Archived
from the original on 2023-06-29
. Retrieved
2023-08-12
.
Convolutional neural networks represent deep learning architectures that are currently used in a wide range of applications, including computer vision, speech recognition, malware dedection, time series analysis in finance, and many others.
^
a
b
Aghdam, Hamed Habibi; Heravi, Elnaz Jahani (2017-05-30).
Guide to convolutional neural networks: a practical application to traffic-sign detection and classification
. Cham, Switzerland: Springer.
ISBN
978-3-319-57549-0
.
OCLC
987790957
.
^
a
b
c
Homma, Toshiteru; Les Atlas; Robert Marks II (1987).
"An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification"
(PDF)
.
Advances in Neural Information Processing Systems
.
1
:
31–
40.
Archived
(PDF)
from the original on 2022-03-31
. Retrieved
2022-03-31
.
The notion of convolution or correlation used in the models presented is popular in engineering disciplines and has been applied extensively to designing filters, control systems, etc.
^
Valueva, M.V.; Nagornov, N.N.; Lyakhov, P.A.; Valuev, G.V.; Chervyakov, N.I. (2020). "Application of the residue number system to reduce hardware costs of the convolutional neural network implementation".
Mathematics and Computers in Simulation
.
177
. Elsevier BV:
232–
243.
doi
:
10.1016/j.matcom.2020.04.031
.
ISSN
0378-4754
.
S2CID
218955622
.
Convolutional neural networks are a promising tool for solving the problem of pattern recognition.
^
van den Oord, Aaron; Dieleman, Sander; Schrauwen, Benjamin (2013-01-01). Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q. (eds.).
Deep content-based music recommendation
(PDF)
. Curran Associates, Inc. pp.
2643–
2651.
Archived
(PDF)
from the original on 2022-03-07
. Retrieved
2022-03-31
.
^
Collobert, Ronan; Weston, Jason (2008-01-01). "A unified architecture for natural language processing".
Proceedings of the 25th international conference on Machine learning - ICML '08
. New York, NY, US: ACM. pp.
160–
167.
doi
:
10.1145/1390156.1390177
.
ISBN
978-1-60558-205-4
.
S2CID
2617020
.
^
Avilov, Oleksii; Rimbert, Sebastien; Popov, Anton; Bougrain, Laurent (July 2020).
"Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals"
.
2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
(PDF)
. Vol. 2020. Montreal, QC, Canada: IEEE. pp.
142–
145.
doi
:
10.1109/EMBC44109.2020.9176228
.
ISBN
978-1-7281-1990-8
.
PMID
33017950
.
S2CID
221386616
.
Archived
(PDF)
from the original on 2022-05-19
. Retrieved
2023-07-21
.
^
a
b
Tsantekidis, Avraam; Passalis, Nikolaos; Tefas, Anastasios; Kanniainen, Juho; Gabbouj, Moncef; Iosifidis, Alexandros (July 2017). "Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks".
2017 IEEE 19th Conference on Business Informatics (CBI)
. Thessaloniki, Greece: IEEE. pp.
7–
12.
doi
:
10.1109/CBI.2017.23
.
ISBN
978-1-5386-3035-8
.
S2CID
4950757
.
^
a
b
c
Zhang, Wei (1988).
"Shift-invariant pattern recognition neural network and its optical architecture"
.
Proceedings of Annual Conference of the Japan Society of Applied Physics
.
Archived
from the original on 2020-06-23
. Retrieved
2020-06-22
.
^
a
b
c
Zhang, Wei (1990).
"Parallel distributed processing model with local space-invariant interconnections and its optical architecture"
.
Applied Optics
.
29
(32):
4790–
7.
Bibcode
:
1990ApOpt..29.4790Z
.
doi
:
10.1364/AO.29.004790
.
PMID
20577468
.
Archived
from the original on 2017-02-06
. Retrieved
2016-09-22
.
^
a
b
c
d
e
f
Mouton, Coenraad; Myburgh, Johannes C.; Davel, Marelie H. (2020).
"Stride and Translation Invariance in CNNs"
. In Gerber, Aurona (ed.).
Artificial Intelligence Research
. Communications in Computer and Information Science. Vol. 1342. Cham: Springer International Publishing. pp.
267–
281.
arXiv
:
2103.10097
.
doi
:
10.1007/978-3-030-66151-9_17
.
ISBN
978-3-030-66151-9
.
S2CID
232269854
.
Archived
from the original on 2021-06-27
. Retrieved
2021-03-26
.
^
Kurtzman, Thomas (August 20, 2019).
"Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening"
.
PLOS ONE
.
14
(8) e0220113.
Bibcode
:
2019PLoSO..1420113C
.
doi
:
10.1371/journal.pone.0220113
.
PMC
6701836
.
PMID
31430292
.
^
a
b
c
Fukushima, K. (2007).
"Neocognitron"
.
Scholarpedia
.
2
(1): 1717.
Bibcode
:
2007SchpJ...2.1717F
.
doi
:
10.4249/scholarpedia.1717
.
^
a
b
Hubel, D. H.; Wiesel, T. N. (1968-03-01).
"Receptive fields and functional architecture of monkey striate cortex"
.
The Journal of Physiology
.
195
(1):
215–
243.
doi
:
10.1113/jphysiol.1968.sp008455
.
ISSN
0022-3751
.
PMC
1557912
.
PMID
4966457
.
^
a
b
Fukushima, Kunihiko (1980).
"Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position"
(PDF)
.
Biological Cybernetics
.
36
(4):
193–
202.
doi
:
10.1007/BF00344251
.
PMID
7370364
.
S2CID
206775608
.
Archived
(PDF)
from the original on 3 June 2014
. Retrieved
16 November
2013
.
^
a
b
Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003).
"Subject independent facial expression recognition with robust face detection using a convolutional neural network"
(PDF)
.
Neural Networks
.
16
(5):
555–
559.
Bibcode
:
2003NN.....16..555M
.
doi
:
10.1016/S0893-6080(03)00115-1
.
PMID
12850007
.
Archived
(PDF)
from the original on 13 December 2013
. Retrieved
17 November
2013
.
^
Convolutional Neural Networks Demystified: A Matched Filtering Perspective Based Tutorial
https://arxiv.org/abs/2108.11663v3
^
"Convolutional Neural Networks (LeNet) – DeepLearning 0.1 documentation"
.
DeepLearning 0.1
. LISA Lab. Archived from
the original
on 28 December 2017
. Retrieved
31 August
2013
.
^
Chollet, François (2017-04-04). "Xception: Deep Learning with Depthwise Separable Convolutions".
arXiv
:
1610.02357
[
cs.CV
].
^
a
b
c
Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011).
"Flexible, High Performance Convolutional Neural Networks for Image Classification"
(PDF)
.
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two
.
2
:
1237–
1242.
Archived
(PDF)
from the original on 5 April 2022
. Retrieved
17 November
2013
.
^
Krizhevsky
, Alex.
"ImageNet Classification with Deep Convolutional Neural Networks"
(PDF)
.
Archived
(PDF)
from the original on 25 April 2021
. Retrieved
17 November
2013
.
^
a
b
Yamaguchi, Kouichi; Sakamoto, Kenji; Akabane, Toshio; Fujimoto, Yoshiji (November 1990).
A Neural Network for Speaker-Independent Isolated Word Recognition
. First International Conference on Spoken Language Processing (ICSLP 90). Kobe, Japan. Archived from
the original
on 2021-03-07
. Retrieved
2019-09-04
.
^
a
b
c
d
Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). "Multi-column deep neural networks for image classification".
2012 IEEE Conference on Computer Vision and Pattern Recognition
. New York, NY:
Institute of Electrical and Electronics Engineers
(IEEE). pp.
3642–
3649.
arXiv
:
1202.2745
.
CiteSeerX
10.1.1.300.3283
.
doi
:
10.1109/CVPR.2012.6248110
.
ISBN
978-1-4673-1226-4
.
OCLC
812295155
.
S2CID
2161592
.
^
Yu, Fisher; Koltun, Vladlen (2016-04-30). "Multi-Scale Context Aggregation by Dilated Convolutions".
arXiv
:
1511.07122
[
cs.CV
].
^
Chen, Liang-Chieh; Papandreou, George; Schroff, Florian; Adam, Hartwig (2017-12-05). "Rethinking Atrous Convolution for Semantic Image Segmentation".
arXiv
:
1706.05587
[
cs.CV
].
^
Duta, Ionut Cosmin; Georgescu, Mariana Iuliana; Ionescu, Radu Tudor (2021-08-16). "Contextual Convolutional Neural Networks".
arXiv
:
2108.07387
[
cs.CV
].
^
LeCun, Yann.
"LeNet-5, convolutional neural networks"
.
Archived
from the original on 24 February 2021
. Retrieved
16 November
2013
.
^
Zeiler, Matthew D.; Taylor, Graham W.; Fergus, Rob (November 2011).
"Adaptive deconvolutional networks for mid and high level feature learning"
.
2011 International Conference on Computer Vision
. IEEE. pp.
2018–
2025.
doi
:
10.1109/iccv.2011.6126474
.
ISBN
978-1-4577-1102-2
.
^
Dumoulin, Vincent; Visin, Francesco (2018-01-11),
A guide to convolution arithmetic for deep learning
,
arXiv
:
1603.07285
^
Odena, Augustus; Dumoulin, Vincent; Olah, Chris (2016-10-17).
"Deconvolution and Checkerboard Artifacts"
.
Distill
.
1
(10) e3.
doi
:
10.23915/distill.00003
.
ISSN
2476-0757
.
^
van Dyck, Leonard Elia; Kwitt, Roland; Denzler, Sebastian Jochen; Gruber, Walter Roland (2021).
"Comparing Object Recognition in Humans and Deep Convolutional Neural Networks—An Eye Tracking Study"
.
Frontiers in Neuroscience
.
15
750639.
doi
:
10.3389/fnins.2021.750639
.
ISSN
1662-453X
.
PMC
8526843
.
PMID
34690686
.
^
a
b
Hubel, DH; Wiesel, TN (October 1959).
"Receptive fields of single neurones in the cat's striate cortex"
.
J. Physiol
.
148
(3):
574–
91.
doi
:
10.1113/jphysiol.1959.sp006308
.
PMC
1363130
.
PMID
14403679
.
^
David H. Hubel and Torsten N. Wiesel (2005).
Brain and visual perception: the story of a 25-year collaboration
. Oxford University Press US. p. 106.
ISBN
978-0-19-517618-6
.
Archived
from the original on 2023-10-16
. Retrieved
2019-01-18
.
^
a
b
Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements".
IEEE Transactions on Systems Science and Cybernetics
.
5
(4):
322–
333.
Bibcode
:
1969ITSSC...5..322F
.
doi
:
10.1109/TSSC.1969.300225
.
^
Schmidhuber, Juergen
(2022). "Annotated History of Modern AI and Deep Learning".
arXiv
:
2212.11279
[
cs.NE
].
^
Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions".
arXiv
:
1710.05941
[
cs.NE
].
^
a
b
Waibel, Alex (18 December 1987).
Phoneme Recognition Using Time-Delay Neural Networks
(PDF)
. Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE). Tokyo, Japan.
^
Alexander Waibel
et al.,
Phoneme Recognition Using Time-Delay Neural Networks
Archived
2021-02-25 at the
Wayback Machine
IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 37, No. 3, pp. 328. - 339 March 1989.
^
LeCun, Yann; Bengio, Yoshua (1995).
"Convolutional networks for images, speech, and time series"
. In Arbib, Michael A. (ed.).
The handbook of brain theory and neural networks
(Second ed.). The MIT press. pp.
276–
278.
Archived
from the original on 2020-07-28
. Retrieved
2019-12-03
.
^
John B. Hampshire and Alexander Waibel,
Connectionist Architectures for Multi-Speaker Phoneme Recognition
Archived
2022-03-31 at the
Wayback Machine
, Advances in Neural Information Processing Systems, 1990, Morgan Kaufmann.
^
Ko, Tom; Peddinti, Vijayaditya; Povey, Daniel; Seltzer, Michael L.; Khudanpur, Sanjeev (March 2018).
A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition
(PDF)
. The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017). New Orleans, LA, US.
Archived
(PDF)
from the original on 2018-07-08
. Retrieved
2019-09-04
.
^
Denker, J S, Gardner, W R, Graf, H. P, Henderson, D, Howard, R E, Hubbard, W, Jackel, L D, BaIrd, H S, and Guyon (1989)
Neural network recognizer for hand-written zip code digits
Archived
2018-08-04 at the
Wayback Machine
, AT&T Bell Laboratories
^
a
b
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel,
Backpropagation Applied to Handwritten Zip Code Recognition
Archived
2020-01-10 at the
Wayback Machine
; AT&T Bell Laboratories
^
a
b
Zhang, Wei (1991).
"Image processing of human corneal endothelium based on a learning network"
.
Applied Optics
.
30
(29):
4211–
7.
Bibcode
:
1991ApOpt..30.4211Z
.
doi
:
10.1364/AO.30.004211
.
PMID
20706526
.
Archived
from the original on 2017-02-06
. Retrieved
2016-09-22
.
^
a
b
Zhang, Wei (1994).
"Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant artificial neural network"
.
Medical Physics
.
21
(4):
517–
24.
Bibcode
:
1994MedPh..21..517Z
.
doi
:
10.1118/1.597177
.
PMID
8058017
.
Archived
from the original on 2017-02-06
. Retrieved
2016-09-22
.
^
Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images".
1993 (4th) International Conference on Computer Vision
. IEEE. pp.
121–
128.
doi
:
10.1109/ICCV.1993.378228
.
ISBN
0-8186-3870-2
.
S2CID
8619176
.
^
a
b
Schmidhuber, Jürgen (2015).
"Deep Learning"
.
Scholarpedia
.
10
(11):
1527–
54.
CiteSeerX
10.1.1.76.1541
.
doi
:
10.1162/neco.2006.18.7.1527
.
PMID
16764513
.
S2CID
2309950
.
Archived
from the original on 2016-04-19
. Retrieved
2019-01-20
.
^
a
b
Lecun, Y.; Jackel, L. D.; Bottou, L.; Cortes, C.; Denker, J. S.; Drucker, H.; Guyon, I.; Muller, U. A.; Sackinger, E.; Simard, P.; Vapnik, V. (August 1995).
Learning algorithms for classification: A comparison on handwritten digit recognition
(PDF)
. World Scientific. pp.
261–
276.
doi
:
10.1142/2808
.
ISBN
978-981-02-2324-3
.
Archived
(PDF)
from the original on 2 May 2023.
^
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (November 1998). "Gradient-based learning applied to document recognition".
Proceedings of the IEEE
.
86
(11):
2278–
2324.
Bibcode
:
1998IEEEP..86.2278L
.
doi
:
10.1109/5.726791
.
^
Zhang, Wei (1991).
"Error Back Propagation with Minimum-Entropy Weights: A Technique for Better Generalization of 2-D Shift-Invariant NNs"
.
Proceedings of the International Joint Conference on Neural Networks
.
Archived
from the original on 2017-02-06
. Retrieved
2016-09-22
.
^
Daniel Graupe, Ruey Wen Liu, George S Moschytz."
Applications of neural networks to medical signal processing
Archived
2020-07-28 at the
Wayback Machine
". In Proc. 27th IEEE Decision and Control Conf., pp. 343–347, 1988.
^
Daniel Graupe, Boris Vern, G. Gruener, Aaron Field, and Qiu Huang. "
Decomposition of surface EMG signals into single fiber action potentials by means of neural network
Archived
2019-09-04 at the
Wayback Machine
". Proc. IEEE International Symp. on Circuits and Systems, pp. 1008–1011, 1989.
^
Qiu Huang, Daniel Graupe, Yi Fang Huang, Ruey Wen Liu."
Identification of firing patterns of neuronal signals
[
dead link
]
." In Proc. 28th IEEE Decision and Control Conf., pp. 266–271, 1989.
https://ieeexplore.ieee.org/document/70115
Archived
2022-03-31 at the
Wayback Machine
^
Oh, KS; Jung, K (2004). "GPU implementation of neural networks".
Pattern Recognition
.
37
(6):
1311–
1314.
Bibcode
:
2004PatRe..37.1311O
.
doi
:
10.1016/j.patcog.2004.01.013
.
^
Dave Steinkraus; Patrice Simard; Ian Buck (2005).
"Using GPUs for Machine Learning Algorithms"
.
12th International Conference on Document Analysis and Recognition (ICDAR 2005)
. pp.
1115–
1119.
doi
:
10.1109/ICDAR.2005.251
.
Archived
from the original on 2022-03-31
. Retrieved
2022-03-31
.
^
Kumar Chellapilla; Sid Puri; Patrice Simard (2006).
"High Performance Convolutional Neural Networks for Document Processing"
. In Lorette, Guy (ed.).
Tenth International Workshop on Frontiers in Handwriting Recognition
. Suvisoft.
Archived
from the original on 2020-05-18
. Retrieved
2016-03-14
.
^
Hinton, GE; Osindero, S; Teh, YW (Jul 2006). "A fast learning algorithm for deep belief nets".
Neural Computation
.
18
(7):
1527–
54.
CiteSeerX
10.1.1.76.1541
.
doi
:
10.1162/neco.2006.18.7.1527
.
PMID
16764513
.
S2CID
2309950
.
^
Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan; Larochelle, Hugo (2007).
"Greedy Layer-Wise Training of Deep Networks"
(PDF)
.
Advances in Neural Information Processing Systems
:
153–
160.
Archived
(PDF)
from the original on 2022-06-02
. Retrieved
2022-03-31
.
^
Ranzato, MarcAurelio; Poultney, Christopher; Chopra, Sumit; LeCun, Yann (2007).
"Efficient Learning of Sparse Representations with an Energy-Based Model"
(PDF)
.
Advances in Neural Information Processing Systems
.
Archived
(PDF)
from the original on 2016-03-22
. Retrieved
2014-06-26
.
^
Raina, R; Madhavan, A; Ng, Andrew (14 June 2009).
"Large-scale deep unsupervised learning using graphics processors"
(PDF)
.
Proceedings of the 26th Annual International Conference on Machine Learning
. ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. pp.
873–
880.
doi
:
10.1145/1553374.1553486
.
ISBN
978-1-60558-516-1
.
S2CID
392458
.
Archived
(PDF)
from the original on 8 December 2020
. Retrieved
22 December
2023
.
^
Ciresan, Dan; Meier, Ueli; Gambardella, Luca; Schmidhuber, Jürgen (2010). "Deep big simple neural nets for handwritten digit recognition".
Neural Computation
.
22
(12):
3207–
3220.
arXiv
:
1003.0358
.
Bibcode
:
2010NeCom..22.3207C
.
doi
:
10.1162/NECO_a_00052
.
PMID
20858131
.
S2CID
1918673
.
^
"IJCNN 2011 Competition result table"
.
OFFICIAL IJCNN2011 COMPETITION
. 2010.
Archived
from the original on 2021-01-17
. Retrieved
2019-01-14
.
^
Schmidhuber, Jürgen (17 March 2017).
"History of computer vision contests won by deep CNNs on GPU"
.
Archived
from the original on 19 December 2018
. Retrieved
14 January
2019
.
^
a
b
Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2017-05-24).
"ImageNet classification with deep convolutional neural networks"
(PDF)
.
Communications of the ACM
.
60
(6):
84–
90.
doi
:
10.1145/3065386
.
ISSN
0001-0782
.
S2CID
195908774
.
Archived
(PDF)
from the original on 2017-05-16
. Retrieved
2018-12-04
.
^
Viebke, Andre; Memeti, Suejb; Pllana, Sabri; Abraham, Ajith (2019). "CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi".
The Journal of Supercomputing
.
75
(1):
197–
227.
arXiv
:
1702.07908
.
doi
:
10.1007/s11227-017-1994-x
.
S2CID
14135321
.
^
Viebke, Andre; Pllana, Sabri (2015).
"The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning"
.
2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems
.
IEEE Xplore
. IEEE 2015. pp.
758–
765.
doi
:
10.1109/HPCC-CSS-ICESS.2015.45
.
ISBN
978-1-4799-8937-9
.
S2CID
15411954
.
Archived
from the original on 2023-03-06
. Retrieved
2022-03-31
.
^
Hinton, Geoffrey (2012).
"ImageNet Classification with Deep Convolutional Neural Networks"
.
NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1
.
1
:
1097–
1105.
Archived
from the original on 2019-12-20
. Retrieved
2021-03-26
– via ACM.
^
a
b
c
d
e
Azulay, Aharon; Weiss, Yair (2019).
"Why do deep convolutional networks generalize so poorly to small image transformations?"
.
Journal of Machine Learning Research
.
20
(184):
1–
25.
ISSN
1533-7928
.
Archived
from the original on 2022-03-31
. Retrieved
2022-03-31
.
^
a
b
Géron, Aurélien (2019).
Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
. Sebastopol, CA: O'Reilly Media.
ISBN
978-1-492-03264-9
.
, pp. 448
^
Li, Zewen; Liu, Fan; Yang, Wenjie; Peng, Shouheng; Zhou, Jun (December 2022). "A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects".
IEEE Transactions on Neural Networks and Learning Systems
.
33
(12):
6999–
7019.
arXiv
:
2004.02806
.
Bibcode
:
2022ITNNL..33.6999L
.
doi
:
10.1109/TNNLS.2021.3084827
.
hdl
:
10072/405164
.
PMID
34111009
.
^
"CS231n Convolutional Neural Networks for Visual Recognition"
.
cs231n.github.io
.
Archived
from the original on 2019-10-23
. Retrieved
2017-04-25
.
^
Nirthika, Rajendran; Manivannan, Siyamalan; Ramanan, Amirthalingam; Wang, Ruixuan (2022-04-01).
"Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study"
.
Neural Computing and Applications
.
34
(7):
5321–
5347.
doi
:
10.1007/s00521-022-06953-8
.
ISSN
1433-3058
.
PMC
8804673
.
PMID
35125669
.
^
a
b
Scherer, Dominik; Müller, Andreas C.; Behnke, Sven (2010).
"Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition"
(PDF)
.
Artificial Neural Networks (ICANN), 20th International Conference on
. Thessaloniki, Greece: Springer. pp.
92–
101.
Archived
(PDF)
from the original on 2018-04-03
. Retrieved
2016-12-28
.
^
Graham, Benjamin (2014-12-18). "Fractional Max-Pooling".
arXiv
:
1412.6071
[
cs.CV
].
^
Springenberg, Jost Tobias; Dosovitskiy, Alexey; Brox, Thomas; Riedmiller, Martin (2014-12-21). "Striving for Simplicity: The All Convolutional Net".
arXiv
:
1412.6806
[
cs.LG
].
^
Ma, Zhanyu; Chang, Dongliang; Xie, Jiyang; Ding, Yifeng; Wen, Shaoguo; Li, Xiaoxu; Si, Zhongwei; Guo, Jun (2019). "Fine-Grained Vehicle Classification With Channel Max Pooling Modified CNNs".
IEEE Transactions on Vehicular Technology
.
68
(4). Institute of Electrical and Electronics Engineers (IEEE):
3224–
3233.
Bibcode
:
2019ITVT...68.3224M
.
doi
:
10.1109/tvt.2019.2899972
.
ISSN
0018-9545
.
S2CID
86674074
.
^
Zafar, Afia; Aamir, Muhammad; Mohd Nawi, Nazri; Arshad, Ali; Riaz, Saman; Alruban, Abdulrahman; Dutta, Ashit Kumar; Almotairi, Sultan (2022-08-29).
"A Comparison of Pooling Methods for Convolutional Neural Networks"
.
Applied Sciences
.
12
(17): 8643.
Bibcode
:
2022ApSci..12.8643Z
.
doi
:
10.3390/app12178643
.
ISSN
2076-3417
.
^
Gholamalinezhad, Hossein; Khosravi, Hossein (2020-09-16),
Pooling Methods in Deep Neural Networks, a Review
,
arXiv
:
2009.07485
^
Householder, Alston S. (June 1941).
"A theory of steady-state activity in nerve-fiber networks: I. Definitions and preliminary lemmas"
.
The Bulletin of Mathematical Biophysics
.
3
(2):
63–
69.
doi
:
10.1007/BF02478220
.
ISSN
0007-4985
.
^
Romanuke, Vadim (2017).
"Appropriate number and allocation of ReLUs in convolutional neural networks"
.
Research Bulletin of NTUU "Kyiv Polytechnic Institute"
.
1
(1):
69–
78.
doi
:
10.20535/1810-0546.2017.1.88156
.
^
Xavier Glorot; Antoine Bordes;
Yoshua Bengio
(2011).
Deep sparse rectifier neural networks
(PDF)
. AISTATS. Archived from
the original
(PDF)
on 2016-12-13
. Retrieved
2023-04-10
.
Rectifier and softplus activation functions. The second one is a smooth version of the first.
^
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012).
"Imagenet classification with deep convolutional neural networks"
(PDF)
.
Advances in Neural Information Processing Systems
.
1
:
1097–
1105.
Archived
(PDF)
from the original on 2022-03-31
. Retrieved
2022-03-31
.
^
Ribeiro, Antonio H.; Schön, Thomas B. (2021). "How Convolutional Neural Networks Deal with Aliasing".
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
. pp.
2755–
2759.
arXiv
:
2102.07757
.
doi
:
10.1109/ICASSP39728.2021.9414627
.
ISBN
978-1-7281-7605-5
.
S2CID
231925012
.
^
Myburgh, Johannes C.; Mouton, Coenraad; Davel, Marelie H. (2020).
"Tracking Translation Invariance in CNNS"
. In Gerber, Aurona (ed.).
Artificial Intelligence Research
. Communications in Computer and Information Science. Vol. 1342. Cham: Springer International Publishing. pp.
282–
295.
arXiv
:
2104.05997
.
doi
:
10.1007/978-3-030-66151-9_18
.
ISBN
978-3-030-66151-9
.
S2CID
233219976
.
Archived
from the original on 2022-01-22
. Retrieved
2021-03-26
.
^
Richard, Zhang (2019-04-25).
Making Convolutional Networks Shift-Invariant Again
.
OCLC
1106340711
.
^
Jadeberg, Max; Simonyan, Karen; Zisserman, Andrew; Kavukcuoglu, Koray (2015).
"Spatial Transformer Networks"
(PDF)
.
Advances in Neural Information Processing Systems
.
28
.
Archived
(PDF)
from the original on 2021-07-25
. Retrieved
2021-03-26
– via NIPS.
^
Sabour, Sara; Frosst, Nicholas; Hinton, Geoffrey E. (2017-10-26).
Dynamic Routing Between Capsules
.
OCLC
1106278545
.
^
Matiz, Sergio;
Barner, Kenneth E.
(2019-06-01).
"Inductive conformal predictor for convolutional neural networks: Applications to active learning for image classification"
.
Pattern Recognition
.
90
:
172–
182.
Bibcode
:
2019PatRe..90..172M
.
doi
:
10.1016/j.patcog.2019.01.035
.
ISSN
0031-3203
.
S2CID
127253432
.
Archived
from the original on 2021-09-29
. Retrieved
2021-09-29
.
^
Wieslander, Håkan; Harrison, Philip J.; Skogberg, Gabriel; Jackson, Sonya; Fridén, Markus; Karlsson, Johan; Spjuth, Ola; Wählby, Carolina (February 2021).
"Deep Learning With Conformal Prediction for Hierarchical Analysis of Large-Scale Whole-Slide Tissue Images"
.
IEEE Journal of Biomedical and Health Informatics
.
25
(2):
371–
380.
Bibcode
:
2021IJBHI..25..371W
.
doi
:
10.1109/JBHI.2020.2996300
.
ISSN
2168-2208
.
PMID
32750907
.
S2CID
219885788
.
^
Srivastava, Nitish; C. Geoffrey Hinton; Alex Krizhevsky; Ilya Sutskever; Ruslan Salakhutdinov (2014).
"Dropout: A Simple Way to Prevent Neural Networks from overfitting"
(PDF)
.
Journal of Machine Learning Research
.
15
(1):
1929–
1958.
Archived
(PDF)
from the original on 2016-01-19
. Retrieved
2015-01-03
.
^
"Regularization of Neural Networks using DropConnect | ICML 2013 | JMLR W&CP"
.
jmlr.org
:
1058–
1066. 2013-02-13.
Archived
from the original on 2017-08-12
. Retrieved
2015-12-17
.
^
Zeiler, Matthew D.; Fergus, Rob (2013-01-15). "Stochastic Pooling for Regularization of Deep Convolutional Neural Networks".
arXiv
:
1301.3557
[
cs.LG
].
^
a
b
Platt, John; Steinkraus, Dave; Simard, Patrice Y. (August 2003).
"Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis – Microsoft Research"
.
Microsoft Research
.
Archived
from the original on 2017-11-07
. Retrieved
2015-12-17
.
^
Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R. (2012). "Improving neural networks by preventing co-adaptation of feature detectors".
arXiv
:
1207.0580
[
cs.NE
].
^
"Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
.
jmlr.org
.
Archived
from the original on 2016-03-05
. Retrieved
2015-12-17
.
^
Hinton, Geoffrey (1979). "Some demonstrations of the effects of structural descriptions in mental imagery".
Cognitive Science
.
3
(3):
231–
250.
doi
:
10.1016/s0364-0213(79)80008-7
.
^
Rock, Irvin. "The frame of reference." The legacy of Solomon Asch: Essays in cognition and social psychology (1990): 243–268.
^
J. Hinton, Coursera lectures on Neural Networks, 2012, Url:
https://www.coursera.org/learn/neural-networks
Archived
2016-12-31 at the
Wayback Machine
^
Dave Gershgorn (18 June 2018).
"The inside story of how AI got good enough to dominate Silicon Valley"
.
Quartz
.
Archived
from the original on 12 December 2019
. Retrieved
5 October
2018
.
^
Lawrence, Steve; C. Lee Giles; Ah Chung Tsoi; Andrew D. Back (1997). "Face Recognition: A Convolutional Neural Network Approach".
IEEE Transactions on Neural Networks
.
8
(1):
98–
113.
CiteSeerX
10.1.1.92.5813
.
doi
:
10.1109/72.554195
.
PMID
18255614
.
S2CID
2883848
.
^
Le Callet, Patrick; Christian Viard-Gaudin; Dominique Barba (2006).
"A Convolutional Neural Network Approach for Objective Video Quality Assessment"
(PDF)
.
IEEE Transactions on Neural Networks
.
17
(5):
1316–
1327.
Bibcode
:
2006ITNN...17.1316L
.
doi
:
10.1109/TNN.2006.879766
.
PMID
17001990
.
S2CID
221185563
.
Archived
(PDF)
from the original on 24 February 2021
. Retrieved
17 November
2013
.
^
"ImageNet Large Scale Visual Recognition Competition 2014 (ILSVRC2014)"
.
Archived
from the original on 5 February 2016
. Retrieved
30 January
2016
.
^
Szegedy, Christian; Liu, Wei; Jia, Yangqing; Sermanet, Pierre; Reed, Scott E.; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (2015). "Going deeper with convolutions".
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015
. IEEE Computer Society. pp.
1–
9.
arXiv
:
1409.4842
.
doi
:
10.1109/CVPR.2015.7298594
.
ISBN
978-1-4673-6964-0
.
^
Russakovsky, Olga
; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; Ma, Sean; Huang, Zhiheng;
Karpathy, Andrej
; Khosla, Aditya; Bernstein, Michael; Berg, Alexander C.; Fei-Fei, Li (2014). "Image
Net
Large Scale Visual Recognition Challenge".
arXiv
:
1409.0575
[
cs.CV
].
^
"The Face Detection Algorithm Set To Revolutionize Image Search"
.
Technology Review
. February 16, 2015.
Archived
from the original on 20 September 2020
. Retrieved
27 October
2017
.
^
Baccouche, Moez; Mamalet, Franck; Wolf, Christian; Garcia, Christophe; Baskurt, Atilla (2011-11-16). "Sequential Deep Learning for Human Action Recognition". In Salah, Albert Ali; Lepri, Bruno (eds.).
Human Behavior Unterstanding
. Lecture Notes in Computer Science. Vol. 7065. Springer Berlin Heidelberg. pp.
29–
39.
CiteSeerX
10.1.1.385.4740
.
doi
:
10.1007/978-3-642-25446-8_4
.
ISBN
978-3-642-25445-1
.
^
Ji, Shuiwang; Xu, Wei; Yang, Ming; Yu, Kai (2013-01-01). "3D Convolutional Neural Networks for Human Action Recognition".
IEEE Transactions on Pattern Analysis and Machine Intelligence
.
35
(1):
221–
231.
Bibcode
:
2013ITPAM..35..221J
.
CiteSeerX
10.1.1.169.4046
.
doi
:
10.1109/TPAMI.2012.59
.
ISSN
0162-8828
.
PMID
22392705
.
S2CID
1923924
.
^
Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (2018). "Video-based Sign Language Recognition without Temporal Segmentation".
arXiv
:
1801.10111
[
cs.CV
].
^
Karpathy, Andrej, et al. "
Large-scale video classification with convolutional neural networks
Archived
2019-08-06 at the
Wayback Machine
." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014.
^
Simonyan, Karen; Zisserman, Andrew (2014). "Two-Stream Convolutional Networks for Action Recognition in Videos".
arXiv
:
1406.2199
[
cs.CV
].
(2014).
^
Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22).
"Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation"
(PDF)
.
Sensors
.
18
(5): 1657.
Bibcode
:
2018Senso..18.1657W
.
doi
:
10.3390/s18051657
.
ISSN
1424-8220
.
PMC
5982167
.
PMID
29789447
.
Archived
(PDF)
from the original on 2021-03-01
. Retrieved
2018-09-14
.
^
Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). "Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation".
2018 25th IEEE International Conference on Image Processing (ICIP)
. 25th IEEE International Conference on Image Processing (ICIP). pp.
918–
922.
doi
:
10.1109/icip.2018.8451692
.
ISBN
978-1-4799-7061-2
.
^
Taylor, Graham W.; Fergus, Rob; LeCun, Yann; Bregler, Christoph (2010-01-01).
Convolutional Learning of Spatio-temporal Features
. Proceedings of the 11th European Conference on Computer Vision: Part VI. ECCV'10. Berlin, Heidelberg: Springer-Verlag. pp.
140–
153.
ISBN
978-3-642-15566-6
.
Archived
from the original on 2022-03-31
. Retrieved
2022-03-31
.
^
Le, Q. V.; Zou, W. Y.; Yeung, S. Y.; Ng, A. Y. (2011-01-01). "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis".
CVPR 2011
. CVPR '11. Washington, DC, US: IEEE Computer Society. pp.
3361–
3368.
CiteSeerX
10.1.1.294.5948
.
doi
:
10.1109/CVPR.2011.5995496
.
ISBN
978-1-4577-0394-2
.
S2CID
6006618
.
^
Grefenstette, Edward; Blunsom, Phil; de Freitas, Nando; Hermann, Karl Moritz (2014-04-29). "A Deep Architecture for Semantic Parsing".
arXiv
:
1404.7296
[
cs.CL
].
^
Mesnil, Gregoire; Deng, Li; Gao, Jianfeng; He, Xiaodong; Shen, Yelong (April 2014).
"Learning Semantic Representations Using Convolutional Neural Networks for Web Search – Microsoft Research"
.
Microsoft Research
.
Archived
from the original on 2017-09-15
. Retrieved
2015-12-17
.
^
Kalchbrenner, Nal; Grefenstette, Edward; Blunsom, Phil (2014-04-08). "A Convolutional Neural Network for Modelling Sentences".
arXiv
:
1404.2188
[
cs.CL
].
^
Kim, Yoon (2014-08-25). "Convolutional Neural Networks for Sentence Classification".
arXiv
:
1408.5882
[
cs.CL
].
^
Collobert, Ronan, and Jason Weston. "
A unified architecture for natural language processing: Deep neural networks with multitask learning
Archived
2019-09-04 at the
Wayback Machine
."Proceedings of the 25th international conference on Machine learning. ACM, 2008.
^
Collobert, Ronan; Weston, Jason; Bottou, Leon; Karlen, Michael; Kavukcuoglu, Koray; Kuksa, Pavel (2011-03-02). "Natural Language Processing (almost) from Scratch".
arXiv
:
1103.0398
[
cs.LG
].
^
Yin, W; Kann, K; Yu, M; Schütze, H (2017-03-02). "Comparative study of CNN and RNN for natural language processing".
arXiv
:
1702.01923
[
cs.LG
].
^
Bai, S.; Kolter, J.S.; Koltun, V. (2018). "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling".
arXiv
:
1803.01271
[
cs.LG
].
^
Gruber, N. (2021). "Detecting dynamics of action in text with a recurrent neural network".
Neural Computing and Applications
.
33
(12):
15709–
15718.
doi
:
10.1007/S00521-021-06190-5
.
S2CID
236307579
.
^
Haotian, J.; Zhong, Li; Qianxiao, Li (2021). "Approximation Theory of Convolutional Architectures for Time Series Modelling".
International Conference on Machine Learning
.
arXiv
:
2107.09355
.
^
Bohnslav, James P; Wimalasena, Nivanthika K; Clausing, Kelsey J; Dai, Yu Y; Yarmolinsky, David A; Cruz, Tomás; Kashlan, Adam D; Chiappe, M Eugenia; Orefice, Lauren L; Woolf, Clifford J; Harvey, Christopher D (2021-09-02).
"DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels"
.
eLife
.
10
e63377.
doi
:
10.7554/eLife.63377
.
ISSN
2050-084X
.
PMC
8455138
.
PMID
34473051
.
^
a
b
Gernat, Tim; Jagla, Tobias; Jones, Beryl M.; Middendorf, Martin; Robinson, Gene E. (2023-01-27).
"Automated monitoring of honey bees with barcodes and artificial intelligence reveals two distinct social networks from a single affiliative behavior"
.
Scientific Reports
.
13
(1) 1541.
Bibcode
:
2023NatSR..13.1541G
.
doi
:
10.1038/s41598-022-26825-4
.
ISSN
2045-2322
.
PMC
9883485
.
PMID
36707534
.
^
Norouzzadeh, Mohammad Sadegh; Nguyen, Anh; Kosmala, Margaret; Swanson, Alexandra; Palmer, Meredith S.; Packer, Craig; Clune, Jeff (2018-06-19).
"Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning"
.
Proceedings of the National Academy of Sciences
.
115
(25):
E5716–
E5725.
Bibcode
:
2018PNAS..115E5716N
.
doi
:
10.1073/pnas.1719367115
.
ISSN
0027-8424
.
PMC
6016780
.
PMID
29871948
.
^
Svenning, Asger; Mougeot, Guillaume; Alison, Jamie; Chevalier, Daphne; Molina, Nisa Luise Chavez; Ong, Song-Quan; Bjerge, Kim; Carrillo, Juli; Hoeye, Toke Thomas (2025-04-14). "A General Method for Detection and Segmentation of Terrestrial Arthropods in Images".
bioRxiv
10.1101/2025.04.08.647223
.
^
Torrents, Jordi; Costa, Tiago; De Polavieja, Gonzalo G. (2025-06-02). "New idtracker.ai: rethinking multi-animal tracking as a representation learning problem to increase accuracy and reduce tracking times".
bioRxiv
10.1101/2025.05.30.657023
.
^
Mathis, Alexander; Mamidanna, Pranav; Cury, Kevin M.; Abe, Taiga; Murthy, Venkatesh N.; Mathis, Mackenzie Weygandt; Bethge, Matthias (September 2018).
"DeepLabCut: markerless pose estimation of user-defined body parts with deep learning"
.
Nature Neuroscience
.
21
(9):
1281–
1289.
doi
:
10.1038/s41593-018-0209-y
.
ISSN
1097-6256
.
PMID
30127430
.
^
Graving, Jacob M; Chae, Daniel; Naik, Hemal; Li, Liang; Koger, Benjamin; Costelloe, Blair R; Couzin, Iain D (2019-10-01).
"DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning"
.
eLife
.
8
e47994.
Bibcode
:
2019eLife...847994G
.
doi
:
10.7554/eLife.47994
.
ISSN
2050-084X
.
PMC
6897514
.
PMID
31570119
.
^
Pereira, Talmo D.; Tabris, Nathaniel; Matsliah, Arie; Turner, David M.; Li, Junyu; Ravindranath, Shruthi; Papadoyannis, Eleni S.; Normand, Edna; Deutsch, David S.; Wang, Z. Yan; McKenzie-Smith, Grace C.; Mitelut, Catalin C.; Castro, Marielisa Diez; D’Uva, John; Kislin, Mikhail (May 2022).
"Publisher Correction: SLEAP: A deep learning system for multi-animal pose tracking"
.
Nature Methods
.
19
(5): 628.
doi
:
10.1038/s41592-022-01495-2
.
ISSN
1548-7091
.
PMC
9119847
.
PMID
35468969
.
^
a
b
Arac, Ahmet; Zhao, Pingping; Dobkin, Bruce H.; Carmichael, S. Thomas; Golshani, Peyman (2019-05-07).
"DeepBehavior: A Deep Learning Toolbox for Automated Analysis of Animal and Human Behavior Imaging Data"
.
Frontiers in Systems Neuroscience
.
13
20.
doi
:
10.3389/fnsys.2019.00020
.
ISSN
1662-5137
.
PMC
6513883
.
PMID
31133826
.
^
Ren, Hansheng; Xu, Bixiong; Wang, Yujing; Yi, Chao; Huang, Congrui; Kou, Xiaoyu; Xing, Tony; Yang, Mao; Tong, Jie; Zhang, Qi (2019).
Time-Series Anomaly Detection Service at Microsoft | Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
.
arXiv
:
1906.03821
.
doi
:
10.1145/3292500.3330680
.
S2CID
182952311
.
^
Wallach, Izhar; Dzamba, Michael; Heifets, Abraham (2015-10-09). "AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery".
arXiv
:
1510.02855
[
cs.LG
].
^
Yosinski, Jason; Clune, Jeff; Nguyen, Anh; Fuchs, Thomas; Lipson, Hod (2015-06-22). "Understanding Neural Networks Through Deep Visualization".
arXiv
:
1506.06579
[
cs.CV
].
^
"Toronto startup has a faster way to discover effective medicines"
.
The Globe and Mail
.
Archived
from the original on 2015-10-20
. Retrieved
2015-11-09
.
^
"Startup Harnesses Supercomputers to Seek Cures"
.
KQED Future of You
. 2015-05-27.
Archived
from the original on 2018-12-06
. Retrieved
2015-11-09
.
^
Chellapilla, K; Fogel, DB (1999). "Evolving neural networks to play checkers without relying on expert knowledge".
IEEE Trans Neural Netw
.
10
(6):
1382–
91.
Bibcode
:
1999ITNN...10.1382C
.
doi
:
10.1109/72.809083
.
PMID
18252639
.
^
Chellapilla, K.; Fogel, D.B. (2001). "Evolving an expert checkers playing program without using human expertise".
IEEE Transactions on Evolutionary Computation
.
5
(4):
422–
428.
Bibcode
:
2001ITEC....5..422C
.
doi
:
10.1109/4235.942536
.
^
Fogel, David
(2001).
Blondie24: Playing at the Edge of AI
. San Francisco, CA: Morgan Kaufmann.
ISBN
978-1-55860-783-5
.
^
Clark, Christopher; Storkey, Amos (2014). "Teaching Deep Convolutional Neural Networks to Play Go".
arXiv
:
1412.3409
[
cs.AI
].
^
Maddison, Chris J.; Huang, Aja; Sutskever, Ilya; Silver, David (2014). "Move Evaluation in Go Using Deep Convolutional Neural Networks".
arXiv
:
1412.6564
[
cs.LG
].
^
"AlphaGo – Google DeepMind"
. Archived from
the original
on 30 January 2016
. Retrieved
30 January
2016
.
^
Bai, Shaojie; Kolter, J. Zico; Koltun, Vladlen (2018-04-19). "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling".
arXiv
:
1803.01271
[
cs.LG
].
^
Yu, Fisher; Koltun, Vladlen (2016-04-30). "Multi-Scale Context Aggregation by Dilated Convolutions".
arXiv
:
1511.07122
[
cs.CV
].
^
Borovykh, Anastasia; Bohte, Sander; Oosterlee, Cornelis W. (2018-09-17). "Conditional Time Series Forecasting with Convolutional Neural Networks".
arXiv
:
1703.04691
[
stat.ML
].
^
Mittelman, Roni (2015-08-03). "Time-series modeling with undecimated fully convolutional neural networks".
arXiv
:
1508.00317
[
stat.ML
].
^
Chen, Yitian; Kang, Yanfei; Chen, Yixiong; Wang, Zizhuo (2019-06-11). "Probabilistic Forecasting with Temporal Convolutional Neural Network".
arXiv
:
1906.04397
[
stat.ML
].
^
Zhao, Bendong; Lu, Huanzhang; Chen, Shangfeng; Liu, Junliang; Wu, Dongya (2017-02-01). "Convolutional neural networks for time series classi".
Journal of Systems Engineering and Electronics
.
28
(1):
162–
169.
doi
:
10.21629/JSEE.2017.01.18
.
^
Petneházi, Gábor (2019-08-21). "QCNN: Quantile Convolutional Neural Network".
arXiv
:
1908.07978
[
cs.LG
].
^
Hubert Mara
(2019-06-07),
HeiCuBeDa Hilprecht – Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection
(in German), heiDATA – institutional repository for research data of Heidelberg University,
doi
:
10.11588/data/IE8CCN
^
Hubert Mara and Bartosz Bogacz (2019), "Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets",
Proceedings of the 15th International Conference on Document Analysis and Recognition (ICDAR)
(in German), Sydney, Australien, pp.
148–
153,
doi
:
10.1109/ICDAR.2019.00032
,
ISBN
978-1-7281-3014-9
,
S2CID
211026941
{{
citation
}}
: CS1 maint: work parameter with ISBN (
link
)
^
Bogacz, Bartosz; Mara, Hubert (2020), "Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks",
Proceedings of the 17th International Conference on Frontiers of Handwriting Recognition (ICFHR)
, Dortmund, Germany
^
Presentation of the ICFHR paper on Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks
on
YouTube
^
Durjoy Sen Maitra; Ujjwal Bhattacharya; S.K. Parui,
"CNN based common approach to handwritten character recognition of multiple scripts"
Archived
2023-10-16 at the
Wayback Machine
, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, vol., no., pp.1021–1025, 23–26 Aug. 2015
^
"NIPS 2017"
.
Interpretable ML Symposium
. 2017-10-20. Archived from
the original
on 2019-09-07
. Retrieved
2018-09-12
.
^
Zang, Jinliang; Wang, Le; Liu, Ziyi; Zhang, Qilin; Hua, Gang; Zheng, Nanning (2018). "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition".
Artificial Intelligence Applications and Innovations
. IFIP Advances in Information and Communication Technology. Vol. 519. Cham: Springer International Publishing. pp.
97–
108.
arXiv
:
1803.07179
.
doi
:
10.1007/978-3-319-92007-8_9
.
ISBN
978-3-319-92006-1
.
ISSN
1868-4238
.
S2CID
4058889
.
^
Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-06-21).
"Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network"
(PDF)
.
Sensors
.
18
(7): 1979.
Bibcode
:
2018Senso..18.1979W
.
doi
:
10.3390/s18071979
.
ISSN
1424-8220
.
PMC
6069475
.
PMID
29933555
.
Archived
(PDF)
from the original on 2018-09-13
. Retrieved
2018-09-14
.
^
Ong, Hao Yi; Chavez, Kevin; Hong, Augustus (2015-08-18). "Distributed Deep Q-Learning".
arXiv
:
1508.04186v2
[
cs.LG
].
^
Mnih, Volodymyr; et al. (2015). "Human-level control through deep reinforcement learning".
Nature
.
518
(7540):
529–
533.
Bibcode
:
2015Natur.518..529M
.
doi
:
10.1038/nature14236
.
PMID
25719670
.
S2CID
205242740
.
^
Sun, R.; Sessions, C. (June 2000). "Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors".
IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics
.
30
(3):
403–
418.
Bibcode
:
2000ITSMB..30..403S
.
CiteSeerX
10.1.1.11.226
.
doi
:
10.1109/3477.846230
.
ISSN
1083-4419
.
PMID
18252373
.
^
"Convolutional Deep Belief Networks on CIFAR-10"
(PDF)
.
Archived
(PDF)
from the original on 2017-08-30
. Retrieved
2017-08-18
.
^
Lee, Honglak; Grosse, Roger; Ranganath, Rajesh; Ng, Andrew Y. (1 January 2009). "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations".
Proceedings of the 26th Annual International Conference on Machine Learning
. ACM. pp.
609–
616.
CiteSeerX
10.1.1.149.6800
.
doi
:
10.1145/1553374.1553453
.
ISBN
978-1-60558-516-1
.
S2CID
12008458
.
^
Behnke, Sven (2003).
Hierarchical Neural Networks for Image Interpretation
(PDF)
. Lecture Notes in Computer Science. Vol. 2766. Springer.
doi
:
10.1007/b11963
.
ISBN
978-3-540-40722-5
.
S2CID
1304548
.
Archived
(PDF)
from the original on 2017-08-10
. Retrieved
2016-12-28
.
^
Choi, Rene Y.; Coyner, Aaron S.; Kalpathy-Cramer, Jayashree; Chiang, Michael F.; Campbell, J. Peter (February 2020).
"Introduction to Machine Learning, Neural Networks, and Deep Learning"
.
Wired
.
Archived
from the original on January 13, 2018
. Retrieved
March 6,
2017
.
CS231n: Convolutional Neural Networks for Visual Recognition
—
Andrej Karpathy
's
Stanford
computer science course on CNNs in computer vision
vdumoulin/conv_arithmetic: A technical report on convolution arithmetic in the context of deep learning
. Animations of convolutions. |
| Markdown | [Jump to content](https://en.wikipedia.org/wiki/Convolutional_neural_network#bodyContent)
Main menu
Main menu
move to sidebar
hide
Navigation
- [Main page](https://en.wikipedia.org/wiki/Main_Page "Visit the main page [z]")
- [Contents](https://en.wikipedia.org/wiki/Wikipedia:Contents "Guides to browsing Wikipedia")
- [Current events](https://en.wikipedia.org/wiki/Portal:Current_events "Articles related to current events")
- [Random article](https://en.wikipedia.org/wiki/Special:Random "Visit a randomly selected article [x]")
- [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About "Learn about Wikipedia and how it works")
- [Contact us](https://en.wikipedia.org/wiki/Wikipedia:Contact_us "How to contact Wikipedia")
Contribute
- [Help](https://en.wikipedia.org/wiki/Help:Contents "Guidance on how to use and edit Wikipedia")
- [Learn to edit](https://en.wikipedia.org/wiki/Help:Introduction "Learn how to edit Wikipedia")
- [Community portal](https://en.wikipedia.org/wiki/Wikipedia:Community_portal "The hub for editors")
- [Recent changes](https://en.wikipedia.org/wiki/Special:RecentChanges "A list of recent changes to Wikipedia [r]")
- [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_upload_wizard "Add images or other media for use on Wikipedia")
- [Special pages](https://en.wikipedia.org/wiki/Special:SpecialPages "A list of all special pages [q]")
[  ](https://en.wikipedia.org/wiki/Main_Page)
[Search](https://en.wikipedia.org/wiki/Special:Search "Search Wikipedia [f]")
Appearance
- [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)
- [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=Convolutional+neural+network "You are encouraged to create an account and log in; however, it is not mandatory")
- [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Convolutional+neural+network "You're encouraged to log in; however, it's not mandatory. [o]")
Personal tools
- [Donate](https://donate.wikimedia.org/?wmf_source=donate&wmf_medium=sidebar&wmf_campaign=en.wikipedia.org&uselang=en)
- [Create account](https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&returnto=Convolutional+neural+network "You are encouraged to create an account and log in; however, it is not mandatory")
- [Log in](https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Convolutional+neural+network "You're encouraged to log in; however, it's not mandatory. [o]")
## Contents
move to sidebar
hide
- [(Top)](https://en.wikipedia.org/wiki/Convolutional_neural_network)
- [1 Architecture](https://en.wikipedia.org/wiki/Convolutional_neural_network#Architecture)
Toggle Architecture subsection
- [1\.1 Convolutional layers](https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional_layers)
- [1\.2 Pooling layers](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layers)
- [1\.3 Fully connected layers](https://en.wikipedia.org/wiki/Convolutional_neural_network#Fully_connected_layers)
- [1\.4 Receptive field](https://en.wikipedia.org/wiki/Convolutional_neural_network#Receptive_field)
- [1\.5 Weights](https://en.wikipedia.org/wiki/Convolutional_neural_network#Weights)
- [1\.6 Deconvolutional](https://en.wikipedia.org/wiki/Convolutional_neural_network#Deconvolutional)
- [2 History](https://en.wikipedia.org/wiki/Convolutional_neural_network#History)
Toggle History subsection
- [2\.1 Receptive fields in the visual cortex](https://en.wikipedia.org/wiki/Convolutional_neural_network#Receptive_fields_in_the_visual_cortex)
- [2\.2 Fukushima's analog threshold elements in a vision model](https://en.wikipedia.org/wiki/Convolutional_neural_network#Fukushima's_analog_threshold_elements_in_a_vision_model)
- [2\.3 Neocognitron, origin of the trainable CNN architecture](https://en.wikipedia.org/wiki/Convolutional_neural_network#Neocognitron,_origin_of_the_trainable_CNN_architecture)
- [2\.4 Convolution in time](https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolution_in_time)
- [2\.5 Time delay neural networks](https://en.wikipedia.org/wiki/Convolutional_neural_network#Time_delay_neural_networks)
- [2\.6 Image recognition with CNNs trained by gradient descent](https://en.wikipedia.org/wiki/Convolutional_neural_network#Image_recognition_with_CNNs_trained_by_gradient_descent)
- [2\.6.1 Max pooling](https://en.wikipedia.org/wiki/Convolutional_neural_network#Max_pooling)
- [2\.6.2 LeNet-5](https://en.wikipedia.org/wiki/Convolutional_neural_network#LeNet-5)
- [2\.7 Shift-invariant neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network#Shift-invariant_neural_network)
- [2\.8 GPU implementations](https://en.wikipedia.org/wiki/Convolutional_neural_network#GPU_implementations)
- [3 Distinguishing features](https://en.wikipedia.org/wiki/Convolutional_neural_network#Distinguishing_features)
- [4 Building blocks](https://en.wikipedia.org/wiki/Convolutional_neural_network#Building_blocks)
Toggle Building blocks subsection
- [4\.1 Convolutional layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional_layer)
- [4\.1.1 Local connectivity](https://en.wikipedia.org/wiki/Convolutional_neural_network#Local_connectivity)
- [4\.1.2 Spatial arrangement](https://en.wikipedia.org/wiki/Convolutional_neural_network#Spatial_arrangement)
- [4\.1.3 Parameter sharing](https://en.wikipedia.org/wiki/Convolutional_neural_network#Parameter_sharing)
- [4\.2 Pooling layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer)
- [4\.2.1 Channel max pooling](https://en.wikipedia.org/wiki/Convolutional_neural_network#Channel_max_pooling)
- [4\.3 ReLU layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#ReLU_layer)
- [4\.4 Fully connected layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#Fully_connected_layer)
- [4\.5 Loss layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#Loss_layer)
- [5 Hyperparameters](https://en.wikipedia.org/wiki/Convolutional_neural_network#Hyperparameters)
Toggle Hyperparameters subsection
- [5\.1 Padding](https://en.wikipedia.org/wiki/Convolutional_neural_network#Padding)
- [5\.2 Stride](https://en.wikipedia.org/wiki/Convolutional_neural_network#Stride)
- [5\.3 Number of filters](https://en.wikipedia.org/wiki/Convolutional_neural_network#Number_of_filters)
- [5\.4 Filter (or kernel) size](https://en.wikipedia.org/wiki/Convolutional_neural_network#Filter_\(or_kernel\)_size)
- [5\.5 Pooling type and size](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_type_and_size)
- [5\.6 Dilation](https://en.wikipedia.org/wiki/Convolutional_neural_network#Dilation)
- [6 Translation equivariance and aliasing](https://en.wikipedia.org/wiki/Convolutional_neural_network#Translation_equivariance_and_aliasing)
- [7 Evaluation](https://en.wikipedia.org/wiki/Convolutional_neural_network#Evaluation)
- [8 Regularization methods](https://en.wikipedia.org/wiki/Convolutional_neural_network#Regularization_methods)
Toggle Regularization methods subsection
- [8\.1 Empirical](https://en.wikipedia.org/wiki/Convolutional_neural_network#Empirical)
- [8\.1.1 Dropout](https://en.wikipedia.org/wiki/Convolutional_neural_network#Dropout)
- [8\.1.2 DropConnect](https://en.wikipedia.org/wiki/Convolutional_neural_network#DropConnect)
- [8\.1.3 Stochastic pooling](https://en.wikipedia.org/wiki/Convolutional_neural_network#Stochastic_pooling)
- [8\.1.4 Artificial data](https://en.wikipedia.org/wiki/Convolutional_neural_network#Artificial_data)
- [8\.2 Explicit](https://en.wikipedia.org/wiki/Convolutional_neural_network#Explicit)
- [8\.2.1 Early stopping](https://en.wikipedia.org/wiki/Convolutional_neural_network#Early_stopping)
- [8\.2.2 Number of parameters](https://en.wikipedia.org/wiki/Convolutional_neural_network#Number_of_parameters)
- [8\.2.3 Weight decay](https://en.wikipedia.org/wiki/Convolutional_neural_network#Weight_decay)
- [8\.2.4 Max norm constraints](https://en.wikipedia.org/wiki/Convolutional_neural_network#Max_norm_constraints)
- [9 Hierarchical coordinate frames](https://en.wikipedia.org/wiki/Convolutional_neural_network#Hierarchical_coordinate_frames)
- [10 Applications](https://en.wikipedia.org/wiki/Convolutional_neural_network#Applications)
Toggle Applications subsection
- [10\.1 Image recognition](https://en.wikipedia.org/wiki/Convolutional_neural_network#Image_recognition)
- [10\.2 Video analysis](https://en.wikipedia.org/wiki/Convolutional_neural_network#Video_analysis)
- [10\.3 Natural language processing](https://en.wikipedia.org/wiki/Convolutional_neural_network#Natural_language_processing)
- [10\.4 Animal behavior detection](https://en.wikipedia.org/wiki/Convolutional_neural_network#Animal_behavior_detection)
- [10\.5 Anomaly detection](https://en.wikipedia.org/wiki/Convolutional_neural_network#Anomaly_detection)
- [10\.6 Drug discovery](https://en.wikipedia.org/wiki/Convolutional_neural_network#Drug_discovery)
- [10\.7 Checkers game](https://en.wikipedia.org/wiki/Convolutional_neural_network#Checkers_game)
- [10\.8 Go](https://en.wikipedia.org/wiki/Convolutional_neural_network#Go)
- [10\.9 Time series forecasting](https://en.wikipedia.org/wiki/Convolutional_neural_network#Time_series_forecasting)
- [10\.10 Cultural heritage and 3D-datasets](https://en.wikipedia.org/wiki/Convolutional_neural_network#Cultural_heritage_and_3D-datasets)
- [11 Fine-tuning](https://en.wikipedia.org/wiki/Convolutional_neural_network#Fine-tuning)
- [12 Human interpretable explanations](https://en.wikipedia.org/wiki/Convolutional_neural_network#Human_interpretable_explanations)
- [13 Related architectures](https://en.wikipedia.org/wiki/Convolutional_neural_network#Related_architectures)
Toggle Related architectures subsection
- [13\.1 Deep Q-networks](https://en.wikipedia.org/wiki/Convolutional_neural_network#Deep_Q-networks)
- [13\.2 Deep belief networks](https://en.wikipedia.org/wiki/Convolutional_neural_network#Deep_belief_networks)
- [13\.3 Neural abstraction pyramid](https://en.wikipedia.org/wiki/Convolutional_neural_network#Neural_abstraction_pyramid)
- [14 Notable libraries](https://en.wikipedia.org/wiki/Convolutional_neural_network#Notable_libraries)
- [15 See also](https://en.wikipedia.org/wiki/Convolutional_neural_network#See_also)
- [16 Notes](https://en.wikipedia.org/wiki/Convolutional_neural_network#Notes)
- [17 References](https://en.wikipedia.org/wiki/Convolutional_neural_network#References)
- [18 External links](https://en.wikipedia.org/wiki/Convolutional_neural_network#External_links)
Toggle the table of contents
# Convolutional neural network
31 languages
- [العربية](https://ar.wikipedia.org/wiki/%D8%B4%D8%A8%D9%83%D8%A9_%D8%B9%D8%B5%D8%A8%D9%88%D9%86%D9%8A%D8%A9_%D8%A7%D9%84%D8%AA%D9%81%D8%A7%D9%81%D9%8A%D8%A9 "شبكة عصبونية التفافية – Arabic")
- [تۆرکجه](https://azb.wikipedia.org/wiki/%D8%A7%D8%A6%D9%88%D8%B1%DB%8C%D8%B4%DB%8C%D9%85%D9%84%DB%8C_%D8%B9%D8%B5%D8%A8%DB%8C_%D8%B4%D8%A8%DA%A9%D9%87%E2%80%8C%D8%B3%DB%8C "ائوریشیملی عصبی شبکهسی – South Azerbaijani")
- [Català](https://ca.wikipedia.org/wiki/Xarxa_neuronal_convolutiva "Xarxa neuronal convolutiva – Catalan")
- [Deutsch](https://de.wikipedia.org/wiki/Convolutional_Neural_Network "Convolutional Neural Network – German")
- [Español](https://es.wikipedia.org/wiki/Red_neuronal_convolucional "Red neuronal convolucional – Spanish")
- [Eesti](https://et.wikipedia.org/wiki/Konvolutsiooniline_n%C3%A4rviv%C3%B5rk "Konvolutsiooniline närvivõrk – Estonian")
- [Euskara](https://eu.wikipedia.org/wiki/Neurona-sare_konboluzional "Neurona-sare konboluzional – Basque")
- [فارسی](https://fa.wikipedia.org/wiki/%D8%B4%D8%A8%DA%A9%D9%87_%D8%B9%D8%B5%D8%A8%DB%8C_%D9%BE%DB%8C%DA%86%D8%B4%DB%8C "شبکه عصبی پیچشی – Persian")
- [Français](https://fr.wikipedia.org/wiki/R%C3%A9seau_neuronal_convolutif "Réseau neuronal convolutif – French")
- [Galego](https://gl.wikipedia.org/wiki/Rede_neural_convolucional "Rede neural convolucional – Galician")
- [עברית](https://he.wikipedia.org/wiki/%D7%A8%D7%A9%D7%AA_%D7%A7%D7%95%D7%A0%D7%91%D7%95%D7%9C%D7%95%D7%A6%D7%99%D7%94 "רשת קונבולוציה – Hebrew")
- [हिन्दी](https://hi.wikipedia.org/wiki/%E0%A4%95%E0%A4%A8%E0%A5%8D%E0%A4%B5%E0%A5%8B%E0%A4%B2%E0%A5%81%E0%A4%B6%E0%A4%A8%E0%A4%B2_%E0%A4%A8%E0%A5%8D%E0%A4%AF%E0%A5%82%E0%A4%B0%E0%A4%B2_%E0%A4%A8%E0%A5%87%E0%A4%9F%E0%A4%B5%E0%A4%B0%E0%A5%8D%E0%A4%95 "कन्वोलुशनल न्यूरल नेटवर्क – Hindi")
- [Bahasa Indonesia](https://id.wikipedia.org/wiki/Jaringan_saraf_konvolusional "Jaringan saraf konvolusional – Indonesian")
- [Italiano](https://it.wikipedia.org/wiki/Rete_neurale_convoluzionale "Rete neurale convoluzionale – Italian")
- [日本語](https://ja.wikipedia.org/wiki/%E7%95%B3%E3%81%BF%E8%BE%BC%E3%81%BF%E3%83%8B%E3%83%A5%E3%83%BC%E3%83%A9%E3%83%AB%E3%83%8D%E3%83%83%E3%83%88%E3%83%AF%E3%83%BC%E3%82%AF "畳み込みニューラルネットワーク – Japanese")
- [한국어](https://ko.wikipedia.org/wiki/%ED%95%A9%EC%84%B1%EA%B3%B1_%EC%8B%A0%EA%B2%BD%EB%A7%9D "합성곱 신경망 – Korean")
- [Lietuvių](https://lt.wikipedia.org/wiki/Konvoliucinis_neuroninis_tinklas "Konvoliucinis neuroninis tinklas – Lithuanian")
- [Polski](https://pl.wikipedia.org/wiki/Konwolucyjna_sie%C4%87_neuronowa "Konwolucyjna sieć neuronowa – Polish")
- [Português](https://pt.wikipedia.org/wiki/Rede_neural_convolucional "Rede neural convolucional – Portuguese")
- [Runa Simi](https://qu.wikipedia.org/wiki/K%27uyukuq_ankucha_llika "K'uyukuq ankucha llika – Quechua")
- [Русский](https://ru.wikipedia.org/wiki/%D0%A1%D0%B2%D1%91%D1%80%D1%82%D0%BE%D1%87%D0%BD%D0%B0%D1%8F_%D0%BD%D0%B5%D0%B9%D1%80%D0%BE%D0%BD%D0%BD%D0%B0%D1%8F_%D1%81%D0%B5%D1%82%D1%8C "Свёрточная нейронная сеть – Russian")
- [Simple English](https://simple.wikipedia.org/wiki/Convolutional_neural_network "Convolutional neural network – Simple English")
- [Српски / srpski](https://sr.wikipedia.org/wiki/%D0%9A%D0%BE%D0%BD%D0%B2%D0%BE%D0%BB%D1%83%D1%86%D0%B8%D1%98%D1%81%D0%BA%D0%B5_%D0%BD%D0%B5%D1%83%D1%80%D0%BE%D0%BD%D1%81%D0%BA%D0%B5_%D0%BC%D1%80%D0%B5%D0%B6%D0%B5 "Конволуцијске неуронске мреже – Serbian")
- [தமிழ்](https://ta.wikipedia.org/wiki/%E0%AE%9A%E0%AF%81%E0%AE%B0%E0%AF%81%E0%AE%B3%E0%AF%8D%E0%AE%B5%E0%AF%81_%E0%AE%A8%E0%AE%B0%E0%AE%AE%E0%AF%8D%E0%AE%AA%E0%AE%BF%E0%AE%AF%E0%AE%B2%E0%AF%8D_%E0%AE%B5%E0%AE%B2%E0%AF%88%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%BF%E0%AE%A9%E0%AF%8D%E0%AE%A9%E0%AE%B2%E0%AF%8D "சுருள்வு நரம்பியல் வலைப்பின்னல் – Tamil")
- [ไทย](https://th.wikipedia.org/wiki/%E0%B9%82%E0%B8%84%E0%B8%A3%E0%B8%87%E0%B8%82%E0%B9%88%E0%B8%B2%E0%B8%A2%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B8%AA%E0%B8%B2%E0%B8%97%E0%B9%81%E0%B8%9A%E0%B8%9A%E0%B8%AA%E0%B8%B1%E0%B8%87%E0%B8%A7%E0%B8%B1%E0%B8%95%E0%B8%99%E0%B8%B2%E0%B8%81%E0%B8%B2%E0%B8%A3 "โครงข่ายประสาทแบบสังวัตนาการ – Thai")
- [Türkçe](https://tr.wikipedia.org/wiki/Evri%C5%9Fimli_sinir_a%C4%9Flar%C4%B1 "Evrişimli sinir ağları – Turkish")
- [Українська](https://uk.wikipedia.org/wiki/%D0%97%D0%B3%D0%BE%D1%80%D1%82%D0%BA%D0%BE%D0%B2%D0%B0_%D0%BD%D0%B5%D0%B9%D1%80%D0%BE%D0%BD%D0%BD%D0%B0_%D0%BC%D0%B5%D1%80%D0%B5%D0%B6%D0%B0 "Згорткова нейронна мережа – Ukrainian")
- [Tiếng Việt](https://vi.wikipedia.org/wiki/M%E1%BA%A1ng_th%E1%BA%A7n_kinh_t%C3%ADch_ch%E1%BA%ADp "Mạng thần kinh tích chập – Vietnamese")
- [吴语](https://wuu.wikipedia.org/wiki/%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C "卷积神经网络 – Wu")
- [粵語](https://zh-yue.wikipedia.org/wiki/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E7%B5%A1 "卷積神經網絡 – Cantonese")
- [中文](https://zh.wikipedia.org/wiki/%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C "卷积神经网络 – Chinese")
[Edit links](https://www.wikidata.org/wiki/Special:EntityPage/Q17084460#sitelinks-wikipedia "Edit interlanguage links")
- [Article](https://en.wikipedia.org/wiki/Convolutional_neural_network "View the content page [c]")
- [Talk](https://en.wikipedia.org/wiki/Talk:Convolutional_neural_network "Discuss improvements to the content page [t]")
English
- [Read](https://en.wikipedia.org/wiki/Convolutional_neural_network)
- [Edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit "Edit this page [e]")
- [View history](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=history "Past revisions of this page [h]")
Tools
Tools
move to sidebar
hide
Actions
- [Read](https://en.wikipedia.org/wiki/Convolutional_neural_network)
- [Edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit "Edit this page [e]")
- [View history](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=history)
General
- [What links here](https://en.wikipedia.org/wiki/Special:WhatLinksHere/Convolutional_neural_network "List of all English Wikipedia pages containing links to this page [j]")
- [Related changes](https://en.wikipedia.org/wiki/Special:RecentChangesLinked/Convolutional_neural_network "Recent changes in pages linked from this page [k]")
- [Upload file](https://en.wikipedia.org/wiki/Wikipedia:File_Upload_Wizard "Upload files [u]")
- [Permanent link](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&oldid=1346333455 "Permanent link to this revision of this page")
- [Page information](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=info "More information about this page")
- [Cite this page](https://en.wikipedia.org/w/index.php?title=Special:CiteThisPage&page=Convolutional_neural_network&id=1346333455&wpFormIdentifier=titleform "Information on how to cite this page")
- [Get shortened URL](https://en.wikipedia.org/w/index.php?title=Special:UrlShortener&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FConvolutional_neural_network)
Print/export
- [Download as PDF](https://en.wikipedia.org/w/index.php?title=Special:DownloadAsPdf&page=Convolutional_neural_network&action=show-download-screen "Download this page as a PDF file")
- [Printable version](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&printable=yes "Printable version of this page [p]")
In other projects
- [Wikidata item](https://www.wikidata.org/wiki/Special:EntityPage/Q17084460 "Structured data on this page hosted by Wikidata [g]")
Appearance
move to sidebar
hide
From Wikipedia, the free encyclopedia
Type of feedforward neural network
| |
|---|
| Part of a series on |
| [Machine learning](https://en.wikipedia.org/wiki/Machine_learning "Machine learning") and [data mining](https://en.wikipedia.org/wiki/Data_mining "Data mining") |
| Paradigms [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning "Supervised learning") [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning "Unsupervised learning") [Semi-supervised learning](https://en.wikipedia.org/wiki/Semi-supervised_learning "Semi-supervised learning") [Self-supervised learning](https://en.wikipedia.org/wiki/Self-supervised_learning "Self-supervised learning") [Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning "Reinforcement learning") [Meta-learning](https://en.wikipedia.org/wiki/Meta-learning_\(computer_science\) "Meta-learning (computer science)") [Online learning](https://en.wikipedia.org/wiki/Online_machine_learning "Online machine learning") [Batch learning](https://en.wikipedia.org/wiki/Batch_learning "Batch learning") [Curriculum learning](https://en.wikipedia.org/wiki/Curriculum_learning "Curriculum learning") [Rule-based learning](https://en.wikipedia.org/wiki/Rule-based_machine_learning "Rule-based machine learning") [Neuro-symbolic AI](https://en.wikipedia.org/wiki/Neuro-symbolic_AI "Neuro-symbolic AI") [Neuromorphic engineering](https://en.wikipedia.org/wiki/Neuromorphic_engineering "Neuromorphic engineering") [Quantum machine learning](https://en.wikipedia.org/wiki/Quantum_machine_learning "Quantum machine learning") |
| Problems [Classification](https://en.wikipedia.org/wiki/Statistical_classification "Statistical classification") [Generative modeling](https://en.wikipedia.org/wiki/Generative_model "Generative model") [Regression](https://en.wikipedia.org/wiki/Regression_analysis "Regression analysis") [Clustering](https://en.wikipedia.org/wiki/Cluster_analysis "Cluster analysis") [Dimensionality reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction "Dimensionality reduction") [Density estimation](https://en.wikipedia.org/wiki/Density_estimation "Density estimation") [Anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection "Anomaly detection") [Data cleaning](https://en.wikipedia.org/wiki/Data_cleaning "Data cleaning") [AutoML](https://en.wikipedia.org/wiki/Automated_machine_learning "Automated machine learning") [Association rules](https://en.wikipedia.org/wiki/Association_rule_learning "Association rule learning") [Semantic analysis](https://en.wikipedia.org/wiki/Semantic_analysis_\(machine_learning\) "Semantic analysis (machine learning)") [Structured prediction](https://en.wikipedia.org/wiki/Structured_prediction "Structured prediction") [Feature engineering](https://en.wikipedia.org/wiki/Feature_engineering "Feature engineering") [Feature learning](https://en.wikipedia.org/wiki/Feature_learning "Feature learning") [Learning to rank](https://en.wikipedia.org/wiki/Learning_to_rank "Learning to rank") [Grammar induction](https://en.wikipedia.org/wiki/Grammar_induction "Grammar induction") [Ontology learning](https://en.wikipedia.org/wiki/Ontology_learning "Ontology learning") [Multimodal learning](https://en.wikipedia.org/wiki/Multimodal_learning "Multimodal learning") |
| [Supervised learning](https://en.wikipedia.org/wiki/Supervised_learning "Supervised learning") (**[classification](https://en.wikipedia.org/wiki/Statistical_classification "Statistical classification")** • **[regression](https://en.wikipedia.org/wiki/Regression_analysis "Regression analysis")**) [Apprenticeship learning](https://en.wikipedia.org/wiki/Apprenticeship_learning "Apprenticeship learning") [Decision trees](https://en.wikipedia.org/wiki/Decision_tree_learning "Decision tree learning") [Ensembles](https://en.wikipedia.org/wiki/Ensemble_learning "Ensemble learning") [Bagging](https://en.wikipedia.org/wiki/Bootstrap_aggregating "Bootstrap aggregating") [Boosting](https://en.wikipedia.org/wiki/Boosting_\(machine_learning\) "Boosting (machine learning)") [Random forest](https://en.wikipedia.org/wiki/Random_forest "Random forest") [*k*\-NN](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm "K-nearest neighbors algorithm") [Linear regression](https://en.wikipedia.org/wiki/Linear_regression "Linear regression") [Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier "Naive Bayes classifier") [Artificial neural networks](https://en.wikipedia.org/wiki/Artificial_neural_network "Artificial neural network") [Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression "Logistic regression") [Perceptron](https://en.wikipedia.org/wiki/Perceptron "Perceptron") [Relevance vector machine (RVM)](https://en.wikipedia.org/wiki/Relevance_vector_machine "Relevance vector machine") [Support vector machine (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine "Support vector machine") |
| [Clustering](https://en.wikipedia.org/wiki/Cluster_analysis "Cluster analysis") [BIRCH](https://en.wikipedia.org/wiki/BIRCH "BIRCH") [CURE](https://en.wikipedia.org/wiki/CURE_algorithm "CURE algorithm") [Hierarchical](https://en.wikipedia.org/wiki/Hierarchical_clustering "Hierarchical clustering") [*k*\-means](https://en.wikipedia.org/wiki/K-means_clustering "K-means clustering") [Fuzzy](https://en.wikipedia.org/wiki/Fuzzy_clustering "Fuzzy clustering") [Expectation–maximization (EM)](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm "Expectation–maximization algorithm") [DBSCAN](https://en.wikipedia.org/wiki/DBSCAN "DBSCAN") [OPTICS](https://en.wikipedia.org/wiki/OPTICS_algorithm "OPTICS algorithm") [Mean shift](https://en.wikipedia.org/wiki/Mean_shift "Mean shift") |
| [Dimensionality reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction "Dimensionality reduction") [Factor analysis](https://en.wikipedia.org/wiki/Factor_analysis "Factor analysis") [CCA](https://en.wikipedia.org/wiki/Canonical_correlation "Canonical correlation") [ICA](https://en.wikipedia.org/wiki/Independent_component_analysis "Independent component analysis") [LDA](https://en.wikipedia.org/wiki/Linear_discriminant_analysis "Linear discriminant analysis") [NMF](https://en.wikipedia.org/wiki/Non-negative_matrix_factorization "Non-negative matrix factorization") [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis "Principal component analysis") [PGD](https://en.wikipedia.org/wiki/Proper_generalized_decomposition "Proper generalized decomposition") [t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding "T-distributed stochastic neighbor embedding") [SDL](https://en.wikipedia.org/wiki/Sparse_dictionary_learning "Sparse dictionary learning") |
| [Structured prediction](https://en.wikipedia.org/wiki/Structured_prediction "Structured prediction") [Graphical models](https://en.wikipedia.org/wiki/Graphical_model "Graphical model") [Bayes net](https://en.wikipedia.org/wiki/Bayesian_network "Bayesian network") [Conditional random field](https://en.wikipedia.org/wiki/Conditional_random_field "Conditional random field") [Hidden Markov](https://en.wikipedia.org/wiki/Hidden_Markov_model "Hidden Markov model") |
| [Anomaly detection](https://en.wikipedia.org/wiki/Anomaly_detection "Anomaly detection") [RANSAC](https://en.wikipedia.org/wiki/Random_sample_consensus "Random sample consensus") [*k*\-NN](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm "K-nearest neighbors algorithm") [Local outlier factor](https://en.wikipedia.org/wiki/Local_outlier_factor "Local outlier factor") [Isolation forest](https://en.wikipedia.org/wiki/Isolation_forest "Isolation forest") |
| [Neural networks](https://en.wikipedia.org/wiki/Neural_network_\(machine_learning\) "Neural network (machine learning)") [Autoencoder](https://en.wikipedia.org/wiki/Autoencoder "Autoencoder") [Deep learning](https://en.wikipedia.org/wiki/Deep_learning "Deep learning") [Feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network "Feedforward neural network") [Recurrent neural network](https://en.wikipedia.org/wiki/Recurrent_neural_network "Recurrent neural network") [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory "Long short-term memory") [GRU](https://en.wikipedia.org/wiki/Gated_recurrent_unit "Gated recurrent unit") [ESN](https://en.wikipedia.org/wiki/Echo_state_network "Echo state network") [reservoir computing](https://en.wikipedia.org/wiki/Reservoir_computing "Reservoir computing") [Boltzmann machine](https://en.wikipedia.org/wiki/Boltzmann_machine "Boltzmann machine") [Restricted](https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine "Restricted Boltzmann machine") [GAN](https://en.wikipedia.org/wiki/Generative_adversarial_network "Generative adversarial network") [Diffusion model](https://en.wikipedia.org/wiki/Diffusion_model "Diffusion model") [SOM](https://en.wikipedia.org/wiki/Self-organizing_map "Self-organizing map") [Convolutional neural network]() [U-Net](https://en.wikipedia.org/wiki/U-Net "U-Net") [LeNet](https://en.wikipedia.org/wiki/LeNet "LeNet") [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet") [DeepDream](https://en.wikipedia.org/wiki/DeepDream "DeepDream") [Neural field](https://en.wikipedia.org/wiki/Neural_field "Neural field") [Neural radiance field](https://en.wikipedia.org/wiki/Neural_radiance_field "Neural radiance field") [Physics-informed neural networks](https://en.wikipedia.org/wiki/Physics-informed_neural_networks "Physics-informed neural networks") [Transformer](https://en.wikipedia.org/wiki/Transformer_\(deep_learning_architecture\) "Transformer (deep learning architecture)") [Vision](https://en.wikipedia.org/wiki/Vision_transformer "Vision transformer") [Mamba](https://en.wikipedia.org/wiki/Mamba_\(deep_learning_architecture\) "Mamba (deep learning architecture)") [Spiking neural network](https://en.wikipedia.org/wiki/Spiking_neural_network "Spiking neural network") [Memtransistor](https://en.wikipedia.org/wiki/Memtransistor "Memtransistor") [Electrochemical RAM](https://en.wikipedia.org/wiki/Electrochemical_RAM "Electrochemical RAM") (ECRAM) |
| [Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning "Reinforcement learning") [Q-learning](https://en.wikipedia.org/wiki/Q-learning "Q-learning") [Policy gradient](https://en.wikipedia.org/wiki/Policy_gradient_method "Policy gradient method") [SARSA](https://en.wikipedia.org/wiki/State%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action "State–action–reward–state–action") [Temporal difference (TD)](https://en.wikipedia.org/wiki/Temporal_difference_learning "Temporal difference learning") [Multi-agent](https://en.wikipedia.org/wiki/Multi-agent_reinforcement_learning "Multi-agent reinforcement learning") [Self-play](https://en.wikipedia.org/wiki/Self-play_\(reinforcement_learning_technique\) "Self-play (reinforcement learning technique)") |
| Learning with humans [Active learning](https://en.wikipedia.org/wiki/Active_learning_\(machine_learning\) "Active learning (machine learning)") [Crowdsourcing](https://en.wikipedia.org/wiki/Crowdsourcing "Crowdsourcing") [Human-in-the-loop](https://en.wikipedia.org/wiki/Human-in-the-loop "Human-in-the-loop") [Mechanistic interpretability](https://en.wikipedia.org/wiki/Mechanistic_interpretability "Mechanistic interpretability") [RLHF](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback "Reinforcement learning from human feedback") |
| Model diagnostics [Coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination "Coefficient of determination") [Confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix "Confusion matrix") [Learning curve](https://en.wikipedia.org/wiki/Learning_curve_\(machine_learning\) "Learning curve (machine learning)") [ROC curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic "Receiver operating characteristic") |
| Mathematical foundations [Kernel machines](https://en.wikipedia.org/wiki/Kernel_machines "Kernel machines") [Bias–variance tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff "Bias–variance tradeoff") [Computational learning theory](https://en.wikipedia.org/wiki/Computational_learning_theory "Computational learning theory") [Empirical risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization "Empirical risk minimization") [Occam learning](https://en.wikipedia.org/wiki/Occam_learning "Occam learning") [PAC learning](https://en.wikipedia.org/wiki/Probably_approximately_correct_learning "Probably approximately correct learning") [Statistical learning](https://en.wikipedia.org/wiki/Statistical_learning_theory "Statistical learning theory") [VC theory](https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory "Vapnik–Chervonenkis theory") [Topological deep learning](https://en.wikipedia.org/wiki/Topological_deep_learning "Topological deep learning") |
| Journals and conferences [AAAI](https://en.wikipedia.org/wiki/AAAI_Conference_on_Artificial_Intelligence "AAAI Conference on Artificial Intelligence") [ECML PKDD](https://en.wikipedia.org/wiki/ECML_PKDD "ECML PKDD") [NeurIPS](https://en.wikipedia.org/wiki/Conference_on_Neural_Information_Processing_Systems "Conference on Neural Information Processing Systems") [ICML](https://en.wikipedia.org/wiki/International_Conference_on_Machine_Learning "International Conference on Machine Learning") [ICLR](https://en.wikipedia.org/wiki/International_Conference_on_Learning_Representations "International Conference on Learning Representations") [IJCAI](https://en.wikipedia.org/wiki/International_Joint_Conference_on_Artificial_Intelligence "International Joint Conference on Artificial Intelligence") [ML](https://en.wikipedia.org/wiki/Machine_Learning_\(journal\) "Machine Learning (journal)") [JMLR](https://en.wikipedia.org/wiki/Journal_of_Machine_Learning_Research "Journal of Machine Learning Research") |
| Related articles [Glossary of artificial intelligence](https://en.wikipedia.org/wiki/Glossary_of_artificial_intelligence "Glossary of artificial intelligence") [List of datasets for machine-learning research](https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research "List of datasets for machine-learning research") [List of datasets in computer vision and image processing](https://en.wikipedia.org/wiki/List_of_datasets_in_computer_vision_and_image_processing "List of datasets in computer vision and image processing") [Outline of machine learning](https://en.wikipedia.org/wiki/Outline_of_machine_learning "Outline of machine learning") |
| [v](https://en.wikipedia.org/wiki/Template:Machine_learning "Template:Machine learning") [t](https://en.wikipedia.org/wiki/Template_talk:Machine_learning "Template talk:Machine learning") [e](https://en.wikipedia.org/wiki/Special:EditPage/Template:Machine_learning "Special:EditPage/Template:Machine learning") |
A **convolutional neural network** (**CNN**) is a type of [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network "Feedforward neural network") that learns [features](https://en.wikipedia.org/wiki/Feature_engineering "Feature engineering") via filter (or [kernel](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)")) optimization. This type of [deep learning](https://en.wikipedia.org/wiki/Deep_learning "Deep learning") network has been applied to process and make [predictions](https://en.wikipedia.org/wiki/Prediction#Statistics "Prediction") from many different types of data including text, images and audio.[\[1\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LeCun2015-1) CNNs are the de-facto standard in deep learning-based approaches to [computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision")[\[2\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-2) and [image processing](https://en.wikipedia.org/wiki/Image_processing "Image processing"), and have only recently been replaced—in some cases—by newer architectures such as the [transformer](https://en.wikipedia.org/wiki/Transformer_\(deep_learning\) "Transformer (deep learning)").
[Vanishing gradients](https://en.wikipedia.org/wiki/Vanishing_gradient_problem "Vanishing gradient problem") and exploding gradients, seen during [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation") in earlier neural networks, are prevented by the [regularization](https://en.wikipedia.org/wiki/Regularization_\(mathematics\) "Regularization (mathematics)") that comes from using shared weights over fewer connections.[\[3\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto3-3)[\[4\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto2-4) For example, for *each* neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded *convolution* (or cross-correlation) kernels,[\[5\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-5)[\[6\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-6) only 25 weights for each convolutional layer are required to process 5x5-sized tiles.[\[7\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto1-7)[\[8\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-homma-8) Higher-layer features are extracted from wider context windows, compared to lower-layer features.
Some applications of CNNs include:
- [image and video recognition](https://en.wikipedia.org/wiki/Computer_vision "Computer vision"),[\[9\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Valueva_Nagornov_Lyakhov_Valuev_2020_pp._232%E2%80%93243-9)
- [recommender systems](https://en.wikipedia.org/wiki/Recommender_system "Recommender system"),[\[10\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-10)
- [image classification](https://en.wikipedia.org/wiki/Image_classification "Image classification"),
- [image segmentation](https://en.wikipedia.org/wiki/Image_segmentation "Image segmentation"),
- [medical image analysis](https://en.wikipedia.org/wiki/Medical_image_computing "Medical image computing"),
- [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing "Natural language processing"),[\[11\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-11)
- [brain–computer interfaces](https://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface "Brain–computer interface"),[\[12\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-12) and
- financial [time series](https://en.wikipedia.org/wiki/Time_series "Time series").[\[13\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Tsantekidis_7%E2%80%9312-13)
CNNs are also known as **shift invariant** or **space invariant artificial neural networks**, based on the shared-weight architecture of the [convolution](https://en.wikipedia.org/wiki/Convolution "Convolution") kernels or filters that slide along input features and provide translation-[equivariant](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map") responses known as feature maps.[\[14\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:0-14)[\[15\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:1-15) Counter-intuitively, most convolutional neural networks are not [invariant to translation](https://en.wikipedia.org/wiki/Translation_invariant "Translation invariant"), due to the downsampling operation they apply to the input.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)
[Feedforward neural networks](https://en.wikipedia.org/wiki/Feedforward_neural_network "Feedforward neural network") are usually fully connected networks, that is, each neuron in one [layer](https://en.wikipedia.org/wiki/Layer_\(deep_learning\) "Layer (deep learning)") is connected to all neurons in the next [layer](https://en.wikipedia.org/wiki/Layer_\(deep_learning\) "Layer (deep learning)"). The "full connectivity" of these networks makes them prone to [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting") data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the generalized principles that characterize a given dataset rather than the biases of a poorly-populated set.[\[17\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-17)
Convolutional networks were [inspired](https://en.wikipedia.org/wiki/Mathematical_biology "Mathematical biology") by [biological](https://en.wikipedia.org/wiki/Biological "Biological") processes[\[18\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-fukuneoscholar-18)[\[19\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-hubelwiesel1968-19)[\[20\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-intro-20)[\[21\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-robust_face_detection-21) in that the connectivity pattern between [neurons](https://en.wikipedia.org/wiki/Artificial_neuron "Artificial neuron") resembles the organization of the animal [visual cortex](https://en.wikipedia.org/wiki/Visual_cortex "Visual cortex"). Individual [cortical neurons](https://en.wikipedia.org/wiki/Cortical_neuron "Cortical neuron") respond to stimuli only in a restricted region of the [visual field](https://en.wikipedia.org/wiki/Visual_field "Visual field") known as the [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field"). The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other [image classification algorithms](https://en.wikipedia.org/wiki/Image_classification "Image classification"). This means that the network learns to optimize the filters (or kernels) through automated learning, whereas in traditional algorithms these filters are [hand-engineered](https://en.wikipedia.org/wiki/Feature_engineering "Feature engineering"). This simplifies and automates the process, enhancing efficiency and scalability overcoming human-intervention bottlenecks.
## Architecture
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=1 "Edit section: Architecture")\]
Main article: [Layer (deep learning)](https://en.wikipedia.org/wiki/Layer_\(deep_learning\) "Layer (deep learning)")
[](https://en.wikipedia.org/wiki/File:Comparison_image_neural_networks.svg)
Comparison of the [LeNet](https://en.wikipedia.org/wiki/LeNet "LeNet") (1995) and [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet") (2012) convolution, pooling and dense layers
A convolutional neural network consists of an input layer, [hidden layers](https://en.wikipedia.org/wiki/Artificial_neural_network#Organization "Artificial neural network") and an output layer. In a convolutional neural network, the hidden layers include one or more layers that perform convolutions. Typically this includes a layer that performs a [dot product](https://en.wikipedia.org/wiki/Dot_product "Dot product") of the convolution kernel with the layer's input matrix. This product is usually the [Frobenius inner product](https://en.wikipedia.org/wiki/Frobenius_inner_product "Frobenius inner product"), and its activation function is commonly [ReLU](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)"). As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as [pooling layers](https://en.wikipedia.org/wiki/Pooling_layer "Pooling layer"), fully connected layers, and normalization layers. Here it should be noted how close a convolutional neural network is to a [matched filter](https://en.wikipedia.org/wiki/Matched_filter "Matched filter").[\[22\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-22)
### Convolutional layers
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=2 "Edit section: Convolutional layers")\]
In a CNN, the input is a [tensor](https://en.wikipedia.org/wiki/Tensor_\(machine_learning\) "Tensor (machine learning)") with shape:
(number of inputs) × (input height) × (input width) × (input [channels](https://en.wikipedia.org/wiki/Channel_\(digital_image\) "Channel (digital image)"))
After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map, with shape:
(number of inputs) × (feature map height) × (feature map width) × (feature map [channels](https://en.wikipedia.org/wiki/Channel_\(digital_image\) "Channel (digital image)")).
Convolutional layers convolve the input and pass its result to the next layer. This is similar to the response of a neuron in the visual cortex to a specific stimulus.[\[23\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-deeplearning-23) Each convolutional neuron processes data only for its [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field").
[](https://en.wikipedia.org/wiki/File:1D_Convolutional_Neural_Network_feed_forward_example.png)
1D convolutional neural network feed forward example
Although [fully connected feedforward neural networks](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") can be used to learn features and classify data, this architecture is generally impractical for larger inputs (e.g., high-resolution images), which would require massive numbers of neurons because each pixel is a relevant input feature. A fully connected layer for an image of size 100 × 100 has 10,000 weights for *each* neuron in the second layer. Convolution reduces the number of free parameters, allowing the network to be deeper.[\[7\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto1-7) For example, using a 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using shared weights means there are many fewer parameters, which helps avoid the vanishing gradients and exploding gradients problems seen during [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation") in earlier neural networks.[\[3\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto3-3)[\[4\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto2-4)
To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers,[\[24\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-24) which are based on a depthwise convolution followed by a pointwise convolution. The *depthwise convolution* is a spatial convolution applied independently over each channel of the input tensor, while the *pointwise convolution* is a standard convolution restricted to the use of 1 × 1 {\\displaystyle 1\\times 1}  kernels.
### Pooling layers
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=3 "Edit section: Pooling layers")\]
Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map.[\[25\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-flexible-25)[\[26\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-26) There are two common types of pooling in popular use: max and average. *Max pooling* uses the maximum value of each local cluster of neurons in the feature map,[\[27\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Yamaguchi111990-27)[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28) while *average pooling* takes the average value.
### Fully connected layers
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=4 "Edit section: Fully connected layers")\]
Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional [multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") neural network (MLP). Each neuron in the fully connected layer receives input from all the neurons in the previous layer. These inputs are weighted and summed with the corresponding biases, and then passed through an activation function to perform a nonlinear transformation, generating the output. The flattened matrix goes through a fully connected layer to classify the images.
### Receptive field
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=5 "Edit section: Receptive field")\]
In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's *receptive field*. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the *entire previous layer*. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers.
To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution[\[29\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-29)[\[30\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-30) expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios,[\[31\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-31) thus having a variable receptive field size.
### Weights
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=6 "Edit section: Weights")\]
Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights.
The vectors of weights and biases are called *filters* and represent particular [features](https://en.wikipedia.org/wiki/Feature_\(machine_learning\) "Feature (machine learning)") of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the [memory footprint](https://en.wikipedia.org/wiki/Memory_footprint "Memory footprint") because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector weighting.[\[32\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LeCun-32)
### Deconvolutional
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=7 "Edit section: Deconvolutional")\]
A deconvolutional neural network is essentially the reverse of a CNN. It consists of deconvolutional layers and unpooling layers.[\[33\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-33)
A deconvolutional layer is the transpose of a convolutional layer. Specifically, a convolutional layer can be written as a multiplication with a matrix, and a deconvolutional layer is multiplication with the transpose of that matrix.[\[34\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-34)
An unpooling layer expands the layer. The max-unpooling layer is the simplest, as it simply copies each entry multiple times. For example, a 2-by-2 max-unpooling layer is \[ x \] ↦ \[ x x x x \] {\\displaystyle \[x\]\\mapsto {\\begin{bmatrix}x\&x\\\\x\&x\\end{bmatrix}}} ![{\\displaystyle \[x\]\\mapsto {\\begin{bmatrix}x\&x\\\\x\&x\\end{bmatrix}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba907f707b81817e69c003905058b928e9097b86).
Deconvolution layers are used in image generators. By default, it creates periodic checkerboard artifact, which can be fixed by upscale-then-convolve.[\[35\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-35)
## History
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=8 "Edit section: History")\]
CNN are often compared to the way the brain achieves vision processing in living [organisms](https://en.wikipedia.org/wiki/Organisms "Organisms").[\[36\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-36)
### Receptive fields in the visual cortex
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=9 "Edit section: Receptive fields in the visual cortex")\]
Main article: [Surround suppression](https://en.wikipedia.org/wiki/Surround_suppression "Surround suppression")
Work by [Hubel](https://en.wikipedia.org/wiki/David_H._Hubel "David H. Hubel") and [Wiesel](https://en.wikipedia.org/wiki/Torsten_Wiesel "Torsten Wiesel") in the 1950s and 1960s showed that cat [visual cortices](https://en.wikipedia.org/wiki/Visual_cortex "Visual cortex") contain neurons that individually respond to small regions of the [visual field](https://en.wikipedia.org/wiki/Visual_field "Visual field"). Provided the eyes are not moving, the region of visual space within which visual stimuli affect the firing of a single neuron is known as its [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field").[\[37\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:4-37) Neighboring cells have similar and overlapping receptive fields. Receptive field size and location varies systematically across the cortex to form a complete map of visual space.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] The cortex in each hemisphere represents the contralateral [visual field](https://en.wikipedia.org/wiki/Visual_field "Visual field").\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
Their 1968 paper identified two basic visual cell types in the brain:[\[19\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-hubelwiesel1968-19)
- [simple cells](https://en.wikipedia.org/wiki/Simple_cell "Simple cell"), whose output is maximized by straight edges having particular orientations within their receptive field
- [complex cells](https://en.wikipedia.org/wiki/Complex_cell "Complex cell"), which have larger [receptive fields](https://en.wikipedia.org/wiki/Receptive_field "Receptive field"), whose output is insensitive to the exact position of the edges in the field.
Hubel and Wiesel also proposed a cascading model of these two types of cells for use in pattern recognition tasks.[\[38\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-38)[\[37\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:4-37)
### Fukushima's analog threshold elements in a vision model
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=10 "Edit section: Fukushima's analog threshold elements in a vision model")\]
In 1969, [Kunihiko Fukushima](https://en.wikipedia.org/wiki/Kunihiko_Fukushima "Kunihiko Fukushima") introduced a multilayer visual feature detection network, inspired by the above-mentioned work of Hubel and Wiesel, in which "All the elements in one layer have the same set of interconnecting coefficients; the arrangement of the elements and their interconnections are all homogeneous over a given layer." This is the essential core of a convolutional network, but the weights were not trained. In the same paper, Fukushima also introduced the [ReLU](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)") (rectified linear unit) [activation function](https://en.wikipedia.org/wiki/Activation_function "Activation function").[\[39\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Fukushima1969-39)[\[40\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-DLhistory-40)
### Neocognitron, origin of the trainable CNN architecture
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=11 "Edit section: Neocognitron, origin of the trainable CNN architecture")\]
The "[neocognitron](https://en.wikipedia.org/wiki/Neocognitron "Neocognitron")"[\[18\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-fukuneoscholar-18) was introduced by Fukushima in 1980.[\[20\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-intro-20)[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28)[\[1\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LeCun2015-1) The neocognitron introduced the two basic types of layers:
- "S-layer": a shared-weights receptive-field layer, later known as a convolutional layer, which contains units whose receptive fields cover a patch of the previous layer. A shared-weights receptive-field group (a "plane" in neocognitron terminology) is often called a filter, and a layer typically has several such filters.
- "C-layer": a downsampling layer that contain units whose receptive fields cover patches of previous convolutional layers. Such a unit typically computes a weighted average of the activations of the units in its patch, and applies inhibition (divisive normalization) pooled from a somewhat larger patch and across different filters in a layer, and applies a saturating activation function. The patch weights are nonnegative and are not trainable in the original neocognitron. The downsampling and competitive inhibition help to classify features and objects in visual scenes even when the objects are shifted.
Several [supervised](https://en.wikipedia.org/wiki/Supervised_learning "Supervised learning") and [unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning "Unsupervised learning") algorithms have been proposed over the decades to train the weights of a neocognitron.[\[18\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-fukuneoscholar-18) Today, however, the CNN architecture is usually trained through [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation").
Fukushima's ReLU activation function was not used in his neocognitron since all the weights were nonnegative; lateral inhibition was used instead. The rectifier has become a very popular activation function for CNNs and [deep neural networks](https://en.wikipedia.org/wiki/Deep_learning "Deep learning") in general.[\[41\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-41)
### Convolution in time
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=12 "Edit section: Convolution in time")\]
The term "convolution" first appears in neural networks in a paper by Toshiteru Homma, Les Atlas, and Robert Marks II at the first [Conference on Neural Information Processing Systems](https://en.wikipedia.org/wiki/Conference_on_Neural_Information_Processing_Systems "Conference on Neural Information Processing Systems") in 1987. Their paper replaced multiplication with convolution in time, inherently providing shift invariance, motivated by and connecting more directly to the [signal-processing concept of a filter](https://en.wikipedia.org/wiki/Linear_shift-invariant_filter "Linear shift-invariant filter"), and demonstrated it on a speech recognition task.[\[8\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-homma-8) They also pointed out that as a data-trainable system, convolution is essentially equivalent to correlation since reversal of the weights does not affect the final learned function ("For convenience, we denote \* as correlation instead of convolution. Note that convolving a(t) with b(t) is equivalent to correlating a(-t) with b(t).").[\[8\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-homma-8) Modern CNN implementations typically do correlation and call it convolution, for convenience, as they did here.
### Time delay neural networks
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=13 "Edit section: Time delay neural networks")\]
The [time delay neural network](https://en.wikipedia.org/wiki/Time_delay_neural_network "Time delay neural network") (TDNN) was introduced in 1987 by [Alex Waibel](https://en.wikipedia.org/wiki/Alex_Waibel "Alex Waibel") et al. for phoneme recognition and was an early convolutional network exhibiting shift-invariance.[\[42\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Waibel1987-42) A TDNN is a 1-D convolutional neural net where the convolution is performed along the time axis of the data. It is the first CNN utilizing weight sharing in combination with a training by gradient descent, using [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation").[\[43\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-speechsignal-43) Thus, while also using a pyramidal structure as in the neocognitron, it performed a global optimization of the weights instead of a local one.[\[42\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Waibel1987-42)
TDNNs are convolutional networks that share weights along the temporal dimension.[\[44\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-44) They allow speech signals to be processed time-invariantly. In 1990 Hampshire and Waibel introduced a variant that performs a two-dimensional convolution.[\[45\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Hampshire1990-45) Since these TDNNs operated on spectrograms, the resulting phoneme recognition system was invariant to both time and frequency shifts, as with images processed by a neocognitron.
TDNNs improved the performance of far-distance speech recognition.[\[46\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Ko2017-46)
### Image recognition with CNNs trained by gradient descent
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=14 "Edit section: Image recognition with CNNs trained by gradient descent")\]
Denker et al. (1989) designed a 2-D CNN system to recognize hand-written [ZIP Code](https://en.wikipedia.org/wiki/ZIP_Code "ZIP Code") numbers.[\[47\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-47) However, the lack of an efficient training method to determine the [kernel](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)") coefficients of the involved convolutions meant that all the coefficients had to be laboriously hand-designed.[\[48\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:2-48)
Following the advances in the training of 1-D CNNs by Waibel et al. (1987), [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun "Yann LeCun") et al. (1989)[\[48\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:2-48) used back-propagation to learn the convolution kernel coefficients directly from images of hand-written numbers. Learning was thus fully automatic, performed better than manual coefficient design, and was suited to a broader range of image recognition problems and image types. Wei Zhang et al. (1988)[\[14\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:0-14)[\[15\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:1-15) used back-propagation to train the convolution kernels of a CNN for alphabets recognition. The model was called shift-invariant pattern recognition neural network before the name CNN was coined later in the early 1990s. Wei Zhang et al. also applied the same CNN without the last fully connected layer for medical image object segmentation (1991)[\[49\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1991-49) and breast cancer detection in mammograms (1994).[\[50\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1994-50)
This approach became a foundation of modern [computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision").
#### Max pooling
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=15 "Edit section: Max pooling")\]
In 1990 Yamaguchi et al. introduced the concept of max pooling, a fixed filtering operation that calculates and propagates the maximum value of a given region. They did so by combining TDNNs with max pooling to realize a speaker-independent isolated word recognition system.[\[27\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Yamaguchi111990-27) In their system they used several TDNNs per word, one for each [syllable](https://en.wikipedia.org/wiki/Syllable "Syllable"). The results of each TDNN over the input signal were combined using max pooling and the outputs of the pooling layers were then passed on to networks performing the actual word classification.
In a variant of the neocognitron called the *cresceptron*, instead of using Fukushima's spatial averaging with inhibition and saturation, J. Weng et al. in 1993 used max pooling, where a downsampling unit computes the maximum of the activations of the units in its patch,[\[51\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-weng1993-51) introducing this method into the vision field.
Max pooling is often used in modern CNNs.[\[52\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-schdeepscholar-52)
#### LeNet-5
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=16 "Edit section: LeNet-5")\]
Main article: [LeNet](https://en.wikipedia.org/wiki/LeNet "LeNet")
LeNet-5, a pioneering 7-level convolutional network by [LeCun](https://en.wikipedia.org/wiki/Yann_LeCun "Yann LeCun") et al. in 1995,[\[53\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-lecun95-53) classifies hand-written numbers on [checks](https://en.wikipedia.org/wiki/Cheque "Cheque") digitized in 32×32 pixel images. The ability to process higher-resolution images requires larger and more layers of convolutional neural networks, so this technique is constrained by the availability of computing resources.
It was superior than other commercial courtesy amount reading systems (as of 1995). The system was integrated in [NCR](https://en.wikipedia.org/wiki/NCR_Voyix "NCR Voyix")'s check reading systems, and fielded in several American banks since June 1996, reading millions of checks per day.[\[54\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-54)
### Shift-invariant neural network
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=17 "Edit section: Shift-invariant neural network")\]
A shift-invariant neural network was proposed by Wei Zhang et al. for image character recognition in 1988.[\[14\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:0-14)[\[15\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:1-15) It is a modified Neocognitron by keeping only the convolutional interconnections between the image feature layers and the last fully connected layer. The model was trained with back-propagation. The training algorithm was further improved in 1991[\[55\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-55) to improve its generalization ability. The model architecture was modified by removing the last fully connected layer and applied for medical image segmentation (1991)[\[49\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1991-49) and automatic detection of breast cancer in [mammograms (1994)](https://en.wikipedia.org/wiki/Mammography "Mammography").[\[50\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1994-50)
A different convolution-based design was proposed in 1988[\[56\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-56) for application to decomposition of one-dimensional [electromyography](https://en.wikipedia.org/wiki/Electromyography "Electromyography") convolved signals via de-convolution. This design was modified in 1989 to other de-convolution-based designs.[\[57\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-57)[\[58\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-58)
### GPU implementations
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=18 "Edit section: GPU implementations")\]
Although CNNs were invented in the 1980s, their breakthrough in the 2000s required fast implementations on [graphics processing units](https://en.wikipedia.org/wiki/Graphics_processing_unit "Graphics processing unit") (GPUs).
In 2004, it was shown by K. S. Oh and K. Jung that standard neural networks can be greatly accelerated on GPUs. Their implementation was 20 times faster than an equivalent implementation on [CPU](https://en.wikipedia.org/wiki/CPU "CPU").[\[59\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-59) In 2005, another paper also emphasised the value of [GPGPU](https://en.wikipedia.org/wiki/GPGPU "GPGPU") for [machine learning](https://en.wikipedia.org/wiki/Machine_learning "Machine learning").[\[60\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-60)
The first GPU-implementation of a CNN was described in 2006 by K. Chellapilla et al. Their implementation was 4 times faster than an equivalent implementation on CPU.[\[61\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-61) In the same period, GPUs were also used for unsupervised training of [deep belief networks](https://en.wikipedia.org/wiki/Deep_belief_network "Deep belief network").[\[62\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-62)[\[63\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-63)[\[64\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-64)[\[65\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LSD_1-65)
In 2010, Dan Ciresan et al. at [IDSIA](https://en.wikipedia.org/wiki/IDSIA "IDSIA") trained deep feedforward networks on GPUs.[\[66\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-66) In 2011, they extended this to CNNs, accelerating by 60 compared to training CPU.[\[25\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-flexible-25) In 2011, the network won an image recognition contest where they achieved superhuman performance for the first time.[\[67\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-67) Then they won more competitions and achieved state of the art on several benchmarks.[\[68\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-68)[\[52\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-schdeepscholar-52)[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28)
Subsequently, [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet"), a similar GPU-based CNN by Alex Krizhevsky et al. won the [ImageNet Large Scale Visual Recognition Challenge](https://en.wikipedia.org/wiki/ImageNet_Large_Scale_Visual_Recognition_Challenge "ImageNet Large Scale Visual Recognition Challenge") 2012.[\[69\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:02-69) It was an early catalytic event for the [AI boom](https://en.wikipedia.org/wiki/AI_boom "AI boom").
Compared to the training of CNNs using [GPUs](https://en.wikipedia.org/wiki/GPU "GPU"), not much attention was given to CPU. (Viebke et al 2019) parallelizes CNN by thread- and [SIMD](https://en.wikipedia.org/wiki/SIMD "SIMD")\-level parallelism that is available on the [Intel Xeon Phi](https://en.wikipedia.org/wiki/Xeon_Phi "Xeon Phi").[\[70\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-70)[\[71\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-71)
## Distinguishing features
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=19 "Edit section: Distinguishing features")\]
In the past, traditional [multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") (MLP) models were used for image recognition.\[*[example needed](https://en.wikipedia.org/wiki/Wikipedia:AUDIENCE "Wikipedia:AUDIENCE")*\] However, the full connectivity between nodes caused the [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality "Curse of dimensionality"), and was computationally intractable with higher-resolution images. A 1000×1000-pixel image with [RGB color](https://en.wikipedia.org/wiki/RGB_color_model "RGB color model") channels has 3 million weights per fully-connected neuron, which is too high to feasibly process efficiently at scale.
[](https://en.wikipedia.org/wiki/File:Conv_layers.png)
CNN layers arranged in 3 dimensions
For example, in [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10 "CIFAR-10"), images are only of size 32×32×3 (32 wide, 32 high, 3 color channels), so a single fully connected neuron in the first hidden layer of a regular neural network would have 32\*32\*3 = 3,072 weights. A 200×200 image, however, would lead to neurons that have 200\*200\*3 = 120,000 weights.
Also, such network architecture does not take into account the spatial structure of data, treating input pixels which are far apart in the same way as pixels that are close together. This ignores [locality of reference](https://en.wikipedia.org/wiki/Locality_of_reference "Locality of reference") in data with a grid-topology (such as images), both computationally and semantically. Thus, full connectivity of neurons is wasteful for purposes such as image recognition that are dominated by [spatially local](https://en.wikipedia.org/wiki/Spatial_locality "Spatial locality") input patterns.
Convolutional neural networks are variants of multilayer perceptrons, designed to emulate the behavior of a [visual cortex](https://en.wikipedia.org/wiki/Visual_cortex "Visual cortex"). These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images. As opposed to MLPs, CNNs have the following distinguishing features:
- 3D volumes of neurons. The layers of a CNN have neurons arranged in [3 dimensions](https://en.wikipedia.org/wiki/Three-dimensional_space "Three-dimensional space"): width, height and depth.[\[72\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-72) Each neuron inside a convolutional layer is connected to only a small region of the layer before it, called a receptive field. Distinct types of layers, both locally and completely connected, are stacked to form a CNN architecture.
- Local connectivity: following the concept of receptive fields, CNNs exploit spatial locality by enforcing a local connectivity pattern between neurons of adjacent layers. The architecture thus ensures that the learned "filters" produce the strongest response to a spatially local input pattern. Stacking many such layers leads to nonlinear filters that become increasingly global (i.e. responsive to a larger region of pixel space) so that the network first creates representations of small parts of the input, then from them assembles representations of larger areas.
- Shared weights: In CNNs, each filter is replicated across the entire visual field. These replicated units share the same parameterization (weight vector and bias) and form a feature map. This means that all the neurons in a given convolutional layer respond to the same feature within their specific response field. Replicating units in this way allows for the resulting activation map to be [equivariant](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map") under shifts of the locations of input features in the visual field, i.e. they grant translational [equivariance](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map")—given that the layer has a stride of one.[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73)
- Pooling: In a CNN's [pooling layers](https://en.wikipedia.org/wiki/Pooling_layer "Pooling layer"), feature maps are divided into rectangular sub-regions, and the features in each rectangle are independently down-sampled to a single value, commonly by taking their average or maximum value. In addition to reducing the sizes of feature maps, the pooling operation grants a degree of local [translational invariance](https://en.wikipedia.org/wiki/Translational_symmetry "Translational symmetry") to the features contained therein, allowing the CNN to be more robust to variations in their positions.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)
Together, these properties allow CNNs to achieve better generalization on [vision problems](https://en.wikipedia.org/wiki/Computer_vision "Computer vision"). Weight sharing dramatically reduces the number of [free parameters](https://en.wikipedia.org/wiki/Free_parameter "Free parameter") learned, thus lowering the memory requirements for running the network and allowing the training of larger, more powerful networks.
## Building blocks
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=20 "Edit section: Building blocks")\]
A CNN architecture is formed by a stack of distinct layers that transform the input volume into an output volume (e.g. holding the class scores) through a differentiable function. A few distinct types of layers are commonly used. These are further discussed below.
[](https://en.wikipedia.org/wiki/File:Conv_layer.png)
Neurons of a convolutional layer (blue), connected to their receptive field (red)
### Convolutional layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=21 "Edit section: Convolutional layer")\]
[](https://en.wikipedia.org/wiki/File:Convolutional_neural_network,_convolution_worked_example.png)
A worked example of performing a convolution. The convolution has stride 1, zero-padding, with kernel size 3-by-3. The convolution kernel is a [discrete Laplacian operator](https://en.wikipedia.org/wiki/Discrete_Laplace_operator "Discrete Laplace operator").
The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or [kernels](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)")), which have a small receptive field, but extend through the full depth of the input volume. During the forward pass, each filter is [convolved](https://en.wikipedia.org/wiki/Convolution "Convolution") across the width and height of the input volume, computing the [dot product](https://en.wikipedia.org/wiki/Dot_product "Dot product") between the filter entries and the input, producing a 2-dimensional [activation map](https://en.wikipedia.org/wiki/Activation_function "Activation function") of that filter. As a result, the network learns filters that activate when it detects some specific type of [feature](https://en.wikipedia.org/wiki/Feature_\(machine_learning\) "Feature (machine learning)") at some spatial position in the input.[\[74\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-G%C3%A9ron_Hands-on_ML_2019-74)[\[nb 1\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-75)
Stacking the activation maps for all filters along the depth dimension forms the full output volume of the convolution layer. Every entry in the output volume can thus also be interpreted as an output of a neuron that looks at a small region in the input. Each entry in an activation map use the same set of parameters that define the filter.
[Self-supervised learning](https://en.wikipedia.org/wiki/Self-supervised_learning "Self-supervised learning") has been adapted for use in convolutional layers by using sparse patches with a high-mask ratio and a global response normalization layer.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
#### Local connectivity
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=22 "Edit section: Local connectivity")\]
[](https://en.wikipedia.org/wiki/File:Typical_cnn.png)
Typical CNN architecture
When dealing with high-dimensional inputs such as images, it is impractical to connect neurons to all neurons in the previous volume because such a network architecture does not take the spatial structure of the data into account. Convolutional networks exploit spatially local correlation by enforcing a [sparse local connectivity](https://en.wikipedia.org/wiki/Sparse_network "Sparse network") pattern between neurons of adjacent layers: each neuron is connected to only a small region of the input volume.
The extent of this connectivity is a [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_optimization "Hyperparameter optimization") called the [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field") of the neuron. The connections are [local in space](https://en.wikipedia.org/wiki/Spatial_locality "Spatial locality") (along width and height), but always extend along the entire depth of the input volume. Such an architecture ensures that the learned filters produce the strongest response to a spatially local input pattern.[\[75\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-76)
#### Spatial arrangement
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=23 "Edit section: Spatial arrangement")\]
Three [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_\(machine_learning\) "Hyperparameter (machine learning)") control the size of the output volume of the convolutional layer: the depth, [stride](https://en.wikipedia.org/wiki/Stride_of_an_array "Stride of an array"), and padding size:
- The *depth* of the output volume controls the number of neurons in a layer that connect to the same region of the input volume. These neurons learn to activate for different features in the input. For example, if the first convolutional layer takes the raw image as input, then different neurons along the depth dimension may activate in the presence of various oriented edges, or blobs of color.
- *Stride*
controls how depth columns around the width and height are allocated. If the stride is 1, then we move the filters one pixel at a time. This leads to heavily [overlapping](https://en.wikipedia.org/wiki/Intersection_\(set_theory\) "Intersection (set theory)") receptive fields between the columns, and to large output volumes. For any integer
S
\>
0
,
{\\textstyle S\>0,}

a stride *S* means that the filter is translated *S* units at a time per output. In practice,
S
≥
3
{\\textstyle S\\geq 3}

is rare. A greater stride means smaller overlap of receptive fields and smaller spatial dimensions of the output volume.[\[76\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-77)
- Sometimes, it is convenient to pad the input with zeros (or other values, such as the average of the region) on the border of the input volume. The size of this padding is a third hyperparameter. Padding provides control of the output volume's spatial size. In particular, sometimes it is desirable to exactly preserve the spatial size of the input volume, this is commonly referred to as "same" padding.
[](https://en.wikipedia.org/wiki/File:Convolutional_neural_network,_boundary_conditions.png)
Three example padding conditions. Replication condition means that the pixel outside is padded with the closest pixel inside. The reflection padding is where the pixel outside is padded with the pixel inside, reflected across the boundary of the image. The circular padding is where the pixel outside wraps around to the other side of the image.
The spatial size of the output volume is a function of the input volume size W {\\displaystyle W} , the kernel field size K {\\displaystyle K}  of the convolutional layer neurons, the stride S {\\displaystyle S} , and the amount of zero padding P {\\displaystyle P}  on the border. The number of neurons that "fit" in a given volume is then:
W
−
K
\+
2
P
S
\+
1\.
{\\displaystyle {\\frac {W-K+2P}{S}}+1.}

If this number is not an [integer](https://en.wikipedia.org/wiki/Integer "Integer"), then the strides are incorrect and the neurons cannot be tiled to fit across the input volume in a [symmetric](https://en.wikipedia.org/wiki/Symmetry "Symmetry") way. In general, setting zero padding to be P \= ( K − 1 ) / 2 {\\textstyle P=(K-1)/2}  when the stride is S \= 1 {\\displaystyle S=1}  ensures that the input volume and output volume will have the same size spatially. However, it is not always completely necessary to use all of the neurons of the previous layer. For example, a neural network designer may decide to use just a portion of padding.
#### Parameter sharing
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=24 "Edit section: Parameter sharing")\]
A parameter sharing scheme is used in convolutional layers to control the number of free parameters. It relies on the assumption that if a patch feature is useful to compute at some spatial position, then it should also be useful to compute at other positions. Denoting a single 2-dimensional slice of depth as a *depth slice*, the neurons in each depth slice are constrained to use the same weights and bias.
Since all neurons in a single depth slice share the same parameters, the forward pass in each depth slice of the convolutional layer can be computed as a [convolution](https://en.wikipedia.org/wiki/Convolution "Convolution") of the neuron's weights with the input volume.[\[nb 2\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-78) Therefore, it is common to refer to the sets of weights as a filter (or a [kernel](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)")), which is convolved with the input. The result of this convolution is an [activation map](https://en.wikipedia.org/wiki/Activation_function "Activation function"), and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume. Parameter sharing contributes to the [translation invariance](https://en.wikipedia.org/wiki/Translational_symmetry "Translational symmetry") of the CNN architecture.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)
Sometimes, the parameter sharing assumption may not make sense. This is especially the case when the input images to a CNN have some specific centered structure; for which we expect completely different features to be learned on different spatial locations. One practical example is when the inputs are faces that have been centered in the image: we might expect different eye-specific or hair-specific features to be learned in different parts of the image. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a "locally connected layer". In this layer, the convolutional kernels' parameters are not shared. Instead, the network learns independent weights and biases for each spatial location. This allows each location to have its own feature-learning ability, making it better suited to handle images with distinct central structures or irregular features.
### Pooling layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=25 "Edit section: Pooling layer")\]
Main article: [Pooling layer](https://en.wikipedia.org/wiki/Pooling_layer "Pooling layer")
[](https://en.wikipedia.org/wiki/File:Convolutional_neural_network,_maxpooling.png)
Worked example of 2x2 maxpooling with stride 2
[](https://en.wikipedia.org/wiki/File:Max_pooling.png)
Max pooling with a 2x2 filter and stride = 2
Another important concept of CNNs is pooling, which is used as a form of non-linear [down-sampling](https://en.wikipedia.org/wiki/Downsampling_\(signal_processing\) "Downsampling (signal processing)"). Pooling provides downsampling because it reduces the spatial dimensions (height and width) of the input feature maps while retaining the most important information. There are several non-linear functions to implement pooling, where *max pooling* and *average pooling* are the most common. Pooling aggregates information from small regions of the input creating [partitions](https://en.wikipedia.org/wiki/Partition_of_a_set "Partition of a set") of the input feature map, typically using a fixed-size window (like 2x2) and applying a stride (often 2) to move the window across the input.[\[77\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-79) Note that without using a stride greater than 1, pooling would not perform downsampling, as it would simply move the pooling window across the input one step at a time, without reducing the size of the feature map. In other words, the stride is what actually causes the downsampling by determining how much the pooling window moves over the input.
Intuitively, the exact location of a feature is less important than its rough location relative to other features. This is the idea behind the use of pooling in convolutional neural networks. The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters, [memory footprint](https://en.wikipedia.org/wiki/Memory_footprint "Memory footprint") and amount of computation in the network, and hence to also control [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting"). This is known as down-sampling. It is common to periodically insert a pooling layer between successive convolutional layers (each one typically followed by an activation function, such as a [ReLU layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#ReLU_layer)) in a CNN architecture.[\[74\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-G%C3%A9ron_Hands-on_ML_2019-74): 460–461 While pooling layers contribute to local translation invariance, they do not provide global translation invariance in a CNN, unless a form of global pooling is used.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) The pooling layer commonly operates independently on every depth, or slice, of the input and resizes it spatially. A very common form of max pooling is a layer with filters of size 2×2, applied with a stride of 2, which subsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations:f X , Y ( S ) \= max a , b \= 0 1 S 2 X \+ a , 2 Y \+ b . {\\displaystyle f\_{X,Y}(S)=\\max \_{a,b=0}^{1}S\_{2X+a,2Y+b}.}  In this case, every [max operation](https://en.wikipedia.org/wiki/Maximum "Maximum") is over 4 numbers. The depth dimension remains unchanged (this is true for other forms of pooling as well).
In addition to max pooling, pooling units can use other functions, such as [average](https://en.wikipedia.org/wiki/Average "Average") pooling or [ℓ2\-norm](https://en.wikipedia.org/wiki/Euclidean_norm "Euclidean norm") pooling. Average pooling was often used historically but has recently fallen out of favor compared to max pooling, which generally performs better in practice.[\[78\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Scherer-ICANN-2010-80)
Due to the effects of fast spatial reduction of the size of the representation,\[*[which?](https://en.wikipedia.org/wiki/Wikipedia:Avoid_weasel_words "Wikipedia:Avoid weasel words")*\] there is a recent trend towards using smaller filters[\[79\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-81) or discarding pooling layers altogether.[\[80\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-82)
[](https://en.wikipedia.org/wiki/File:RoI_pooling_animated.gif)
RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.
#### Channel max pooling
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=26 "Edit section: Channel max pooling")\]
A channel max pooling (CMP) operation layer conducts the MP operation along the channel side among the corresponding positions of the consecutive feature maps for the purpose of redundant information elimination. The CMP makes the significant features gather together within fewer channels, which is important for fine-grained image classification that needs more discriminating features. Meanwhile, another advantage of the CMP operation is to make the channel number of feature maps smaller before it connects to the first fully connected (FC) layer. Similar to the MP operation, we denote the input feature maps and output feature maps of a CMP layer as F ∈ R(C×M×N) and C ∈ R(c×M×N), respectively, where C and c are the channel numbers of the input and output feature maps, M and N are the widths and the height of the feature maps, respectively. Note that the CMP operation only changes the channel number of the feature maps. The width and the height of the feature maps are not changed, which is different from the MP operation.[\[81\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Ma_Chang_Xie_Ding_2019_pp._3224%E2%80%933233-83)
See [\[82\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-84)[\[83\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-85) for reviews for pooling methods.
### ReLU layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=27 "Edit section: ReLU layer")\]
ReLU is the abbreviation of [rectified linear unit](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)"). It was proposed by [Alston Householder](https://en.wikipedia.org/wiki/Alston_Scott_Householder "Alston Scott Householder") in 1941,[\[84\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-86) and used in CNN by [Kunihiko Fukushima](https://en.wikipedia.org/wiki/Kunihiko_Fukushima "Kunihiko Fukushima") in 1969.[\[39\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Fukushima1969-39) ReLU applies the non-saturating [activation function](https://en.wikipedia.org/wiki/Activation_function "Activation function") f ( x ) \= max ( 0 , x ) {\\textstyle f(x)=\\max(0,x)} .[\[69\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:02-69) It effectively removes negative values from an activation map by setting them to zero.[\[85\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Romanuke4-87) It introduces [nonlinearity](https://en.wikipedia.org/wiki/Nonlinearity_\(disambiguation\) "Nonlinearity (disambiguation)") to the [decision function](https://en.wikipedia.org/wiki/Decision_boundary "Decision boundary") and in the overall network without affecting the receptive fields of the convolution layers. In 2011, Xavier Glorot, Antoine Bordes and [Yoshua Bengio](https://en.wikipedia.org/wiki/Yoshua_Bengio "Yoshua Bengio") found that ReLU enables better training of deeper networks,[\[86\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-glorot2011-88) compared to widely used activation functions prior to 2011.
Other functions can also be used to increase nonlinearity, for example the saturating [hyperbolic tangent](https://en.wikipedia.org/wiki/Hyperbolic_tangent "Hyperbolic tangent") f ( x ) \= tanh ( x ) {\\displaystyle f(x)=\\tanh(x)} , f ( x ) \= \| tanh ( x ) \| {\\displaystyle f(x)=\|\\tanh(x)\|} , and the [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function "Sigmoid function") σ ( x ) \= ( 1 \+ e − x ) − 1 {\\textstyle \\sigma (x)=(1+e^{-x})^{-1}} . ReLU is often preferred to other functions because it trains the neural network several times faster without a significant penalty to [generalization](https://en.wikipedia.org/wiki/Generalization_\(learning\) "Generalization (learning)") accuracy.[\[87\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-89)
### Fully connected layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=28 "Edit section: Fully connected layer")\]
After several convolutional and max pooling layers, the final classification is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular (non-convolutional) [artificial neural networks](https://en.wikipedia.org/wiki/Artificial_neural_network "Artificial neural network"). Their activations can thus be computed as an [affine transformation](https://en.wikipedia.org/wiki/Affine_transformation "Affine transformation"), with [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication "Matrix multiplication") followed by a bias offset ([vector addition](https://en.wikipedia.org/wiki/Vector_addition "Vector addition") of a learned or fixed bias term).
### Loss layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=29 "Edit section: Loss layer")\]
Main articles: [Loss function](https://en.wikipedia.org/wiki/Loss_function "Loss function") and [Loss functions for classification](https://en.wikipedia.org/wiki/Loss_functions_for_classification "Loss functions for classification")
The "loss layer", or "[loss function](https://en.wikipedia.org/wiki/Loss_function "Loss function")", exemplifies how [training](https://en.wikipedia.org/wiki/Training "Training") penalizes the deviation between the predicted output of the network, and the [true](https://en.wikipedia.org/wiki/Ground_truth "Ground truth") data labels (during supervised learning). Various [loss functions](https://en.wikipedia.org/wiki/Loss_function "Loss function") can be used, depending on the specific task.
The [Softmax](https://en.wikipedia.org/wiki/Softmax_function "Softmax function") loss function is used for predicting a single class of *K* mutually exclusive classes.[\[nb 3\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-90) [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function "Sigmoid function") [cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy "Cross entropy") loss is used for predicting *K* independent probability values in \[ 0 , 1 \] {\\displaystyle \[0,1\]} ![{\\displaystyle \[0,1\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/738f7d23bb2d9642bab520020873cccbef49768d). [Euclidean](https://en.wikipedia.org/wiki/Euclidean_distance "Euclidean distance") loss is used for [regressing](https://en.wikipedia.org/wiki/Regression_\(machine_learning\) "Regression (machine learning)") to [real-valued](https://en.wikipedia.org/wiki/Real_number "Real number") labels ( − ∞ , ∞ ) {\\displaystyle (-\\infty ,\\infty )} .
## Hyperparameters
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=30 "Edit section: Hyperparameters")\]
| | |
|---|---|
| [](https://en.wikipedia.org/wiki/File:Question_book-new.svg) | This section **needs additional citations for [verification](https://en.wikipedia.org/wiki/Wikipedia:Verifiability "Wikipedia:Verifiability")**. Please help [improve this article](https://en.wikipedia.org/wiki/Special:EditPage/Convolutional_neural_network "Special:EditPage/Convolutional neural network") by [adding citations to reliable sources](https://en.wikipedia.org/wiki/Help:Referencing_for_beginners "Help:Referencing for beginners") in this section. Unsourced material may be challenged and removed. *(June 2017)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* |
Hyperparameters are various settings that are used to control the learning process. CNNs use more [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_\(machine_learning\) "Hyperparameter (machine learning)") than a standard multilayer perceptron (MLP).
### Padding
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=31 "Edit section: Padding")\]
Padding is the addition of (typically) 0-valued pixels on the borders of an image. This is done so that the border pixels are not undervalued (lost) from the output because they would ordinarily participate in only a single receptive field instance. The padding applied is typically one less than the corresponding kernel dimension. For example, a convolutional layer using 3x3 kernels would receive a 2-pixel pad, that is 1 pixel on each side of the image.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
### Stride
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=32 "Edit section: Stride")\]
The stride is the number of pixels that the analysis window moves on each iteration. A stride of 2 means that each kernel is offset by 2 pixels from its predecessor.
### Number of filters
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=33 "Edit section: Number of filters")\]
Since feature map size decreases with depth, layers near the input layer tend to have fewer filters while higher layers can have more. To equalize computation at each layer, the product of feature values *va* with pixel position is kept roughly constant across layers. Preserving more information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) non-decreasing from one layer to the next.
The number of feature maps directly controls the capacity and depends on the number of available examples and task complexity.
### Filter (or kernel) size
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=34 "Edit section: Filter (or kernel) size")\]
Common filter sizes found in the literature vary greatly, and are usually chosen based on the data set. Typical filter sizes range from 1x1 to 7x7. As two famous examples, [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet") used 3x3, 5x5, and 11x11. [Inceptionv3](https://en.wikipedia.org/wiki/Inceptionv3 "Inceptionv3") used 1x1, 3x3, and 5x5.
The challenge is to find the right level of granularity so as to create abstractions at the proper scale, given a particular data set, and without [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting").
### Pooling type and size
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=35 "Edit section: Pooling type and size")\]
[Max pooling](https://en.wikipedia.org/wiki/Max_pooling "Max pooling") is typically used, often with a 2x2 dimension. This implies that the input is drastically [downsampled](https://en.wikipedia.org/wiki/Downsampling_\(signal_processing\) "Downsampling (signal processing)"), reducing processing cost.
Greater pooling [reduces the dimension](https://en.wikipedia.org/wiki/Dimensionality_reduction "Dimensionality reduction") of the signal, and may result in unacceptable [information loss](https://en.wikipedia.org/wiki/Data_loss "Data loss"). Often, non-overlapping pooling windows perform best.[\[78\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Scherer-ICANN-2010-80)
### Dilation
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=36 "Edit section: Dilation")\]
Dilation involves ignoring pixels within a kernel. This reduces processing memory potentially without significant signal loss. A dilation of 2 on a 3x3 kernel expands the kernel to 5x5, while still processing 9 (evenly spaced) pixels. Specifically, the processed pixels after the dilation are the cells (1,1), (1,3), (1,5), (3,1), (3,3), (3,5), (5,1), (5,3), (5,5), where (i,j) denotes the cell of the i-th row and j-th column in the expanded 5x5 kernel. Accordingly, dilation of 4 expands the kernel to 7x7.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
## Translation equivariance and aliasing
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=37 "Edit section: Translation equivariance and aliasing")\]
It is commonly assumed that CNNs are invariant to shifts of the input. Convolution or pooling layers within a CNN that do not have a stride greater than one are indeed [equivariant](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map") to translations of the input.[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) However, layers with a stride greater than one ignore the [Nyquist–Shannon sampling theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem "Nyquist–Shannon sampling theorem") and might lead to [aliasing](https://en.wikipedia.org/wiki/Aliasing "Aliasing") of the input signal[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) While, in principle, CNNs are capable of implementing anti-aliasing filters, it has been observed that this does not happen in practice,[\[88\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-91) and therefore yield models that are not equivariant to translations.
Furthermore, if a CNN makes use of fully connected layers, translation equivariance does not imply translation invariance, as the fully connected layers are not invariant to shifts of the input.[\[89\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-92)[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16) One solution for complete translation invariance is avoiding any down-sampling throughout the network and applying global average pooling at the last layer.[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) Additionally, several other partial solutions have been proposed, such as [anti-aliasing](https://en.wikipedia.org/wiki/Anti-aliasing_filter "Anti-aliasing filter") before downsampling operations,[\[90\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-93) spatial transformer networks,[\[91\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-94) [data augmentation](https://en.wikipedia.org/wiki/Data_augmentation "Data augmentation"), subsampling combined with pooling,[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16) and [capsule neural networks](https://en.wikipedia.org/wiki/Capsule_neural_network "Capsule neural network").[\[92\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-95)
## Evaluation
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=38 "Edit section: Evaluation")\]
The accuracy of the final model is typically estimated on a sub-part of the dataset set apart at the start, often called a test set. Alternatively, methods such as [*k*\-fold cross-validation](https://en.wikipedia.org/wiki/Cross-validation_\(statistics\) "Cross-validation (statistics)") are applied. Other strategies include using [conformal prediction](https://en.wikipedia.org/wiki/Conformal_prediction "Conformal prediction").[\[93\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-96)[\[94\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-97)
## Regularization methods
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=39 "Edit section: Regularization methods")\]
Main article: [Regularization (mathematics)](https://en.wikipedia.org/wiki/Regularization_\(mathematics\) "Regularization (mathematics)")
| | |
|---|---|
| [](https://en.wikipedia.org/wiki/File:Question_book-new.svg) | This section **needs additional citations for [verification](https://en.wikipedia.org/wiki/Wikipedia:Verifiability "Wikipedia:Verifiability")**. Please help [improve this article](https://en.wikipedia.org/wiki/Special:EditPage/Convolutional_neural_network "Special:EditPage/Convolutional neural network") by [adding citations to reliable sources](https://en.wikipedia.org/wiki/Help:Referencing_for_beginners "Help:Referencing for beginners") in this section. Unsourced material may be challenged and removed. *(June 2017)* *([Learn how and when to remove this message](https://en.wikipedia.org/wiki/Help:Maintenance_template_removal "Help:Maintenance template removal"))* |
[Regularization](https://en.wikipedia.org/wiki/Regularization_\(mathematics\) "Regularization (mathematics)") is a process of introducing additional information to solve an [ill-posed problem](https://en.wikipedia.org/wiki/Ill-posed_problem "Ill-posed problem") or to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting"). CNNs use various types of regularization.
### Empirical
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=40 "Edit section: Empirical")\]
#### Dropout
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=41 "Edit section: Dropout")\]
Because networks have so many parameters, they are prone to overfitting. One method to reduce overfitting is [dropout](https://en.wikipedia.org/wiki/Dropout_\(neural_networks\) "Dropout (neural networks)"), introduced in 2014.[\[95\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-98) At each training stage, individual nodes are either "dropped out" of the net (ignored) with probability 1 − p {\\displaystyle 1-p}  or kept with probability p {\\displaystyle p} , so that a reduced network is left; incoming and outgoing edges to a dropped-out node are also removed. Only the reduced network is trained on the data in that stage. The removed nodes are then reinserted into the network with their original weights.
In the training stages, p {\\displaystyle p}  is usually 0.5; for input nodes, it is typically much higher because information is directly lost when input nodes are ignored.
At testing time after training has finished, we would ideally like to find a sample average of all possible 2 n {\\displaystyle 2^{n}}  dropped-out networks; unfortunately this is unfeasible for large values of n {\\displaystyle n} . However, we can find an approximation by using the full network with each node's output weighted by a factor of p {\\displaystyle p} , so the [expected value](https://en.wikipedia.org/wiki/Expected_value "Expected value") of the output of any node is the same as in the training stages. This is the biggest contribution of the dropout method: although it effectively generates 2 n {\\displaystyle 2^{n}}  neural nets, and as such allows for model combination, at test time only a single network needs to be tested.
By avoiding training all nodes on all training data, dropout decreases overfitting. The method also significantly improves training speed. This makes the model combination practical, even for [deep neural networks](https://en.wikipedia.org/wiki/Deep_neural_network "Deep neural network"). The technique seems to reduce node interactions, leading them to learn more robust features\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\] that better generalize to new data.
#### DropConnect
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=42 "Edit section: DropConnect")\]
DropConnect is the generalization of dropout in which each connection, rather than each output unit, can be dropped with probability 1 − p {\\displaystyle 1-p} . Each unit thus receives input from a random subset of units in the previous layer.[\[96\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-99)
DropConnect is similar to dropout as it introduces dynamic sparsity within the model, but differs in that the sparsity is on the weights, rather than the output vectors of a layer. In other words, the fully connected layer with DropConnect becomes a sparsely connected layer in which the connections are chosen at random during the training stage.
#### Stochastic pooling
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=43 "Edit section: Stochastic pooling")\]
A major drawback to dropout is that it does not have the same benefits for convolutional layers, where the neurons are not fully connected.
Even before dropout, in 2013 a technique called stochastic pooling,[\[97\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-100) the conventional [deterministic](https://en.wikipedia.org/wiki/Deterministic_algorithm "Deterministic algorithm") pooling operations were replaced with a stochastic procedure, where the activation within each pooling region is picked randomly according to a [multinomial distribution](https://en.wikipedia.org/wiki/Multinomial_distribution "Multinomial distribution"), given by the activities within the pooling region. This approach is free of hyperparameters and can be combined with other regularization approaches, such as dropout and [data augmentation](https://en.wikipedia.org/wiki/Data_augmentation "Data augmentation").
An alternate view of stochastic pooling is that it is equivalent to standard max pooling but with many copies of an input image, each having small local [deformations](https://en.wikipedia.org/wiki/Deformation_theory "Deformation theory"). This is similar to explicit [elastic deformations](https://en.wikipedia.org/wiki/Elastic_deformation "Elastic deformation") of the input images,[\[98\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:3-101) which delivers excellent performance on the [MNIST data set](https://en.wikipedia.org/wiki/MNIST_database "MNIST database").[\[98\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:3-101) Using stochastic pooling in a multilayer model gives an exponential number of deformations since the selections in higher layers are independent of those below.
#### Artificial data
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=44 "Edit section: Artificial data")\]
Main article: [Data augmentation](https://en.wikipedia.org/wiki/Data_augmentation "Data augmentation")
Because the degree of model overfitting is determined by both its power and the amount of training it receives, providing a convolutional network with more training examples can reduce overfitting. Because there is often not enough available data to train, especially considering that some part should be spared for later testing, two approaches are to either generate new data from scratch (if possible) or perturb existing data to create new ones. The latter one is used since mid-1990s.[\[53\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-lecun95-53) For example, input images can be cropped, rotated, or rescaled to create new examples with the same labels as the original training set.[\[99\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-102)
### Explicit
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=45 "Edit section: Explicit")\]
#### Early stopping
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=46 "Edit section: Early stopping")\]
Main article: [Early stopping](https://en.wikipedia.org/wiki/Early_stopping "Early stopping")
One of the simplest methods to prevent overfitting of a network is to simply stop the training before overfitting has had a chance to occur. It comes with the disadvantage that the learning process is halted.
#### Number of parameters
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=47 "Edit section: Number of parameters")\]
Another simple way to prevent overfitting is to limit the number of parameters, typically by limiting the number of hidden units in each layer or limiting network depth. For convolutional networks, the filter size also affects the number of parameters. Limiting the number of parameters restricts the predictive power of the network directly, reducing the complexity of the function that it can perform on the data, and thus limits the amount of overfitting. This is equivalent to a "[zero norm](https://en.wikipedia.org/wiki/Zero_norm "Zero norm")".
#### Weight decay
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=48 "Edit section: Weight decay")\]
A simple form of added regularizer is weight decay, which simply adds an additional error, proportional to the sum of weights ([L1 norm](https://en.wikipedia.org/wiki/L1-norm "L1-norm")) or squared magnitude ([L2 norm](https://en.wikipedia.org/wiki/L2_norm "L2 norm")) of the weight vector, to the error at each node. The level of acceptable model complexity can be reduced by increasing the proportionality constant('alpha' hyperparameter), thus increasing the penalty for large weight vectors.
L2 regularization is the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. The L2 regularization has the intuitive interpretation of heavily penalizing peaky weight vectors and preferring diffuse weight vectors. Due to multiplicative interactions between weights and inputs this has the useful property of encouraging the network to use all of its inputs a little rather than some of its inputs a lot.
L1 regularization is also common. It makes the weight vectors sparse during optimization. In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the noisy inputs. L1 with L2 regularization can be combined; this is called [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization "Elastic net regularization").
#### Max norm constraints
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=49 "Edit section: Max norm constraints")\]
Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use [projected gradient descent](https://en.wikipedia.org/wiki/Sparse_approximation#Projected_Gradient_Descent "Sparse approximation") to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector w → {\\displaystyle {\\vec {w}}}  of every neuron to satisfy ‖ w → ‖ 2 \< c {\\displaystyle \\\|{\\vec {w}}\\\|\_{2}\<c} . Typical values of c {\\displaystyle c}  are order of 3–4. Some papers report improvements[\[100\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-103) when using this form of regularization.
## Hierarchical coordinate frames
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=50 "Edit section: Hierarchical coordinate frames")\]
Pooling loses the precise spatial relationships between high-level parts (such as nose and mouth in a face image). These relationships are needed for identity recognition. Overlapping the pools so that each feature occurs in multiple pools, helps retain the information. Translation alone cannot extrapolate the understanding of geometric relationships to a radically new viewpoint, such as a different orientation or scale. On the other hand, people are very good at extrapolating; after seeing a new shape once they can recognize it from a different viewpoint.[\[101\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-104)
An earlier common way to deal with this problem is to train the network on transformed data in different orientations, scales, lighting, etc. so that the network can cope with these variations. This is computationally intensive for large data-sets. The alternative is to use a hierarchy of coordinate frames and use a group of neurons to represent a conjunction of the shape of the feature and its pose relative to the [retina](https://en.wikipedia.org/wiki/Retina "Retina"). The pose relative to the retina is the relationship between the coordinate frame of the retina and the intrinsic features' coordinate frame.[\[102\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-105)
Thus, one way to represent something is to embed the coordinate frame within it. This allows large features to be recognized by using the consistency of the poses of their parts (e.g. nose and mouth poses make a consistent prediction of the pose of the whole face). This approach ensures that the higher-level entity (e.g. face) is present when the lower-level (e.g. nose and mouth) agree on its prediction of the pose. The vectors of neuronal activity that represent pose ("pose vectors") allow spatial transformations modeled as linear operations that make it easier for the network to learn the hierarchy of visual entities and generalize across viewpoints. This is similar to the way the human [visual system](https://en.wikipedia.org/wiki/Visual_system "Visual system") imposes coordinate frames in order to represent shapes.[\[103\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-106)
## Applications
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=51 "Edit section: Applications")\]
### Image recognition
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=52 "Edit section: Image recognition")\]
CNNs are often used in [image recognition](https://en.wikipedia.org/wiki/Image_recognition "Image recognition") systems. In 2012, an [error rate](https://en.wikipedia.org/wiki/Per-comparison_error_rate "Per-comparison error rate") of 0.23% on the [MNIST database](https://en.wikipedia.org/wiki/MNIST_database "MNIST database") was reported.[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28) Another paper on using CNN for image classification reported that the learning process was "surprisingly fast"; in the same paper, the best published results as of 2011 were achieved in the MNIST database and the NORB database.[\[25\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-flexible-25) Subsequently, a similar CNN called [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet")[\[104\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-quartz-107) won the [ImageNet Large Scale Visual Recognition Challenge](https://en.wikipedia.org/wiki/ImageNet_Large_Scale_Visual_Recognition_Challenge "ImageNet Large Scale Visual Recognition Challenge") 2012.
When applied to [facial recognition](https://en.wikipedia.org/wiki/Facial_recognition_system "Facial recognition system"), CNNs achieved a large decrease in error rate.[\[105\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-108) Another paper reported a 97.6% recognition rate on "5,600 still images of more than 10 subjects".[\[21\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-robust_face_detection-21) CNNs were used to assess [video quality](https://en.wikipedia.org/wiki/Video_quality "Video quality") in an objective way after manual training; the resulting system had a very low [root mean square error](https://en.wikipedia.org/wiki/Root_mean_square_error "Root mean square error").[\[106\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-video_quality-109)
The [ImageNet Large Scale Visual Recognition Challenge](https://en.wikipedia.org/wiki/ImageNet_Large_Scale_Visual_Recognition_Challenge "ImageNet Large Scale Visual Recognition Challenge") is a benchmark in object classification and detection, with millions of images and hundreds of object classes. In the ILSVRC 2014,[\[107\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ILSVRC2014-110) a large-scale visual recognition challenge, almost every highly ranked team used CNN as their basic framework. The winner [GoogLeNet](https://en.wikipedia.org/wiki/GoogLeNet "GoogLeNet")[\[108\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-googlenet-111) (the foundation of [DeepDream](https://en.wikipedia.org/wiki/DeepDream "DeepDream")) increased the mean average [precision](https://en.wikipedia.org/wiki/Precision_and_recall "Precision and recall") of object detection to 0.439329, and reduced classification error to 0.06656, the best result to date. Its network applied more than 30 layers. That performance of convolutional neural networks on the ImageNet tests was close to that of humans.[\[109\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-112) The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters, an increasingly common phenomenon with modern digital cameras. By contrast, those kinds of images rarely trouble humans. Humans, however, tend to have trouble with other issues. For example, they are not good at classifying objects into fine-grained categories such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
In 2015, a many-layered CNN demonstrated the ability to spot faces from a wide range of angles, including upside down, even when partially occluded, with competitive performance. The network was trained on a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They used batches of 128 images over 50,000 iterations.[\[110\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-113)
### Video analysis
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=53 "Edit section: Video analysis")\]
Compared to image data domains, there is relatively little work on applying CNNs to video classification. Video is more complex than images since it has another (temporal) dimension. However, some extensions of CNNs into the video domain have been explored. One approach is to treat space and time as equivalent dimensions of the input and perform convolutions in both time and space.[\[111\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-114)[\[112\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-115) Another way is to fuse the features of two convolutional neural networks, one for the spatial and one for the temporal stream.[\[113\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-116)[\[114\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-117)[\[115\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-118) [Long short-term memory](https://en.wikipedia.org/wiki/Long_short-term_memory "Long short-term memory") (LSTM) [recurrent](https://en.wikipedia.org/wiki/Recurrent_neural_network "Recurrent neural network") units are typically incorporated after the CNN to account for inter-frame or inter-clip dependencies.[\[116\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Wang_Duan_Zhang_Niu_p=1657-119)[\[117\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Duan_Wang_Zhai_Zheng_2018_p.-120) [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning "Unsupervised learning") schemes for training spatio-temporal features have been introduced, based on Convolutional Gated Restricted [Boltzmann Machines](https://en.wikipedia.org/wiki/Boltzmann_machine "Boltzmann machine")[\[118\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-121) and Independent Subspace Analysis.[\[119\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-122) Its application can be seen in [text-to-video model](https://en.wikipedia.org/wiki/Text-to-video_model "Text-to-video model").\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
### Natural language processing
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=54 "Edit section: Natural language processing")\]
CNNs have also been explored for [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing "Natural language processing"). CNN models are effective for various NLP problems and achieved excellent results in [semantic parsing](https://en.wikipedia.org/wiki/Semantic_parsing "Semantic parsing"),[\[120\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-123) search query retrieval,[\[121\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-124) sentence modeling,[\[122\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-125) classification,[\[123\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-126) prediction[\[124\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-127) and other traditional NLP tasks.[\[125\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-128) Compared to traditional language processing methods such as [recurrent neural networks](https://en.wikipedia.org/wiki/Recurrent_neural_networks "Recurrent neural networks"), CNNs can represent different contextual realities of language that do not rely on a series-sequence assumption, while RNNs are better suitable when classical time series modeling is required.[\[126\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-129)[\[127\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-130)[\[128\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-131)[\[129\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-132)
### Animal behavior detection
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=55 "Edit section: Animal behavior detection")\]
CNNs have been applied in ecological and behavioral research to automatically detect and quantify animal behavior from visual data,[\[130\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-133)[\[131\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:7-134) enabling identification of animals,[\[132\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-135)[\[133\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-136) tracking of individuals,[\[134\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-137) estimation of pose,[\[135\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-138)[\[136\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-139)[\[137\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-140) and classification of specific actions such as feeding,[\[138\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:8-141) and social interactions.[\[131\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:7-134)[\[138\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:8-141) Combined with multi-object tracking and temporal modeling, these systems can extract behavioral sequences over extended recordings, reducing reliance on manual annotation and increasing throughput for studies of individual variation, social networks, and collective dynamics.
### Anomaly detection
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=56 "Edit section: Anomaly detection")\]
A CNN with 1-D convolutions was used on time series in the frequency domain (spectral residual) by an unsupervised model to detect anomalies in the time domain.[\[139\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-142)
### Drug discovery
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=57 "Edit section: Drug discovery")\]
CNNs have been used in [drug discovery](https://en.wikipedia.org/wiki/Drug_discovery "Drug discovery"). Predicting the interaction between molecules and biological [proteins](https://en.wikipedia.org/wiki/Protein "Protein") can identify potential treatments. In 2015, Atomwise introduced AtomNet, the first deep learning neural network for [structure-based drug design](https://en.wikipedia.org/wiki/Structure-based_drug_design "Structure-based drug design").[\[140\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-143) The system trains directly on 3-dimensional representations of chemical interactions. Similar to how image recognition networks learn to compose smaller, spatially proximate features into larger, complex structures,[\[141\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-144) AtomNet discovers chemical features, such as [aromaticity](https://en.wikipedia.org/wiki/Aromaticity "Aromaticity"), [sp3 carbons](https://en.wikipedia.org/wiki/Orbital_hybridisation "Orbital hybridisation"), and [hydrogen bonding](https://en.wikipedia.org/wiki/Hydrogen_bond "Hydrogen bond"). Subsequently, AtomNet was used to predict novel candidate [biomolecules](https://en.wikipedia.org/wiki/Biomolecule "Biomolecule") for multiple disease targets, most notably treatments for the [Ebola virus](https://en.wikipedia.org/wiki/Ebola_virus "Ebola virus")[\[142\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-145) and [multiple sclerosis](https://en.wikipedia.org/wiki/Multiple_sclerosis "Multiple sclerosis").[\[143\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-146)
### Checkers game
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=58 "Edit section: Checkers game")\]
CNNs have been used in the game of [checkers](https://en.wikipedia.org/wiki/Draughts "Draughts"). From 1999 to 2001, [Fogel](https://en.wikipedia.org/wiki/David_B._Fogel "David B. Fogel") and Chellapilla published papers showing how a convolutional neural network could learn to play checkers using co-evolution. The learning process did not use prior human professional games, but rather focused on a minimal set of information contained in the checkerboard: the location and type of pieces, and the difference in number of pieces between the two sides. Ultimately, the program ([Blondie24](https://en.wikipedia.org/wiki/Blondie24 "Blondie24")) was tested on 165 games against players and ranked in the highest 0.4%.[\[144\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-147)[\[145\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-148) It also earned a win against the program [Chinook](https://en.wikipedia.org/wiki/Chinook_\(draughts_player\) "Chinook (draughts player)") at its "expert" level of play.[\[146\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-149)
### Go
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=59 "Edit section: Go")\]
CNNs have been used in [computer Go](https://en.wikipedia.org/wiki/Computer_Go "Computer Go"). In December 2014, Clark and [Storkey](https://en.wikipedia.org/wiki/Amos_Storkey "Amos Storkey") published a paper showing that a CNN trained by supervised learning from a database of human professional games could outperform [GNU Go](https://en.wikipedia.org/wiki/GNU_Go "GNU Go") and win some games against [Monte Carlo tree search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search "Monte Carlo tree search") Fuego 1.1 in a fraction of the time it took Fuego to play.[\[147\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-150) Later it was announced that a large 12-layer convolutional neural network had correctly predicted the professional move in 55% of positions, equalling the accuracy of a [6 dan](https://en.wikipedia.org/wiki/Go_ranks_and_ratings "Go ranks and ratings") human player. When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional search program GNU Go in 97% of games, and matched the performance of the [Monte Carlo tree search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search "Monte Carlo tree search") program Fuego simulating ten thousand playouts (about a million positions) per move.[\[148\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-151)
A couple of CNNs for choosing moves to try ("policy network") and evaluating positions ("value network") driving MCTS were used by [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo "AlphaGo"), the first to beat the best human player at the time.[\[149\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-152)
### Time series forecasting
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=60 "Edit section: Time series forecasting")\]
Recurrent neural networks are generally considered the best neural network architectures for time series forecasting (and sequence modeling in general), but recent studies show that convolutional networks can perform comparably or even better.[\[150\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-153)[\[13\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Tsantekidis_7%E2%80%9312-13) Dilated convolutions[\[151\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-154) might enable one-dimensional convolutional neural networks to effectively learn time series dependences.[\[152\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-155) Convolutions can be implemented more efficiently than RNN-based solutions, and they do not suffer from vanishing (or exploding) gradients.[\[153\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-156) Convolutional networks can provide an improved forecasting performance when there are multiple similar time series to learn from.[\[154\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-157) CNNs can also be applied to further tasks in time series analysis (e.g., time series classification[\[155\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-158) or quantile forecasting[\[156\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-159)).
### Cultural heritage and 3D-datasets
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=61 "Edit section: Cultural heritage and 3D-datasets")\]
As archaeological findings such as [clay tablets](https://en.wikipedia.org/wiki/Clay_tablet "Clay tablet") with [cuneiform writing](https://en.wikipedia.org/wiki/Cuneiform "Cuneiform") are increasingly acquired using [3D scanners](https://en.wikipedia.org/wiki/3D_scanner "3D scanner"), benchmark datasets are becoming available, including *HeiCuBeDa*[\[157\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-HeiCuBeDa_Hilprecht-160) providing almost 2000 normalized 2-D and 3-D datasets prepared with the [GigaMesh Software Framework](https://en.wikipedia.org/wiki/GigaMesh_Software_Framework "GigaMesh Software Framework").[\[158\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ICDAR19-161) So [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature")\-based measures are used in conjunction with geometric neural networks (GNNs), e.g. for period classification of those clay tablets being among the oldest documents of human history.[\[159\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ICFHR20-162)[\[160\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ICFHR20_Presentation-163)
## Fine-tuning
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=62 "Edit section: Fine-tuning")\]
For many applications, training data is not very available. Convolutional neural networks usually require a large amount of training data in order to avoid [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting"). A common technique is to train the network on a larger data set from a related domain. Once the network parameters have converged an additional training step is performed using the in-domain data to fine-tune the network weights, this is known as [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning "Transfer learning"). Furthermore, this technique allows convolutional network architectures to successfully be applied to problems with tiny training sets.[\[161\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-164)
## Human interpretable explanations
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=63 "Edit section: Human interpretable explanations")\]
End-to-end training and prediction are common practice in [computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision"). However, human interpretable explanations are required for [critical systems](https://en.wikipedia.org/wiki/Safety-critical_system "Safety-critical system") such as [self-driving cars](https://en.wikipedia.org/wiki/Self-driving_car "Self-driving car").[\[162\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Interpretable_ML_Symposium_2017-165) With recent advances in [visual salience](https://en.wikipedia.org/wiki/Salience_\(neuroscience\) "Salience (neuroscience)"), [spatial attention](https://en.wikipedia.org/wiki/Visual_spatial_attention "Visual spatial attention"), and [temporal attention](https://en.wikipedia.org/wiki/Visual_temporal_attention "Visual temporal attention"), the most critical spatial regions/temporal instants could be visualized to justify the CNN predictions.[\[163\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Zang_Wang_Liu_Zhang_2018_pp._97%E2%80%93108-166)[\[164\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Wang_Zang_Zhang_Niu_p=1979-167)
## Related architectures
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=64 "Edit section: Related architectures")\]
### Deep Q-networks
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=65 "Edit section: Deep Q-networks")\]
A deep Q-network (DQN) is a type of deep learning model that combines a deep neural network with [Q-learning](https://en.wikipedia.org/wiki/Q-learning "Q-learning"), a form of [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning "Reinforcement learning"). Unlike earlier reinforcement learning agents, DQNs that utilize CNNs can learn directly from high-dimensional sensory inputs via reinforcement learning.[\[165\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Ong_Chavez_Hong_2015-168)
Preliminary results were presented in 2014, with an accompanying paper in February 2015.[\[166\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-DQN-169) The research described an application to [Atari 2600](https://en.wikipedia.org/wiki/Atari_2600 "Atari 2600") gaming. Other deep reinforcement learning models preceded it.[\[167\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-170)
### Deep belief networks
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=66 "Edit section: Deep belief networks")\]
Main article: [Deep belief network](https://en.wikipedia.org/wiki/Deep_belief_network "Deep belief network")
[Convolutional deep belief networks](https://en.wikipedia.org/wiki/Convolutional_deep_belief_network "Convolutional deep belief network") (CDBN) have structure very similar to convolutional neural networks and are trained similarly to deep belief networks. Therefore, they exploit the 2D structure of images, like CNNs do, and make use of pre-training like [deep belief networks](https://en.wikipedia.org/wiki/Deep_belief_network "Deep belief network"). They provide a generic structure that can be used in many image and signal processing tasks. Benchmark results on standard image datasets like CIFAR[\[168\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-CDBN-CIFAR-171) have been obtained using CDBNs.[\[169\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-CDBN-172)
[](https://en.wikipedia.org/wiki/File:Neural_Abstraction_Pyramid.jpg)
Neural abstraction pyramid
### Neural abstraction pyramid
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=67 "Edit section: Neural abstraction pyramid")\]
The feed-forward architecture of convolutional neural networks was extended in the neural abstraction pyramid[\[170\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-173) by lateral and feedback connections. The resulting recurrent convolutional network allows for the flexible incorporation of contextual information to iteratively resolve local ambiguities. In contrast to previous models, image-like outputs at the highest resolution were generated, e.g., for semantic segmentation, image reconstruction, and object localization tasks.
## Notable libraries
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=68 "Edit section: Notable libraries")\]
- [Caffe](https://en.wikipedia.org/wiki/Caffe_\(software\) "Caffe (software)"): A library for convolutional neural networks. Created by the Berkeley Vision and Learning Center (BVLC). It supports both CPU and GPU. Developed in [C++](https://en.wikipedia.org/wiki/C%2B%2B "C++"), and has [Python](https://en.wikipedia.org/wiki/Python_\(programming_language\) "Python (programming language)") and [MATLAB](https://en.wikipedia.org/wiki/MATLAB "MATLAB") wrappers.
- [Deeplearning4j](https://en.wikipedia.org/wiki/Deeplearning4j "Deeplearning4j"): Deep learning in [Java](https://en.wikipedia.org/wiki/Java_\(programming_language\) "Java (programming language)") and [Scala](https://en.wikipedia.org/wiki/Scala_\(programming_language\) "Scala (programming language)") on multi-GPU-enabled [Spark](https://en.wikipedia.org/wiki/Apache_Spark "Apache Spark"). A general-purpose deep learning library for the JVM production stack running on a C++ scientific computing engine. Allows the creation of custom layers. Integrates with Hadoop and Kafka.
- [Dlib](https://en.wikipedia.org/wiki/Dlib "Dlib"): A toolkit for making real world machine learning and data analysis applications in C++.
- [Microsoft Cognitive Toolkit](https://en.wikipedia.org/wiki/Microsoft_Cognitive_Toolkit "Microsoft Cognitive Toolkit"): A deep learning toolkit written by Microsoft with several unique features enhancing scalability over multiple nodes. It supports full-fledged interfaces for training in C++ and Python and with additional support for model inference in [C\#](https://en.wikipedia.org/wiki/C_Sharp_\(programming_language\) "C Sharp (programming language)") and Java.
- [TensorFlow](https://en.wikipedia.org/wiki/TensorFlow "TensorFlow"): [Apache 2.0](https://en.wikipedia.org/wiki/Apache_License#Version_2.0 "Apache License")\-licensed Theano-like library with support for CPU, GPU, Google's proprietary [tensor processing unit](https://en.wikipedia.org/wiki/Tensor_processing_unit "Tensor processing unit") (TPU),[\[171\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-174) and mobile devices.
- [Theano](https://en.wikipedia.org/wiki/Theano_\(software\) "Theano (software)"): The reference deep-learning library for Python with an API largely compatible with the popular [NumPy](https://en.wikipedia.org/wiki/NumPy "NumPy") library. Allows user to write symbolic mathematical expressions, then automatically generates their derivatives, saving the user from having to code gradients or backpropagation. These symbolic expressions are automatically compiled to [CUDA](https://en.wikipedia.org/wiki/CUDA "CUDA") code for a fast, [on-the-GPU](https://en.wikipedia.org/wiki/Compute_kernel "Compute kernel") implementation.
- [Torch](https://en.wikipedia.org/wiki/Torch_\(machine_learning\) "Torch (machine learning)"): A [scientific computing](https://en.wikipedia.org/wiki/Scientific_computing "Scientific computing") framework with wide support for machine learning algorithms, written in [C](https://en.wikipedia.org/wiki/C_\(programming_language\) "C (programming language)") and [Lua](https://en.wikipedia.org/wiki/Lua_\(programming_language\) "Lua (programming language)").
## See also
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=69 "Edit section: See also")\]
- [Attention (machine learning)](https://en.wikipedia.org/wiki/Attention_\(machine_learning\) "Attention (machine learning)")
- [Circuit (neural network)](https://en.wikipedia.org/wiki/Circuit_\(neural_network\) "Circuit (neural network)")
- [Convolution](https://en.wikipedia.org/wiki/Convolution "Convolution")
- [Deep learning](https://en.wikipedia.org/wiki/Deep_learning "Deep learning")
- [Natural-language processing](https://en.wikipedia.org/wiki/Natural-language_processing "Natural-language processing")
- [Neocognitron](https://en.wikipedia.org/wiki/Neocognitron "Neocognitron")
- [Scale-invariant feature transform](https://en.wikipedia.org/wiki/Scale-invariant_feature_transform "Scale-invariant feature transform")
- [Time delay neural network](https://en.wikipedia.org/wiki/Time_delay_neural_network "Time delay neural network")
- [Vision processing unit](https://en.wikipedia.org/wiki/Vision_processing_unit "Vision processing unit")
## Notes
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=70 "Edit section: Notes")\]
1. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-75)** When applied to other types of data than image data, such as sound data, "spatial position" may variously correspond to different points in the [time domain](https://en.wikipedia.org/wiki/Time_domain "Time domain"), [frequency domain](https://en.wikipedia.org/wiki/Frequency_domain "Frequency domain"), or other [mathematical spaces](https://en.wikipedia.org/wiki/Space_\(mathematics\) "Space (mathematics)").
2. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-78)** hence the name "convolutional layer"
3. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-90)** So-called [categorical data](https://en.wikipedia.org/wiki/Categorical_data "Categorical data").
## References
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=71 "Edit section: References")\]
1. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LeCun2015_1-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LeCun2015_1-1)
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015-05-28). ["Deep learning"](https://hal.science/hal-04206682). *Nature*. **521** (7553): 436–444\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2015Natur.521..436L](https://ui.adsabs.harvard.edu/abs/2015Natur.521..436L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/nature14539](https://doi.org/10.1038%2Fnature14539). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1476-4687](https://search.worldcat.org/issn/1476-4687). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [26017442](https://pubmed.ncbi.nlm.nih.gov/26017442).
2. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-2)**
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. (December 1989). ["Backpropagation Applied to Handwritten Zip Code Recognition"](https://ieeexplore.ieee.org/document/6795724). *Neural Computation*. **1** (4): 541–551\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/neco.1989.1.4.541](https://doi.org/10.1162%2Fneco.1989.1.4.541). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0899-7667](https://search.worldcat.org/issn/0899-7667).
3. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto3_3-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto3_3-1)
Venkatesan, Ragav; Li, Baoxin (2017-10-23). [*Convolutional Neural Networks in Visual Computing: A Concise Guide*](https://books.google.com/books?id=bAM7DwAAQBAJ&q=vanishing+gradient). CRC Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-351-65032-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-351-65032-8 "Special:BookSources/978-1-351-65032-8")
. [Archived](https://web.archive.org/web/20231016190415/https://books.google.com/books?id=bAM7DwAAQBAJ&q=vanishing+gradient#v=snippet&q=vanishing%20gradient&f=false) from the original on 2023-10-16. Retrieved 2020-12-13.
4. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto2_4-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto2_4-1)
Balas, Valentina E.; Kumar, Raghvendra; Srivastava, Rajshree (2019-11-19). [*Recent Trends and Advances in Artificial Intelligence and Internet of Things*](https://books.google.com/books?id=XRS_DwAAQBAJ&q=exploding+gradient). Springer Nature. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-030-32644-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-030-32644-9 "Special:BookSources/978-3-030-32644-9")
. [Archived](https://web.archive.org/web/20231016190414/https://books.google.com/books?id=XRS_DwAAQBAJ&q=exploding+gradient#v=snippet&q=exploding%20gradient&f=false) from the original on 2023-10-16. Retrieved 2020-12-13.
5. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-5)**
Zhang, Yingjie; Soon, Hong Geok; Ye, Dongsen; Fuh, Jerry Ying Hsi; Zhu, Kunpeng (September 2020). "Powder-Bed Fusion Process Monitoring by Machine Vision With Hybrid Convolutional Neural Networks". *IEEE Transactions on Industrial Informatics*. **16** (9): 5769–5779\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2020ITII...16.5769Z](https://ui.adsabs.harvard.edu/abs/2020ITII...16.5769Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TII.2019.2956078](https://doi.org/10.1109%2FTII.2019.2956078). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1941-0050](https://search.worldcat.org/issn/1941-0050). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [213010088](https://api.semanticscholar.org/CorpusID:213010088).
6. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-6)**
Chervyakov, N.I.; Lyakhov, P.A.; Deryabin, M.A.; Nagornov, N.N.; Valueva, M.V.; Valuev, G.V. (September 2020). ["Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network"](https://linkinghub.elsevier.com/retrieve/pii/S092523122030583X). *Neurocomputing*. **407**: 439–453\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.neucom.2020.04.018](https://doi.org/10.1016%2Fj.neucom.2020.04.018). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [219470398](https://api.semanticscholar.org/CorpusID:219470398). [Archived](https://web.archive.org/web/20230629155646/https://linkinghub.elsevier.com/retrieve/pii/S092523122030583X) from the original on 2023-06-29. Retrieved 2023-08-12. "Convolutional neural networks represent deep learning architectures that are currently used in a wide range of applications, including computer vision, speech recognition, malware dedection, time series analysis in finance, and many others."
7. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto1_7-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto1_7-1)
Aghdam, Hamed Habibi; Heravi, Elnaz Jahani (2017-05-30). *Guide to convolutional neural networks: a practical application to traffic-sign detection and classification*. Cham, Switzerland: Springer. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-319-57549-0](https://en.wikipedia.org/wiki/Special:BookSources/978-3-319-57549-0 "Special:BookSources/978-3-319-57549-0")
. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [987790957](https://search.worldcat.org/oclc/987790957).
8. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-homma_8-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-homma_8-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-homma_8-2)
Homma, Toshiteru; Les Atlas; Robert Marks II (1987). ["An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification"](https://proceedings.neurips.cc/paper_files/paper/1987/file/853f7b3615411c82a2ae439ab8c4c96e-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*. **1**: 31–40\. [Archived](https://web.archive.org/web/20220331211142/https://proceedings.neurips.cc/paper/1987/file/98f13708210194c475687be6106a3b84-Paper.pdf) (PDF) from the original on 2022-03-31. Retrieved 2022-03-31. "The notion of convolution or correlation used in the models presented is popular in engineering disciplines and has been applied extensively to designing filters, control systems, etc."
9. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Valueva_Nagornov_Lyakhov_Valuev_2020_pp._232%E2%80%93243_9-0)**
Valueva, M.V.; Nagornov, N.N.; Lyakhov, P.A.; Valuev, G.V.; Chervyakov, N.I. (2020). "Application of the residue number system to reduce hardware costs of the convolutional neural network implementation". *Mathematics and Computers in Simulation*. **177**. Elsevier BV: 232–243\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.matcom.2020.04.031](https://doi.org/10.1016%2Fj.matcom.2020.04.031). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0378-4754](https://search.worldcat.org/issn/0378-4754). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [218955622](https://api.semanticscholar.org/CorpusID:218955622). "Convolutional neural networks are a promising tool for solving the problem of pattern recognition."
10. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-10)**
van den Oord, Aaron; Dieleman, Sander; Schrauwen, Benjamin (2013-01-01). Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q. (eds.). [*Deep content-based music recommendation*](https://proceedings.neurips.cc/paper/2013/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf) (PDF). Curran Associates, Inc. pp. 2643–2651\. [Archived](https://web.archive.org/web/20220307172303/https://proceedings.neurips.cc/paper/2013/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf) (PDF) from the original on 2022-03-07. Retrieved 2022-03-31.
11. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-11)**
Collobert, Ronan; Weston, Jason (2008-01-01). "A unified architecture for natural language processing". *Proceedings of the 25th international conference on Machine learning - ICML '08*. New York, NY, US: ACM. pp. 160–167\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/1390156.1390177](https://doi.org/10.1145%2F1390156.1390177). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-60558-205-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60558-205-4 "Special:BookSources/978-1-60558-205-4")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2617020](https://api.semanticscholar.org/CorpusID:2617020).
12. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-12)**
Avilov, Oleksii; Rimbert, Sebastien; Popov, Anton; Bougrain, Laurent (July 2020). ["Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals"](https://ieeexplore.ieee.org/document/9176228). [*2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)*](https://hal.inria.fr/hal-02920320/file/Avilov_EMBC2020.pdf) (PDF). Vol. 2020. Montreal, QC, Canada: IEEE. pp. 142–145\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/EMBC44109.2020.9176228](https://doi.org/10.1109%2FEMBC44109.2020.9176228). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-7281-1990-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-7281-1990-8 "Special:BookSources/978-1-7281-1990-8")
. [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [33017950](https://pubmed.ncbi.nlm.nih.gov/33017950). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [221386616](https://api.semanticscholar.org/CorpusID:221386616). [Archived](https://web.archive.org/web/20220519135428/https://hal.inria.fr/hal-02920320/file/Avilov_EMBC2020.pdf) (PDF) from the original on 2022-05-19. Retrieved 2023-07-21.
13. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Tsantekidis_7%E2%80%9312_13-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Tsantekidis_7%E2%80%9312_13-1)
Tsantekidis, Avraam; Passalis, Nikolaos; Tefas, Anastasios; Kanniainen, Juho; Gabbouj, Moncef; Iosifidis, Alexandros (July 2017). "Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks". *2017 IEEE 19th Conference on Business Informatics (CBI)*. Thessaloniki, Greece: IEEE. pp. 7–12\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CBI.2017.23](https://doi.org/10.1109%2FCBI.2017.23). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-5386-3035-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-5386-3035-8 "Special:BookSources/978-1-5386-3035-8")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [4950757](https://api.semanticscholar.org/CorpusID:4950757).
14. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:0_14-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:0_14-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:0_14-2)
Zhang, Wei (1988). ["Shift-invariant pattern recognition neural network and its optical architecture"](https://drive.google.com/file/d/1nN_5odSG_QVae54EsQN_qSz-0ZsX6wA0/view?usp=sharing). *Proceedings of Annual Conference of the Japan Society of Applied Physics*. [Archived](https://web.archive.org/web/20200623051222/https://drive.google.com/file/d/1nN_5odSG_QVae54EsQN_qSz-0ZsX6wA0/view?usp=sharing) from the original on 2020-06-23. Retrieved 2020-06-22.
15. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:1_15-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:1_15-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:1_15-2)
Zhang, Wei (1990). ["Parallel distributed processing model with local space-invariant interconnections and its optical architecture"](https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view?usp=sharing). *Applied Optics*. **29** (32): 4790–7\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1990ApOpt..29.4790Z](https://ui.adsabs.harvard.edu/abs/1990ApOpt..29.4790Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1364/AO.29.004790](https://doi.org/10.1364%2FAO.29.004790). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20577468](https://pubmed.ncbi.nlm.nih.gov/20577468). [Archived](https://web.archive.org/web/20170206111407/https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
16. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-2) [***d***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-3) [***e***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-4) [***f***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-5)
Mouton, Coenraad; Myburgh, Johannes C.; Davel, Marelie H. (2020). ["Stride and Translation Invariance in CNNs"](https://link.springer.com/chapter/10.1007%2F978-3-030-66151-9_17). In Gerber, Aurona (ed.). *Artificial Intelligence Research*. Communications in Computer and Information Science. Vol. 1342. Cham: Springer International Publishing. pp. 267–281\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2103\.10097](https://arxiv.org/abs/2103.10097). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-030-66151-9\_17](https://doi.org/10.1007%2F978-3-030-66151-9_17). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-030-66151-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-030-66151-9 "Special:BookSources/978-3-030-66151-9")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [232269854](https://api.semanticscholar.org/CorpusID:232269854). [Archived](https://web.archive.org/web/20210627074505/https://link.springer.com/chapter/10.1007%2F978-3-030-66151-9_17) from the original on 2021-06-27. Retrieved 2021-03-26.
17. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-17)**
Kurtzman, Thomas (August 20, 2019). ["Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6701836). *PLOS ONE*. **14** (8) e0220113. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019PLoSO..1420113C](https://ui.adsabs.harvard.edu/abs/2019PLoSO..1420113C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1371/journal.pone.0220113](https://doi.org/10.1371%2Fjournal.pone.0220113). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6701836](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6701836). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [31430292](https://pubmed.ncbi.nlm.nih.gov/31430292).
18. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-fukuneoscholar_18-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-fukuneoscholar_18-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-fukuneoscholar_18-2)
Fukushima, K. (2007). ["Neocognitron"](https://doi.org/10.4249%2Fscholarpedia.1717). *Scholarpedia*. **2** (1): 1717. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2007SchpJ...2.1717F](https://ui.adsabs.harvard.edu/abs/2007SchpJ...2.1717F). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.4249/scholarpedia.1717](https://doi.org/10.4249%2Fscholarpedia.1717).
19. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-hubelwiesel1968_19-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-hubelwiesel1968_19-1)
Hubel, D. H.; Wiesel, T. N. (1968-03-01). ["Receptive fields and functional architecture of monkey striate cortex"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557912). *The Journal of Physiology*. **195** (1): 215–243\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1113/jphysiol.1968.sp008455](https://doi.org/10.1113%2Fjphysiol.1968.sp008455). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0022-3751](https://search.worldcat.org/issn/0022-3751). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [1557912](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557912). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [4966457](https://pubmed.ncbi.nlm.nih.gov/4966457).
20. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-intro_20-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-intro_20-1)
Fukushima, Kunihiko (1980). ["Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position"](https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (PDF). *Biological Cybernetics*. **36** (4): 193–202\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/BF00344251](https://doi.org/10.1007%2FBF00344251). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [7370364](https://pubmed.ncbi.nlm.nih.gov/7370364). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [206775608](https://api.semanticscholar.org/CorpusID:206775608). [Archived](https://web.archive.org/web/20140603013137/http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (PDF) from the original on 3 June 2014. Retrieved 16 November 2013.
21. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-robust_face_detection_21-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-robust_face_detection_21-1)
Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). ["Subject independent facial expression recognition with robust face detection using a convolutional neural network"](http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/sparse/matsugo_etal_face_expression_conv_nnet.pdf) (PDF). *Neural Networks*. **16** (5): 555–559\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2003NN.....16..555M](https://ui.adsabs.harvard.edu/abs/2003NN.....16..555M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/S0893-6080(03)00115-1](https://doi.org/10.1016%2FS0893-6080%2803%2900115-1). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [12850007](https://pubmed.ncbi.nlm.nih.gov/12850007). [Archived](https://web.archive.org/web/20131213022740/http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/sparse/matsugo_etal_face_expression_conv_nnet.pdf) (PDF) from the original on 13 December 2013. Retrieved 17 November 2013.
22. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-22)** Convolutional Neural Networks Demystified: A Matched Filtering Perspective Based Tutorial <https://arxiv.org/abs/2108.11663v3>
23. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-deeplearning_23-0)**
["Convolutional Neural Networks (LeNet) – DeepLearning 0.1 documentation"](https://web.archive.org/web/20171228091645/http://deeplearning.net/tutorial/lenet.html). *DeepLearning 0.1*. LISA Lab. Archived from [the original](http://deeplearning.net/tutorial/lenet.html) on 28 December 2017. Retrieved 31 August 2013.
24. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-24)**
Chollet, François (2017-04-04). "Xception: Deep Learning with Depthwise Separable Convolutions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1610\.02357](https://arxiv.org/abs/1610.02357) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
25. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-flexible_25-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-flexible_25-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-flexible_25-2)
Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). ["Flexible, High Performance Convolutional Neural Networks for Image Classification"](https://people.idsia.ch/~juergen/ijcai2011.pdf) (PDF). *Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two*. **2**: 1237–1242\. [Archived](https://web.archive.org/web/20220405190128/https://people.idsia.ch/~juergen/ijcai2011.pdf) (PDF) from the original on 5 April 2022. Retrieved 17 November 2013.
26. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-26)**
[Krizhevsky](https://en.wikipedia.org/wiki/Alex_Krizhevsky "Alex Krizhevsky"), Alex. ["ImageNet Classification with Deep Convolutional Neural Networks"](https://image-net.org/static_files/files/supervision.pdf) (PDF). [Archived](https://web.archive.org/web/20210425025127/http://www.image-net.org/static_files/files/supervision.pdf) (PDF) from the original on 25 April 2021. Retrieved 17 November 2013.
27. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Yamaguchi111990_27-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Yamaguchi111990_27-1)
Yamaguchi, Kouichi; Sakamoto, Kenji; Akabane, Toshio; Fujimoto, Yoshiji (November 1990). [*A Neural Network for Speaker-Independent Isolated Word Recognition*](https://web.archive.org/web/20210307233750/https://www.isca-speech.org/archive/icslp_1990/i90_1077.html). First International Conference on Spoken Language Processing (ICSLP 90). Kobe, Japan. Archived from [the original](https://www.isca-speech.org/archive/icslp_1990/i90_1077.html) on 2021-03-07. Retrieved 2019-09-04.
28. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-2) [***d***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-3)
Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). "Multi-column deep neural networks for image classification". *2012 IEEE Conference on Computer Vision and Pattern Recognition*. New York, NY: [Institute of Electrical and Electronics Engineers](https://en.wikipedia.org/wiki/Institute_of_Electrical_and_Electronics_Engineers "Institute of Electrical and Electronics Engineers") (IEEE). pp. 3642–3649\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1202\.2745](https://arxiv.org/abs/1202.2745). [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.300.3283](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.300.3283). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CVPR.2012.6248110](https://doi.org/10.1109%2FCVPR.2012.6248110). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4673-1226-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4673-1226-4 "Special:BookSources/978-1-4673-1226-4")
. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [812295155](https://search.worldcat.org/oclc/812295155). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2161592](https://api.semanticscholar.org/CorpusID:2161592).
29. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-29)**
Yu, Fisher; Koltun, Vladlen (2016-04-30). "Multi-Scale Context Aggregation by Dilated Convolutions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1511\.07122](https://arxiv.org/abs/1511.07122) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
30. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-30)**
Chen, Liang-Chieh; Papandreou, George; Schroff, Florian; Adam, Hartwig (2017-12-05). "Rethinking Atrous Convolution for Semantic Image Segmentation". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1706\.05587](https://arxiv.org/abs/1706.05587) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
31. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-31)**
Duta, Ionut Cosmin; Georgescu, Mariana Iuliana; Ionescu, Radu Tudor (2021-08-16). "Contextual Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2108\.07387](https://arxiv.org/abs/2108.07387) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
32. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LeCun_32-0)**
LeCun, Yann. ["LeNet-5, convolutional neural networks"](http://yann.lecun.com/exdb/lenet/). [Archived](https://web.archive.org/web/20210224225707/http://yann.lecun.com/exdb/lenet/) from the original on 24 February 2021. Retrieved 16 November 2013.
33. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-33)**
Zeiler, Matthew D.; Taylor, Graham W.; Fergus, Rob (November 2011). ["Adaptive deconvolutional networks for mid and high level feature learning"](https://dx.doi.org/10.1109/iccv.2011.6126474). *2011 International Conference on Computer Vision*. IEEE. pp. 2018–2025\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/iccv.2011.6126474](https://doi.org/10.1109%2Ficcv.2011.6126474). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4577-1102-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4577-1102-2 "Special:BookSources/978-1-4577-1102-2")
.
34. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-34)**
Dumoulin, Vincent; Visin, Francesco (2018-01-11), *A guide to convolution arithmetic for deep learning*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1603\.07285](https://arxiv.org/abs/1603.07285)
35. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-35)**
Odena, Augustus; Dumoulin, Vincent; Olah, Chris (2016-10-17). ["Deconvolution and Checkerboard Artifacts"](https://distill.pub/2016/deconv-checkerboard/). *Distill*. **1** (10) e3. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.23915/distill.00003](https://doi.org/10.23915%2Fdistill.00003). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2476-0757](https://search.worldcat.org/issn/2476-0757).
36. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-36)**
van Dyck, Leonard Elia; Kwitt, Roland; Denzler, Sebastian Jochen; Gruber, Walter Roland (2021). ["Comparing Object Recognition in Humans and Deep Convolutional Neural Networks—An Eye Tracking Study"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8526843). *Frontiers in Neuroscience*. **15** 750639. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3389/fnins.2021.750639](https://doi.org/10.3389%2Ffnins.2021.750639). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1662-453X](https://search.worldcat.org/issn/1662-453X). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [8526843](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8526843). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [34690686](https://pubmed.ncbi.nlm.nih.gov/34690686).
37. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:4_37-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:4_37-1)
Hubel, DH; Wiesel, TN (October 1959). ["Receptive fields of single neurones in the cat's striate cortex"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130). *J. Physiol*. **148** (3): 574–91\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1113/jphysiol.1959.sp006308](https://doi.org/10.1113%2Fjphysiol.1959.sp006308). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [1363130](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [14403679](https://pubmed.ncbi.nlm.nih.gov/14403679).
38. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-38)**
David H. Hubel and Torsten N. Wiesel (2005). [*Brain and visual perception: the story of a 25-year collaboration*](https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106). Oxford University Press US. p. 106. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0-19-517618-6](https://en.wikipedia.org/wiki/Special:BookSources/978-0-19-517618-6 "Special:BookSources/978-0-19-517618-6")
. [Archived](https://web.archive.org/web/20231016190414/https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106#v=onepage&q&f=false) from the original on 2023-10-16. Retrieved 2019-01-18.
39. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Fukushima1969_39-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Fukushima1969_39-1)
Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements". *IEEE Transactions on Systems Science and Cybernetics*. **5** (4): 322–333\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1969ITSSC...5..322F](https://ui.adsabs.harvard.edu/abs/1969ITSSC...5..322F). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TSSC.1969.300225](https://doi.org/10.1109%2FTSSC.1969.300225).
40. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-DLhistory_40-0)**
[Schmidhuber, Juergen](https://en.wikipedia.org/wiki/Juergen_Schmidhuber "Juergen Schmidhuber") (2022). "Annotated History of Modern AI and Deep Learning". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2212\.11279](https://arxiv.org/abs/2212.11279) \[[cs.NE](https://arxiv.org/archive/cs.NE)\].
41. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-41)**
Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1710\.05941](https://arxiv.org/abs/1710.05941) \[[cs.NE](https://arxiv.org/archive/cs.NE)\].
42. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Waibel1987_42-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Waibel1987_42-1)
Waibel, Alex (18 December 1987). [*Phoneme Recognition Using Time-Delay Neural Networks*](https://isl.iar.kit.edu/downloads/Pheome_Recognition_Using_Time-Delay_Neural_Networks_SP87-100_6.pdf) (PDF). Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE). Tokyo, Japan.
43. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-speechsignal_43-0)** [Alexander Waibel](https://en.wikipedia.org/wiki/Alex_Waibel "Alex Waibel") et al., *[Phoneme Recognition Using Time-Delay Neural Networks](http://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf) [Archived](https://web.archive.org/web/20210225163001/http://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf) 2021-02-25 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")* IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 37, No. 3, pp. 328. - 339 March 1989.
44. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-44)**
LeCun, Yann; Bengio, Yoshua (1995). ["Convolutional networks for images, speech, and time series"](https://www.researchgate.net/publication/2453996). In Arbib, Michael A. (ed.). *The handbook of brain theory and neural networks* (Second ed.). The MIT press. pp. 276–278\. [Archived](https://web.archive.org/web/20200728164116/https://www.researchgate.net/publication/2453996_Convolutional_Networks_for_Images_Speech_and_Time-Series) from the original on 2020-07-28. Retrieved 2019-12-03.
45. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Hampshire1990_45-0)** John B. Hampshire and Alexander Waibel, *[Connectionist Architectures for Multi-Speaker Phoneme Recognition](https://proceedings.neurips.cc/paper/1989/file/979d472a84804b9f647bc185a877a8b5-Paper.pdf) [Archived](https://web.archive.org/web/20220331225059/https://proceedings.neurips.cc/paper/1989/file/979d472a84804b9f647bc185a877a8b5-Paper.pdf) 2022-03-31 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")*, Advances in Neural Information Processing Systems, 1990, Morgan Kaufmann.
46. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Ko2017_46-0)**
Ko, Tom; Peddinti, Vijayaditya; Povey, Daniel; Seltzer, Michael L.; Khudanpur, Sanjeev (March 2018). [*A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition*](https://www.danielpovey.com/files/2017_icassp_reverberation.pdf) (PDF). The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017). New Orleans, LA, US. [Archived](https://web.archive.org/web/20180708072725/http://danielpovey.com/files/2017_icassp_reverberation.pdf) (PDF) from the original on 2018-07-08. Retrieved 2019-09-04.
47. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-47)** Denker, J S, Gardner, W R, Graf, H. P, Henderson, D, Howard, R E, Hubbard, W, Jackel, L D, BaIrd, H S, and Guyon (1989) [Neural network recognizer for hand-written zip code digits](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.852.5499&rep=rep1&type=pdf) [Archived](https://web.archive.org/web/20180804013916/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.852.5499&rep=rep1&type=pdf) 2018-08-04 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine"), AT\&T Bell Laboratories
48. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:2_48-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:2_48-1) Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, [Backpropagation Applied to Handwritten Zip Code Recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) [Archived](https://web.archive.org/web/20200110090230/http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) 2020-01-10 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine"); AT\&T Bell Laboratories
49. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1991_49-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1991_49-1)
Zhang, Wei (1991). ["Image processing of human corneal endothelium based on a learning network"](https://drive.google.com/file/d/0B65v6Wo67Tk5cm5DTlNGd0NPUmM/view?usp=sharing). *Applied Optics*. **30** (29): 4211–7\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1991ApOpt..30.4211Z](https://ui.adsabs.harvard.edu/abs/1991ApOpt..30.4211Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1364/AO.30.004211](https://doi.org/10.1364%2FAO.30.004211). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20706526](https://pubmed.ncbi.nlm.nih.gov/20706526). [Archived](https://web.archive.org/web/20170206122612/https://drive.google.com/file/d/0B65v6Wo67Tk5cm5DTlNGd0NPUmM/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
50. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1994_50-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1994_50-1)
Zhang, Wei (1994). ["Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant artificial neural network"](https://drive.google.com/file/d/0B65v6Wo67Tk5Ml9qeW5nQ3poVTQ/view?usp=sharing). *Medical Physics*. **21** (4): 517–24\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1994MedPh..21..517Z](https://ui.adsabs.harvard.edu/abs/1994MedPh..21..517Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1118/1.597177](https://doi.org/10.1118%2F1.597177). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [8058017](https://pubmed.ncbi.nlm.nih.gov/8058017). [Archived](https://web.archive.org/web/20170206030321/https://drive.google.com/file/d/0B65v6Wo67Tk5Ml9qeW5nQ3poVTQ/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
51. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-weng1993_51-0)**
Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". *1993 (4th) International Conference on Computer Vision*. IEEE. pp. 121–128\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICCV.1993.378228](https://doi.org/10.1109%2FICCV.1993.378228). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[0-8186-3870-2](https://en.wikipedia.org/wiki/Special:BookSources/0-8186-3870-2 "Special:BookSources/0-8186-3870-2")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [8619176](https://api.semanticscholar.org/CorpusID:8619176).
52. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-schdeepscholar_52-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-schdeepscholar_52-1)
Schmidhuber, Jürgen (2015). ["Deep Learning"](http://www.scholarpedia.org/article/Deep_Learning). *Scholarpedia*. **10** (11): 1527–54\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.76.1541](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1541). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/neco.2006.18.7.1527](https://doi.org/10.1162%2Fneco.2006.18.7.1527). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [16764513](https://pubmed.ncbi.nlm.nih.gov/16764513). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2309950](https://api.semanticscholar.org/CorpusID:2309950). [Archived](https://web.archive.org/web/20160419024349/http://www.scholarpedia.org/article/Deep_Learning) from the original on 2016-04-19. Retrieved 2019-01-20.
53. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-lecun95_53-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-lecun95_53-1)
Lecun, Y.; Jackel, L. D.; Bottou, L.; Cortes, C.; Denker, J. S.; Drucker, H.; Guyon, I.; Muller, U. A.; Sackinger, E.; Simard, P.; Vapnik, V. (August 1995). [*Learning algorithms for classification: A comparison on handwritten digit recognition*](http://yann.lecun.com/exdb/publis/pdf/lecun-95a.pdf) (PDF). World Scientific. pp. 261–276\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1142/2808](https://doi.org/10.1142%2F2808). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-981-02-2324-3](https://en.wikipedia.org/wiki/Special:BookSources/978-981-02-2324-3 "Special:BookSources/978-981-02-2324-3")
. [Archived](https://web.archive.org/web/20230502220356/http://yann.lecun.com/exdb/publis/pdf/lecun-95a.pdf) (PDF) from the original on 2 May 2023.
54. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-54)**
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (November 1998). "Gradient-based learning applied to document recognition". *Proceedings of the IEEE*. **86** (11): 2278–2324\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1998IEEEP..86.2278L](https://ui.adsabs.harvard.edu/abs/1998IEEEP..86.2278L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/5.726791](https://doi.org/10.1109%2F5.726791).
55. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-55)**
Zhang, Wei (1991). ["Error Back Propagation with Minimum-Entropy Weights: A Technique for Better Generalization of 2-D Shift-Invariant NNs"](https://drive.google.com/file/d/0B65v6Wo67Tk5dkJTcEMtU2c5Znc/view?usp=sharing). *Proceedings of the International Joint Conference on Neural Networks*. [Archived](https://web.archive.org/web/20170206155801/https://drive.google.com/file/d/0B65v6Wo67Tk5dkJTcEMtU2c5Znc/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
56. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-56)** Daniel Graupe, Ruey Wen Liu, George S Moschytz."[Applications of neural networks to medical signal processing](https://www.researchgate.net/profile/Daniel_Graupe2/publication/241130197_Applications_of_signal_and_image_processing_to_medicine/links/575eef7e08aec91374b42bd2.pdf) [Archived](https://web.archive.org/web/20200728164114/https://www.researchgate.net/profile/Daniel_Graupe2/publication/241130197_Applications_of_signal_and_image_processing_to_medicine/links/575eef7e08aec91374b42bd2.pdf) 2020-07-28 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")". In Proc. 27th IEEE Decision and Control Conf., pp. 343–347, 1988.
57. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-57)** Daniel Graupe, Boris Vern, G. Gruener, Aaron Field, and Qiu Huang. "[Decomposition of surface EMG signals into single fiber action potentials by means of neural network](https://ieeexplore.ieee.org/abstract/document/100522/) [Archived](https://web.archive.org/web/20190904161656/https://ieeexplore.ieee.org/abstract/document/100522/) 2019-09-04 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")". Proc. IEEE International Symp. on Circuits and Systems, pp. 1008–1011, 1989.
58. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-58)** Qiu Huang, Daniel Graupe, Yi Fang Huang, Ruey Wen Liu."[Identification of firing patterns of neuronal signals](http://www.academia.edu/download/42092095/graupe_huang_q_huang_yf_liu_rw_1989.pdf)\[*[dead link](https://en.wikipedia.org/wiki/Wikipedia:Link_rot "Wikipedia:Link rot")*\]." In Proc. 28th IEEE Decision and Control Conf., pp. 266–271, 1989. <https://ieeexplore.ieee.org/document/70115> [Archived](https://web.archive.org/web/20220331211138/https://ieeexplore.ieee.org/document/70115) 2022-03-31 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")
59. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-59)**
Oh, KS; Jung, K (2004). "GPU implementation of neural networks". *Pattern Recognition*. **37** (6): 1311–1314\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2004PatRe..37.1311O](https://ui.adsabs.harvard.edu/abs/2004PatRe..37.1311O). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.patcog.2004.01.013](https://doi.org/10.1016%2Fj.patcog.2004.01.013).
60. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-60)**
Dave Steinkraus; Patrice Simard; Ian Buck (2005). ["Using GPUs for Machine Learning Algorithms"](https://www.computer.org/csdl/proceedings-article/icdar/2005/24201115/12OmNylKAVX). *12th International Conference on Document Analysis and Recognition (ICDAR 2005)*. pp. 1115–1119\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICDAR.2005.251](https://doi.org/10.1109%2FICDAR.2005.251). [Archived](https://web.archive.org/web/20220331211138/https://www.computer.org/csdl/proceedings-article/icdar/2005/24201115/12OmNylKAVX) from the original on 2022-03-31. Retrieved 2022-03-31.
61. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-61)**
Kumar Chellapilla; Sid Puri; Patrice Simard (2006). ["High Performance Convolutional Neural Networks for Document Processing"](https://hal.inria.fr/inria-00112631/document). In Lorette, Guy (ed.). *Tenth International Workshop on Frontiers in Handwriting Recognition*. Suvisoft. [Archived](https://web.archive.org/web/20200518193413/https://hal.inria.fr/inria-00112631/document) from the original on 2020-05-18. Retrieved 2016-03-14.
62. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-62)**
Hinton, GE; Osindero, S; Teh, YW (Jul 2006). "A fast learning algorithm for deep belief nets". *Neural Computation*. **18** (7): 1527–54\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.76.1541](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1541). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/neco.2006.18.7.1527](https://doi.org/10.1162%2Fneco.2006.18.7.1527). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [16764513](https://pubmed.ncbi.nlm.nih.gov/16764513). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2309950](https://api.semanticscholar.org/CorpusID:2309950).
63. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-63)**
Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan; Larochelle, Hugo (2007). ["Greedy Layer-Wise Training of Deep Networks"](https://proceedings.neurips.cc/paper/2006/file/5da713a690c067105aeb2fae32403405-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*: 153–160\. [Archived](https://web.archive.org/web/20220602144141/https://proceedings.neurips.cc/paper/2006/file/5da713a690c067105aeb2fae32403405-Paper.pdf) (PDF) from the original on 2022-06-02. Retrieved 2022-03-31.
64. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-64)**
Ranzato, MarcAurelio; Poultney, Christopher; Chopra, Sumit; LeCun, Yann (2007). ["Efficient Learning of Sparse Representations with an Energy-Based Model"](http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf) (PDF). *Advances in Neural Information Processing Systems*. [Archived](https://web.archive.org/web/20160322112400/http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf) (PDF) from the original on 2016-03-22. Retrieved 2014-06-26.
65. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LSD_1_65-0)**
Raina, R; Madhavan, A; Ng, Andrew (14 June 2009). ["Large-scale deep unsupervised learning using graphics processors"](http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf) (PDF). *Proceedings of the 26th Annual International Conference on Machine Learning*. ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. pp. 873–880\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/1553374.1553486](https://doi.org/10.1145%2F1553374.1553486). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-60558-516-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60558-516-1 "Special:BookSources/978-1-60558-516-1")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [392458](https://api.semanticscholar.org/CorpusID:392458). [Archived](https://web.archive.org/web/20201208104513/http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf) (PDF) from the original on 8 December 2020. Retrieved 22 December 2023.
66. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-66)**
Ciresan, Dan; Meier, Ueli; Gambardella, Luca; Schmidhuber, Jürgen (2010). "Deep big simple neural nets for handwritten digit recognition". *Neural Computation*. **22** (12): 3207–3220\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1003\.0358](https://arxiv.org/abs/1003.0358). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2010NeCom..22.3207C](https://ui.adsabs.harvard.edu/abs/2010NeCom..22.3207C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/NECO\_a\_00052](https://doi.org/10.1162%2FNECO_a_00052). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20858131](https://pubmed.ncbi.nlm.nih.gov/20858131). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [1918673](https://api.semanticscholar.org/CorpusID:1918673).
67. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-67)**
["IJCNN 2011 Competition result table"](https://benchmark.ini.rub.de/gtsrb_results.html). *OFFICIAL IJCNN2011 COMPETITION*. 2010. [Archived](https://web.archive.org/web/20210117024729/https://benchmark.ini.rub.de/gtsrb_results.html) from the original on 2021-01-17. Retrieved 2019-01-14.
68. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-68)**
Schmidhuber, Jürgen (17 March 2017). ["History of computer vision contests won by deep CNNs on GPU"](https://people.idsia.ch/~juergen/computer-vision-contests-won-by-gpu-cnns.html). [Archived](https://web.archive.org/web/20181219224934/http://people.idsia.ch/~juergen/computer-vision-contests-won-by-gpu-cnns.html) from the original on 19 December 2018. Retrieved 14 January 2019.
69. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:02_69-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:02_69-1)
Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2017-05-24). ["ImageNet classification with deep convolutional neural networks"](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (PDF). *Communications of the ACM*. **60** (6): 84–90\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/3065386](https://doi.org/10.1145%2F3065386). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0001-0782](https://search.worldcat.org/issn/0001-0782). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [195908774](https://api.semanticscholar.org/CorpusID:195908774). [Archived](https://web.archive.org/web/20170516174757/http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (PDF) from the original on 2017-05-16. Retrieved 2018-12-04.
70. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-70)**
Viebke, Andre; Memeti, Suejb; Pllana, Sabri; Abraham, Ajith (2019). "CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi". *The Journal of Supercomputing*. **75** (1): 197–227\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.07908](https://arxiv.org/abs/1702.07908). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/s11227-017-1994-x](https://doi.org/10.1007%2Fs11227-017-1994-x). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [14135321](https://api.semanticscholar.org/CorpusID:14135321).
71. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-71)**
Viebke, Andre; Pllana, Sabri (2015). ["The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning"](http://lnu.diva-portal.org/smash/record.jsf?pid=diva2%3A877421&dswid=4277). *2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems*. *IEEE Xplore*. IEEE 2015. pp. 758–765\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/HPCC-CSS-ICESS.2015.45](https://doi.org/10.1109%2FHPCC-CSS-ICESS.2015.45). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4799-8937-9](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-8937-9 "Special:BookSources/978-1-4799-8937-9")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [15411954](https://api.semanticscholar.org/CorpusID:15411954). [Archived](https://web.archive.org/web/20230306003530/http://lnu.diva-portal.org/smash/record.jsf?pid=diva2:877421&dswid=4277) from the original on 2023-03-06. Retrieved 2022-03-31.
72. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-72)**
Hinton, Geoffrey (2012). ["ImageNet Classification with Deep Convolutional Neural Networks"](https://dl.acm.org/doi/10.5555/2999134.2999257). *NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1*. **1**: 1097–1105\. [Archived](https://web.archive.org/web/20191220014019/https://dl.acm.org/citation.cfm?id=2999134.2999257) from the original on 2019-12-20. Retrieved 2021-03-26 – via ACM.
73. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-2) [***d***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-3) [***e***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-4)
Azulay, Aharon; Weiss, Yair (2019). ["Why do deep convolutional networks generalize so poorly to small image transformations?"](https://jmlr.org/papers/v20/19-519.html). *Journal of Machine Learning Research*. **20** (184): 1–25\. [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1533-7928](https://search.worldcat.org/issn/1533-7928). [Archived](https://web.archive.org/web/20220331211138/https://jmlr.org/papers/v20/19-519.html) from the original on 2022-03-31. Retrieved 2022-03-31.
74. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-G%C3%A9ron_Hands-on_ML_2019_74-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-G%C3%A9ron_Hands-on_ML_2019_74-1)
Géron, Aurélien (2019). *Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow*. Sebastopol, CA: O'Reilly Media. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-492-03264-9](https://en.wikipedia.org/wiki/Special:BookSources/978-1-492-03264-9 "Special:BookSources/978-1-492-03264-9")
.
, pp. 448
75. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-76)**
Li, Zewen; Liu, Fan; Yang, Wenjie; Peng, Shouheng; Zhou, Jun (December 2022). "A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects". *IEEE Transactions on Neural Networks and Learning Systems*. **33** (12): 6999–7019\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2004\.02806](https://arxiv.org/abs/2004.02806). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2022ITNNL..33.6999L](https://ui.adsabs.harvard.edu/abs/2022ITNNL..33.6999L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TNNLS.2021.3084827](https://doi.org/10.1109%2FTNNLS.2021.3084827). [hdl](https://en.wikipedia.org/wiki/Hdl_\(identifier\) "Hdl (identifier)"):[10072/405164](https://hdl.handle.net/10072%2F405164). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [34111009](https://pubmed.ncbi.nlm.nih.gov/34111009).
76. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-77)**
["CS231n Convolutional Neural Networks for Visual Recognition"](https://cs231n.github.io/convolutional-networks/). *cs231n.github.io*. [Archived](https://web.archive.org/web/20191023031945/https://cs231n.github.io/convolutional-networks/) from the original on 2019-10-23. Retrieved 2017-04-25.
77. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-79)**
Nirthika, Rajendran; Manivannan, Siyamalan; Ramanan, Amirthalingam; Wang, Ruixuan (2022-04-01). ["Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804673). *Neural Computing and Applications*. **34** (7): 5321–5347\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/s00521-022-06953-8](https://doi.org/10.1007%2Fs00521-022-06953-8). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1433-3058](https://search.worldcat.org/issn/1433-3058). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [8804673](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804673). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [35125669](https://pubmed.ncbi.nlm.nih.gov/35125669).
78. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Scherer-ICANN-2010_80-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Scherer-ICANN-2010_80-1)
Scherer, Dominik; Müller, Andreas C.; Behnke, Sven (2010). ["Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition"](http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf) (PDF). *Artificial Neural Networks (ICANN), 20th International Conference on*. Thessaloniki, Greece: Springer. pp. 92–101\. [Archived](https://web.archive.org/web/20180403185041/http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf) (PDF) from the original on 2018-04-03. Retrieved 2016-12-28.
79. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-81)**
Graham, Benjamin (2014-12-18). "Fractional Max-Pooling". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.6071](https://arxiv.org/abs/1412.6071) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
80. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-82)**
Springenberg, Jost Tobias; Dosovitskiy, Alexey; Brox, Thomas; Riedmiller, Martin (2014-12-21). "Striving for Simplicity: The All Convolutional Net". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.6806](https://arxiv.org/abs/1412.6806) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
81. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Ma_Chang_Xie_Ding_2019_pp._3224%E2%80%933233_83-0)**
Ma, Zhanyu; Chang, Dongliang; Xie, Jiyang; Ding, Yifeng; Wen, Shaoguo; Li, Xiaoxu; Si, Zhongwei; Guo, Jun (2019). "Fine-Grained Vehicle Classification With Channel Max Pooling Modified CNNs". *IEEE Transactions on Vehicular Technology*. **68** (4). Institute of Electrical and Electronics Engineers (IEEE): 3224–3233\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019ITVT...68.3224M](https://ui.adsabs.harvard.edu/abs/2019ITVT...68.3224M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/tvt.2019.2899972](https://doi.org/10.1109%2Ftvt.2019.2899972). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0018-9545](https://search.worldcat.org/issn/0018-9545). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [86674074](https://api.semanticscholar.org/CorpusID:86674074).
82. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-84)**
Zafar, Afia; Aamir, Muhammad; Mohd Nawi, Nazri; Arshad, Ali; Riaz, Saman; Alruban, Abdulrahman; Dutta, Ashit Kumar; Almotairi, Sultan (2022-08-29). ["A Comparison of Pooling Methods for Convolutional Neural Networks"](https://doi.org/10.3390%2Fapp12178643). *Applied Sciences*. **12** (17): 8643. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2022ApSci..12.8643Z](https://ui.adsabs.harvard.edu/abs/2022ApSci..12.8643Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/app12178643](https://doi.org/10.3390%2Fapp12178643). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2076-3417](https://search.worldcat.org/issn/2076-3417).
83. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-85)**
Gholamalinezhad, Hossein; Khosravi, Hossein (2020-09-16), *Pooling Methods in Deep Neural Networks, a Review*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2009\.07485](https://arxiv.org/abs/2009.07485)
84. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-86)**
Householder, Alston S. (June 1941). ["A theory of steady-state activity in nerve-fiber networks: I. Definitions and preliminary lemmas"](http://link.springer.com/10.1007/BF02478220). *The Bulletin of Mathematical Biophysics*. **3** (2): 63–69\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/BF02478220](https://doi.org/10.1007%2FBF02478220). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0007-4985](https://search.worldcat.org/issn/0007-4985).
85. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Romanuke4_87-0)**
Romanuke, Vadim (2017). ["Appropriate number and allocation of ReLUs in convolutional neural networks"](https://doi.org/10.20535%2F1810-0546.2017.1.88156). *Research Bulletin of NTUU "Kyiv Polytechnic Institute"*. **1** (1): 69–78\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.20535/1810-0546.2017.1.88156](https://doi.org/10.20535%2F1810-0546.2017.1.88156).
86. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-glorot2011_88-0)**
Xavier Glorot; Antoine Bordes; [Yoshua Bengio](https://en.wikipedia.org/wiki/Yoshua_Bengio "Yoshua Bengio") (2011). [*Deep sparse rectifier neural networks*](https://web.archive.org/web/20161213022121/http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf) (PDF). AISTATS. Archived from [the original](http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf) (PDF) on 2016-12-13. Retrieved 2023-04-10. "Rectifier and softplus activation functions. The second one is a smooth version of the first."
87. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-89)**
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). ["Imagenet classification with deep convolutional neural networks"](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*. **1**: 1097–1105\. [Archived](https://web.archive.org/web/20220331224736/https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) (PDF) from the original on 2022-03-31. Retrieved 2022-03-31.
88. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-91)**
Ribeiro, Antonio H.; Schön, Thomas B. (2021). "How Convolutional Neural Networks Deal with Aliasing". *ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*. pp. 2755–2759\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2102\.07757](https://arxiv.org/abs/2102.07757). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICASSP39728.2021.9414627](https://doi.org/10.1109%2FICASSP39728.2021.9414627). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-7281-7605-5](https://en.wikipedia.org/wiki/Special:BookSources/978-1-7281-7605-5 "Special:BookSources/978-1-7281-7605-5")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [231925012](https://api.semanticscholar.org/CorpusID:231925012).
89. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-92)**
Myburgh, Johannes C.; Mouton, Coenraad; Davel, Marelie H. (2020). ["Tracking Translation Invariance in CNNS"](https://link.springer.com/chapter/10.1007%2F978-3-030-66151-9_18). In Gerber, Aurona (ed.). *Artificial Intelligence Research*. Communications in Computer and Information Science. Vol. 1342. Cham: Springer International Publishing. pp. 282–295\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2104\.05997](https://arxiv.org/abs/2104.05997). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-030-66151-9\_18](https://doi.org/10.1007%2F978-3-030-66151-9_18). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-030-66151-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-030-66151-9 "Special:BookSources/978-3-030-66151-9")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [233219976](https://api.semanticscholar.org/CorpusID:233219976). [Archived](https://web.archive.org/web/20220122015258/http://link.springer.com/chapter/10.1007/978-3-030-66151-9_18) from the original on 2022-01-22. Retrieved 2021-03-26.
90. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-93)**
Richard, Zhang (2019-04-25). *Making Convolutional Networks Shift-Invariant Again*. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [1106340711](https://search.worldcat.org/oclc/1106340711).
91. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-94)**
Jadeberg, Max; Simonyan, Karen; Zisserman, Andrew; Kavukcuoglu, Koray (2015). ["Spatial Transformer Networks"](https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*. **28**. [Archived](https://web.archive.org/web/20210725115312/https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf) (PDF) from the original on 2021-07-25. Retrieved 2021-03-26 – via NIPS.
92. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-95)**
Sabour, Sara; Frosst, Nicholas; Hinton, Geoffrey E. (2017-10-26). *Dynamic Routing Between Capsules*. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [1106278545](https://search.worldcat.org/oclc/1106278545).
93. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-96)**
Matiz, Sergio; [Barner, Kenneth E.](https://en.wikipedia.org/wiki/Kenneth_E._Barner "Kenneth E. Barner") (2019-06-01). ["Inductive conformal predictor for convolutional neural networks: Applications to active learning for image classification"](https://www.sciencedirect.com/science/article/abs/pii/S003132031930055X). *Pattern Recognition*. **90**: 172–182\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019PatRe..90..172M](https://ui.adsabs.harvard.edu/abs/2019PatRe..90..172M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.patcog.2019.01.035](https://doi.org/10.1016%2Fj.patcog.2019.01.035). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0031-3203](https://search.worldcat.org/issn/0031-3203). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [127253432](https://api.semanticscholar.org/CorpusID:127253432). [Archived](https://web.archive.org/web/20210929092610/https://www.sciencedirect.com/science/article/abs/pii/S003132031930055X) from the original on 2021-09-29. Retrieved 2021-09-29.
94. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-97)**
Wieslander, Håkan; Harrison, Philip J.; Skogberg, Gabriel; Jackson, Sonya; Fridén, Markus; Karlsson, Johan; Spjuth, Ola; Wählby, Carolina (February 2021). ["Deep Learning With Conformal Prediction for Hierarchical Analysis of Large-Scale Whole-Slide Tissue Images"](https://doi.org/10.1109%2FJBHI.2020.2996300). *IEEE Journal of Biomedical and Health Informatics*. **25** (2): 371–380\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2021IJBHI..25..371W](https://ui.adsabs.harvard.edu/abs/2021IJBHI..25..371W). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/JBHI.2020.2996300](https://doi.org/10.1109%2FJBHI.2020.2996300). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2168-2208](https://search.worldcat.org/issn/2168-2208). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [32750907](https://pubmed.ncbi.nlm.nih.gov/32750907). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [219885788](https://api.semanticscholar.org/CorpusID:219885788).
95. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-98)**
Srivastava, Nitish; C. Geoffrey Hinton; Alex Krizhevsky; Ilya Sutskever; Ruslan Salakhutdinov (2014). ["Dropout: A Simple Way to Prevent Neural Networks from overfitting"](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) (PDF). *Journal of Machine Learning Research*. **15** (1): 1929–1958\. [Archived](https://web.archive.org/web/20160119155849/http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) (PDF) from the original on 2016-01-19. Retrieved 2015-01-03.
96. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-99)**
["Regularization of Neural Networks using DropConnect \| ICML 2013 \| JMLR W\&CP"](http://proceedings.mlr.press/v28/wan13.html). *jmlr.org*: 1058–1066\. 2013-02-13. [Archived](https://web.archive.org/web/20170812080411/http://proceedings.mlr.press/v28/wan13.html) from the original on 2017-08-12. Retrieved 2015-12-17.
97. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-100)**
Zeiler, Matthew D.; Fergus, Rob (2013-01-15). "Stochastic Pooling for Regularization of Deep Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1301\.3557](https://arxiv.org/abs/1301.3557) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
98. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:3_101-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:3_101-1)
Platt, John; Steinkraus, Dave; Simard, Patrice Y. (August 2003). ["Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis – Microsoft Research"](https://www.microsoft.com/en-us/research/publication/best-practices-for-convolutional-neural-networks-applied-to-visual-document-analysis/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D68920). *Microsoft Research*. [Archived](https://web.archive.org/web/20171107112839/https://www.microsoft.com/en-us/research/publication/best-practices-for-convolutional-neural-networks-applied-to-visual-document-analysis/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D68920) from the original on 2017-11-07. Retrieved 2015-12-17.
99. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-102)**
Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R. (2012). "Improving neural networks by preventing co-adaptation of feature detectors". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1207\.0580](https://arxiv.org/abs/1207.0580) \[[cs.NE](https://arxiv.org/archive/cs.NE)\].
100. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-103)**
["Dropout: A Simple Way to Prevent Neural Networks from Overfitting"](https://jmlr.org/papers/v15/srivastava14a.html). *jmlr.org*. [Archived](https://web.archive.org/web/20160305010425/http://jmlr.org/papers/v15/srivastava14a.html) from the original on 2016-03-05. Retrieved 2015-12-17.
101. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-104)**
Hinton, Geoffrey (1979). "Some demonstrations of the effects of structural descriptions in mental imagery". *Cognitive Science*. **3** (3): 231–250\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/s0364-0213(79)80008-7](https://doi.org/10.1016%2Fs0364-0213%2879%2980008-7).
102. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-105)** Rock, Irvin. "The frame of reference." The legacy of Solomon Asch: Essays in cognition and social psychology (1990): 243–268.
103. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-106)** J. Hinton, Coursera lectures on Neural Networks, 2012, Url: <https://www.coursera.org/learn/neural-networks> [Archived](https://web.archive.org/web/20161231174321/https://www.coursera.org/learn/neural-networks) 2016-12-31 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")
104. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-quartz_107-0)**
Dave Gershgorn (18 June 2018). ["The inside story of how AI got good enough to dominate Silicon Valley"](https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/). *[Quartz](https://en.wikipedia.org/wiki/Quartz_\(website\) "Quartz (website)")*. [Archived](https://web.archive.org/web/20191212224842/https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/) from the original on 12 December 2019. Retrieved 5 October 2018.
105. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-108)**
Lawrence, Steve; C. Lee Giles; Ah Chung Tsoi; Andrew D. Back (1997). "Face Recognition: A Convolutional Neural Network Approach". *IEEE Transactions on Neural Networks*. **8** (1): 98–113\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.92.5813](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.5813). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/72.554195](https://doi.org/10.1109%2F72.554195). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18255614](https://pubmed.ncbi.nlm.nih.gov/18255614). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2883848](https://api.semanticscholar.org/CorpusID:2883848).
106. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-video_quality_109-0)**
Le Callet, Patrick; Christian Viard-Gaudin; Dominique Barba (2006). ["A Convolutional Neural Network Approach for Objective Video Quality Assessment"](https://hal.archives-ouvertes.fr/file/index/docid/287426/filename/A_convolutional_neural_network_approach_for_objective_video_quality_assessment_completefinal_manuscript.pdf) (PDF). *IEEE Transactions on Neural Networks*. **17** (5): 1316–1327\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2006ITNN...17.1316L](https://ui.adsabs.harvard.edu/abs/2006ITNN...17.1316L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TNN.2006.879766](https://doi.org/10.1109%2FTNN.2006.879766). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [17001990](https://pubmed.ncbi.nlm.nih.gov/17001990). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [221185563](https://api.semanticscholar.org/CorpusID:221185563). [Archived](https://web.archive.org/web/20210224123804/https://hal.archives-ouvertes.fr/file/index/docid/287426/filename/A_convolutional_neural_network_approach_for_objective_video_quality_assessment_completefinal_manuscript.pdf) (PDF) from the original on 24 February 2021. Retrieved 17 November 2013.
107. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ILSVRC2014_110-0)**
["ImageNet Large Scale Visual Recognition Competition 2014 (ILSVRC2014)"](https://image-net.org/challenges/LSVRC/2014/results). [Archived](https://web.archive.org/web/20160205153105/http://www.image-net.org/challenges/LSVRC/2014/results) from the original on 5 February 2016. Retrieved 30 January 2016.
108. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-googlenet_111-0)**
Szegedy, Christian; Liu, Wei; Jia, Yangqing; Sermanet, Pierre; Reed, Scott E.; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (2015). "Going deeper with convolutions". *IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015*. IEEE Computer Society. pp. 1–9\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1409\.4842](https://arxiv.org/abs/1409.4842). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CVPR.2015.7298594](https://doi.org/10.1109%2FCVPR.2015.7298594). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4673-6964-0](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4673-6964-0 "Special:BookSources/978-1-4673-6964-0")
.
109. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-112)**
[Russakovsky, Olga](https://en.wikipedia.org/wiki/Olga_Russakovsky "Olga Russakovsky"); Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; Ma, Sean; Huang, Zhiheng; [Karpathy, Andrej](https://en.wikipedia.org/wiki/Andrej_Karpathy "Andrej Karpathy"); Khosla, Aditya; Bernstein, Michael; Berg, Alexander C.; Fei-Fei, Li (2014). "Image *Net* Large Scale Visual Recognition Challenge". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1409\.0575](https://arxiv.org/abs/1409.0575) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
110. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-113)**
["The Face Detection Algorithm Set To Revolutionize Image Search"](https://www.technologyreview.com/2015/02/16/169357/the-face-detection-algorithm-set-to-revolutionize-image-search/). *Technology Review*. February 16, 2015. [Archived](https://web.archive.org/web/20200920130711/https://www.technologyreview.com/2015/02/16/169357/the-face-detection-algorithm-set-to-revolutionize-image-search/) from the original on 20 September 2020. Retrieved 27 October 2017.
111. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-114)**
Baccouche, Moez; Mamalet, Franck; Wolf, Christian; Garcia, Christophe; Baskurt, Atilla (2011-11-16). "Sequential Deep Learning for Human Action Recognition". In Salah, Albert Ali; Lepri, Bruno (eds.). *Human Behavior Unterstanding*. Lecture Notes in Computer Science. Vol. 7065. Springer Berlin Heidelberg. pp. 29–39\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.385.4740](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.385.4740). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-642-25446-8\_4](https://doi.org/10.1007%2F978-3-642-25446-8_4). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-642-25445-1](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-25445-1 "Special:BookSources/978-3-642-25445-1")
.
112. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-115)**
Ji, Shuiwang; Xu, Wei; Yang, Ming; Yu, Kai (2013-01-01). "3D Convolutional Neural Networks for Human Action Recognition". *IEEE Transactions on Pattern Analysis and Machine Intelligence*. **35** (1): 221–231\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2013ITPAM..35..221J](https://ui.adsabs.harvard.edu/abs/2013ITPAM..35..221J). [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.169.4046](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.4046). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TPAMI.2012.59](https://doi.org/10.1109%2FTPAMI.2012.59). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0162-8828](https://search.worldcat.org/issn/0162-8828). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [22392705](https://pubmed.ncbi.nlm.nih.gov/22392705). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [1923924](https://api.semanticscholar.org/CorpusID:1923924).
113. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-116)**
Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (2018). "Video-based Sign Language Recognition without Temporal Segmentation". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1801\.10111](https://arxiv.org/abs/1801.10111) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
114. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-117)** Karpathy, Andrej, et al. "[Large-scale video classification with convolutional neural networks](https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf) [Archived](https://web.archive.org/web/20190806022753/https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf) 2019-08-06 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014.
115. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-118)**
Simonyan, Karen; Zisserman, Andrew (2014). "Two-Stream Convolutional Networks for Action Recognition in Videos". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1406\.2199](https://arxiv.org/abs/1406.2199) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
(2014).
116. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Wang_Duan_Zhang_Niu_p=1657_119-0)**
Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22). ["Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation"](https://qilin-zhang.github.io/_pages/pdfs/Segment-Tube_Spatio-Temporal_Action_Localization_in_Untrimmed_Videos_with_Per-Frame_Segmentation.pdf) (PDF). *Sensors*. **18** (5): 1657. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2018Senso..18.1657W](https://ui.adsabs.harvard.edu/abs/2018Senso..18.1657W). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/s18051657](https://doi.org/10.3390%2Fs18051657). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1424-8220](https://search.worldcat.org/issn/1424-8220). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [5982167](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5982167). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [29789447](https://pubmed.ncbi.nlm.nih.gov/29789447). [Archived](https://web.archive.org/web/20210301195518/https://qilin-zhang.github.io/_pages/pdfs/Segment-Tube_Spatio-Temporal_Action_Localization_in_Untrimmed_Videos_with_Per-Frame_Segmentation.pdf) (PDF) from the original on 2021-03-01. Retrieved 2018-09-14.
117. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Duan_Wang_Zhai_Zheng_2018_p._120-0)**
Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). "Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation". *2018 25th IEEE International Conference on Image Processing (ICIP)*. 25th IEEE International Conference on Image Processing (ICIP). pp. 918–922\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/icip.2018.8451692](https://doi.org/10.1109%2Ficip.2018.8451692). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4799-7061-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-7061-2 "Special:BookSources/978-1-4799-7061-2")
.
118. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-121)**
Taylor, Graham W.; Fergus, Rob; LeCun, Yann; Bregler, Christoph (2010-01-01). [*Convolutional Learning of Spatio-temporal Features*](https://dl.acm.org/doi/10.5555/1888212). Proceedings of the 11th European Conference on Computer Vision: Part VI. ECCV'10. Berlin, Heidelberg: Springer-Verlag. pp. 140–153\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-642-15566-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-15566-6 "Special:BookSources/978-3-642-15566-6")
. [Archived](https://web.archive.org/web/20220331211137/https://dl.acm.org/doi/10.5555/1888212) from the original on 2022-03-31. Retrieved 2022-03-31.
119. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-122)**
Le, Q. V.; Zou, W. Y.; Yeung, S. Y.; Ng, A. Y. (2011-01-01). "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis". *CVPR 2011*. CVPR '11. Washington, DC, US: IEEE Computer Society. pp. 3361–3368\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.294.5948](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.5948). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CVPR.2011.5995496](https://doi.org/10.1109%2FCVPR.2011.5995496). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4577-0394-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4577-0394-2 "Special:BookSources/978-1-4577-0394-2")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [6006618](https://api.semanticscholar.org/CorpusID:6006618).
120. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-123)**
Grefenstette, Edward; Blunsom, Phil; de Freitas, Nando; Hermann, Karl Moritz (2014-04-29). "A Deep Architecture for Semantic Parsing". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1404\.7296](https://arxiv.org/abs/1404.7296) \[[cs.CL](https://arxiv.org/archive/cs.CL)\].
121. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-124)**
Mesnil, Gregoire; Deng, Li; Gao, Jianfeng; He, Xiaodong; Shen, Yelong (April 2014). ["Learning Semantic Representations Using Convolutional Neural Networks for Web Search – Microsoft Research"](https://www.microsoft.com/en-us/research/publication/learning-semantic-representations-using-convolutional-neural-networks-for-web-search/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D214617). *Microsoft Research*. [Archived](https://web.archive.org/web/20170915160617/https://www.microsoft.com/en-us/research/publication/learning-semantic-representations-using-convolutional-neural-networks-for-web-search/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D214617) from the original on 2017-09-15. Retrieved 2015-12-17.
122. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-125)**
Kalchbrenner, Nal; Grefenstette, Edward; Blunsom, Phil (2014-04-08). "A Convolutional Neural Network for Modelling Sentences". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1404\.2188](https://arxiv.org/abs/1404.2188) \[[cs.CL](https://arxiv.org/archive/cs.CL)\].
123. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-126)**
Kim, Yoon (2014-08-25). "Convolutional Neural Networks for Sentence Classification". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1408\.5882](https://arxiv.org/abs/1408.5882) \[[cs.CL](https://arxiv.org/archive/cs.CL)\].
124. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-127)** Collobert, Ronan, and Jason Weston. "[A unified architecture for natural language processing: Deep neural networks with multitask learning](https://thetalkingmachines.com/sites/default/files/2018-12/unified_nlp.pdf) [Archived](https://web.archive.org/web/20190904161653/https://thetalkingmachines.com/sites/default/files/2018-12/unified_nlp.pdf) 2019-09-04 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")."Proceedings of the 25th international conference on Machine learning. ACM, 2008.
125. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-128)**
Collobert, Ronan; Weston, Jason; Bottou, Leon; Karlen, Michael; Kavukcuoglu, Koray; Kuksa, Pavel (2011-03-02). "Natural Language Processing (almost) from Scratch". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1103\.0398](https://arxiv.org/abs/1103.0398) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
126. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-129)**
Yin, W; Kann, K; Yu, M; Schütze, H (2017-03-02). "Comparative study of CNN and RNN for natural language processing". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.01923](https://arxiv.org/abs/1702.01923) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
127. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-130)**
Bai, S.; Kolter, J.S.; Koltun, V. (2018). "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1803\.01271](https://arxiv.org/abs/1803.01271) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
128. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-131)**
Gruber, N. (2021). "Detecting dynamics of action in text with a recurrent neural network". *Neural Computing and Applications*. **33** (12): 15709–15718\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/S00521-021-06190-5](https://doi.org/10.1007%2FS00521-021-06190-5). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [236307579](https://api.semanticscholar.org/CorpusID:236307579).
129. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-132)**
Haotian, J.; Zhong, Li; Qianxiao, Li (2021). "Approximation Theory of Convolutional Architectures for Time Series Modelling". *International Conference on Machine Learning*. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2107\.09355](https://arxiv.org/abs/2107.09355).
130. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-133)**
Bohnslav, James P; Wimalasena, Nivanthika K; Clausing, Kelsey J; Dai, Yu Y; Yarmolinsky, David A; Cruz, Tomás; Kashlan, Adam D; Chiappe, M Eugenia; Orefice, Lauren L; Woolf, Clifford J; Harvey, Christopher D (2021-09-02). ["DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455138). *eLife*. **10** e63377. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.7554/eLife.63377](https://doi.org/10.7554%2FeLife.63377). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2050-084X](https://search.worldcat.org/issn/2050-084X). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [8455138](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455138). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [34473051](https://pubmed.ncbi.nlm.nih.gov/34473051).
131. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:7_134-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:7_134-1)
Gernat, Tim; Jagla, Tobias; Jones, Beryl M.; Middendorf, Martin; Robinson, Gene E. (2023-01-27). ["Automated monitoring of honey bees with barcodes and artificial intelligence reveals two distinct social networks from a single affiliative behavior"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883485). *Scientific Reports*. **13** (1) 1541. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2023NatSR..13.1541G](https://ui.adsabs.harvard.edu/abs/2023NatSR..13.1541G). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/s41598-022-26825-4](https://doi.org/10.1038%2Fs41598-022-26825-4). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2045-2322](https://search.worldcat.org/issn/2045-2322). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [9883485](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883485). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [36707534](https://pubmed.ncbi.nlm.nih.gov/36707534).
132. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-135)**
Norouzzadeh, Mohammad Sadegh; Nguyen, Anh; Kosmala, Margaret; Swanson, Alexandra; Palmer, Meredith S.; Packer, Craig; Clune, Jeff (2018-06-19). ["Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6016780). *Proceedings of the National Academy of Sciences*. **115** (25): E5716–E5725. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2018PNAS..115E5716N](https://ui.adsabs.harvard.edu/abs/2018PNAS..115E5716N). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1073/pnas.1719367115](https://doi.org/10.1073%2Fpnas.1719367115). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0027-8424](https://search.worldcat.org/issn/0027-8424). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6016780](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6016780). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [29871948](https://pubmed.ncbi.nlm.nih.gov/29871948).
133. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-136)**
Svenning, Asger; Mougeot, Guillaume; Alison, Jamie; Chevalier, Daphne; Molina, Nisa Luise Chavez; Ong, Song-Quan; Bjerge, Kim; Carrillo, Juli; Hoeye, Toke Thomas (2025-04-14). "A General Method for Detection and Segmentation of Terrestrial Arthropods in Images". [bioRxiv](https://en.wikipedia.org/wiki/BioRxiv_\(identifier\) "BioRxiv (identifier)") [10\.1101/2025.04.08.647223](https://doi.org/10.1101%2F2025.04.08.647223).
134. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-137)**
Torrents, Jordi; Costa, Tiago; De Polavieja, Gonzalo G. (2025-06-02). "New idtracker.ai: rethinking multi-animal tracking as a representation learning problem to increase accuracy and reduce tracking times". [bioRxiv](https://en.wikipedia.org/wiki/BioRxiv_\(identifier\) "BioRxiv (identifier)") [10\.1101/2025.05.30.657023](https://doi.org/10.1101%2F2025.05.30.657023).
135. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-138)**
Mathis, Alexander; Mamidanna, Pranav; Cury, Kevin M.; Abe, Taiga; Murthy, Venkatesh N.; Mathis, Mackenzie Weygandt; Bethge, Matthias (September 2018). ["DeepLabCut: markerless pose estimation of user-defined body parts with deep learning"](https://www.nature.com/articles/s41593-018-0209-y). *Nature Neuroscience*. **21** (9): 1281–1289\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/s41593-018-0209-y](https://doi.org/10.1038%2Fs41593-018-0209-y). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1097-6256](https://search.worldcat.org/issn/1097-6256). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [30127430](https://pubmed.ncbi.nlm.nih.gov/30127430).
136. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-139)**
Graving, Jacob M; Chae, Daniel; Naik, Hemal; Li, Liang; Koger, Benjamin; Costelloe, Blair R; Couzin, Iain D (2019-10-01). ["DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897514). *eLife*. **8** e47994. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019eLife...847994G](https://ui.adsabs.harvard.edu/abs/2019eLife...847994G). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.7554/eLife.47994](https://doi.org/10.7554%2FeLife.47994). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2050-084X](https://search.worldcat.org/issn/2050-084X). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6897514](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897514). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [31570119](https://pubmed.ncbi.nlm.nih.gov/31570119).
137. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-140)**
Pereira, Talmo D.; Tabris, Nathaniel; Matsliah, Arie; Turner, David M.; Li, Junyu; Ravindranath, Shruthi; Papadoyannis, Eleni S.; Normand, Edna; Deutsch, David S.; Wang, Z. Yan; McKenzie-Smith, Grace C.; Mitelut, Catalin C.; Castro, Marielisa Diez; D’Uva, John; Kislin, Mikhail (May 2022). ["Publisher Correction: SLEAP: A deep learning system for multi-animal pose tracking"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9119847). *Nature Methods*. **19** (5): 628. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/s41592-022-01495-2](https://doi.org/10.1038%2Fs41592-022-01495-2). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1548-7091](https://search.worldcat.org/issn/1548-7091). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [9119847](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9119847). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [35468969](https://pubmed.ncbi.nlm.nih.gov/35468969).
138. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:8_141-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:8_141-1)
Arac, Ahmet; Zhao, Pingping; Dobkin, Bruce H.; Carmichael, S. Thomas; Golshani, Peyman (2019-05-07). ["DeepBehavior: A Deep Learning Toolbox for Automated Analysis of Animal and Human Behavior Imaging Data"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513883). *Frontiers in Systems Neuroscience*. **13** 20. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3389/fnsys.2019.00020](https://doi.org/10.3389%2Ffnsys.2019.00020). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1662-5137](https://search.worldcat.org/issn/1662-5137). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6513883](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513883). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [31133826](https://pubmed.ncbi.nlm.nih.gov/31133826).
139. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-142)**
Ren, Hansheng; Xu, Bixiong; Wang, Yujing; Yi, Chao; Huang, Congrui; Kou, Xiaoyu; Xing, Tony; Yang, Mao; Tong, Jie; Zhang, Qi (2019). *Time-Series Anomaly Detection Service at Microsoft \| Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1906\.03821](https://arxiv.org/abs/1906.03821). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/3292500.3330680](https://doi.org/10.1145%2F3292500.3330680). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [182952311](https://api.semanticscholar.org/CorpusID:182952311).
140. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-143)**
Wallach, Izhar; Dzamba, Michael; Heifets, Abraham (2015-10-09). "AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1510\.02855](https://arxiv.org/abs/1510.02855) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
141. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-144)**
Yosinski, Jason; Clune, Jeff; Nguyen, Anh; Fuchs, Thomas; Lipson, Hod (2015-06-22). "Understanding Neural Networks Through Deep Visualization". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1506\.06579](https://arxiv.org/abs/1506.06579) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
142. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-145)**
["Toronto startup has a faster way to discover effective medicines"](https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/). *The Globe and Mail*. [Archived](https://web.archive.org/web/20151020040115/http://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/) from the original on 2015-10-20. Retrieved 2015-11-09.
143. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-146)**
["Startup Harnesses Supercomputers to Seek Cures"](https://www.kqed.org/futureofyou/3461/startup-harnesses-supercomputers-to-seek-cures). *KQED Future of You*. 2015-05-27. [Archived](https://web.archive.org/web/20181206234956/https://www.kqed.org/futureofyou/3461/startup-harnesses-supercomputers-to-seek-cures) from the original on 2018-12-06. Retrieved 2015-11-09.
144. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-147)**
Chellapilla, K; Fogel, DB (1999). "Evolving neural networks to play checkers without relying on expert knowledge". *IEEE Trans Neural Netw*. **10** (6): 1382–91\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1999ITNN...10.1382C](https://ui.adsabs.harvard.edu/abs/1999ITNN...10.1382C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/72.809083](https://doi.org/10.1109%2F72.809083). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18252639](https://pubmed.ncbi.nlm.nih.gov/18252639).
145. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-148)**
Chellapilla, K.; Fogel, D.B. (2001). "Evolving an expert checkers playing program without using human expertise". *IEEE Transactions on Evolutionary Computation*. **5** (4): 422–428\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2001ITEC....5..422C](https://ui.adsabs.harvard.edu/abs/2001ITEC....5..422C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/4235.942536](https://doi.org/10.1109%2F4235.942536).
146. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-149)**
[Fogel, David](https://en.wikipedia.org/wiki/David_B._Fogel "David B. Fogel") (2001). *Blondie24: Playing at the Edge of AI*. San Francisco, CA: Morgan Kaufmann. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-55860-783-5](https://en.wikipedia.org/wiki/Special:BookSources/978-1-55860-783-5 "Special:BookSources/978-1-55860-783-5")
.
147. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-150)**
Clark, Christopher; Storkey, Amos (2014). "Teaching Deep Convolutional Neural Networks to Play Go". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.3409](https://arxiv.org/abs/1412.3409) \[[cs.AI](https://arxiv.org/archive/cs.AI)\].
148. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-151)**
Maddison, Chris J.; Huang, Aja; Sutskever, Ilya; Silver, David (2014). "Move Evaluation in Go Using Deep Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.6564](https://arxiv.org/abs/1412.6564) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
149. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-152)**
["AlphaGo – Google DeepMind"](https://web.archive.org/web/20160130230207/http://www.deepmind.com/alpha-go.html). Archived from [the original](https://www.deepmind.com/alpha-go.html) on 30 January 2016. Retrieved 30 January 2016.
150. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-153)**
Bai, Shaojie; Kolter, J. Zico; Koltun, Vladlen (2018-04-19). "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1803\.01271](https://arxiv.org/abs/1803.01271) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
151. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-154)**
Yu, Fisher; Koltun, Vladlen (2016-04-30). "Multi-Scale Context Aggregation by Dilated Convolutions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1511\.07122](https://arxiv.org/abs/1511.07122) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
152. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-155)**
Borovykh, Anastasia; Bohte, Sander; Oosterlee, Cornelis W. (2018-09-17). "Conditional Time Series Forecasting with Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1703\.04691](https://arxiv.org/abs/1703.04691) \[[stat.ML](https://arxiv.org/archive/stat.ML)\].
153. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-156)**
Mittelman, Roni (2015-08-03). "Time-series modeling with undecimated fully convolutional neural networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1508\.00317](https://arxiv.org/abs/1508.00317) \[[stat.ML](https://arxiv.org/archive/stat.ML)\].
154. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-157)**
Chen, Yitian; Kang, Yanfei; Chen, Yixiong; Wang, Zizhuo (2019-06-11). "Probabilistic Forecasting with Temporal Convolutional Neural Network". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1906\.04397](https://arxiv.org/abs/1906.04397) \[[stat.ML](https://arxiv.org/archive/stat.ML)\].
155. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-158)**
Zhao, Bendong; Lu, Huanzhang; Chen, Shangfeng; Liu, Junliang; Wu, Dongya (2017-02-01). "Convolutional neural networks for time series classi". *Journal of Systems Engineering and Electronics*. **28** (1): 162–169\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.21629/JSEE.2017.01.18](https://doi.org/10.21629%2FJSEE.2017.01.18).
156. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-159)**
Petneházi, Gábor (2019-08-21). "QCNN: Quantile Convolutional Neural Network". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1908\.07978](https://arxiv.org/abs/1908.07978) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
157. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-HeiCuBeDa_Hilprecht_160-0)**
[Hubert Mara](https://en.wikipedia.org/wiki/Hubert_Mara "Hubert Mara") (2019-06-07), *HeiCuBeDa Hilprecht – Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection* (in German), heiDATA – institutional repository for research data of Heidelberg University, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.11588/data/IE8CCN](https://doi.org/10.11588%2Fdata%2FIE8CCN)
158. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ICDAR19_161-0)**
Hubert Mara and Bartosz Bogacz (2019), "Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets", *Proceedings of the 15th International Conference on Document Analysis and Recognition (ICDAR)* (in German), Sydney, Australien, pp. 148–153, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICDAR.2019.00032](https://doi.org/10.1109%2FICDAR.2019.00032), [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-7281-3014-9](https://en.wikipedia.org/wiki/Special:BookSources/978-1-7281-3014-9 "Special:BookSources/978-1-7281-3014-9")
, [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [211026941](https://api.semanticscholar.org/CorpusID:211026941)
`{{citation}}`: CS1 maint: work parameter with ISBN ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_work_parameter_with_ISBN "Category:CS1 maint: work parameter with ISBN"))
159. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ICFHR20_162-0)**
Bogacz, Bartosz; Mara, Hubert (2020), "Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks", *Proceedings of the 17th International Conference on Frontiers of Handwriting Recognition (ICFHR)*, Dortmund, Germany
160. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ICFHR20_Presentation_163-0)** [Presentation of the ICFHR paper on Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks](https://www.youtube.com/watch?v=-iFntE51HRw) on [YouTube](https://en.wikipedia.org/wiki/YouTube_video_\(identifier\) "YouTube video (identifier)")
161. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-164)** Durjoy Sen Maitra; Ujjwal Bhattacharya; S.K. Parui, ["CNN based common approach to handwritten character recognition of multiple scripts"](https://ieeexplore.ieee.org/document/7333916) [Archived](https://web.archive.org/web/20231016190918/https://ieeexplore.ieee.org/document/7333916) 2023-10-16 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine"), in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, vol., no., pp.1021–1025, 23–26 Aug. 2015
162. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Interpretable_ML_Symposium_2017_165-0)**
["NIPS 2017"](https://web.archive.org/web/20190907063237/http://interpretable.ml/). *Interpretable ML Symposium*. 2017-10-20. Archived from [the original](http://interpretable.ml/) on 2019-09-07. Retrieved 2018-09-12.
163. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Zang_Wang_Liu_Zhang_2018_pp._97%E2%80%93108_166-0)**
Zang, Jinliang; Wang, Le; Liu, Ziyi; Zhang, Qilin; Hua, Gang; Zheng, Nanning (2018). "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition". *Artificial Intelligence Applications and Innovations*. IFIP Advances in Information and Communication Technology. Vol. 519. Cham: Springer International Publishing. pp. 97–108\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1803\.07179](https://arxiv.org/abs/1803.07179). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-319-92007-8\_9](https://doi.org/10.1007%2F978-3-319-92007-8_9). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-319-92006-1](https://en.wikipedia.org/wiki/Special:BookSources/978-3-319-92006-1 "Special:BookSources/978-3-319-92006-1")
. [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1868-4238](https://search.worldcat.org/issn/1868-4238). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [4058889](https://api.semanticscholar.org/CorpusID:4058889).
164. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Wang_Zang_Zhang_Niu_p=1979_167-0)**
Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-06-21). ["Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network"](https://qilin-zhang.github.io/_pages/pdfs/sensors-18-01979-Action_Recognition_by_an_Attention-Aware_Temporal_Weighted_Convolutional_Neural_Network.pdf) (PDF). *Sensors*. **18** (7): 1979. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2018Senso..18.1979W](https://ui.adsabs.harvard.edu/abs/2018Senso..18.1979W). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/s18071979](https://doi.org/10.3390%2Fs18071979). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1424-8220](https://search.worldcat.org/issn/1424-8220). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6069475](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069475). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [29933555](https://pubmed.ncbi.nlm.nih.gov/29933555). [Archived](https://web.archive.org/web/20180913040055/https://qilin-zhang.github.io/_pages/pdfs/sensors-18-01979-Action_Recognition_by_an_Attention-Aware_Temporal_Weighted_Convolutional_Neural_Network.pdf) (PDF) from the original on 2018-09-13. Retrieved 2018-09-14.
165. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Ong_Chavez_Hong_2015_168-0)**
Ong, Hao Yi; Chavez, Kevin; Hong, Augustus (2015-08-18). "Distributed Deep Q-Learning". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1508\.04186v2](https://arxiv.org/abs/1508.04186v2) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
166. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-DQN_169-0)**
Mnih, Volodymyr; et al. (2015). "Human-level control through deep reinforcement learning". *Nature*. **518** (7540): 529–533\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2015Natur.518..529M](https://ui.adsabs.harvard.edu/abs/2015Natur.518..529M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/nature14236](https://doi.org/10.1038%2Fnature14236). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [25719670](https://pubmed.ncbi.nlm.nih.gov/25719670). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [205242740](https://api.semanticscholar.org/CorpusID:205242740).
167. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-170)**
Sun, R.; Sessions, C. (June 2000). "Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors". *IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics*. **30** (3): 403–418\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2000ITSMB..30..403S](https://ui.adsabs.harvard.edu/abs/2000ITSMB..30..403S). [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.11.226](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.226). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/3477.846230](https://doi.org/10.1109%2F3477.846230). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1083-4419](https://search.worldcat.org/issn/1083-4419). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18252373](https://pubmed.ncbi.nlm.nih.gov/18252373).
168. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-CDBN-CIFAR_171-0)**
["Convolutional Deep Belief Networks on CIFAR-10"](http://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf) (PDF). [Archived](https://web.archive.org/web/20170830060223/http://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf) (PDF) from the original on 2017-08-30. Retrieved 2017-08-18.
169. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-CDBN_172-0)**
Lee, Honglak; Grosse, Roger; Ranganath, Rajesh; Ng, Andrew Y. (1 January 2009). "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations". *Proceedings of the 26th Annual International Conference on Machine Learning*. ACM. pp. 609–616\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.149.6800](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.6800). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/1553374.1553453](https://doi.org/10.1145%2F1553374.1553453). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-60558-516-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60558-516-1 "Special:BookSources/978-1-60558-516-1")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [12008458](https://api.semanticscholar.org/CorpusID:12008458).
170. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-173)**
Behnke, Sven (2003). [*Hierarchical Neural Networks for Image Interpretation*](https://www.ais.uni-bonn.de/books/LNCS2766.pdf) (PDF). Lecture Notes in Computer Science. Vol. 2766. Springer. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/b11963](https://doi.org/10.1007%2Fb11963). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-540-40722-5](https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-40722-5 "Special:BookSources/978-3-540-40722-5")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [1304548](https://api.semanticscholar.org/CorpusID:1304548). [Archived](https://web.archive.org/web/20170810020001/http://www.ais.uni-bonn.de/books/LNCS2766.pdf) (PDF) from the original on 2017-08-10. Retrieved 2016-12-28.
171. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-174)**
Choi, Rene Y.; Coyner, Aaron S.; Kalpathy-Cramer, Jayashree; Chiang, Michael F.; Campbell, J. Peter (February 2020). ["Introduction to Machine Learning, Neural Networks, and Deep Learning"](https://tvst.arvojournals.org/article.aspx?articleid=2762344). *Wired*. [Archived](https://web.archive.org/web/20180113150305/https://www.wired.com/2016/05/google-tpu-custom-chips/) from the original on January 13, 2018. Retrieved March 6, 2017.
## External links
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=72 "Edit section: External links")\]
- [CS231n: Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) — [Andrej Karpathy](https://en.wikipedia.org/wiki/Andrej_Karpathy "Andrej Karpathy")'s [Stanford](https://en.wikipedia.org/wiki/Stanford_University "Stanford University") computer science course on CNNs in computer vision
- [vdumoulin/conv\_arithmetic: A technical report on convolution arithmetic in the context of deep learning](https://github.com/vdumoulin/conv_arithmetic). Animations of convolutions.
| [v](https://en.wikipedia.org/wiki/Template:Artificial_intelligence_navbox "Template:Artificial intelligence navbox") [t](https://en.wikipedia.org/wiki/Template_talk:Artificial_intelligence_navbox "Template talk:Artificial intelligence navbox") [e](https://en.wikipedia.org/wiki/Special:EditPage/Template:Artificial_intelligence_navbox "Special:EditPage/Template:Artificial intelligence navbox")[Artificial intelligence](https://en.wikipedia.org/wiki/Artificial_intelligence "Artificial intelligence") (AI) | |
|---|---|
| [History](https://en.wikipedia.org/wiki/History_of_artificial_intelligence "History of artificial intelligence") [timeline](https://en.wikipedia.org/wiki/Timeline_of_artificial_intelligence "Timeline of artificial intelligence") [Glossary](https://en.wikipedia.org/wiki/Glossary_of_artificial_intelligence "Glossary of artificial intelligence") [Companies](https://en.wikipedia.org/wiki/List_of_artificial_intelligence_companies "List of artificial intelligence companies") [Projects](https://en.wikipedia.org/wiki/List_of_artificial_intelligence_projects "List of artificial intelligence projects") | |
| Concepts | [Parameter](https://en.wikipedia.org/wiki/Parameter "Parameter") [Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_\(machine_learning\) "Hyperparameter (machine learning)") [Loss functions](https://en.wikipedia.org/wiki/Loss_functions_for_classification "Loss functions for classification") [Regression](https://en.wikipedia.org/wiki/Regression_analysis "Regression analysis") [Bias–variance tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff "Bias–variance tradeoff") [Double descent](https://en.wikipedia.org/wiki/Double_descent "Double descent") [Overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting") [Clustering](https://en.wikipedia.org/wiki/Cluster_analysis "Cluster analysis") [Gradient descent](https://en.wikipedia.org/wiki/Gradient_descent "Gradient descent") [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent "Stochastic gradient descent") [Quasi-Newton method](https://en.wikipedia.org/wiki/Quasi-Newton_method "Quasi-Newton method") [Conjugate gradient method](https://en.wikipedia.org/wiki/Conjugate_gradient_method "Conjugate gradient method") [Backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation") [Attention](https://en.wikipedia.org/wiki/Attention_\(machine_learning\) "Attention (machine learning)") [Convolution](https://en.wikipedia.org/wiki/Convolution "Convolution") [Normalization](https://en.wikipedia.org/wiki/Normalization_\(machine_learning\) "Normalization (machine learning)") [Batchnorm](https://en.wikipedia.org/wiki/Batch_normalization "Batch normalization") [Activation](https://en.wikipedia.org/wiki/Activation_function "Activation function") [Softmax](https://en.wikipedia.org/wiki/Softmax_function "Softmax function") [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function "Sigmoid function") [Rectifier](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)") [Gating](https://en.wikipedia.org/wiki/Gating_mechanism "Gating mechanism") [Weight initialization](https://en.wikipedia.org/wiki/Weight_initialization "Weight initialization") [Regularization](https://en.wikipedia.org/wiki/Regularization_\(mathematics\) "Regularization (mathematics)") [Datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_data_sets "Training, validation, and test data sets") [Augmentation](https://en.wikipedia.org/wiki/Data_augmentation "Data augmentation") [Prompt engineering](https://en.wikipedia.org/wiki/Prompt_engineering "Prompt engineering") [Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning "Reinforcement learning") [Q-learning](https://en.wikipedia.org/wiki/Q-learning "Q-learning") [SARSA](https://en.wikipedia.org/wiki/State%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action "State–action–reward–state–action") [Imitation](https://en.wikipedia.org/wiki/Imitation_learning "Imitation learning") [Policy gradient](https://en.wikipedia.org/wiki/Policy_gradient_method "Policy gradient method") [Diffusion](https://en.wikipedia.org/wiki/Diffusion_process "Diffusion process") [Latent diffusion model](https://en.wikipedia.org/wiki/Latent_diffusion_model "Latent diffusion model") [Autoregression](https://en.wikipedia.org/wiki/Autoregressive_model "Autoregressive model") [Adversary](https://en.wikipedia.org/wiki/Adversarial_machine_learning "Adversarial machine learning") [RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation "Retrieval-augmented generation") [Uncanny valley](https://en.wikipedia.org/wiki/Uncanny_valley "Uncanny valley") [RLHF](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback "Reinforcement learning from human feedback") [Self-supervised learning](https://en.wikipedia.org/wiki/Self-supervised_learning "Self-supervised learning") [Reflection](https://en.wikipedia.org/wiki/Reflection_\(artificial_intelligence\) "Reflection (artificial intelligence)") [Recursive self-improvement](https://en.wikipedia.org/wiki/Recursive_self-improvement "Recursive self-improvement") [Hallucination](https://en.wikipedia.org/wiki/Hallucination_\(artificial_intelligence\) "Hallucination (artificial intelligence)") [Word embedding](https://en.wikipedia.org/wiki/Word_embedding "Word embedding") [Vibe coding](https://en.wikipedia.org/wiki/Vibe_coding "Vibe coding") [Symbolic AI](https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence "Symbolic artificial intelligence") |
| [Applications](https://en.wikipedia.org/wiki/Applications_of_artificial_intelligence "Applications of artificial intelligence") | [Machine learning](https://en.wikipedia.org/wiki/Machine_learning "Machine learning") [In-context learning](https://en.wikipedia.org/wiki/Prompt_engineering#In-context_learning "Prompt engineering") [Artificial neural network](https://en.wikipedia.org/wiki/Neural_network_\(machine_learning\) "Neural network (machine learning)") [Deep learning](https://en.wikipedia.org/wiki/Deep_learning "Deep learning") [Language model](https://en.wikipedia.org/wiki/Language_model "Language model") [Large](https://en.wikipedia.org/wiki/Large_language_model "Large language model") [NMT](https://en.wikipedia.org/wiki/Neural_machine_translation "Neural machine translation") [Reasoning](https://en.wikipedia.org/wiki/Reasoning_model "Reasoning model") [Model Context Protocol](https://en.wikipedia.org/wiki/Model_Context_Protocol "Model Context Protocol") [Intelligent agent](https://en.wikipedia.org/wiki/Intelligent_agent "Intelligent agent") [AI agent](https://en.wikipedia.org/wiki/AI_agent "AI agent") [Artificial human companion](https://en.wikipedia.org/wiki/Artificial_human_companion "Artificial human companion") [Humanity's Last Exam](https://en.wikipedia.org/wiki/Humanity%27s_Last_Exam "Humanity's Last Exam") [Lethal autonomous weapons (LAWs)](https://en.wikipedia.org/wiki/Lethal_autonomous_weapon "Lethal autonomous weapon") [Generative artificial intelligence (GenAI)](https://en.wikipedia.org/wiki/Generative_artificial_intelligence "Generative artificial intelligence") [Weak AI](https://en.wikipedia.org/wiki/Weak_artificial_intelligence "Weak artificial intelligence") (Hypothetical: [Artificial general intelligence (AGI)](https://en.wikipedia.org/wiki/Artificial_general_intelligence "Artificial general intelligence")) (Hypothetical: [Artificial superintelligence (ASI)](https://en.wikipedia.org/wiki/Artificial_superintelligence "Artificial superintelligence")) [Agent2Agent protocol](https://en.wikipedia.org/wiki/Agent2Agent "Agent2Agent") |
| Implementations | |
| | |
| Audio–visual | [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet") [WaveNet](https://en.wikipedia.org/wiki/WaveNet "WaveNet") [Human image synthesis](https://en.wikipedia.org/wiki/Human_image_synthesis "Human image synthesis") [HWR](https://en.wikipedia.org/wiki/Handwriting_recognition "Handwriting recognition") [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition "Optical character recognition") [Computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision") [Speech synthesis](https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis "Deep learning speech synthesis") [15\.ai](https://en.wikipedia.org/wiki/15.ai "15.ai") [ElevenLabs](https://en.wikipedia.org/wiki/ElevenLabs "ElevenLabs") [Speech recognition](https://en.wikipedia.org/wiki/Speech_recognition "Speech recognition") [Whisper](https://en.wikipedia.org/wiki/Whisper_\(speech_recognition_system\) "Whisper (speech recognition system)") [Facial recognition](https://en.wikipedia.org/wiki/Facial_recognition_system "Facial recognition system") [AlphaFold](https://en.wikipedia.org/wiki/AlphaFold "AlphaFold") [Text-to-image models](https://en.wikipedia.org/wiki/Text-to-image_model "Text-to-image model") [Aurora](https://en.wikipedia.org/wiki/Aurora_\(text-to-image_model\) "Aurora (text-to-image model)") [DALL-E](https://en.wikipedia.org/wiki/DALL-E "DALL-E") [Firefly](https://en.wikipedia.org/wiki/Adobe_Firefly "Adobe Firefly") [Flux](https://en.wikipedia.org/wiki/Flux_\(text-to-image_model\) "Flux (text-to-image model)") [GPT Image](https://en.wikipedia.org/wiki/GPT_Image "GPT Image") [Ideogram](https://en.wikipedia.org/wiki/Ideogram_\(text-to-image_model\) "Ideogram (text-to-image model)") [Imagen](https://en.wikipedia.org/wiki/Imagen_\(text-to-image_model\) "Imagen (text-to-image model)") [Midjourney](https://en.wikipedia.org/wiki/Midjourney "Midjourney") [Recraft](https://en.wikipedia.org/wiki/Recraft "Recraft") [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion "Stable Diffusion") [Text-to-video models](https://en.wikipedia.org/wiki/Text-to-video_model "Text-to-video model") [Dream Machine](https://en.wikipedia.org/wiki/Dream_Machine_\(text-to-video_model\) "Dream Machine (text-to-video model)") [Runway Gen](https://en.wikipedia.org/wiki/Runway_\(company\)#Services_and_technologies "Runway (company)") [Hailuo AI](https://en.wikipedia.org/wiki/MiniMax_\(company\)#Hailuo_AI "MiniMax (company)") [Kling](https://en.wikipedia.org/wiki/Kling_AI "Kling AI") [Sora](https://en.wikipedia.org/wiki/Sora_\(text-to-video_model\) "Sora (text-to-video model)") [Seedance](https://en.wikipedia.org/wiki/Seedance_2.0 "Seedance 2.0") [Veo](https://en.wikipedia.org/wiki/Veo_\(text-to-video_model\) "Veo (text-to-video model)") [Music generation](https://en.wikipedia.org/wiki/Music_and_artificial_intelligence "Music and artificial intelligence") [Riffusion](https://en.wikipedia.org/wiki/Riffusion "Riffusion") [Suno](https://en.wikipedia.org/wiki/Suno_\(platform\) "Suno (platform)") [Udio](https://en.wikipedia.org/wiki/Udio "Udio") |
| Text | [Word2vec](https://en.wikipedia.org/wiki/Word2vec "Word2vec") [Seq2seq](https://en.wikipedia.org/wiki/Seq2seq "Seq2seq") [GloVe](https://en.wikipedia.org/wiki/GloVe "GloVe") [BERT](https://en.wikipedia.org/wiki/BERT_\(language_model\) "BERT (language model)") [T5](https://en.wikipedia.org/wiki/T5_\(language_model\) "T5 (language model)") [Llama](https://en.wikipedia.org/wiki/Llama_\(language_model\) "Llama (language model)") [Chinchilla AI](https://en.wikipedia.org/wiki/Chinchilla_\(language_model\) "Chinchilla (language model)") [PaLM](https://en.wikipedia.org/wiki/PaLM "PaLM") [GPT](https://en.wikipedia.org/wiki/Generative_pre-trained_transformer "Generative pre-trained transformer") [Claude](https://en.wikipedia.org/wiki/Claude_\(language_model\) "Claude (language model)") [Gemini](https://en.wikipedia.org/wiki/Gemini_\(chatbot\) "Gemini (chatbot)") [Gemini (language model)](https://en.wikipedia.org/wiki/Gemini_\(language_model\) "Gemini (language model)") [Gemma](https://en.wikipedia.org/wiki/Gemma_\(language_model\) "Gemma (language model)") [Grok](https://en.wikipedia.org/wiki/Grok_\(chatbot\) "Grok (chatbot)") [LaMDA](https://en.wikipedia.org/wiki/LaMDA "LaMDA") [BLOOM](https://en.wikipedia.org/wiki/BLOOM_\(language_model\) "BLOOM (language model)") [DBRX](https://en.wikipedia.org/wiki/DBRX "DBRX") [Project Debater](https://en.wikipedia.org/wiki/Project_Debater "Project Debater") [IBM Watson](https://en.wikipedia.org/wiki/IBM_Watson "IBM Watson") [IBM Watsonx](https://en.wikipedia.org/wiki/IBM_Watsonx "IBM Watsonx") [Granite](https://en.wikipedia.org/wiki/IBM_Granite "IBM Granite") [PanGu-Σ](https://en.wikipedia.org/wiki/Huawei_PanGu "Huawei PanGu") [DeepSeek](https://en.wikipedia.org/wiki/DeepSeek_\(chatbot\) "DeepSeek (chatbot)") [Qwen](https://en.wikipedia.org/wiki/Qwen "Qwen") [Xiaomi MiMo](https://en.wikipedia.org/wiki/Xiaomi_MiMo "Xiaomi MiMo") |
| Decisional | [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo "AlphaGo") [AlphaZero](https://en.wikipedia.org/wiki/AlphaZero "AlphaZero") [OpenAI Five](https://en.wikipedia.org/wiki/OpenAI_Five "OpenAI Five") [Self-driving car](https://en.wikipedia.org/wiki/Self-driving_car "Self-driving car") [MuZero](https://en.wikipedia.org/wiki/MuZero "MuZero") [Action selection](https://en.wikipedia.org/wiki/Action_selection "Action selection") [AutoGPT](https://en.wikipedia.org/wiki/AutoGPT "AutoGPT") [Robot control](https://en.wikipedia.org/wiki/Robot_control "Robot control") |
| People | [Alan Turing](https://en.wikipedia.org/wiki/Alan_Turing "Alan Turing") [Warren Sturgis McCulloch](https://en.wikipedia.org/wiki/Warren_Sturgis_McCulloch "Warren Sturgis McCulloch") [Walter Pitts](https://en.wikipedia.org/wiki/Walter_Pitts "Walter Pitts") [John von Neumann](https://en.wikipedia.org/wiki/John_von_Neumann "John von Neumann") [Christopher D. Manning](https://en.wikipedia.org/wiki/Christopher_D._Manning "Christopher D. Manning") [Claude Shannon](https://en.wikipedia.org/wiki/Claude_Shannon "Claude Shannon") [Shun'ichi Amari](https://en.wikipedia.org/wiki/Shun%27ichi_Amari "Shun'ichi Amari") [Kunihiko Fukushima](https://en.wikipedia.org/wiki/Kunihiko_Fukushima "Kunihiko Fukushima") [Takeo Kanade](https://en.wikipedia.org/wiki/Takeo_Kanade "Takeo Kanade") [Marvin Minsky](https://en.wikipedia.org/wiki/Marvin_Minsky "Marvin Minsky") [John McCarthy](https://en.wikipedia.org/wiki/John_McCarthy_\(computer_scientist\) "John McCarthy (computer scientist)") [Nathaniel Rochester](https://en.wikipedia.org/wiki/Nathaniel_Rochester_\(computer_scientist\) "Nathaniel Rochester (computer scientist)") [Allen Newell](https://en.wikipedia.org/wiki/Allen_Newell "Allen Newell") [Cliff Shaw](https://en.wikipedia.org/wiki/Cliff_Shaw "Cliff Shaw") [Herbert A. Simon](https://en.wikipedia.org/wiki/Herbert_A._Simon "Herbert A. Simon") [Oliver Selfridge](https://en.wikipedia.org/wiki/Oliver_Selfridge "Oliver Selfridge") [Frank Rosenblatt](https://en.wikipedia.org/wiki/Frank_Rosenblatt "Frank Rosenblatt") [Bernard Widrow](https://en.wikipedia.org/wiki/Bernard_Widrow "Bernard Widrow") [Joseph Weizenbaum](https://en.wikipedia.org/wiki/Joseph_Weizenbaum "Joseph Weizenbaum") [Seymour Papert](https://en.wikipedia.org/wiki/Seymour_Papert "Seymour Papert") [Seppo Linnainmaa](https://en.wikipedia.org/wiki/Seppo_Linnainmaa "Seppo Linnainmaa") [Paul Werbos](https://en.wikipedia.org/wiki/Paul_Werbos "Paul Werbos") [Geoffrey Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton "Geoffrey Hinton") [John Hopfield](https://en.wikipedia.org/wiki/John_Hopfield "John Hopfield") [Jürgen Schmidhuber](https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber "Jürgen Schmidhuber") [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun "Yann LeCun") [Yoshua Bengio](https://en.wikipedia.org/wiki/Yoshua_Bengio "Yoshua Bengio") [Lotfi A. Zadeh](https://en.wikipedia.org/wiki/Lotfi_A._Zadeh "Lotfi A. Zadeh") [Stephen Grossberg](https://en.wikipedia.org/wiki/Stephen_Grossberg "Stephen Grossberg") [Alex Graves](https://en.wikipedia.org/wiki/Alex_Graves_\(computer_scientist\) "Alex Graves (computer scientist)") [James Goodnight](https://en.wikipedia.org/wiki/James_Goodnight "James Goodnight") [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng "Andrew Ng") [Fei-Fei Li](https://en.wikipedia.org/wiki/Fei-Fei_Li "Fei-Fei Li") [Alex Krizhevsky](https://en.wikipedia.org/wiki/Alex_Krizhevsky "Alex Krizhevsky") [Ilya Sutskever](https://en.wikipedia.org/wiki/Ilya_Sutskever "Ilya Sutskever") [Oriol Vinyals](https://en.wikipedia.org/wiki/Oriol_Vinyals "Oriol Vinyals") [Quoc V. Le](https://en.wikipedia.org/wiki/Quoc_V._Le "Quoc V. Le") [Ian Goodfellow](https://en.wikipedia.org/wiki/Ian_Goodfellow "Ian Goodfellow") [Demis Hassabis](https://en.wikipedia.org/wiki/Demis_Hassabis "Demis Hassabis") [David Silver](https://en.wikipedia.org/wiki/David_Silver_\(computer_scientist\) "David Silver (computer scientist)") [Andrej Karpathy](https://en.wikipedia.org/wiki/Andrej_Karpathy "Andrej Karpathy") [Ashish Vaswani](https://en.wikipedia.org/wiki/Ashish_Vaswani "Ashish Vaswani") [Noam Shazeer](https://en.wikipedia.org/wiki/Noam_Shazeer "Noam Shazeer") [Aidan Gomez](https://en.wikipedia.org/wiki/Aidan_Gomez "Aidan Gomez") [John Schulman](https://en.wikipedia.org/wiki/John_Schulman "John Schulman") [Mustafa Suleyman](https://en.wikipedia.org/wiki/Mustafa_Suleyman "Mustafa Suleyman") [Jan Leike](https://en.wikipedia.org/wiki/Jan_Leike "Jan Leike") [Daniel Kokotajlo](https://en.wikipedia.org/wiki/Daniel_Kokotajlo_\(researcher\) "Daniel Kokotajlo (researcher)") [François Chollet](https://en.wikipedia.org/wiki/Fran%C3%A7ois_Chollet "François Chollet") |
| Architectures | [Neural Turing machine](https://en.wikipedia.org/wiki/Neural_Turing_machine "Neural Turing machine") [Differentiable neural computer](https://en.wikipedia.org/wiki/Differentiable_neural_computer "Differentiable neural computer") [Transformer](https://en.wikipedia.org/wiki/Transformer_\(deep_learning_architecture\) "Transformer (deep learning architecture)") [Vision transformer (ViT)](https://en.wikipedia.org/wiki/Vision_transformer "Vision transformer") [Recurrent neural network (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network "Recurrent neural network") [Long short-term memory (LSTM)](https://en.wikipedia.org/wiki/Long_short-term_memory "Long short-term memory") [Gated recurrent unit (GRU)](https://en.wikipedia.org/wiki/Gated_recurrent_unit "Gated recurrent unit") [Echo state network](https://en.wikipedia.org/wiki/Echo_state_network "Echo state network") [Multilayer perceptron (MLP)](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") [Convolutional neural network (CNN)]() [Residual neural network (RNN)](https://en.wikipedia.org/wiki/Residual_neural_network "Residual neural network") [Highway network](https://en.wikipedia.org/wiki/Highway_network "Highway network") [Mamba](https://en.wikipedia.org/wiki/Mamba_\(deep_learning_architecture\) "Mamba (deep learning architecture)") [Autoencoder](https://en.wikipedia.org/wiki/Autoencoder "Autoencoder") [Variational autoencoder (VAE)](https://en.wikipedia.org/wiki/Variational_autoencoder "Variational autoencoder") [Generative adversarial network (GAN)](https://en.wikipedia.org/wiki/Generative_adversarial_network "Generative adversarial network") [Graph neural network (GNN)](https://en.wikipedia.org/wiki/Graph_neural_network "Graph neural network") |
| Political | [AI Cold War](https://en.wikipedia.org/wiki/Artificial_Intelligence_Cold_War "Artificial Intelligence Cold War") [AI safety](https://en.wikipedia.org/wiki/AI_safety "AI safety") ([Alignment](https://en.wikipedia.org/wiki/AI_alignment "AI alignment")) [AI takeover](https://en.wikipedia.org/wiki/AI_takeover "AI takeover") [Elections](https://en.wikipedia.org/wiki/Artificial_intelligence_and_elections "Artificial intelligence and elections") [Ethics of AI](https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence "Ethics of artificial intelligence") EU [AI Act](https://en.wikipedia.org/wiki/Artificial_Intelligence_Act "Artificial Intelligence Act") [Nationalism](https://en.wikipedia.org/wiki/AI_nationalism "AI nationalism") [Precautionary principle](https://en.wikipedia.org/wiki/Precautionary_principle "Precautionary principle") [Regulation of AI](https://en.wikipedia.org/wiki/Regulation_of_artificial_intelligence "Regulation of artificial intelligence") [US](https://en.wikipedia.org/wiki/Regulation_of_artificial_intelligence_in_the_United_States "Regulation of artificial intelligence in the United States") [Virtual politician](https://en.wikipedia.org/wiki/Virtual_politician "Virtual politician") |
| Social and economic | [AI boom](https://en.wikipedia.org/wiki/AI_boom "AI boom") [AI bubble](https://en.wikipedia.org/wiki/AI_bubble "AI bubble") [AI data center](https://en.wikipedia.org/wiki/AI_data_center "AI data center") [AI effect](https://en.wikipedia.org/wiki/AI_effect "AI effect") [AI literacy](https://en.wikipedia.org/wiki/AI_literacy "AI literacy") [AI slop](https://en.wikipedia.org/wiki/AI_slop "AI slop") [AI veganism](https://en.wikipedia.org/wiki/AI_veganism "AI veganism") [AI winter](https://en.wikipedia.org/wiki/AI_winter "AI winter") [Anthropomorphism](https://en.wikipedia.org/wiki/AI_anthropomorphism "AI anthropomorphism") [Arms race](https://en.wikipedia.org/wiki/Artificial_intelligence_arms_race "Artificial intelligence arms race") [Competition](https://en.wikipedia.org/wiki/Competition_in_artificial_intelligence "Competition in artificial intelligence") [Environmental impact](https://en.wikipedia.org/wiki/Environmental_impact_of_artificial_intelligence "Environmental impact of artificial intelligence") [Generative engine optimization](https://en.wikipedia.org/wiki/Generative_engine_optimization "Generative engine optimization") [In architecture](https://en.wikipedia.org/wiki/Artificial_intelligence_in_architecture "Artificial intelligence in architecture") [In education](https://en.wikipedia.org/wiki/Artificial_intelligence_in_education "Artificial intelligence in education") [In fiction](https://en.wikipedia.org/wiki/Artificial_intelligence_in_fiction "Artificial intelligence in fiction") [In healthcare](https://en.wikipedia.org/wiki/Artificial_intelligence_in_healthcare "Artificial intelligence in healthcare") [Chatbot psychosis](https://en.wikipedia.org/wiki/Chatbot_psychosis "Chatbot psychosis") [Mental health](https://en.wikipedia.org/wiki/Artificial_intelligence_in_mental_health "Artificial intelligence in mental health") [In video games](https://en.wikipedia.org/wiki/Artificial_intelligence_in_video_games "Artificial intelligence in video games") [In visual art](https://en.wikipedia.org/wiki/Artificial_intelligence_visual_art "Artificial intelligence visual art") [Workplace impact](https://en.wikipedia.org/wiki/Workplace_impact_of_artificial_intelligence "Workplace impact of artificial intelligence") |
|  [Category](https://en.wikipedia.org/wiki/Category:Artificial_intelligence "Category:Artificial intelligence") | |
| | |
|---|---|
| [Authority control databases](https://en.wikipedia.org/wiki/Help:Authority_control "Help:Authority control"): National [](https://www.wikidata.org/wiki/Q17084460#identifiers "Edit this at Wikidata") | [Latvia](https://kopkatalogs.lv/F?func=direct&local_base=lnc10&doc_number=000363559&P_CON_LNG=ENG) |

Retrieved from "<https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&oldid=1346333455>"
[Categories](https://en.wikipedia.org/wiki/Help:Category "Help:Category"):
- [Neural network architectures](https://en.wikipedia.org/wiki/Category:Neural_network_architectures "Category:Neural network architectures")
- [Computer vision](https://en.wikipedia.org/wiki/Category:Computer_vision "Category:Computer vision")
- [Computational neuroscience](https://en.wikipedia.org/wiki/Category:Computational_neuroscience "Category:Computational neuroscience")
Hidden categories:
- [Webarchive template wayback links](https://en.wikipedia.org/wiki/Category:Webarchive_template_wayback_links "Category:Webarchive template wayback links")
- [All articles with dead external links](https://en.wikipedia.org/wiki/Category:All_articles_with_dead_external_links "Category:All articles with dead external links")
- [Articles with dead external links from July 2022](https://en.wikipedia.org/wiki/Category:Articles_with_dead_external_links_from_July_2022 "Category:Articles with dead external links from July 2022")
- [CS1 German-language sources (de)](https://en.wikipedia.org/wiki/Category:CS1_German-language_sources_\(de\) "Category:CS1 German-language sources (de)")
- [CS1 maint: work parameter with ISBN](https://en.wikipedia.org/wiki/Category:CS1_maint:_work_parameter_with_ISBN "Category:CS1 maint: work parameter with ISBN")
- [Articles with short description](https://en.wikipedia.org/wiki/Category:Articles_with_short_description "Category:Articles with short description")
- [Short description is different from Wikidata](https://en.wikipedia.org/wiki/Category:Short_description_is_different_from_Wikidata "Category:Short description is different from Wikidata")
- [All articles with unsourced statements](https://en.wikipedia.org/wiki/Category:All_articles_with_unsourced_statements "Category:All articles with unsourced statements")
- [Articles with unsourced statements from October 2017](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_October_2017 "Category:Articles with unsourced statements from October 2017")
- [All articles needing examples](https://en.wikipedia.org/wiki/Category:All_articles_needing_examples "Category:All articles needing examples")
- [Articles needing examples from October 2017](https://en.wikipedia.org/wiki/Category:Articles_needing_examples_from_October_2017 "Category:Articles needing examples from October 2017")
- [Articles with unsourced statements from March 2024](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_March_2024 "Category:Articles with unsourced statements from March 2024")
- [All articles with specifically marked weasel-worded phrases](https://en.wikipedia.org/wiki/Category:All_articles_with_specifically_marked_weasel-worded_phrases "Category:All articles with specifically marked weasel-worded phrases")
- [Articles with specifically marked weasel-worded phrases from December 2018](https://en.wikipedia.org/wiki/Category:Articles_with_specifically_marked_weasel-worded_phrases_from_December_2018 "Category:Articles with specifically marked weasel-worded phrases from December 2018")
- [Articles needing additional references from June 2017](https://en.wikipedia.org/wiki/Category:Articles_needing_additional_references_from_June_2017 "Category:Articles needing additional references from June 2017")
- [All articles needing additional references](https://en.wikipedia.org/wiki/Category:All_articles_needing_additional_references "Category:All articles needing additional references")
- [Wikipedia articles needing clarification from December 2018](https://en.wikipedia.org/wiki/Category:Wikipedia_articles_needing_clarification_from_December_2018 "Category:Wikipedia articles needing clarification from December 2018")
- [Articles with unsourced statements from June 2019](https://en.wikipedia.org/wiki/Category:Articles_with_unsourced_statements_from_June_2019 "Category:Articles with unsourced statements from June 2019")
- This page was last edited on 31 March 2026, at 07:34 (UTC).
- Text is available under the [Creative Commons Attribution-ShareAlike 4.0 License](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License "Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License"); additional terms may apply. By using this site, you agree to the [Terms of Use](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Terms_of_Use "foundation:Special:MyLanguage/Policy:Terms of Use") and [Privacy Policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy "foundation:Special:MyLanguage/Policy:Privacy policy"). Wikipedia® is a registered trademark of the [Wikimedia Foundation, Inc.](https://wikimediafoundation.org/), a non-profit organization.
- [Privacy policy](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Privacy_policy)
- [About Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:About)
- [Disclaimers](https://en.wikipedia.org/wiki/Wikipedia:General_disclaimer)
- [Contact Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Contact_us)
- [Legal & safety contacts](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Legal:Wikimedia_Foundation_Legal_and_Safety_Contact_Information)
- [Code of Conduct](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Universal_Code_of_Conduct)
- [Developers](https://developer.wikimedia.org/)
- [Statistics](https://stats.wikimedia.org/#/en.wikipedia.org)
- [Cookie statement](https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy:Cookie_statement)
- [Mobile view](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&mobileaction=toggle_view_mobile)
- [](https://www.wikimedia.org/)
- [](https://www.mediawiki.org/)
Search
Toggle the table of contents
Convolutional neural network
31 languages
[Add topic](https://en.wikipedia.org/wiki/Convolutional_neural_network) |
| Readable Markdown | A **convolutional neural network** (**CNN**) is a type of [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network "Feedforward neural network") that learns [features](https://en.wikipedia.org/wiki/Feature_engineering "Feature engineering") via filter (or [kernel](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)")) optimization. This type of [deep learning](https://en.wikipedia.org/wiki/Deep_learning "Deep learning") network has been applied to process and make [predictions](https://en.wikipedia.org/wiki/Prediction#Statistics "Prediction") from many different types of data including text, images and audio.[\[1\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LeCun2015-1) CNNs are the de-facto standard in deep learning-based approaches to [computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision")[\[2\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-2) and [image processing](https://en.wikipedia.org/wiki/Image_processing "Image processing"), and have only recently been replaced—in some cases—by newer architectures such as the [transformer](https://en.wikipedia.org/wiki/Transformer_\(deep_learning\) "Transformer (deep learning)").
[Vanishing gradients](https://en.wikipedia.org/wiki/Vanishing_gradient_problem "Vanishing gradient problem") and exploding gradients, seen during [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation") in earlier neural networks, are prevented by the [regularization](https://en.wikipedia.org/wiki/Regularization_\(mathematics\) "Regularization (mathematics)") that comes from using shared weights over fewer connections.[\[3\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto3-3)[\[4\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto2-4) For example, for *each* neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded *convolution* (or cross-correlation) kernels,[\[5\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-5)[\[6\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-6) only 25 weights for each convolutional layer are required to process 5x5-sized tiles.[\[7\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto1-7)[\[8\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-homma-8) Higher-layer features are extracted from wider context windows, compared to lower-layer features.
Some applications of CNNs include:
- [image and video recognition](https://en.wikipedia.org/wiki/Computer_vision "Computer vision"),[\[9\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Valueva_Nagornov_Lyakhov_Valuev_2020_pp._232%E2%80%93243-9)
- [recommender systems](https://en.wikipedia.org/wiki/Recommender_system "Recommender system"),[\[10\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-10)
- [image classification](https://en.wikipedia.org/wiki/Image_classification "Image classification"),
- [image segmentation](https://en.wikipedia.org/wiki/Image_segmentation "Image segmentation"),
- [medical image analysis](https://en.wikipedia.org/wiki/Medical_image_computing "Medical image computing"),
- [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing "Natural language processing"),[\[11\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-11)
- [brain–computer interfaces](https://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface "Brain–computer interface"),[\[12\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-12) and
- financial [time series](https://en.wikipedia.org/wiki/Time_series "Time series").[\[13\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Tsantekidis_7%E2%80%9312-13)
CNNs are also known as **shift invariant** or **space invariant artificial neural networks**, based on the shared-weight architecture of the [convolution](https://en.wikipedia.org/wiki/Convolution "Convolution") kernels or filters that slide along input features and provide translation-[equivariant](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map") responses known as feature maps.[\[14\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:0-14)[\[15\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:1-15) Counter-intuitively, most convolutional neural networks are not [invariant to translation](https://en.wikipedia.org/wiki/Translation_invariant "Translation invariant"), due to the downsampling operation they apply to the input.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)
[Feedforward neural networks](https://en.wikipedia.org/wiki/Feedforward_neural_network "Feedforward neural network") are usually fully connected networks, that is, each neuron in one [layer](https://en.wikipedia.org/wiki/Layer_\(deep_learning\) "Layer (deep learning)") is connected to all neurons in the next [layer](https://en.wikipedia.org/wiki/Layer_\(deep_learning\) "Layer (deep learning)"). The "full connectivity" of these networks makes them prone to [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting") data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the generalized principles that characterize a given dataset rather than the biases of a poorly-populated set.[\[17\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-17)
Convolutional networks were [inspired](https://en.wikipedia.org/wiki/Mathematical_biology "Mathematical biology") by [biological](https://en.wikipedia.org/wiki/Biological "Biological") processes[\[18\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-fukuneoscholar-18)[\[19\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-hubelwiesel1968-19)[\[20\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-intro-20)[\[21\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-robust_face_detection-21) in that the connectivity pattern between [neurons](https://en.wikipedia.org/wiki/Artificial_neuron "Artificial neuron") resembles the organization of the animal [visual cortex](https://en.wikipedia.org/wiki/Visual_cortex "Visual cortex"). Individual [cortical neurons](https://en.wikipedia.org/wiki/Cortical_neuron "Cortical neuron") respond to stimuli only in a restricted region of the [visual field](https://en.wikipedia.org/wiki/Visual_field "Visual field") known as the [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field"). The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other [image classification algorithms](https://en.wikipedia.org/wiki/Image_classification "Image classification"). This means that the network learns to optimize the filters (or kernels) through automated learning, whereas in traditional algorithms these filters are [hand-engineered](https://en.wikipedia.org/wiki/Feature_engineering "Feature engineering"). This simplifies and automates the process, enhancing efficiency and scalability overcoming human-intervention bottlenecks.
[](https://en.wikipedia.org/wiki/File:Comparison_image_neural_networks.svg)
Comparison of the [LeNet](https://en.wikipedia.org/wiki/LeNet "LeNet") (1995) and [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet") (2012) convolution, pooling and dense layers
A convolutional neural network consists of an input layer, [hidden layers](https://en.wikipedia.org/wiki/Artificial_neural_network#Organization "Artificial neural network") and an output layer. In a convolutional neural network, the hidden layers include one or more layers that perform convolutions. Typically this includes a layer that performs a [dot product](https://en.wikipedia.org/wiki/Dot_product "Dot product") of the convolution kernel with the layer's input matrix. This product is usually the [Frobenius inner product](https://en.wikipedia.org/wiki/Frobenius_inner_product "Frobenius inner product"), and its activation function is commonly [ReLU](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)"). As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as [pooling layers](https://en.wikipedia.org/wiki/Pooling_layer "Pooling layer"), fully connected layers, and normalization layers. Here it should be noted how close a convolutional neural network is to a [matched filter](https://en.wikipedia.org/wiki/Matched_filter "Matched filter").[\[22\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-22)
### Convolutional layers
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=2 "Edit section: Convolutional layers")\]
In a CNN, the input is a [tensor](https://en.wikipedia.org/wiki/Tensor_\(machine_learning\) "Tensor (machine learning)") with shape:
(number of inputs) × (input height) × (input width) × (input [channels](https://en.wikipedia.org/wiki/Channel_\(digital_image\) "Channel (digital image)"))
After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map, with shape:
(number of inputs) × (feature map height) × (feature map width) × (feature map [channels](https://en.wikipedia.org/wiki/Channel_\(digital_image\) "Channel (digital image)")).
Convolutional layers convolve the input and pass its result to the next layer. This is similar to the response of a neuron in the visual cortex to a specific stimulus.[\[23\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-deeplearning-23) Each convolutional neuron processes data only for its [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field").
[](https://en.wikipedia.org/wiki/File:1D_Convolutional_Neural_Network_feed_forward_example.png)
1D convolutional neural network feed forward example
Although [fully connected feedforward neural networks](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") can be used to learn features and classify data, this architecture is generally impractical for larger inputs (e.g., high-resolution images), which would require massive numbers of neurons because each pixel is a relevant input feature. A fully connected layer for an image of size 100 × 100 has 10,000 weights for *each* neuron in the second layer. Convolution reduces the number of free parameters, allowing the network to be deeper.[\[7\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto1-7) For example, using a 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using shared weights means there are many fewer parameters, which helps avoid the vanishing gradients and exploding gradients problems seen during [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation") in earlier neural networks.[\[3\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto3-3)[\[4\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-auto2-4)
To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers,[\[24\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-24) which are based on a depthwise convolution followed by a pointwise convolution. The *depthwise convolution* is a spatial convolution applied independently over each channel of the input tensor, while the *pointwise convolution* is a standard convolution restricted to the use of  kernels.
Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map.[\[25\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-flexible-25)[\[26\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-26) There are two common types of pooling in popular use: max and average. *Max pooling* uses the maximum value of each local cluster of neurons in the feature map,[\[27\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Yamaguchi111990-27)[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28) while *average pooling* takes the average value.
### Fully connected layers
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=4 "Edit section: Fully connected layers")\]
Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional [multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") neural network (MLP). Each neuron in the fully connected layer receives input from all the neurons in the previous layer. These inputs are weighted and summed with the corresponding biases, and then passed through an activation function to perform a nonlinear transformation, generating the output. The flattened matrix goes through a fully connected layer to classify the images.
In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's *receptive field*. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the *entire previous layer*. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers.
To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution[\[29\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-29)[\[30\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-30) expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios,[\[31\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-31) thus having a variable receptive field size.
Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights.
The vectors of weights and biases are called *filters* and represent particular [features](https://en.wikipedia.org/wiki/Feature_\(machine_learning\) "Feature (machine learning)") of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the [memory footprint](https://en.wikipedia.org/wiki/Memory_footprint "Memory footprint") because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector weighting.[\[32\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LeCun-32)
A deconvolutional neural network is essentially the reverse of a CNN. It consists of deconvolutional layers and unpooling layers.[\[33\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-33)
A deconvolutional layer is the transpose of a convolutional layer. Specifically, a convolutional layer can be written as a multiplication with a matrix, and a deconvolutional layer is multiplication with the transpose of that matrix.[\[34\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-34)
An unpooling layer expands the layer. The max-unpooling layer is the simplest, as it simply copies each entry multiple times. For example, a 2-by-2 max-unpooling layer is ![{\\displaystyle \[x\]\\mapsto {\\begin{bmatrix}x\&x\\\\x\&x\\end{bmatrix}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ba907f707b81817e69c003905058b928e9097b86).
Deconvolution layers are used in image generators. By default, it creates periodic checkerboard artifact, which can be fixed by upscale-then-convolve.[\[35\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-35)
CNN are often compared to the way the brain achieves vision processing in living [organisms](https://en.wikipedia.org/wiki/Organisms "Organisms").[\[36\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-36)
### Receptive fields in the visual cortex
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=9 "Edit section: Receptive fields in the visual cortex")\]
Work by [Hubel](https://en.wikipedia.org/wiki/David_H._Hubel "David H. Hubel") and [Wiesel](https://en.wikipedia.org/wiki/Torsten_Wiesel "Torsten Wiesel") in the 1950s and 1960s showed that cat [visual cortices](https://en.wikipedia.org/wiki/Visual_cortex "Visual cortex") contain neurons that individually respond to small regions of the [visual field](https://en.wikipedia.org/wiki/Visual_field "Visual field"). Provided the eyes are not moving, the region of visual space within which visual stimuli affect the firing of a single neuron is known as its [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field").[\[37\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:4-37) Neighboring cells have similar and overlapping receptive fields. Receptive field size and location varies systematically across the cortex to form a complete map of visual space.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\] The cortex in each hemisphere represents the contralateral [visual field](https://en.wikipedia.org/wiki/Visual_field "Visual field").\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
Their 1968 paper identified two basic visual cell types in the brain:[\[19\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-hubelwiesel1968-19)
- [simple cells](https://en.wikipedia.org/wiki/Simple_cell "Simple cell"), whose output is maximized by straight edges having particular orientations within their receptive field
- [complex cells](https://en.wikipedia.org/wiki/Complex_cell "Complex cell"), which have larger [receptive fields](https://en.wikipedia.org/wiki/Receptive_field "Receptive field"), whose output is insensitive to the exact position of the edges in the field.
Hubel and Wiesel also proposed a cascading model of these two types of cells for use in pattern recognition tasks.[\[38\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-38)[\[37\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:4-37)
### Fukushima's analog threshold elements in a vision model
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=10 "Edit section: Fukushima's analog threshold elements in a vision model")\]
In 1969, [Kunihiko Fukushima](https://en.wikipedia.org/wiki/Kunihiko_Fukushima "Kunihiko Fukushima") introduced a multilayer visual feature detection network, inspired by the above-mentioned work of Hubel and Wiesel, in which "All the elements in one layer have the same set of interconnecting coefficients; the arrangement of the elements and their interconnections are all homogeneous over a given layer." This is the essential core of a convolutional network, but the weights were not trained. In the same paper, Fukushima also introduced the [ReLU](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)") (rectified linear unit) [activation function](https://en.wikipedia.org/wiki/Activation_function "Activation function").[\[39\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Fukushima1969-39)[\[40\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-DLhistory-40)
### Neocognitron, origin of the trainable CNN architecture
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=11 "Edit section: Neocognitron, origin of the trainable CNN architecture")\]
The "[neocognitron](https://en.wikipedia.org/wiki/Neocognitron "Neocognitron")"[\[18\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-fukuneoscholar-18) was introduced by Fukushima in 1980.[\[20\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-intro-20)[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28)[\[1\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LeCun2015-1) The neocognitron introduced the two basic types of layers:
- "S-layer": a shared-weights receptive-field layer, later known as a convolutional layer, which contains units whose receptive fields cover a patch of the previous layer. A shared-weights receptive-field group (a "plane" in neocognitron terminology) is often called a filter, and a layer typically has several such filters.
- "C-layer": a downsampling layer that contain units whose receptive fields cover patches of previous convolutional layers. Such a unit typically computes a weighted average of the activations of the units in its patch, and applies inhibition (divisive normalization) pooled from a somewhat larger patch and across different filters in a layer, and applies a saturating activation function. The patch weights are nonnegative and are not trainable in the original neocognitron. The downsampling and competitive inhibition help to classify features and objects in visual scenes even when the objects are shifted.
Several [supervised](https://en.wikipedia.org/wiki/Supervised_learning "Supervised learning") and [unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning "Unsupervised learning") algorithms have been proposed over the decades to train the weights of a neocognitron.[\[18\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-fukuneoscholar-18) Today, however, the CNN architecture is usually trained through [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation").
Fukushima's ReLU activation function was not used in his neocognitron since all the weights were nonnegative; lateral inhibition was used instead. The rectifier has become a very popular activation function for CNNs and [deep neural networks](https://en.wikipedia.org/wiki/Deep_learning "Deep learning") in general.[\[41\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-41)
### Convolution in time
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=12 "Edit section: Convolution in time")\]
The term "convolution" first appears in neural networks in a paper by Toshiteru Homma, Les Atlas, and Robert Marks II at the first [Conference on Neural Information Processing Systems](https://en.wikipedia.org/wiki/Conference_on_Neural_Information_Processing_Systems "Conference on Neural Information Processing Systems") in 1987. Their paper replaced multiplication with convolution in time, inherently providing shift invariance, motivated by and connecting more directly to the [signal-processing concept of a filter](https://en.wikipedia.org/wiki/Linear_shift-invariant_filter "Linear shift-invariant filter"), and demonstrated it on a speech recognition task.[\[8\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-homma-8) They also pointed out that as a data-trainable system, convolution is essentially equivalent to correlation since reversal of the weights does not affect the final learned function ("For convenience, we denote \* as correlation instead of convolution. Note that convolving a(t) with b(t) is equivalent to correlating a(-t) with b(t).").[\[8\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-homma-8) Modern CNN implementations typically do correlation and call it convolution, for convenience, as they did here.
### Time delay neural networks
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=13 "Edit section: Time delay neural networks")\]
The [time delay neural network](https://en.wikipedia.org/wiki/Time_delay_neural_network "Time delay neural network") (TDNN) was introduced in 1987 by [Alex Waibel](https://en.wikipedia.org/wiki/Alex_Waibel "Alex Waibel") et al. for phoneme recognition and was an early convolutional network exhibiting shift-invariance.[\[42\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Waibel1987-42) A TDNN is a 1-D convolutional neural net where the convolution is performed along the time axis of the data. It is the first CNN utilizing weight sharing in combination with a training by gradient descent, using [backpropagation](https://en.wikipedia.org/wiki/Backpropagation "Backpropagation").[\[43\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-speechsignal-43) Thus, while also using a pyramidal structure as in the neocognitron, it performed a global optimization of the weights instead of a local one.[\[42\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Waibel1987-42)
TDNNs are convolutional networks that share weights along the temporal dimension.[\[44\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-44) They allow speech signals to be processed time-invariantly. In 1990 Hampshire and Waibel introduced a variant that performs a two-dimensional convolution.[\[45\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Hampshire1990-45) Since these TDNNs operated on spectrograms, the resulting phoneme recognition system was invariant to both time and frequency shifts, as with images processed by a neocognitron.
TDNNs improved the performance of far-distance speech recognition.[\[46\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Ko2017-46)
### Image recognition with CNNs trained by gradient descent
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=14 "Edit section: Image recognition with CNNs trained by gradient descent")\]
Denker et al. (1989) designed a 2-D CNN system to recognize hand-written [ZIP Code](https://en.wikipedia.org/wiki/ZIP_Code "ZIP Code") numbers.[\[47\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-47) However, the lack of an efficient training method to determine the [kernel](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)") coefficients of the involved convolutions meant that all the coefficients had to be laboriously hand-designed.[\[48\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:2-48)
Following the advances in the training of 1-D CNNs by Waibel et al. (1987), [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun "Yann LeCun") et al. (1989)[\[48\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:2-48) used back-propagation to learn the convolution kernel coefficients directly from images of hand-written numbers. Learning was thus fully automatic, performed better than manual coefficient design, and was suited to a broader range of image recognition problems and image types. Wei Zhang et al. (1988)[\[14\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:0-14)[\[15\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:1-15) used back-propagation to train the convolution kernels of a CNN for alphabets recognition. The model was called shift-invariant pattern recognition neural network before the name CNN was coined later in the early 1990s. Wei Zhang et al. also applied the same CNN without the last fully connected layer for medical image object segmentation (1991)[\[49\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1991-49) and breast cancer detection in mammograms (1994).[\[50\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1994-50)
This approach became a foundation of modern [computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision").
In 1990 Yamaguchi et al. introduced the concept of max pooling, a fixed filtering operation that calculates and propagates the maximum value of a given region. They did so by combining TDNNs with max pooling to realize a speaker-independent isolated word recognition system.[\[27\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Yamaguchi111990-27) In their system they used several TDNNs per word, one for each [syllable](https://en.wikipedia.org/wiki/Syllable "Syllable"). The results of each TDNN over the input signal were combined using max pooling and the outputs of the pooling layers were then passed on to networks performing the actual word classification.
In a variant of the neocognitron called the *cresceptron*, instead of using Fukushima's spatial averaging with inhibition and saturation, J. Weng et al. in 1993 used max pooling, where a downsampling unit computes the maximum of the activations of the units in its patch,[\[51\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-weng1993-51) introducing this method into the vision field.
Max pooling is often used in modern CNNs.[\[52\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-schdeepscholar-52)
LeNet-5, a pioneering 7-level convolutional network by [LeCun](https://en.wikipedia.org/wiki/Yann_LeCun "Yann LeCun") et al. in 1995,[\[53\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-lecun95-53) classifies hand-written numbers on [checks](https://en.wikipedia.org/wiki/Cheque "Cheque") digitized in 32×32 pixel images. The ability to process higher-resolution images requires larger and more layers of convolutional neural networks, so this technique is constrained by the availability of computing resources.
It was superior than other commercial courtesy amount reading systems (as of 1995). The system was integrated in [NCR](https://en.wikipedia.org/wiki/NCR_Voyix "NCR Voyix")'s check reading systems, and fielded in several American banks since June 1996, reading millions of checks per day.[\[54\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-54)
### Shift-invariant neural network
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=17 "Edit section: Shift-invariant neural network")\]
A shift-invariant neural network was proposed by Wei Zhang et al. for image character recognition in 1988.[\[14\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:0-14)[\[15\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:1-15) It is a modified Neocognitron by keeping only the convolutional interconnections between the image feature layers and the last fully connected layer. The model was trained with back-propagation. The training algorithm was further improved in 1991[\[55\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-55) to improve its generalization ability. The model architecture was modified by removing the last fully connected layer and applied for medical image segmentation (1991)[\[49\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1991-49) and automatic detection of breast cancer in [mammograms (1994)](https://en.wikipedia.org/wiki/Mammography "Mammography").[\[50\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:wz1994-50)
A different convolution-based design was proposed in 1988[\[56\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-56) for application to decomposition of one-dimensional [electromyography](https://en.wikipedia.org/wiki/Electromyography "Electromyography") convolved signals via de-convolution. This design was modified in 1989 to other de-convolution-based designs.[\[57\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-57)[\[58\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-58)
### GPU implementations
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=18 "Edit section: GPU implementations")\]
Although CNNs were invented in the 1980s, their breakthrough in the 2000s required fast implementations on [graphics processing units](https://en.wikipedia.org/wiki/Graphics_processing_unit "Graphics processing unit") (GPUs).
In 2004, it was shown by K. S. Oh and K. Jung that standard neural networks can be greatly accelerated on GPUs. Their implementation was 20 times faster than an equivalent implementation on [CPU](https://en.wikipedia.org/wiki/CPU "CPU").[\[59\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-59) In 2005, another paper also emphasised the value of [GPGPU](https://en.wikipedia.org/wiki/GPGPU "GPGPU") for [machine learning](https://en.wikipedia.org/wiki/Machine_learning "Machine learning").[\[60\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-60)
The first GPU-implementation of a CNN was described in 2006 by K. Chellapilla et al. Their implementation was 4 times faster than an equivalent implementation on CPU.[\[61\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-61) In the same period, GPUs were also used for unsupervised training of [deep belief networks](https://en.wikipedia.org/wiki/Deep_belief_network "Deep belief network").[\[62\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-62)[\[63\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-63)[\[64\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-64)[\[65\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-LSD_1-65)
In 2010, Dan Ciresan et al. at [IDSIA](https://en.wikipedia.org/wiki/IDSIA "IDSIA") trained deep feedforward networks on GPUs.[\[66\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-66) In 2011, they extended this to CNNs, accelerating by 60 compared to training CPU.[\[25\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-flexible-25) In 2011, the network won an image recognition contest where they achieved superhuman performance for the first time.[\[67\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-67) Then they won more competitions and achieved state of the art on several benchmarks.[\[68\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-68)[\[52\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-schdeepscholar-52)[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28)
Subsequently, [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet"), a similar GPU-based CNN by Alex Krizhevsky et al. won the [ImageNet Large Scale Visual Recognition Challenge](https://en.wikipedia.org/wiki/ImageNet_Large_Scale_Visual_Recognition_Challenge "ImageNet Large Scale Visual Recognition Challenge") 2012.[\[69\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:02-69) It was an early catalytic event for the [AI boom](https://en.wikipedia.org/wiki/AI_boom "AI boom").
Compared to the training of CNNs using [GPUs](https://en.wikipedia.org/wiki/GPU "GPU"), not much attention was given to CPU. (Viebke et al 2019) parallelizes CNN by thread- and [SIMD](https://en.wikipedia.org/wiki/SIMD "SIMD")\-level parallelism that is available on the [Intel Xeon Phi](https://en.wikipedia.org/wiki/Xeon_Phi "Xeon Phi").[\[70\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-70)[\[71\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-71)
## Distinguishing features
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=19 "Edit section: Distinguishing features")\]
In the past, traditional [multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron "Multilayer perceptron") (MLP) models were used for image recognition.\[*[example needed](https://en.wikipedia.org/wiki/Wikipedia:AUDIENCE "Wikipedia:AUDIENCE")*\] However, the full connectivity between nodes caused the [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality "Curse of dimensionality"), and was computationally intractable with higher-resolution images. A 1000×1000-pixel image with [RGB color](https://en.wikipedia.org/wiki/RGB_color_model "RGB color model") channels has 3 million weights per fully-connected neuron, which is too high to feasibly process efficiently at scale.
[](https://en.wikipedia.org/wiki/File:Conv_layers.png)
CNN layers arranged in 3 dimensions
For example, in [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10 "CIFAR-10"), images are only of size 32×32×3 (32 wide, 32 high, 3 color channels), so a single fully connected neuron in the first hidden layer of a regular neural network would have 32\*32\*3 = 3,072 weights. A 200×200 image, however, would lead to neurons that have 200\*200\*3 = 120,000 weights.
Also, such network architecture does not take into account the spatial structure of data, treating input pixels which are far apart in the same way as pixels that are close together. This ignores [locality of reference](https://en.wikipedia.org/wiki/Locality_of_reference "Locality of reference") in data with a grid-topology (such as images), both computationally and semantically. Thus, full connectivity of neurons is wasteful for purposes such as image recognition that are dominated by [spatially local](https://en.wikipedia.org/wiki/Spatial_locality "Spatial locality") input patterns.
Convolutional neural networks are variants of multilayer perceptrons, designed to emulate the behavior of a [visual cortex](https://en.wikipedia.org/wiki/Visual_cortex "Visual cortex"). These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images. As opposed to MLPs, CNNs have the following distinguishing features:
- 3D volumes of neurons. The layers of a CNN have neurons arranged in [3 dimensions](https://en.wikipedia.org/wiki/Three-dimensional_space "Three-dimensional space"): width, height and depth.[\[72\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-72) Each neuron inside a convolutional layer is connected to only a small region of the layer before it, called a receptive field. Distinct types of layers, both locally and completely connected, are stacked to form a CNN architecture.
- Local connectivity: following the concept of receptive fields, CNNs exploit spatial locality by enforcing a local connectivity pattern between neurons of adjacent layers. The architecture thus ensures that the learned "filters" produce the strongest response to a spatially local input pattern. Stacking many such layers leads to nonlinear filters that become increasingly global (i.e. responsive to a larger region of pixel space) so that the network first creates representations of small parts of the input, then from them assembles representations of larger areas.
- Shared weights: In CNNs, each filter is replicated across the entire visual field. These replicated units share the same parameterization (weight vector and bias) and form a feature map. This means that all the neurons in a given convolutional layer respond to the same feature within their specific response field. Replicating units in this way allows for the resulting activation map to be [equivariant](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map") under shifts of the locations of input features in the visual field, i.e. they grant translational [equivariance](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map")—given that the layer has a stride of one.[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73)
- Pooling: In a CNN's [pooling layers](https://en.wikipedia.org/wiki/Pooling_layer "Pooling layer"), feature maps are divided into rectangular sub-regions, and the features in each rectangle are independently down-sampled to a single value, commonly by taking their average or maximum value. In addition to reducing the sizes of feature maps, the pooling operation grants a degree of local [translational invariance](https://en.wikipedia.org/wiki/Translational_symmetry "Translational symmetry") to the features contained therein, allowing the CNN to be more robust to variations in their positions.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)
Together, these properties allow CNNs to achieve better generalization on [vision problems](https://en.wikipedia.org/wiki/Computer_vision "Computer vision"). Weight sharing dramatically reduces the number of [free parameters](https://en.wikipedia.org/wiki/Free_parameter "Free parameter") learned, thus lowering the memory requirements for running the network and allowing the training of larger, more powerful networks.
A CNN architecture is formed by a stack of distinct layers that transform the input volume into an output volume (e.g. holding the class scores) through a differentiable function. A few distinct types of layers are commonly used. These are further discussed below.
[](https://en.wikipedia.org/wiki/File:Conv_layer.png)
Neurons of a convolutional layer (blue), connected to their receptive field (red)
### Convolutional layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=21 "Edit section: Convolutional layer")\]
[](https://en.wikipedia.org/wiki/File:Convolutional_neural_network,_convolution_worked_example.png)
A worked example of performing a convolution. The convolution has stride 1, zero-padding, with kernel size 3-by-3. The convolution kernel is a [discrete Laplacian operator](https://en.wikipedia.org/wiki/Discrete_Laplace_operator "Discrete Laplace operator").
The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or [kernels](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)")), which have a small receptive field, but extend through the full depth of the input volume. During the forward pass, each filter is [convolved](https://en.wikipedia.org/wiki/Convolution "Convolution") across the width and height of the input volume, computing the [dot product](https://en.wikipedia.org/wiki/Dot_product "Dot product") between the filter entries and the input, producing a 2-dimensional [activation map](https://en.wikipedia.org/wiki/Activation_function "Activation function") of that filter. As a result, the network learns filters that activate when it detects some specific type of [feature](https://en.wikipedia.org/wiki/Feature_\(machine_learning\) "Feature (machine learning)") at some spatial position in the input.[\[74\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-G%C3%A9ron_Hands-on_ML_2019-74)[\[nb 1\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-75)
Stacking the activation maps for all filters along the depth dimension forms the full output volume of the convolution layer. Every entry in the output volume can thus also be interpreted as an output of a neuron that looks at a small region in the input. Each entry in an activation map use the same set of parameters that define the filter.
[Self-supervised learning](https://en.wikipedia.org/wiki/Self-supervised_learning "Self-supervised learning") has been adapted for use in convolutional layers by using sparse patches with a high-mask ratio and a global response normalization layer.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
[](https://en.wikipedia.org/wiki/File:Typical_cnn.png)
Typical CNN architecture
When dealing with high-dimensional inputs such as images, it is impractical to connect neurons to all neurons in the previous volume because such a network architecture does not take the spatial structure of the data into account. Convolutional networks exploit spatially local correlation by enforcing a [sparse local connectivity](https://en.wikipedia.org/wiki/Sparse_network "Sparse network") pattern between neurons of adjacent layers: each neuron is connected to only a small region of the input volume.
The extent of this connectivity is a [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_optimization "Hyperparameter optimization") called the [receptive field](https://en.wikipedia.org/wiki/Receptive_field "Receptive field") of the neuron. The connections are [local in space](https://en.wikipedia.org/wiki/Spatial_locality "Spatial locality") (along width and height), but always extend along the entire depth of the input volume. Such an architecture ensures that the learned filters produce the strongest response to a spatially local input pattern.[\[75\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-76)
#### Spatial arrangement
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=23 "Edit section: Spatial arrangement")\]
Three [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_\(machine_learning\) "Hyperparameter (machine learning)") control the size of the output volume of the convolutional layer: the depth, [stride](https://en.wikipedia.org/wiki/Stride_of_an_array "Stride of an array"), and padding size:
- The *depth* of the output volume controls the number of neurons in a layer that connect to the same region of the input volume. These neurons learn to activate for different features in the input. For example, if the first convolutional layer takes the raw image as input, then different neurons along the depth dimension may activate in the presence of various oriented edges, or blobs of color.
- *Stride*
controls how depth columns around the width and height are allocated. If the stride is 1, then we move the filters one pixel at a time. This leads to heavily [overlapping](https://en.wikipedia.org/wiki/Intersection_\(set_theory\) "Intersection (set theory)") receptive fields between the columns, and to large output volumes. For any integer  a stride *S* means that the filter is translated *S* units at a time per output. In practice,  is rare. A greater stride means smaller overlap of receptive fields and smaller spatial dimensions of the output volume.[\[76\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-77)
- Sometimes, it is convenient to pad the input with zeros (or other values, such as the average of the region) on the border of the input volume. The size of this padding is a third hyperparameter. Padding provides control of the output volume's spatial size. In particular, sometimes it is desirable to exactly preserve the spatial size of the input volume, this is commonly referred to as "same" padding.
[](https://en.wikipedia.org/wiki/File:Convolutional_neural_network,_boundary_conditions.png)
Three example padding conditions. Replication condition means that the pixel outside is padded with the closest pixel inside. The reflection padding is where the pixel outside is padded with the pixel inside, reflected across the boundary of the image. The circular padding is where the pixel outside wraps around to the other side of the image.
The spatial size of the output volume is a function of the input volume size , the kernel field size  of the convolutional layer neurons, the stride , and the amount of zero padding  on the border. The number of neurons that "fit" in a given volume is then:

If this number is not an [integer](https://en.wikipedia.org/wiki/Integer "Integer"), then the strides are incorrect and the neurons cannot be tiled to fit across the input volume in a [symmetric](https://en.wikipedia.org/wiki/Symmetry "Symmetry") way. In general, setting zero padding to be  when the stride is  ensures that the input volume and output volume will have the same size spatially. However, it is not always completely necessary to use all of the neurons of the previous layer. For example, a neural network designer may decide to use just a portion of padding.
A parameter sharing scheme is used in convolutional layers to control the number of free parameters. It relies on the assumption that if a patch feature is useful to compute at some spatial position, then it should also be useful to compute at other positions. Denoting a single 2-dimensional slice of depth as a *depth slice*, the neurons in each depth slice are constrained to use the same weights and bias.
Since all neurons in a single depth slice share the same parameters, the forward pass in each depth slice of the convolutional layer can be computed as a [convolution](https://en.wikipedia.org/wiki/Convolution "Convolution") of the neuron's weights with the input volume.[\[nb 2\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-78) Therefore, it is common to refer to the sets of weights as a filter (or a [kernel](https://en.wikipedia.org/wiki/Kernel_\(image_processing\) "Kernel (image processing)")), which is convolved with the input. The result of this convolution is an [activation map](https://en.wikipedia.org/wiki/Activation_function "Activation function"), and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume. Parameter sharing contributes to the [translation invariance](https://en.wikipedia.org/wiki/Translational_symmetry "Translational symmetry") of the CNN architecture.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)
Sometimes, the parameter sharing assumption may not make sense. This is especially the case when the input images to a CNN have some specific centered structure; for which we expect completely different features to be learned on different spatial locations. One practical example is when the inputs are faces that have been centered in the image: we might expect different eye-specific or hair-specific features to be learned in different parts of the image. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a "locally connected layer". In this layer, the convolutional kernels' parameters are not shared. Instead, the network learns independent weights and biases for each spatial location. This allows each location to have its own feature-learning ability, making it better suited to handle images with distinct central structures or irregular features.
[](https://en.wikipedia.org/wiki/File:Convolutional_neural_network,_maxpooling.png)
Worked example of 2x2 maxpooling with stride 2
[](https://en.wikipedia.org/wiki/File:Max_pooling.png)
Max pooling with a 2x2 filter and stride = 2
Another important concept of CNNs is pooling, which is used as a form of non-linear [down-sampling](https://en.wikipedia.org/wiki/Downsampling_\(signal_processing\) "Downsampling (signal processing)"). Pooling provides downsampling because it reduces the spatial dimensions (height and width) of the input feature maps while retaining the most important information. There are several non-linear functions to implement pooling, where *max pooling* and *average pooling* are the most common. Pooling aggregates information from small regions of the input creating [partitions](https://en.wikipedia.org/wiki/Partition_of_a_set "Partition of a set") of the input feature map, typically using a fixed-size window (like 2x2) and applying a stride (often 2) to move the window across the input.[\[77\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-79) Note that without using a stride greater than 1, pooling would not perform downsampling, as it would simply move the pooling window across the input one step at a time, without reducing the size of the feature map. In other words, the stride is what actually causes the downsampling by determining how much the pooling window moves over the input.
Intuitively, the exact location of a feature is less important than its rough location relative to other features. This is the idea behind the use of pooling in convolutional neural networks. The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters, [memory footprint](https://en.wikipedia.org/wiki/Memory_footprint "Memory footprint") and amount of computation in the network, and hence to also control [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting"). This is known as down-sampling. It is common to periodically insert a pooling layer between successive convolutional layers (each one typically followed by an activation function, such as a [ReLU layer](https://en.wikipedia.org/wiki/Convolutional_neural_network#ReLU_layer)) in a CNN architecture.[\[74\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-G%C3%A9ron_Hands-on_ML_2019-74): 460–461 While pooling layers contribute to local translation invariance, they do not provide global translation invariance in a CNN, unless a form of global pooling is used.[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16)[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) The pooling layer commonly operates independently on every depth, or slice, of the input and resizes it spatially. A very common form of max pooling is a layer with filters of size 2×2, applied with a stride of 2, which subsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations: In this case, every [max operation](https://en.wikipedia.org/wiki/Maximum "Maximum") is over 4 numbers. The depth dimension remains unchanged (this is true for other forms of pooling as well).
In addition to max pooling, pooling units can use other functions, such as [average](https://en.wikipedia.org/wiki/Average "Average") pooling or [ℓ2\-norm](https://en.wikipedia.org/wiki/Euclidean_norm "Euclidean norm") pooling. Average pooling was often used historically but has recently fallen out of favor compared to max pooling, which generally performs better in practice.[\[78\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Scherer-ICANN-2010-80)
Due to the effects of fast spatial reduction of the size of the representation,\[*[which?](https://en.wikipedia.org/wiki/Wikipedia:Avoid_weasel_words "Wikipedia:Avoid weasel words")*\] there is a recent trend towards using smaller filters[\[79\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-81) or discarding pooling layers altogether.[\[80\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-82)
[](https://en.wikipedia.org/wiki/File:RoI_pooling_animated.gif)
RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.
#### Channel max pooling
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=26 "Edit section: Channel max pooling")\]
A channel max pooling (CMP) operation layer conducts the MP operation along the channel side among the corresponding positions of the consecutive feature maps for the purpose of redundant information elimination. The CMP makes the significant features gather together within fewer channels, which is important for fine-grained image classification that needs more discriminating features. Meanwhile, another advantage of the CMP operation is to make the channel number of feature maps smaller before it connects to the first fully connected (FC) layer. Similar to the MP operation, we denote the input feature maps and output feature maps of a CMP layer as F ∈ R(C×M×N) and C ∈ R(c×M×N), respectively, where C and c are the channel numbers of the input and output feature maps, M and N are the widths and the height of the feature maps, respectively. Note that the CMP operation only changes the channel number of the feature maps. The width and the height of the feature maps are not changed, which is different from the MP operation.[\[81\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Ma_Chang_Xie_Ding_2019_pp._3224%E2%80%933233-83)
See [\[82\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-84)[\[83\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-85) for reviews for pooling methods.
ReLU is the abbreviation of [rectified linear unit](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\) "Rectifier (neural networks)"). It was proposed by [Alston Householder](https://en.wikipedia.org/wiki/Alston_Scott_Householder "Alston Scott Householder") in 1941,[\[84\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-86) and used in CNN by [Kunihiko Fukushima](https://en.wikipedia.org/wiki/Kunihiko_Fukushima "Kunihiko Fukushima") in 1969.[\[39\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Fukushima1969-39) ReLU applies the non-saturating [activation function](https://en.wikipedia.org/wiki/Activation_function "Activation function") .[\[69\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:02-69) It effectively removes negative values from an activation map by setting them to zero.[\[85\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Romanuke4-87) It introduces [nonlinearity](https://en.wikipedia.org/wiki/Nonlinearity_\(disambiguation\) "Nonlinearity (disambiguation)") to the [decision function](https://en.wikipedia.org/wiki/Decision_boundary "Decision boundary") and in the overall network without affecting the receptive fields of the convolution layers. In 2011, Xavier Glorot, Antoine Bordes and [Yoshua Bengio](https://en.wikipedia.org/wiki/Yoshua_Bengio "Yoshua Bengio") found that ReLU enables better training of deeper networks,[\[86\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-glorot2011-88) compared to widely used activation functions prior to 2011.
Other functions can also be used to increase nonlinearity, for example the saturating [hyperbolic tangent](https://en.wikipedia.org/wiki/Hyperbolic_tangent "Hyperbolic tangent") , , and the [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function "Sigmoid function") . ReLU is often preferred to other functions because it trains the neural network several times faster without a significant penalty to [generalization](https://en.wikipedia.org/wiki/Generalization_\(learning\) "Generalization (learning)") accuracy.[\[87\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-89)
### Fully connected layer
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=28 "Edit section: Fully connected layer")\]
After several convolutional and max pooling layers, the final classification is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular (non-convolutional) [artificial neural networks](https://en.wikipedia.org/wiki/Artificial_neural_network "Artificial neural network"). Their activations can thus be computed as an [affine transformation](https://en.wikipedia.org/wiki/Affine_transformation "Affine transformation"), with [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication "Matrix multiplication") followed by a bias offset ([vector addition](https://en.wikipedia.org/wiki/Vector_addition "Vector addition") of a learned or fixed bias term).
The "loss layer", or "[loss function](https://en.wikipedia.org/wiki/Loss_function "Loss function")", exemplifies how [training](https://en.wikipedia.org/wiki/Training "Training") penalizes the deviation between the predicted output of the network, and the [true](https://en.wikipedia.org/wiki/Ground_truth "Ground truth") data labels (during supervised learning). Various [loss functions](https://en.wikipedia.org/wiki/Loss_function "Loss function") can be used, depending on the specific task.
The [Softmax](https://en.wikipedia.org/wiki/Softmax_function "Softmax function") loss function is used for predicting a single class of *K* mutually exclusive classes.[\[nb 3\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-90) [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function "Sigmoid function") [cross-entropy](https://en.wikipedia.org/wiki/Cross_entropy "Cross entropy") loss is used for predicting *K* independent probability values in ![{\\displaystyle \[0,1\]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/738f7d23bb2d9642bab520020873cccbef49768d). [Euclidean](https://en.wikipedia.org/wiki/Euclidean_distance "Euclidean distance") loss is used for [regressing](https://en.wikipedia.org/wiki/Regression_\(machine_learning\) "Regression (machine learning)") to [real-valued](https://en.wikipedia.org/wiki/Real_number "Real number") labels .
Hyperparameters are various settings that are used to control the learning process. CNNs use more [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_\(machine_learning\) "Hyperparameter (machine learning)") than a standard multilayer perceptron (MLP).
Padding is the addition of (typically) 0-valued pixels on the borders of an image. This is done so that the border pixels are not undervalued (lost) from the output because they would ordinarily participate in only a single receptive field instance. The padding applied is typically one less than the corresponding kernel dimension. For example, a convolutional layer using 3x3 kernels would receive a 2-pixel pad, that is 1 pixel on each side of the image.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
The stride is the number of pixels that the analysis window moves on each iteration. A stride of 2 means that each kernel is offset by 2 pixels from its predecessor.
Since feature map size decreases with depth, layers near the input layer tend to have fewer filters while higher layers can have more. To equalize computation at each layer, the product of feature values *va* with pixel position is kept roughly constant across layers. Preserving more information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) non-decreasing from one layer to the next.
The number of feature maps directly controls the capacity and depends on the number of available examples and task complexity.
### Filter (or kernel) size
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=34 "Edit section: Filter (or kernel) size")\]
Common filter sizes found in the literature vary greatly, and are usually chosen based on the data set. Typical filter sizes range from 1x1 to 7x7. As two famous examples, [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet") used 3x3, 5x5, and 11x11. [Inceptionv3](https://en.wikipedia.org/wiki/Inceptionv3 "Inceptionv3") used 1x1, 3x3, and 5x5.
The challenge is to find the right level of granularity so as to create abstractions at the proper scale, given a particular data set, and without [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting").
### Pooling type and size
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=35 "Edit section: Pooling type and size")\]
[Max pooling](https://en.wikipedia.org/wiki/Max_pooling "Max pooling") is typically used, often with a 2x2 dimension. This implies that the input is drastically [downsampled](https://en.wikipedia.org/wiki/Downsampling_\(signal_processing\) "Downsampling (signal processing)"), reducing processing cost.
Greater pooling [reduces the dimension](https://en.wikipedia.org/wiki/Dimensionality_reduction "Dimensionality reduction") of the signal, and may result in unacceptable [information loss](https://en.wikipedia.org/wiki/Data_loss "Data loss"). Often, non-overlapping pooling windows perform best.[\[78\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Scherer-ICANN-2010-80)
Dilation involves ignoring pixels within a kernel. This reduces processing memory potentially without significant signal loss. A dilation of 2 on a 3x3 kernel expands the kernel to 5x5, while still processing 9 (evenly spaced) pixels. Specifically, the processed pixels after the dilation are the cells (1,1), (1,3), (1,5), (3,1), (3,3), (3,5), (5,1), (5,3), (5,5), where (i,j) denotes the cell of the i-th row and j-th column in the expanded 5x5 kernel. Accordingly, dilation of 4 expands the kernel to 7x7.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
## Translation equivariance and aliasing
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=37 "Edit section: Translation equivariance and aliasing")\]
It is commonly assumed that CNNs are invariant to shifts of the input. Convolution or pooling layers within a CNN that do not have a stride greater than one are indeed [equivariant](https://en.wikipedia.org/wiki/Equivariant_map "Equivariant map") to translations of the input.[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) However, layers with a stride greater than one ignore the [Nyquist–Shannon sampling theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem "Nyquist–Shannon sampling theorem") and might lead to [aliasing](https://en.wikipedia.org/wiki/Aliasing "Aliasing") of the input signal[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) While, in principle, CNNs are capable of implementing anti-aliasing filters, it has been observed that this does not happen in practice,[\[88\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-91) and therefore yield models that are not equivariant to translations.
Furthermore, if a CNN makes use of fully connected layers, translation equivariance does not imply translation invariance, as the fully connected layers are not invariant to shifts of the input.[\[89\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-92)[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16) One solution for complete translation invariance is avoiding any down-sampling throughout the network and applying global average pooling at the last layer.[\[73\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:5-73) Additionally, several other partial solutions have been proposed, such as [anti-aliasing](https://en.wikipedia.org/wiki/Anti-aliasing_filter "Anti-aliasing filter") before downsampling operations,[\[90\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-93) spatial transformer networks,[\[91\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-94) [data augmentation](https://en.wikipedia.org/wiki/Data_augmentation "Data augmentation"), subsampling combined with pooling,[\[16\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:6-16) and [capsule neural networks](https://en.wikipedia.org/wiki/Capsule_neural_network "Capsule neural network").[\[92\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-95)
The accuracy of the final model is typically estimated on a sub-part of the dataset set apart at the start, often called a test set. Alternatively, methods such as [*k*\-fold cross-validation](https://en.wikipedia.org/wiki/Cross-validation_\(statistics\) "Cross-validation (statistics)") are applied. Other strategies include using [conformal prediction](https://en.wikipedia.org/wiki/Conformal_prediction "Conformal prediction").[\[93\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-96)[\[94\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-97)
## Regularization methods
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=39 "Edit section: Regularization methods")\]
[Regularization](https://en.wikipedia.org/wiki/Regularization_\(mathematics\) "Regularization (mathematics)") is a process of introducing additional information to solve an [ill-posed problem](https://en.wikipedia.org/wiki/Ill-posed_problem "Ill-posed problem") or to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting"). CNNs use various types of regularization.
Because networks have so many parameters, they are prone to overfitting. One method to reduce overfitting is [dropout](https://en.wikipedia.org/wiki/Dropout_\(neural_networks\) "Dropout (neural networks)"), introduced in 2014.[\[95\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-98) At each training stage, individual nodes are either "dropped out" of the net (ignored) with probability  or kept with probability , so that a reduced network is left; incoming and outgoing edges to a dropped-out node are also removed. Only the reduced network is trained on the data in that stage. The removed nodes are then reinserted into the network with their original weights.
In the training stages,  is usually 0.5; for input nodes, it is typically much higher because information is directly lost when input nodes are ignored.
At testing time after training has finished, we would ideally like to find a sample average of all possible  dropped-out networks; unfortunately this is unfeasible for large values of . However, we can find an approximation by using the full network with each node's output weighted by a factor of , so the [expected value](https://en.wikipedia.org/wiki/Expected_value "Expected value") of the output of any node is the same as in the training stages. This is the biggest contribution of the dropout method: although it effectively generates  neural nets, and as such allows for model combination, at test time only a single network needs to be tested.
By avoiding training all nodes on all training data, dropout decreases overfitting. The method also significantly improves training speed. This makes the model combination practical, even for [deep neural networks](https://en.wikipedia.org/wiki/Deep_neural_network "Deep neural network"). The technique seems to reduce node interactions, leading them to learn more robust features\[*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify "Wikipedia:Please clarify")*\] that better generalize to new data.
DropConnect is the generalization of dropout in which each connection, rather than each output unit, can be dropped with probability . Each unit thus receives input from a random subset of units in the previous layer.[\[96\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-99)
DropConnect is similar to dropout as it introduces dynamic sparsity within the model, but differs in that the sparsity is on the weights, rather than the output vectors of a layer. In other words, the fully connected layer with DropConnect becomes a sparsely connected layer in which the connections are chosen at random during the training stage.
A major drawback to dropout is that it does not have the same benefits for convolutional layers, where the neurons are not fully connected.
Even before dropout, in 2013 a technique called stochastic pooling,[\[97\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-100) the conventional [deterministic](https://en.wikipedia.org/wiki/Deterministic_algorithm "Deterministic algorithm") pooling operations were replaced with a stochastic procedure, where the activation within each pooling region is picked randomly according to a [multinomial distribution](https://en.wikipedia.org/wiki/Multinomial_distribution "Multinomial distribution"), given by the activities within the pooling region. This approach is free of hyperparameters and can be combined with other regularization approaches, such as dropout and [data augmentation](https://en.wikipedia.org/wiki/Data_augmentation "Data augmentation").
An alternate view of stochastic pooling is that it is equivalent to standard max pooling but with many copies of an input image, each having small local [deformations](https://en.wikipedia.org/wiki/Deformation_theory "Deformation theory"). This is similar to explicit [elastic deformations](https://en.wikipedia.org/wiki/Elastic_deformation "Elastic deformation") of the input images,[\[98\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:3-101) which delivers excellent performance on the [MNIST data set](https://en.wikipedia.org/wiki/MNIST_database "MNIST database").[\[98\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:3-101) Using stochastic pooling in a multilayer model gives an exponential number of deformations since the selections in higher layers are independent of those below.
Because the degree of model overfitting is determined by both its power and the amount of training it receives, providing a convolutional network with more training examples can reduce overfitting. Because there is often not enough available data to train, especially considering that some part should be spared for later testing, two approaches are to either generate new data from scratch (if possible) or perturb existing data to create new ones. The latter one is used since mid-1990s.[\[53\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-lecun95-53) For example, input images can be cropped, rotated, or rescaled to create new examples with the same labels as the original training set.[\[99\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-102)
One of the simplest methods to prevent overfitting of a network is to simply stop the training before overfitting has had a chance to occur. It comes with the disadvantage that the learning process is halted.
#### Number of parameters
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=47 "Edit section: Number of parameters")\]
Another simple way to prevent overfitting is to limit the number of parameters, typically by limiting the number of hidden units in each layer or limiting network depth. For convolutional networks, the filter size also affects the number of parameters. Limiting the number of parameters restricts the predictive power of the network directly, reducing the complexity of the function that it can perform on the data, and thus limits the amount of overfitting. This is equivalent to a "[zero norm](https://en.wikipedia.org/wiki/Zero_norm "Zero norm")".
A simple form of added regularizer is weight decay, which simply adds an additional error, proportional to the sum of weights ([L1 norm](https://en.wikipedia.org/wiki/L1-norm "L1-norm")) or squared magnitude ([L2 norm](https://en.wikipedia.org/wiki/L2_norm "L2 norm")) of the weight vector, to the error at each node. The level of acceptable model complexity can be reduced by increasing the proportionality constant('alpha' hyperparameter), thus increasing the penalty for large weight vectors.
L2 regularization is the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. The L2 regularization has the intuitive interpretation of heavily penalizing peaky weight vectors and preferring diffuse weight vectors. Due to multiplicative interactions between weights and inputs this has the useful property of encouraging the network to use all of its inputs a little rather than some of its inputs a lot.
L1 regularization is also common. It makes the weight vectors sparse during optimization. In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the noisy inputs. L1 with L2 regularization can be combined; this is called [elastic net regularization](https://en.wikipedia.org/wiki/Elastic_net_regularization "Elastic net regularization").
#### Max norm constraints
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=49 "Edit section: Max norm constraints")\]
Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use [projected gradient descent](https://en.wikipedia.org/wiki/Sparse_approximation#Projected_Gradient_Descent "Sparse approximation") to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector  of every neuron to satisfy . Typical values of  are order of 3–4. Some papers report improvements[\[100\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-103) when using this form of regularization.
## Hierarchical coordinate frames
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=50 "Edit section: Hierarchical coordinate frames")\]
Pooling loses the precise spatial relationships between high-level parts (such as nose and mouth in a face image). These relationships are needed for identity recognition. Overlapping the pools so that each feature occurs in multiple pools, helps retain the information. Translation alone cannot extrapolate the understanding of geometric relationships to a radically new viewpoint, such as a different orientation or scale. On the other hand, people are very good at extrapolating; after seeing a new shape once they can recognize it from a different viewpoint.[\[101\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-104)
An earlier common way to deal with this problem is to train the network on transformed data in different orientations, scales, lighting, etc. so that the network can cope with these variations. This is computationally intensive for large data-sets. The alternative is to use a hierarchy of coordinate frames and use a group of neurons to represent a conjunction of the shape of the feature and its pose relative to the [retina](https://en.wikipedia.org/wiki/Retina "Retina"). The pose relative to the retina is the relationship between the coordinate frame of the retina and the intrinsic features' coordinate frame.[\[102\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-105)
Thus, one way to represent something is to embed the coordinate frame within it. This allows large features to be recognized by using the consistency of the poses of their parts (e.g. nose and mouth poses make a consistent prediction of the pose of the whole face). This approach ensures that the higher-level entity (e.g. face) is present when the lower-level (e.g. nose and mouth) agree on its prediction of the pose. The vectors of neuronal activity that represent pose ("pose vectors") allow spatial transformations modeled as linear operations that make it easier for the network to learn the hierarchy of visual entities and generalize across viewpoints. This is similar to the way the human [visual system](https://en.wikipedia.org/wiki/Visual_system "Visual system") imposes coordinate frames in order to represent shapes.[\[103\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-106)
CNNs are often used in [image recognition](https://en.wikipedia.org/wiki/Image_recognition "Image recognition") systems. In 2012, an [error rate](https://en.wikipedia.org/wiki/Per-comparison_error_rate "Per-comparison error rate") of 0.23% on the [MNIST database](https://en.wikipedia.org/wiki/MNIST_database "MNIST database") was reported.[\[28\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-mcdns-28) Another paper on using CNN for image classification reported that the learning process was "surprisingly fast"; in the same paper, the best published results as of 2011 were achieved in the MNIST database and the NORB database.[\[25\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-flexible-25) Subsequently, a similar CNN called [AlexNet](https://en.wikipedia.org/wiki/AlexNet "AlexNet")[\[104\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-quartz-107) won the [ImageNet Large Scale Visual Recognition Challenge](https://en.wikipedia.org/wiki/ImageNet_Large_Scale_Visual_Recognition_Challenge "ImageNet Large Scale Visual Recognition Challenge") 2012.
When applied to [facial recognition](https://en.wikipedia.org/wiki/Facial_recognition_system "Facial recognition system"), CNNs achieved a large decrease in error rate.[\[105\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-108) Another paper reported a 97.6% recognition rate on "5,600 still images of more than 10 subjects".[\[21\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-robust_face_detection-21) CNNs were used to assess [video quality](https://en.wikipedia.org/wiki/Video_quality "Video quality") in an objective way after manual training; the resulting system had a very low [root mean square error](https://en.wikipedia.org/wiki/Root_mean_square_error "Root mean square error").[\[106\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-video_quality-109)
The [ImageNet Large Scale Visual Recognition Challenge](https://en.wikipedia.org/wiki/ImageNet_Large_Scale_Visual_Recognition_Challenge "ImageNet Large Scale Visual Recognition Challenge") is a benchmark in object classification and detection, with millions of images and hundreds of object classes. In the ILSVRC 2014,[\[107\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ILSVRC2014-110) a large-scale visual recognition challenge, almost every highly ranked team used CNN as their basic framework. The winner [GoogLeNet](https://en.wikipedia.org/wiki/GoogLeNet "GoogLeNet")[\[108\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-googlenet-111) (the foundation of [DeepDream](https://en.wikipedia.org/wiki/DeepDream "DeepDream")) increased the mean average [precision](https://en.wikipedia.org/wiki/Precision_and_recall "Precision and recall") of object detection to 0.439329, and reduced classification error to 0.06656, the best result to date. Its network applied more than 30 layers. That performance of convolutional neural networks on the ImageNet tests was close to that of humans.[\[109\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-112) The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also have trouble with images that have been distorted with filters, an increasingly common phenomenon with modern digital cameras. By contrast, those kinds of images rarely trouble humans. Humans, however, tend to have trouble with other issues. For example, they are not good at classifying objects into fine-grained categories such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this.\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
In 2015, a many-layered CNN demonstrated the ability to spot faces from a wide range of angles, including upside down, even when partially occluded, with competitive performance. The network was trained on a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They used batches of 128 images over 50,000 iterations.[\[110\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-113)
Compared to image data domains, there is relatively little work on applying CNNs to video classification. Video is more complex than images since it has another (temporal) dimension. However, some extensions of CNNs into the video domain have been explored. One approach is to treat space and time as equivalent dimensions of the input and perform convolutions in both time and space.[\[111\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-114)[\[112\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-115) Another way is to fuse the features of two convolutional neural networks, one for the spatial and one for the temporal stream.[\[113\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-116)[\[114\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-117)[\[115\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-118) [Long short-term memory](https://en.wikipedia.org/wiki/Long_short-term_memory "Long short-term memory") (LSTM) [recurrent](https://en.wikipedia.org/wiki/Recurrent_neural_network "Recurrent neural network") units are typically incorporated after the CNN to account for inter-frame or inter-clip dependencies.[\[116\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Wang_Duan_Zhang_Niu_p=1657-119)[\[117\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Duan_Wang_Zhai_Zheng_2018_p.-120) [Unsupervised learning](https://en.wikipedia.org/wiki/Unsupervised_learning "Unsupervised learning") schemes for training spatio-temporal features have been introduced, based on Convolutional Gated Restricted [Boltzmann Machines](https://en.wikipedia.org/wiki/Boltzmann_machine "Boltzmann machine")[\[118\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-121) and Independent Subspace Analysis.[\[119\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-122) Its application can be seen in [text-to-video model](https://en.wikipedia.org/wiki/Text-to-video_model "Text-to-video model").\[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")*\]
### Natural language processing
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=54 "Edit section: Natural language processing")\]
CNNs have also been explored for [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing "Natural language processing"). CNN models are effective for various NLP problems and achieved excellent results in [semantic parsing](https://en.wikipedia.org/wiki/Semantic_parsing "Semantic parsing"),[\[120\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-123) search query retrieval,[\[121\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-124) sentence modeling,[\[122\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-125) classification,[\[123\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-126) prediction[\[124\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-127) and other traditional NLP tasks.[\[125\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-128) Compared to traditional language processing methods such as [recurrent neural networks](https://en.wikipedia.org/wiki/Recurrent_neural_networks "Recurrent neural networks"), CNNs can represent different contextual realities of language that do not rely on a series-sequence assumption, while RNNs are better suitable when classical time series modeling is required.[\[126\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-129)[\[127\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-130)[\[128\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-131)[\[129\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-132)
### Animal behavior detection
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=55 "Edit section: Animal behavior detection")\]
CNNs have been applied in ecological and behavioral research to automatically detect and quantify animal behavior from visual data,[\[130\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-133)[\[131\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:7-134) enabling identification of animals,[\[132\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-135)[\[133\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-136) tracking of individuals,[\[134\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-137) estimation of pose,[\[135\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-138)[\[136\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-139)[\[137\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-140) and classification of specific actions such as feeding,[\[138\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:8-141) and social interactions.[\[131\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:7-134)[\[138\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-:8-141) Combined with multi-object tracking and temporal modeling, these systems can extract behavioral sequences over extended recordings, reducing reliance on manual annotation and increasing throughput for studies of individual variation, social networks, and collective dynamics.
A CNN with 1-D convolutions was used on time series in the frequency domain (spectral residual) by an unsupervised model to detect anomalies in the time domain.[\[139\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-142)
CNNs have been used in [drug discovery](https://en.wikipedia.org/wiki/Drug_discovery "Drug discovery"). Predicting the interaction between molecules and biological [proteins](https://en.wikipedia.org/wiki/Protein "Protein") can identify potential treatments. In 2015, Atomwise introduced AtomNet, the first deep learning neural network for [structure-based drug design](https://en.wikipedia.org/wiki/Structure-based_drug_design "Structure-based drug design").[\[140\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-143) The system trains directly on 3-dimensional representations of chemical interactions. Similar to how image recognition networks learn to compose smaller, spatially proximate features into larger, complex structures,[\[141\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-144) AtomNet discovers chemical features, such as [aromaticity](https://en.wikipedia.org/wiki/Aromaticity "Aromaticity"), [sp3 carbons](https://en.wikipedia.org/wiki/Orbital_hybridisation "Orbital hybridisation"), and [hydrogen bonding](https://en.wikipedia.org/wiki/Hydrogen_bond "Hydrogen bond"). Subsequently, AtomNet was used to predict novel candidate [biomolecules](https://en.wikipedia.org/wiki/Biomolecule "Biomolecule") for multiple disease targets, most notably treatments for the [Ebola virus](https://en.wikipedia.org/wiki/Ebola_virus "Ebola virus")[\[142\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-145) and [multiple sclerosis](https://en.wikipedia.org/wiki/Multiple_sclerosis "Multiple sclerosis").[\[143\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-146)
CNNs have been used in the game of [checkers](https://en.wikipedia.org/wiki/Draughts "Draughts"). From 1999 to 2001, [Fogel](https://en.wikipedia.org/wiki/David_B._Fogel "David B. Fogel") and Chellapilla published papers showing how a convolutional neural network could learn to play checkers using co-evolution. The learning process did not use prior human professional games, but rather focused on a minimal set of information contained in the checkerboard: the location and type of pieces, and the difference in number of pieces between the two sides. Ultimately, the program ([Blondie24](https://en.wikipedia.org/wiki/Blondie24 "Blondie24")) was tested on 165 games against players and ranked in the highest 0.4%.[\[144\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-147)[\[145\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-148) It also earned a win against the program [Chinook](https://en.wikipedia.org/wiki/Chinook_\(draughts_player\) "Chinook (draughts player)") at its "expert" level of play.[\[146\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-149)
CNNs have been used in [computer Go](https://en.wikipedia.org/wiki/Computer_Go "Computer Go"). In December 2014, Clark and [Storkey](https://en.wikipedia.org/wiki/Amos_Storkey "Amos Storkey") published a paper showing that a CNN trained by supervised learning from a database of human professional games could outperform [GNU Go](https://en.wikipedia.org/wiki/GNU_Go "GNU Go") and win some games against [Monte Carlo tree search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search "Monte Carlo tree search") Fuego 1.1 in a fraction of the time it took Fuego to play.[\[147\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-150) Later it was announced that a large 12-layer convolutional neural network had correctly predicted the professional move in 55% of positions, equalling the accuracy of a [6 dan](https://en.wikipedia.org/wiki/Go_ranks_and_ratings "Go ranks and ratings") human player. When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional search program GNU Go in 97% of games, and matched the performance of the [Monte Carlo tree search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search "Monte Carlo tree search") program Fuego simulating ten thousand playouts (about a million positions) per move.[\[148\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-151)
A couple of CNNs for choosing moves to try ("policy network") and evaluating positions ("value network") driving MCTS were used by [AlphaGo](https://en.wikipedia.org/wiki/AlphaGo "AlphaGo"), the first to beat the best human player at the time.[\[149\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-152)
### Time series forecasting
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=60 "Edit section: Time series forecasting")\]
Recurrent neural networks are generally considered the best neural network architectures for time series forecasting (and sequence modeling in general), but recent studies show that convolutional networks can perform comparably or even better.[\[150\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-153)[\[13\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Tsantekidis_7%E2%80%9312-13) Dilated convolutions[\[151\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-154) might enable one-dimensional convolutional neural networks to effectively learn time series dependences.[\[152\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-155) Convolutions can be implemented more efficiently than RNN-based solutions, and they do not suffer from vanishing (or exploding) gradients.[\[153\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-156) Convolutional networks can provide an improved forecasting performance when there are multiple similar time series to learn from.[\[154\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-157) CNNs can also be applied to further tasks in time series analysis (e.g., time series classification[\[155\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-158) or quantile forecasting[\[156\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-159)).
### Cultural heritage and 3D-datasets
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=61 "Edit section: Cultural heritage and 3D-datasets")\]
As archaeological findings such as [clay tablets](https://en.wikipedia.org/wiki/Clay_tablet "Clay tablet") with [cuneiform writing](https://en.wikipedia.org/wiki/Cuneiform "Cuneiform") are increasingly acquired using [3D scanners](https://en.wikipedia.org/wiki/3D_scanner "3D scanner"), benchmark datasets are becoming available, including *HeiCuBeDa*[\[157\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-HeiCuBeDa_Hilprecht-160) providing almost 2000 normalized 2-D and 3-D datasets prepared with the [GigaMesh Software Framework](https://en.wikipedia.org/wiki/GigaMesh_Software_Framework "GigaMesh Software Framework").[\[158\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ICDAR19-161) So [curvature](https://en.wikipedia.org/wiki/Curvature "Curvature")\-based measures are used in conjunction with geometric neural networks (GNNs), e.g. for period classification of those clay tablets being among the oldest documents of human history.[\[159\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ICFHR20-162)[\[160\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-ICFHR20_Presentation-163)
For many applications, training data is not very available. Convolutional neural networks usually require a large amount of training data in order to avoid [overfitting](https://en.wikipedia.org/wiki/Overfitting "Overfitting"). A common technique is to train the network on a larger data set from a related domain. Once the network parameters have converged an additional training step is performed using the in-domain data to fine-tune the network weights, this is known as [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning "Transfer learning"). Furthermore, this technique allows convolutional network architectures to successfully be applied to problems with tiny training sets.[\[161\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-164)
## Human interpretable explanations
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=63 "Edit section: Human interpretable explanations")\]
End-to-end training and prediction are common practice in [computer vision](https://en.wikipedia.org/wiki/Computer_vision "Computer vision"). However, human interpretable explanations are required for [critical systems](https://en.wikipedia.org/wiki/Safety-critical_system "Safety-critical system") such as [self-driving cars](https://en.wikipedia.org/wiki/Self-driving_car "Self-driving car").[\[162\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Interpretable_ML_Symposium_2017-165) With recent advances in [visual salience](https://en.wikipedia.org/wiki/Salience_\(neuroscience\) "Salience (neuroscience)"), [spatial attention](https://en.wikipedia.org/wiki/Visual_spatial_attention "Visual spatial attention"), and [temporal attention](https://en.wikipedia.org/wiki/Visual_temporal_attention "Visual temporal attention"), the most critical spatial regions/temporal instants could be visualized to justify the CNN predictions.[\[163\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Zang_Wang_Liu_Zhang_2018_pp._97%E2%80%93108-166)[\[164\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Wang_Zang_Zhang_Niu_p=1979-167)
A deep Q-network (DQN) is a type of deep learning model that combines a deep neural network with [Q-learning](https://en.wikipedia.org/wiki/Q-learning "Q-learning"), a form of [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning "Reinforcement learning"). Unlike earlier reinforcement learning agents, DQNs that utilize CNNs can learn directly from high-dimensional sensory inputs via reinforcement learning.[\[165\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-Ong_Chavez_Hong_2015-168)
Preliminary results were presented in 2014, with an accompanying paper in February 2015.[\[166\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-DQN-169) The research described an application to [Atari 2600](https://en.wikipedia.org/wiki/Atari_2600 "Atari 2600") gaming. Other deep reinforcement learning models preceded it.[\[167\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-170)
### Deep belief networks
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=66 "Edit section: Deep belief networks")\]
[Convolutional deep belief networks](https://en.wikipedia.org/wiki/Convolutional_deep_belief_network "Convolutional deep belief network") (CDBN) have structure very similar to convolutional neural networks and are trained similarly to deep belief networks. Therefore, they exploit the 2D structure of images, like CNNs do, and make use of pre-training like [deep belief networks](https://en.wikipedia.org/wiki/Deep_belief_network "Deep belief network"). They provide a generic structure that can be used in many image and signal processing tasks. Benchmark results on standard image datasets like CIFAR[\[168\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-CDBN-CIFAR-171) have been obtained using CDBNs.[\[169\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-CDBN-172)
[](https://en.wikipedia.org/wiki/File:Neural_Abstraction_Pyramid.jpg)
Neural abstraction pyramid
### Neural abstraction pyramid
\[[edit](https://en.wikipedia.org/w/index.php?title=Convolutional_neural_network&action=edit§ion=67 "Edit section: Neural abstraction pyramid")\]
The feed-forward architecture of convolutional neural networks was extended in the neural abstraction pyramid[\[170\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-173) by lateral and feedback connections. The resulting recurrent convolutional network allows for the flexible incorporation of contextual information to iteratively resolve local ambiguities. In contrast to previous models, image-like outputs at the highest resolution were generated, e.g., for semantic segmentation, image reconstruction, and object localization tasks.
- [Caffe](https://en.wikipedia.org/wiki/Caffe_\(software\) "Caffe (software)"): A library for convolutional neural networks. Created by the Berkeley Vision and Learning Center (BVLC). It supports both CPU and GPU. Developed in [C++](https://en.wikipedia.org/wiki/C%2B%2B "C++"), and has [Python](https://en.wikipedia.org/wiki/Python_\(programming_language\) "Python (programming language)") and [MATLAB](https://en.wikipedia.org/wiki/MATLAB "MATLAB") wrappers.
- [Deeplearning4j](https://en.wikipedia.org/wiki/Deeplearning4j "Deeplearning4j"): Deep learning in [Java](https://en.wikipedia.org/wiki/Java_\(programming_language\) "Java (programming language)") and [Scala](https://en.wikipedia.org/wiki/Scala_\(programming_language\) "Scala (programming language)") on multi-GPU-enabled [Spark](https://en.wikipedia.org/wiki/Apache_Spark "Apache Spark"). A general-purpose deep learning library for the JVM production stack running on a C++ scientific computing engine. Allows the creation of custom layers. Integrates with Hadoop and Kafka.
- [Dlib](https://en.wikipedia.org/wiki/Dlib "Dlib"): A toolkit for making real world machine learning and data analysis applications in C++.
- [Microsoft Cognitive Toolkit](https://en.wikipedia.org/wiki/Microsoft_Cognitive_Toolkit "Microsoft Cognitive Toolkit"): A deep learning toolkit written by Microsoft with several unique features enhancing scalability over multiple nodes. It supports full-fledged interfaces for training in C++ and Python and with additional support for model inference in [C\#](https://en.wikipedia.org/wiki/C_Sharp_\(programming_language\) "C Sharp (programming language)") and Java.
- [TensorFlow](https://en.wikipedia.org/wiki/TensorFlow "TensorFlow"): [Apache 2.0](https://en.wikipedia.org/wiki/Apache_License#Version_2.0 "Apache License")\-licensed Theano-like library with support for CPU, GPU, Google's proprietary [tensor processing unit](https://en.wikipedia.org/wiki/Tensor_processing_unit "Tensor processing unit") (TPU),[\[171\]](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_note-174) and mobile devices.
- [Theano](https://en.wikipedia.org/wiki/Theano_\(software\) "Theano (software)"): The reference deep-learning library for Python with an API largely compatible with the popular [NumPy](https://en.wikipedia.org/wiki/NumPy "NumPy") library. Allows user to write symbolic mathematical expressions, then automatically generates their derivatives, saving the user from having to code gradients or backpropagation. These symbolic expressions are automatically compiled to [CUDA](https://en.wikipedia.org/wiki/CUDA "CUDA") code for a fast, [on-the-GPU](https://en.wikipedia.org/wiki/Compute_kernel "Compute kernel") implementation.
- [Torch](https://en.wikipedia.org/wiki/Torch_\(machine_learning\) "Torch (machine learning)"): A [scientific computing](https://en.wikipedia.org/wiki/Scientific_computing "Scientific computing") framework with wide support for machine learning algorithms, written in [C](https://en.wikipedia.org/wiki/C_\(programming_language\) "C (programming language)") and [Lua](https://en.wikipedia.org/wiki/Lua_\(programming_language\) "Lua (programming language)").
- [Attention (machine learning)](https://en.wikipedia.org/wiki/Attention_\(machine_learning\) "Attention (machine learning)")
- [Circuit (neural network)](https://en.wikipedia.org/wiki/Circuit_\(neural_network\) "Circuit (neural network)")
- [Convolution](https://en.wikipedia.org/wiki/Convolution "Convolution")
- [Deep learning](https://en.wikipedia.org/wiki/Deep_learning "Deep learning")
- [Natural-language processing](https://en.wikipedia.org/wiki/Natural-language_processing "Natural-language processing")
- [Neocognitron](https://en.wikipedia.org/wiki/Neocognitron "Neocognitron")
- [Scale-invariant feature transform](https://en.wikipedia.org/wiki/Scale-invariant_feature_transform "Scale-invariant feature transform")
- [Time delay neural network](https://en.wikipedia.org/wiki/Time_delay_neural_network "Time delay neural network")
- [Vision processing unit](https://en.wikipedia.org/wiki/Vision_processing_unit "Vision processing unit")
1. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-75)** When applied to other types of data than image data, such as sound data, "spatial position" may variously correspond to different points in the [time domain](https://en.wikipedia.org/wiki/Time_domain "Time domain"), [frequency domain](https://en.wikipedia.org/wiki/Frequency_domain "Frequency domain"), or other [mathematical spaces](https://en.wikipedia.org/wiki/Space_\(mathematics\) "Space (mathematics)").
2. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-78)** hence the name "convolutional layer"
3. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-90)** So-called [categorical data](https://en.wikipedia.org/wiki/Categorical_data "Categorical data").
1. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LeCun2015_1-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LeCun2015_1-1)
LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015-05-28). ["Deep learning"](https://hal.science/hal-04206682). *Nature*. **521** (7553): 436–444\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2015Natur.521..436L](https://ui.adsabs.harvard.edu/abs/2015Natur.521..436L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/nature14539](https://doi.org/10.1038%2Fnature14539). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1476-4687](https://search.worldcat.org/issn/1476-4687). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [26017442](https://pubmed.ncbi.nlm.nih.gov/26017442).
2. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-2)**
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. (December 1989). ["Backpropagation Applied to Handwritten Zip Code Recognition"](https://ieeexplore.ieee.org/document/6795724). *Neural Computation*. **1** (4): 541–551\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/neco.1989.1.4.541](https://doi.org/10.1162%2Fneco.1989.1.4.541). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0899-7667](https://search.worldcat.org/issn/0899-7667).
3. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto3_3-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto3_3-1)
Venkatesan, Ragav; Li, Baoxin (2017-10-23). [*Convolutional Neural Networks in Visual Computing: A Concise Guide*](https://books.google.com/books?id=bAM7DwAAQBAJ&q=vanishing+gradient). CRC Press. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-351-65032-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-351-65032-8 "Special:BookSources/978-1-351-65032-8")
. [Archived](https://web.archive.org/web/20231016190415/https://books.google.com/books?id=bAM7DwAAQBAJ&q=vanishing+gradient#v=snippet&q=vanishing%20gradient&f=false) from the original on 2023-10-16. Retrieved 2020-12-13.
4. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto2_4-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto2_4-1)
Balas, Valentina E.; Kumar, Raghvendra; Srivastava, Rajshree (2019-11-19). [*Recent Trends and Advances in Artificial Intelligence and Internet of Things*](https://books.google.com/books?id=XRS_DwAAQBAJ&q=exploding+gradient). Springer Nature. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-030-32644-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-030-32644-9 "Special:BookSources/978-3-030-32644-9")
. [Archived](https://web.archive.org/web/20231016190414/https://books.google.com/books?id=XRS_DwAAQBAJ&q=exploding+gradient#v=snippet&q=exploding%20gradient&f=false) from the original on 2023-10-16. Retrieved 2020-12-13.
5. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-5)**
Zhang, Yingjie; Soon, Hong Geok; Ye, Dongsen; Fuh, Jerry Ying Hsi; Zhu, Kunpeng (September 2020). "Powder-Bed Fusion Process Monitoring by Machine Vision With Hybrid Convolutional Neural Networks". *IEEE Transactions on Industrial Informatics*. **16** (9): 5769–5779\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2020ITII...16.5769Z](https://ui.adsabs.harvard.edu/abs/2020ITII...16.5769Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TII.2019.2956078](https://doi.org/10.1109%2FTII.2019.2956078). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1941-0050](https://search.worldcat.org/issn/1941-0050). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [213010088](https://api.semanticscholar.org/CorpusID:213010088).
6. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-6)**
Chervyakov, N.I.; Lyakhov, P.A.; Deryabin, M.A.; Nagornov, N.N.; Valueva, M.V.; Valuev, G.V. (September 2020). ["Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network"](https://linkinghub.elsevier.com/retrieve/pii/S092523122030583X). *Neurocomputing*. **407**: 439–453\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.neucom.2020.04.018](https://doi.org/10.1016%2Fj.neucom.2020.04.018). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [219470398](https://api.semanticscholar.org/CorpusID:219470398). [Archived](https://web.archive.org/web/20230629155646/https://linkinghub.elsevier.com/retrieve/pii/S092523122030583X) from the original on 2023-06-29. Retrieved 2023-08-12. "Convolutional neural networks represent deep learning architectures that are currently used in a wide range of applications, including computer vision, speech recognition, malware dedection, time series analysis in finance, and many others."
7. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto1_7-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-auto1_7-1)
Aghdam, Hamed Habibi; Heravi, Elnaz Jahani (2017-05-30). *Guide to convolutional neural networks: a practical application to traffic-sign detection and classification*. Cham, Switzerland: Springer. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-319-57549-0](https://en.wikipedia.org/wiki/Special:BookSources/978-3-319-57549-0 "Special:BookSources/978-3-319-57549-0")
. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [987790957](https://search.worldcat.org/oclc/987790957).
8. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-homma_8-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-homma_8-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-homma_8-2)
Homma, Toshiteru; Les Atlas; Robert Marks II (1987). ["An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification"](https://proceedings.neurips.cc/paper_files/paper/1987/file/853f7b3615411c82a2ae439ab8c4c96e-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*. **1**: 31–40\. [Archived](https://web.archive.org/web/20220331211142/https://proceedings.neurips.cc/paper/1987/file/98f13708210194c475687be6106a3b84-Paper.pdf) (PDF) from the original on 2022-03-31. Retrieved 2022-03-31. "The notion of convolution or correlation used in the models presented is popular in engineering disciplines and has been applied extensively to designing filters, control systems, etc."
9. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Valueva_Nagornov_Lyakhov_Valuev_2020_pp._232%E2%80%93243_9-0)**
Valueva, M.V.; Nagornov, N.N.; Lyakhov, P.A.; Valuev, G.V.; Chervyakov, N.I. (2020). "Application of the residue number system to reduce hardware costs of the convolutional neural network implementation". *Mathematics and Computers in Simulation*. **177**. Elsevier BV: 232–243\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.matcom.2020.04.031](https://doi.org/10.1016%2Fj.matcom.2020.04.031). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0378-4754](https://search.worldcat.org/issn/0378-4754). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [218955622](https://api.semanticscholar.org/CorpusID:218955622). "Convolutional neural networks are a promising tool for solving the problem of pattern recognition."
10. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-10)**
van den Oord, Aaron; Dieleman, Sander; Schrauwen, Benjamin (2013-01-01). Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, K. Q. (eds.). [*Deep content-based music recommendation*](https://proceedings.neurips.cc/paper/2013/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf) (PDF). Curran Associates, Inc. pp. 2643–2651\. [Archived](https://web.archive.org/web/20220307172303/https://proceedings.neurips.cc/paper/2013/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf) (PDF) from the original on 2022-03-07. Retrieved 2022-03-31.
11. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-11)**
Collobert, Ronan; Weston, Jason (2008-01-01). "A unified architecture for natural language processing". *Proceedings of the 25th international conference on Machine learning - ICML '08*. New York, NY, US: ACM. pp. 160–167\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/1390156.1390177](https://doi.org/10.1145%2F1390156.1390177). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-60558-205-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60558-205-4 "Special:BookSources/978-1-60558-205-4")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2617020](https://api.semanticscholar.org/CorpusID:2617020).
12. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-12)**
Avilov, Oleksii; Rimbert, Sebastien; Popov, Anton; Bougrain, Laurent (July 2020). ["Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals"](https://ieeexplore.ieee.org/document/9176228). [*2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)*](https://hal.inria.fr/hal-02920320/file/Avilov_EMBC2020.pdf) (PDF). Vol. 2020. Montreal, QC, Canada: IEEE. pp. 142–145\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/EMBC44109.2020.9176228](https://doi.org/10.1109%2FEMBC44109.2020.9176228). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-7281-1990-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-7281-1990-8 "Special:BookSources/978-1-7281-1990-8")
. [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [33017950](https://pubmed.ncbi.nlm.nih.gov/33017950). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [221386616](https://api.semanticscholar.org/CorpusID:221386616). [Archived](https://web.archive.org/web/20220519135428/https://hal.inria.fr/hal-02920320/file/Avilov_EMBC2020.pdf) (PDF) from the original on 2022-05-19. Retrieved 2023-07-21.
13. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Tsantekidis_7%E2%80%9312_13-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Tsantekidis_7%E2%80%9312_13-1)
Tsantekidis, Avraam; Passalis, Nikolaos; Tefas, Anastasios; Kanniainen, Juho; Gabbouj, Moncef; Iosifidis, Alexandros (July 2017). "Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks". *2017 IEEE 19th Conference on Business Informatics (CBI)*. Thessaloniki, Greece: IEEE. pp. 7–12\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CBI.2017.23](https://doi.org/10.1109%2FCBI.2017.23). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-5386-3035-8](https://en.wikipedia.org/wiki/Special:BookSources/978-1-5386-3035-8 "Special:BookSources/978-1-5386-3035-8")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [4950757](https://api.semanticscholar.org/CorpusID:4950757).
14. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:0_14-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:0_14-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:0_14-2)
Zhang, Wei (1988). ["Shift-invariant pattern recognition neural network and its optical architecture"](https://drive.google.com/file/d/1nN_5odSG_QVae54EsQN_qSz-0ZsX6wA0/view?usp=sharing). *Proceedings of Annual Conference of the Japan Society of Applied Physics*. [Archived](https://web.archive.org/web/20200623051222/https://drive.google.com/file/d/1nN_5odSG_QVae54EsQN_qSz-0ZsX6wA0/view?usp=sharing) from the original on 2020-06-23. Retrieved 2020-06-22.
15. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:1_15-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:1_15-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:1_15-2)
Zhang, Wei (1990). ["Parallel distributed processing model with local space-invariant interconnections and its optical architecture"](https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view?usp=sharing). *Applied Optics*. **29** (32): 4790–7\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1990ApOpt..29.4790Z](https://ui.adsabs.harvard.edu/abs/1990ApOpt..29.4790Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1364/AO.29.004790](https://doi.org/10.1364%2FAO.29.004790). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20577468](https://pubmed.ncbi.nlm.nih.gov/20577468). [Archived](https://web.archive.org/web/20170206111407/https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
16. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-2) [***d***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-3) [***e***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-4) [***f***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:6_16-5)
Mouton, Coenraad; Myburgh, Johannes C.; Davel, Marelie H. (2020). ["Stride and Translation Invariance in CNNs"](https://link.springer.com/chapter/10.1007%2F978-3-030-66151-9_17). In Gerber, Aurona (ed.). *Artificial Intelligence Research*. Communications in Computer and Information Science. Vol. 1342. Cham: Springer International Publishing. pp. 267–281\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2103\.10097](https://arxiv.org/abs/2103.10097). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-030-66151-9\_17](https://doi.org/10.1007%2F978-3-030-66151-9_17). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-030-66151-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-030-66151-9 "Special:BookSources/978-3-030-66151-9")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [232269854](https://api.semanticscholar.org/CorpusID:232269854). [Archived](https://web.archive.org/web/20210627074505/https://link.springer.com/chapter/10.1007%2F978-3-030-66151-9_17) from the original on 2021-06-27. Retrieved 2021-03-26.
17. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-17)**
Kurtzman, Thomas (August 20, 2019). ["Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6701836). *PLOS ONE*. **14** (8) e0220113. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019PLoSO..1420113C](https://ui.adsabs.harvard.edu/abs/2019PLoSO..1420113C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1371/journal.pone.0220113](https://doi.org/10.1371%2Fjournal.pone.0220113). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6701836](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6701836). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [31430292](https://pubmed.ncbi.nlm.nih.gov/31430292).
18. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-fukuneoscholar_18-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-fukuneoscholar_18-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-fukuneoscholar_18-2)
Fukushima, K. (2007). ["Neocognitron"](https://doi.org/10.4249%2Fscholarpedia.1717). *Scholarpedia*. **2** (1): 1717. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2007SchpJ...2.1717F](https://ui.adsabs.harvard.edu/abs/2007SchpJ...2.1717F). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.4249/scholarpedia.1717](https://doi.org/10.4249%2Fscholarpedia.1717).
19. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-hubelwiesel1968_19-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-hubelwiesel1968_19-1)
Hubel, D. H.; Wiesel, T. N. (1968-03-01). ["Receptive fields and functional architecture of monkey striate cortex"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557912). *The Journal of Physiology*. **195** (1): 215–243\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1113/jphysiol.1968.sp008455](https://doi.org/10.1113%2Fjphysiol.1968.sp008455). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0022-3751](https://search.worldcat.org/issn/0022-3751). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [1557912](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557912). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [4966457](https://pubmed.ncbi.nlm.nih.gov/4966457).
20. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-intro_20-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-intro_20-1)
Fukushima, Kunihiko (1980). ["Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position"](https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (PDF). *Biological Cybernetics*. **36** (4): 193–202\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/BF00344251](https://doi.org/10.1007%2FBF00344251). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [7370364](https://pubmed.ncbi.nlm.nih.gov/7370364). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [206775608](https://api.semanticscholar.org/CorpusID:206775608). [Archived](https://web.archive.org/web/20140603013137/http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (PDF) from the original on 3 June 2014. Retrieved 16 November 2013.
21. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-robust_face_detection_21-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-robust_face_detection_21-1)
Matusugu, Masakazu; Katsuhiko Mori; Yusuke Mitari; Yuji Kaneda (2003). ["Subject independent facial expression recognition with robust face detection using a convolutional neural network"](http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/sparse/matsugo_etal_face_expression_conv_nnet.pdf) (PDF). *Neural Networks*. **16** (5): 555–559\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2003NN.....16..555M](https://ui.adsabs.harvard.edu/abs/2003NN.....16..555M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/S0893-6080(03)00115-1](https://doi.org/10.1016%2FS0893-6080%2803%2900115-1). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [12850007](https://pubmed.ncbi.nlm.nih.gov/12850007). [Archived](https://web.archive.org/web/20131213022740/http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/sparse/matsugo_etal_face_expression_conv_nnet.pdf) (PDF) from the original on 13 December 2013. Retrieved 17 November 2013.
22. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-22)** Convolutional Neural Networks Demystified: A Matched Filtering Perspective Based Tutorial <https://arxiv.org/abs/2108.11663v3>
23. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-deeplearning_23-0)**
["Convolutional Neural Networks (LeNet) – DeepLearning 0.1 documentation"](https://web.archive.org/web/20171228091645/http://deeplearning.net/tutorial/lenet.html). *DeepLearning 0.1*. LISA Lab. Archived from [the original](http://deeplearning.net/tutorial/lenet.html) on 28 December 2017. Retrieved 31 August 2013.
24. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-24)**
Chollet, François (2017-04-04). "Xception: Deep Learning with Depthwise Separable Convolutions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1610\.02357](https://arxiv.org/abs/1610.02357) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
25. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-flexible_25-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-flexible_25-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-flexible_25-2)
Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber (2011). ["Flexible, High Performance Convolutional Neural Networks for Image Classification"](https://people.idsia.ch/~juergen/ijcai2011.pdf) (PDF). *Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two*. **2**: 1237–1242\. [Archived](https://web.archive.org/web/20220405190128/https://people.idsia.ch/~juergen/ijcai2011.pdf) (PDF) from the original on 5 April 2022. Retrieved 17 November 2013.
26. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-26)**
[Krizhevsky](https://en.wikipedia.org/wiki/Alex_Krizhevsky "Alex Krizhevsky"), Alex. ["ImageNet Classification with Deep Convolutional Neural Networks"](https://image-net.org/static_files/files/supervision.pdf) (PDF). [Archived](https://web.archive.org/web/20210425025127/http://www.image-net.org/static_files/files/supervision.pdf) (PDF) from the original on 25 April 2021. Retrieved 17 November 2013.
27. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Yamaguchi111990_27-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Yamaguchi111990_27-1)
Yamaguchi, Kouichi; Sakamoto, Kenji; Akabane, Toshio; Fujimoto, Yoshiji (November 1990). [*A Neural Network for Speaker-Independent Isolated Word Recognition*](https://web.archive.org/web/20210307233750/https://www.isca-speech.org/archive/icslp_1990/i90_1077.html). First International Conference on Spoken Language Processing (ICSLP 90). Kobe, Japan. Archived from [the original](https://www.isca-speech.org/archive/icslp_1990/i90_1077.html) on 2021-03-07. Retrieved 2019-09-04.
28. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-2) [***d***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-mcdns_28-3)
Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen (June 2012). "Multi-column deep neural networks for image classification". *2012 IEEE Conference on Computer Vision and Pattern Recognition*. New York, NY: [Institute of Electrical and Electronics Engineers](https://en.wikipedia.org/wiki/Institute_of_Electrical_and_Electronics_Engineers "Institute of Electrical and Electronics Engineers") (IEEE). pp. 3642–3649\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1202\.2745](https://arxiv.org/abs/1202.2745). [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.300.3283](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.300.3283). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CVPR.2012.6248110](https://doi.org/10.1109%2FCVPR.2012.6248110). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4673-1226-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4673-1226-4 "Special:BookSources/978-1-4673-1226-4")
. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [812295155](https://search.worldcat.org/oclc/812295155). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2161592](https://api.semanticscholar.org/CorpusID:2161592).
29. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-29)**
Yu, Fisher; Koltun, Vladlen (2016-04-30). "Multi-Scale Context Aggregation by Dilated Convolutions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1511\.07122](https://arxiv.org/abs/1511.07122) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
30. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-30)**
Chen, Liang-Chieh; Papandreou, George; Schroff, Florian; Adam, Hartwig (2017-12-05). "Rethinking Atrous Convolution for Semantic Image Segmentation". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1706\.05587](https://arxiv.org/abs/1706.05587) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
31. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-31)**
Duta, Ionut Cosmin; Georgescu, Mariana Iuliana; Ionescu, Radu Tudor (2021-08-16). "Contextual Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2108\.07387](https://arxiv.org/abs/2108.07387) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
32. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LeCun_32-0)**
LeCun, Yann. ["LeNet-5, convolutional neural networks"](http://yann.lecun.com/exdb/lenet/). [Archived](https://web.archive.org/web/20210224225707/http://yann.lecun.com/exdb/lenet/) from the original on 24 February 2021. Retrieved 16 November 2013.
33. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-33)**
Zeiler, Matthew D.; Taylor, Graham W.; Fergus, Rob (November 2011). ["Adaptive deconvolutional networks for mid and high level feature learning"](https://dx.doi.org/10.1109/iccv.2011.6126474). *2011 International Conference on Computer Vision*. IEEE. pp. 2018–2025\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/iccv.2011.6126474](https://doi.org/10.1109%2Ficcv.2011.6126474). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4577-1102-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4577-1102-2 "Special:BookSources/978-1-4577-1102-2")
.
34. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-34)**
Dumoulin, Vincent; Visin, Francesco (2018-01-11), *A guide to convolution arithmetic for deep learning*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1603\.07285](https://arxiv.org/abs/1603.07285)
35. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-35)**
Odena, Augustus; Dumoulin, Vincent; Olah, Chris (2016-10-17). ["Deconvolution and Checkerboard Artifacts"](https://distill.pub/2016/deconv-checkerboard/). *Distill*. **1** (10) e3. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.23915/distill.00003](https://doi.org/10.23915%2Fdistill.00003). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2476-0757](https://search.worldcat.org/issn/2476-0757).
36. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-36)**
van Dyck, Leonard Elia; Kwitt, Roland; Denzler, Sebastian Jochen; Gruber, Walter Roland (2021). ["Comparing Object Recognition in Humans and Deep Convolutional Neural Networks—An Eye Tracking Study"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8526843). *Frontiers in Neuroscience*. **15** 750639. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3389/fnins.2021.750639](https://doi.org/10.3389%2Ffnins.2021.750639). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1662-453X](https://search.worldcat.org/issn/1662-453X). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [8526843](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8526843). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [34690686](https://pubmed.ncbi.nlm.nih.gov/34690686).
37. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:4_37-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:4_37-1)
Hubel, DH; Wiesel, TN (October 1959). ["Receptive fields of single neurones in the cat's striate cortex"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130). *J. Physiol*. **148** (3): 574–91\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1113/jphysiol.1959.sp006308](https://doi.org/10.1113%2Fjphysiol.1959.sp006308). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [1363130](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [14403679](https://pubmed.ncbi.nlm.nih.gov/14403679).
38. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-38)**
David H. Hubel and Torsten N. Wiesel (2005). [*Brain and visual perception: the story of a 25-year collaboration*](https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106). Oxford University Press US. p. 106. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-0-19-517618-6](https://en.wikipedia.org/wiki/Special:BookSources/978-0-19-517618-6 "Special:BookSources/978-0-19-517618-6")
. [Archived](https://web.archive.org/web/20231016190414/https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106#v=onepage&q&f=false) from the original on 2023-10-16. Retrieved 2019-01-18.
39. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Fukushima1969_39-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Fukushima1969_39-1)
Fukushima, K. (1969). "Visual feature extraction by a multilayered network of analog threshold elements". *IEEE Transactions on Systems Science and Cybernetics*. **5** (4): 322–333\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1969ITSSC...5..322F](https://ui.adsabs.harvard.edu/abs/1969ITSSC...5..322F). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TSSC.1969.300225](https://doi.org/10.1109%2FTSSC.1969.300225).
40. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-DLhistory_40-0)**
[Schmidhuber, Juergen](https://en.wikipedia.org/wiki/Juergen_Schmidhuber "Juergen Schmidhuber") (2022). "Annotated History of Modern AI and Deep Learning". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2212\.11279](https://arxiv.org/abs/2212.11279) \[[cs.NE](https://arxiv.org/archive/cs.NE)\].
41. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-41)**
Ramachandran, Prajit; Barret, Zoph; Quoc, V. Le (October 16, 2017). "Searching for Activation Functions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1710\.05941](https://arxiv.org/abs/1710.05941) \[[cs.NE](https://arxiv.org/archive/cs.NE)\].
42. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Waibel1987_42-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Waibel1987_42-1)
Waibel, Alex (18 December 1987). [*Phoneme Recognition Using Time-Delay Neural Networks*](https://isl.iar.kit.edu/downloads/Pheome_Recognition_Using_Time-Delay_Neural_Networks_SP87-100_6.pdf) (PDF). Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE). Tokyo, Japan.
43. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-speechsignal_43-0)** [Alexander Waibel](https://en.wikipedia.org/wiki/Alex_Waibel "Alex Waibel") et al., *[Phoneme Recognition Using Time-Delay Neural Networks](http://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf) [Archived](https://web.archive.org/web/20210225163001/http://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf) 2021-02-25 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")* IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 37, No. 3, pp. 328. - 339 March 1989.
44. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-44)**
LeCun, Yann; Bengio, Yoshua (1995). ["Convolutional networks for images, speech, and time series"](https://www.researchgate.net/publication/2453996). In Arbib, Michael A. (ed.). *The handbook of brain theory and neural networks* (Second ed.). The MIT press. pp. 276–278\. [Archived](https://web.archive.org/web/20200728164116/https://www.researchgate.net/publication/2453996_Convolutional_Networks_for_Images_Speech_and_Time-Series) from the original on 2020-07-28. Retrieved 2019-12-03.
45. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Hampshire1990_45-0)** John B. Hampshire and Alexander Waibel, *[Connectionist Architectures for Multi-Speaker Phoneme Recognition](https://proceedings.neurips.cc/paper/1989/file/979d472a84804b9f647bc185a877a8b5-Paper.pdf) [Archived](https://web.archive.org/web/20220331225059/https://proceedings.neurips.cc/paper/1989/file/979d472a84804b9f647bc185a877a8b5-Paper.pdf) 2022-03-31 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")*, Advances in Neural Information Processing Systems, 1990, Morgan Kaufmann.
46. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Ko2017_46-0)**
Ko, Tom; Peddinti, Vijayaditya; Povey, Daniel; Seltzer, Michael L.; Khudanpur, Sanjeev (March 2018). [*A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition*](https://www.danielpovey.com/files/2017_icassp_reverberation.pdf) (PDF). The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017). New Orleans, LA, US. [Archived](https://web.archive.org/web/20180708072725/http://danielpovey.com/files/2017_icassp_reverberation.pdf) (PDF) from the original on 2018-07-08. Retrieved 2019-09-04.
47. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-47)** Denker, J S, Gardner, W R, Graf, H. P, Henderson, D, Howard, R E, Hubbard, W, Jackel, L D, BaIrd, H S, and Guyon (1989) [Neural network recognizer for hand-written zip code digits](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.852.5499&rep=rep1&type=pdf) [Archived](https://web.archive.org/web/20180804013916/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.852.5499&rep=rep1&type=pdf) 2018-08-04 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine"), AT\&T Bell Laboratories
48. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:2_48-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:2_48-1) Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, [Backpropagation Applied to Handwritten Zip Code Recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) [Archived](https://web.archive.org/web/20200110090230/http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) 2020-01-10 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine"); AT\&T Bell Laboratories
49. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1991_49-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1991_49-1)
Zhang, Wei (1991). ["Image processing of human corneal endothelium based on a learning network"](https://drive.google.com/file/d/0B65v6Wo67Tk5cm5DTlNGd0NPUmM/view?usp=sharing). *Applied Optics*. **30** (29): 4211–7\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1991ApOpt..30.4211Z](https://ui.adsabs.harvard.edu/abs/1991ApOpt..30.4211Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1364/AO.30.004211](https://doi.org/10.1364%2FAO.30.004211). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20706526](https://pubmed.ncbi.nlm.nih.gov/20706526). [Archived](https://web.archive.org/web/20170206122612/https://drive.google.com/file/d/0B65v6Wo67Tk5cm5DTlNGd0NPUmM/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
50. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1994_50-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:wz1994_50-1)
Zhang, Wei (1994). ["Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant artificial neural network"](https://drive.google.com/file/d/0B65v6Wo67Tk5Ml9qeW5nQ3poVTQ/view?usp=sharing). *Medical Physics*. **21** (4): 517–24\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1994MedPh..21..517Z](https://ui.adsabs.harvard.edu/abs/1994MedPh..21..517Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1118/1.597177](https://doi.org/10.1118%2F1.597177). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [8058017](https://pubmed.ncbi.nlm.nih.gov/8058017). [Archived](https://web.archive.org/web/20170206030321/https://drive.google.com/file/d/0B65v6Wo67Tk5Ml9qeW5nQ3poVTQ/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
51. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-weng1993_51-0)**
Weng, J; Ahuja, N; Huang, TS (1993). "Learning recognition and segmentation of 3-D objects from 2-D images". *1993 (4th) International Conference on Computer Vision*. IEEE. pp. 121–128\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICCV.1993.378228](https://doi.org/10.1109%2FICCV.1993.378228). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[0-8186-3870-2](https://en.wikipedia.org/wiki/Special:BookSources/0-8186-3870-2 "Special:BookSources/0-8186-3870-2")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [8619176](https://api.semanticscholar.org/CorpusID:8619176).
52. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-schdeepscholar_52-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-schdeepscholar_52-1)
Schmidhuber, Jürgen (2015). ["Deep Learning"](http://www.scholarpedia.org/article/Deep_Learning). *Scholarpedia*. **10** (11): 1527–54\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.76.1541](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1541). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/neco.2006.18.7.1527](https://doi.org/10.1162%2Fneco.2006.18.7.1527). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [16764513](https://pubmed.ncbi.nlm.nih.gov/16764513). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2309950](https://api.semanticscholar.org/CorpusID:2309950). [Archived](https://web.archive.org/web/20160419024349/http://www.scholarpedia.org/article/Deep_Learning) from the original on 2016-04-19. Retrieved 2019-01-20.
53. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-lecun95_53-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-lecun95_53-1)
Lecun, Y.; Jackel, L. D.; Bottou, L.; Cortes, C.; Denker, J. S.; Drucker, H.; Guyon, I.; Muller, U. A.; Sackinger, E.; Simard, P.; Vapnik, V. (August 1995). [*Learning algorithms for classification: A comparison on handwritten digit recognition*](http://yann.lecun.com/exdb/publis/pdf/lecun-95a.pdf) (PDF). World Scientific. pp. 261–276\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1142/2808](https://doi.org/10.1142%2F2808). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-981-02-2324-3](https://en.wikipedia.org/wiki/Special:BookSources/978-981-02-2324-3 "Special:BookSources/978-981-02-2324-3")
. [Archived](https://web.archive.org/web/20230502220356/http://yann.lecun.com/exdb/publis/pdf/lecun-95a.pdf) (PDF) from the original on 2 May 2023.
54. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-54)**
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. (November 1998). "Gradient-based learning applied to document recognition". *Proceedings of the IEEE*. **86** (11): 2278–2324\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1998IEEEP..86.2278L](https://ui.adsabs.harvard.edu/abs/1998IEEEP..86.2278L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/5.726791](https://doi.org/10.1109%2F5.726791).
55. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-55)**
Zhang, Wei (1991). ["Error Back Propagation with Minimum-Entropy Weights: A Technique for Better Generalization of 2-D Shift-Invariant NNs"](https://drive.google.com/file/d/0B65v6Wo67Tk5dkJTcEMtU2c5Znc/view?usp=sharing). *Proceedings of the International Joint Conference on Neural Networks*. [Archived](https://web.archive.org/web/20170206155801/https://drive.google.com/file/d/0B65v6Wo67Tk5dkJTcEMtU2c5Znc/view?usp=sharing) from the original on 2017-02-06. Retrieved 2016-09-22.
56. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-56)** Daniel Graupe, Ruey Wen Liu, George S Moschytz."[Applications of neural networks to medical signal processing](https://www.researchgate.net/profile/Daniel_Graupe2/publication/241130197_Applications_of_signal_and_image_processing_to_medicine/links/575eef7e08aec91374b42bd2.pdf) [Archived](https://web.archive.org/web/20200728164114/https://www.researchgate.net/profile/Daniel_Graupe2/publication/241130197_Applications_of_signal_and_image_processing_to_medicine/links/575eef7e08aec91374b42bd2.pdf) 2020-07-28 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")". In Proc. 27th IEEE Decision and Control Conf., pp. 343–347, 1988.
57. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-57)** Daniel Graupe, Boris Vern, G. Gruener, Aaron Field, and Qiu Huang. "[Decomposition of surface EMG signals into single fiber action potentials by means of neural network](https://ieeexplore.ieee.org/abstract/document/100522/) [Archived](https://web.archive.org/web/20190904161656/https://ieeexplore.ieee.org/abstract/document/100522/) 2019-09-04 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")". Proc. IEEE International Symp. on Circuits and Systems, pp. 1008–1011, 1989.
58. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-58)** Qiu Huang, Daniel Graupe, Yi Fang Huang, Ruey Wen Liu."[Identification of firing patterns of neuronal signals](http://www.academia.edu/download/42092095/graupe_huang_q_huang_yf_liu_rw_1989.pdf)\[*[dead link](https://en.wikipedia.org/wiki/Wikipedia:Link_rot "Wikipedia:Link rot")*\]." In Proc. 28th IEEE Decision and Control Conf., pp. 266–271, 1989. <https://ieeexplore.ieee.org/document/70115> [Archived](https://web.archive.org/web/20220331211138/https://ieeexplore.ieee.org/document/70115) 2022-03-31 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")
59. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-59)**
Oh, KS; Jung, K (2004). "GPU implementation of neural networks". *Pattern Recognition*. **37** (6): 1311–1314\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2004PatRe..37.1311O](https://ui.adsabs.harvard.edu/abs/2004PatRe..37.1311O). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.patcog.2004.01.013](https://doi.org/10.1016%2Fj.patcog.2004.01.013).
60. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-60)**
Dave Steinkraus; Patrice Simard; Ian Buck (2005). ["Using GPUs for Machine Learning Algorithms"](https://www.computer.org/csdl/proceedings-article/icdar/2005/24201115/12OmNylKAVX). *12th International Conference on Document Analysis and Recognition (ICDAR 2005)*. pp. 1115–1119\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICDAR.2005.251](https://doi.org/10.1109%2FICDAR.2005.251). [Archived](https://web.archive.org/web/20220331211138/https://www.computer.org/csdl/proceedings-article/icdar/2005/24201115/12OmNylKAVX) from the original on 2022-03-31. Retrieved 2022-03-31.
61. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-61)**
Kumar Chellapilla; Sid Puri; Patrice Simard (2006). ["High Performance Convolutional Neural Networks for Document Processing"](https://hal.inria.fr/inria-00112631/document). In Lorette, Guy (ed.). *Tenth International Workshop on Frontiers in Handwriting Recognition*. Suvisoft. [Archived](https://web.archive.org/web/20200518193413/https://hal.inria.fr/inria-00112631/document) from the original on 2020-05-18. Retrieved 2016-03-14.
62. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-62)**
Hinton, GE; Osindero, S; Teh, YW (Jul 2006). "A fast learning algorithm for deep belief nets". *Neural Computation*. **18** (7): 1527–54\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.76.1541](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1541). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/neco.2006.18.7.1527](https://doi.org/10.1162%2Fneco.2006.18.7.1527). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [16764513](https://pubmed.ncbi.nlm.nih.gov/16764513). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2309950](https://api.semanticscholar.org/CorpusID:2309950).
63. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-63)**
Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan; Larochelle, Hugo (2007). ["Greedy Layer-Wise Training of Deep Networks"](https://proceedings.neurips.cc/paper/2006/file/5da713a690c067105aeb2fae32403405-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*: 153–160\. [Archived](https://web.archive.org/web/20220602144141/https://proceedings.neurips.cc/paper/2006/file/5da713a690c067105aeb2fae32403405-Paper.pdf) (PDF) from the original on 2022-06-02. Retrieved 2022-03-31.
64. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-64)**
Ranzato, MarcAurelio; Poultney, Christopher; Chopra, Sumit; LeCun, Yann (2007). ["Efficient Learning of Sparse Representations with an Energy-Based Model"](http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf) (PDF). *Advances in Neural Information Processing Systems*. [Archived](https://web.archive.org/web/20160322112400/http://yann.lecun.com/exdb/publis/pdf/ranzato-06.pdf) (PDF) from the original on 2016-03-22. Retrieved 2014-06-26.
65. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-LSD_1_65-0)**
Raina, R; Madhavan, A; Ng, Andrew (14 June 2009). ["Large-scale deep unsupervised learning using graphics processors"](http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf) (PDF). *Proceedings of the 26th Annual International Conference on Machine Learning*. ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. pp. 873–880\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/1553374.1553486](https://doi.org/10.1145%2F1553374.1553486). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-60558-516-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60558-516-1 "Special:BookSources/978-1-60558-516-1")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [392458](https://api.semanticscholar.org/CorpusID:392458). [Archived](https://web.archive.org/web/20201208104513/http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf) (PDF) from the original on 8 December 2020. Retrieved 22 December 2023.
66. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-66)**
Ciresan, Dan; Meier, Ueli; Gambardella, Luca; Schmidhuber, Jürgen (2010). "Deep big simple neural nets for handwritten digit recognition". *Neural Computation*. **22** (12): 3207–3220\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1003\.0358](https://arxiv.org/abs/1003.0358). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2010NeCom..22.3207C](https://ui.adsabs.harvard.edu/abs/2010NeCom..22.3207C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1162/NECO\_a\_00052](https://doi.org/10.1162%2FNECO_a_00052). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [20858131](https://pubmed.ncbi.nlm.nih.gov/20858131). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [1918673](https://api.semanticscholar.org/CorpusID:1918673).
67. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-67)**
["IJCNN 2011 Competition result table"](https://benchmark.ini.rub.de/gtsrb_results.html). *OFFICIAL IJCNN2011 COMPETITION*. 2010. [Archived](https://web.archive.org/web/20210117024729/https://benchmark.ini.rub.de/gtsrb_results.html) from the original on 2021-01-17. Retrieved 2019-01-14.
68. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-68)**
Schmidhuber, Jürgen (17 March 2017). ["History of computer vision contests won by deep CNNs on GPU"](https://people.idsia.ch/~juergen/computer-vision-contests-won-by-gpu-cnns.html). [Archived](https://web.archive.org/web/20181219224934/http://people.idsia.ch/~juergen/computer-vision-contests-won-by-gpu-cnns.html) from the original on 19 December 2018. Retrieved 14 January 2019.
69. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:02_69-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:02_69-1)
Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2017-05-24). ["ImageNet classification with deep convolutional neural networks"](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (PDF). *Communications of the ACM*. **60** (6): 84–90\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/3065386](https://doi.org/10.1145%2F3065386). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0001-0782](https://search.worldcat.org/issn/0001-0782). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [195908774](https://api.semanticscholar.org/CorpusID:195908774). [Archived](https://web.archive.org/web/20170516174757/http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (PDF) from the original on 2017-05-16. Retrieved 2018-12-04.
70. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-70)**
Viebke, Andre; Memeti, Suejb; Pllana, Sabri; Abraham, Ajith (2019). "CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi". *The Journal of Supercomputing*. **75** (1): 197–227\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.07908](https://arxiv.org/abs/1702.07908). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/s11227-017-1994-x](https://doi.org/10.1007%2Fs11227-017-1994-x). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [14135321](https://api.semanticscholar.org/CorpusID:14135321).
71. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-71)**
Viebke, Andre; Pllana, Sabri (2015). ["The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning"](http://lnu.diva-portal.org/smash/record.jsf?pid=diva2%3A877421&dswid=4277). *2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems*. *IEEE Xplore*. IEEE 2015. pp. 758–765\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/HPCC-CSS-ICESS.2015.45](https://doi.org/10.1109%2FHPCC-CSS-ICESS.2015.45). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4799-8937-9](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-8937-9 "Special:BookSources/978-1-4799-8937-9")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [15411954](https://api.semanticscholar.org/CorpusID:15411954). [Archived](https://web.archive.org/web/20230306003530/http://lnu.diva-portal.org/smash/record.jsf?pid=diva2:877421&dswid=4277) from the original on 2023-03-06. Retrieved 2022-03-31.
72. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-72)**
Hinton, Geoffrey (2012). ["ImageNet Classification with Deep Convolutional Neural Networks"](https://dl.acm.org/doi/10.5555/2999134.2999257). *NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1*. **1**: 1097–1105\. [Archived](https://web.archive.org/web/20191220014019/https://dl.acm.org/citation.cfm?id=2999134.2999257) from the original on 2019-12-20. Retrieved 2021-03-26 – via ACM.
73. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-1) [***c***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-2) [***d***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-3) [***e***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:5_73-4)
Azulay, Aharon; Weiss, Yair (2019). ["Why do deep convolutional networks generalize so poorly to small image transformations?"](https://jmlr.org/papers/v20/19-519.html). *Journal of Machine Learning Research*. **20** (184): 1–25\. [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1533-7928](https://search.worldcat.org/issn/1533-7928). [Archived](https://web.archive.org/web/20220331211138/https://jmlr.org/papers/v20/19-519.html) from the original on 2022-03-31. Retrieved 2022-03-31.
74. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-G%C3%A9ron_Hands-on_ML_2019_74-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-G%C3%A9ron_Hands-on_ML_2019_74-1)
Géron, Aurélien (2019). *Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow*. Sebastopol, CA: O'Reilly Media. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-492-03264-9](https://en.wikipedia.org/wiki/Special:BookSources/978-1-492-03264-9 "Special:BookSources/978-1-492-03264-9")
.
, pp. 448
75. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-76)**
Li, Zewen; Liu, Fan; Yang, Wenjie; Peng, Shouheng; Zhou, Jun (December 2022). "A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects". *IEEE Transactions on Neural Networks and Learning Systems*. **33** (12): 6999–7019\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2004\.02806](https://arxiv.org/abs/2004.02806). [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2022ITNNL..33.6999L](https://ui.adsabs.harvard.edu/abs/2022ITNNL..33.6999L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TNNLS.2021.3084827](https://doi.org/10.1109%2FTNNLS.2021.3084827). [hdl](https://en.wikipedia.org/wiki/Hdl_\(identifier\) "Hdl (identifier)"):[10072/405164](https://hdl.handle.net/10072%2F405164). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [34111009](https://pubmed.ncbi.nlm.nih.gov/34111009).
76. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-77)**
["CS231n Convolutional Neural Networks for Visual Recognition"](https://cs231n.github.io/convolutional-networks/). *cs231n.github.io*. [Archived](https://web.archive.org/web/20191023031945/https://cs231n.github.io/convolutional-networks/) from the original on 2019-10-23. Retrieved 2017-04-25.
77. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-79)**
Nirthika, Rajendran; Manivannan, Siyamalan; Ramanan, Amirthalingam; Wang, Ruixuan (2022-04-01). ["Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804673). *Neural Computing and Applications*. **34** (7): 5321–5347\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/s00521-022-06953-8](https://doi.org/10.1007%2Fs00521-022-06953-8). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1433-3058](https://search.worldcat.org/issn/1433-3058). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [8804673](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8804673). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [35125669](https://pubmed.ncbi.nlm.nih.gov/35125669).
78. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Scherer-ICANN-2010_80-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Scherer-ICANN-2010_80-1)
Scherer, Dominik; Müller, Andreas C.; Behnke, Sven (2010). ["Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition"](http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf) (PDF). *Artificial Neural Networks (ICANN), 20th International Conference on*. Thessaloniki, Greece: Springer. pp. 92–101\. [Archived](https://web.archive.org/web/20180403185041/http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf) (PDF) from the original on 2018-04-03. Retrieved 2016-12-28.
79. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-81)**
Graham, Benjamin (2014-12-18). "Fractional Max-Pooling". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.6071](https://arxiv.org/abs/1412.6071) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
80. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-82)**
Springenberg, Jost Tobias; Dosovitskiy, Alexey; Brox, Thomas; Riedmiller, Martin (2014-12-21). "Striving for Simplicity: The All Convolutional Net". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.6806](https://arxiv.org/abs/1412.6806) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
81. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Ma_Chang_Xie_Ding_2019_pp._3224%E2%80%933233_83-0)**
Ma, Zhanyu; Chang, Dongliang; Xie, Jiyang; Ding, Yifeng; Wen, Shaoguo; Li, Xiaoxu; Si, Zhongwei; Guo, Jun (2019). "Fine-Grained Vehicle Classification With Channel Max Pooling Modified CNNs". *IEEE Transactions on Vehicular Technology*. **68** (4). Institute of Electrical and Electronics Engineers (IEEE): 3224–3233\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019ITVT...68.3224M](https://ui.adsabs.harvard.edu/abs/2019ITVT...68.3224M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/tvt.2019.2899972](https://doi.org/10.1109%2Ftvt.2019.2899972). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0018-9545](https://search.worldcat.org/issn/0018-9545). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [86674074](https://api.semanticscholar.org/CorpusID:86674074).
82. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-84)**
Zafar, Afia; Aamir, Muhammad; Mohd Nawi, Nazri; Arshad, Ali; Riaz, Saman; Alruban, Abdulrahman; Dutta, Ashit Kumar; Almotairi, Sultan (2022-08-29). ["A Comparison of Pooling Methods for Convolutional Neural Networks"](https://doi.org/10.3390%2Fapp12178643). *Applied Sciences*. **12** (17): 8643. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2022ApSci..12.8643Z](https://ui.adsabs.harvard.edu/abs/2022ApSci..12.8643Z). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/app12178643](https://doi.org/10.3390%2Fapp12178643). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2076-3417](https://search.worldcat.org/issn/2076-3417).
83. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-85)**
Gholamalinezhad, Hossein; Khosravi, Hossein (2020-09-16), *Pooling Methods in Deep Neural Networks, a Review*, [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2009\.07485](https://arxiv.org/abs/2009.07485)
84. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-86)**
Householder, Alston S. (June 1941). ["A theory of steady-state activity in nerve-fiber networks: I. Definitions and preliminary lemmas"](http://link.springer.com/10.1007/BF02478220). *The Bulletin of Mathematical Biophysics*. **3** (2): 63–69\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/BF02478220](https://doi.org/10.1007%2FBF02478220). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0007-4985](https://search.worldcat.org/issn/0007-4985).
85. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Romanuke4_87-0)**
Romanuke, Vadim (2017). ["Appropriate number and allocation of ReLUs in convolutional neural networks"](https://doi.org/10.20535%2F1810-0546.2017.1.88156). *Research Bulletin of NTUU "Kyiv Polytechnic Institute"*. **1** (1): 69–78\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.20535/1810-0546.2017.1.88156](https://doi.org/10.20535%2F1810-0546.2017.1.88156).
86. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-glorot2011_88-0)**
Xavier Glorot; Antoine Bordes; [Yoshua Bengio](https://en.wikipedia.org/wiki/Yoshua_Bengio "Yoshua Bengio") (2011). [*Deep sparse rectifier neural networks*](https://web.archive.org/web/20161213022121/http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf) (PDF). AISTATS. Archived from [the original](http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf) (PDF) on 2016-12-13. Retrieved 2023-04-10. "Rectifier and softplus activation functions. The second one is a smooth version of the first."
87. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-89)**
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. (2012). ["Imagenet classification with deep convolutional neural networks"](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*. **1**: 1097–1105\. [Archived](https://web.archive.org/web/20220331224736/https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) (PDF) from the original on 2022-03-31. Retrieved 2022-03-31.
88. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-91)**
Ribeiro, Antonio H.; Schön, Thomas B. (2021). "How Convolutional Neural Networks Deal with Aliasing". *ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*. pp. 2755–2759\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2102\.07757](https://arxiv.org/abs/2102.07757). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICASSP39728.2021.9414627](https://doi.org/10.1109%2FICASSP39728.2021.9414627). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-7281-7605-5](https://en.wikipedia.org/wiki/Special:BookSources/978-1-7281-7605-5 "Special:BookSources/978-1-7281-7605-5")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [231925012](https://api.semanticscholar.org/CorpusID:231925012).
89. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-92)**
Myburgh, Johannes C.; Mouton, Coenraad; Davel, Marelie H. (2020). ["Tracking Translation Invariance in CNNS"](https://link.springer.com/chapter/10.1007%2F978-3-030-66151-9_18). In Gerber, Aurona (ed.). *Artificial Intelligence Research*. Communications in Computer and Information Science. Vol. 1342. Cham: Springer International Publishing. pp. 282–295\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2104\.05997](https://arxiv.org/abs/2104.05997). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-030-66151-9\_18](https://doi.org/10.1007%2F978-3-030-66151-9_18). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-030-66151-9](https://en.wikipedia.org/wiki/Special:BookSources/978-3-030-66151-9 "Special:BookSources/978-3-030-66151-9")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [233219976](https://api.semanticscholar.org/CorpusID:233219976). [Archived](https://web.archive.org/web/20220122015258/http://link.springer.com/chapter/10.1007/978-3-030-66151-9_18) from the original on 2022-01-22. Retrieved 2021-03-26.
90. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-93)**
Richard, Zhang (2019-04-25). *Making Convolutional Networks Shift-Invariant Again*. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [1106340711](https://search.worldcat.org/oclc/1106340711).
91. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-94)**
Jadeberg, Max; Simonyan, Karen; Zisserman, Andrew; Kavukcuoglu, Koray (2015). ["Spatial Transformer Networks"](https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf) (PDF). *Advances in Neural Information Processing Systems*. **28**. [Archived](https://web.archive.org/web/20210725115312/https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf) (PDF) from the original on 2021-07-25. Retrieved 2021-03-26 – via NIPS.
92. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-95)**
Sabour, Sara; Frosst, Nicholas; Hinton, Geoffrey E. (2017-10-26). *Dynamic Routing Between Capsules*. [OCLC](https://en.wikipedia.org/wiki/OCLC_\(identifier\) "OCLC (identifier)") [1106278545](https://search.worldcat.org/oclc/1106278545).
93. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-96)**
Matiz, Sergio; [Barner, Kenneth E.](https://en.wikipedia.org/wiki/Kenneth_E._Barner "Kenneth E. Barner") (2019-06-01). ["Inductive conformal predictor for convolutional neural networks: Applications to active learning for image classification"](https://www.sciencedirect.com/science/article/abs/pii/S003132031930055X). *Pattern Recognition*. **90**: 172–182\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019PatRe..90..172M](https://ui.adsabs.harvard.edu/abs/2019PatRe..90..172M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/j.patcog.2019.01.035](https://doi.org/10.1016%2Fj.patcog.2019.01.035). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0031-3203](https://search.worldcat.org/issn/0031-3203). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [127253432](https://api.semanticscholar.org/CorpusID:127253432). [Archived](https://web.archive.org/web/20210929092610/https://www.sciencedirect.com/science/article/abs/pii/S003132031930055X) from the original on 2021-09-29. Retrieved 2021-09-29.
94. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-97)**
Wieslander, Håkan; Harrison, Philip J.; Skogberg, Gabriel; Jackson, Sonya; Fridén, Markus; Karlsson, Johan; Spjuth, Ola; Wählby, Carolina (February 2021). ["Deep Learning With Conformal Prediction for Hierarchical Analysis of Large-Scale Whole-Slide Tissue Images"](https://doi.org/10.1109%2FJBHI.2020.2996300). *IEEE Journal of Biomedical and Health Informatics*. **25** (2): 371–380\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2021IJBHI..25..371W](https://ui.adsabs.harvard.edu/abs/2021IJBHI..25..371W). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/JBHI.2020.2996300](https://doi.org/10.1109%2FJBHI.2020.2996300). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2168-2208](https://search.worldcat.org/issn/2168-2208). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [32750907](https://pubmed.ncbi.nlm.nih.gov/32750907). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [219885788](https://api.semanticscholar.org/CorpusID:219885788).
95. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-98)**
Srivastava, Nitish; C. Geoffrey Hinton; Alex Krizhevsky; Ilya Sutskever; Ruslan Salakhutdinov (2014). ["Dropout: A Simple Way to Prevent Neural Networks from overfitting"](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) (PDF). *Journal of Machine Learning Research*. **15** (1): 1929–1958\. [Archived](https://web.archive.org/web/20160119155849/http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) (PDF) from the original on 2016-01-19. Retrieved 2015-01-03.
96. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-99)**
["Regularization of Neural Networks using DropConnect \| ICML 2013 \| JMLR W\&CP"](http://proceedings.mlr.press/v28/wan13.html). *jmlr.org*: 1058–1066\. 2013-02-13. [Archived](https://web.archive.org/web/20170812080411/http://proceedings.mlr.press/v28/wan13.html) from the original on 2017-08-12. Retrieved 2015-12-17.
97. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-100)**
Zeiler, Matthew D.; Fergus, Rob (2013-01-15). "Stochastic Pooling for Regularization of Deep Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1301\.3557](https://arxiv.org/abs/1301.3557) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
98. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:3_101-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:3_101-1)
Platt, John; Steinkraus, Dave; Simard, Patrice Y. (August 2003). ["Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis – Microsoft Research"](https://www.microsoft.com/en-us/research/publication/best-practices-for-convolutional-neural-networks-applied-to-visual-document-analysis/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D68920). *Microsoft Research*. [Archived](https://web.archive.org/web/20171107112839/https://www.microsoft.com/en-us/research/publication/best-practices-for-convolutional-neural-networks-applied-to-visual-document-analysis/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2F%3Fid%3D68920) from the original on 2017-11-07. Retrieved 2015-12-17.
99. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-102)**
Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R. (2012). "Improving neural networks by preventing co-adaptation of feature detectors". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1207\.0580](https://arxiv.org/abs/1207.0580) \[[cs.NE](https://arxiv.org/archive/cs.NE)\].
100. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-103)**
["Dropout: A Simple Way to Prevent Neural Networks from Overfitting"](https://jmlr.org/papers/v15/srivastava14a.html). *jmlr.org*. [Archived](https://web.archive.org/web/20160305010425/http://jmlr.org/papers/v15/srivastava14a.html) from the original on 2016-03-05. Retrieved 2015-12-17.
101. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-104)**
Hinton, Geoffrey (1979). "Some demonstrations of the effects of structural descriptions in mental imagery". *Cognitive Science*. **3** (3): 231–250\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1016/s0364-0213(79)80008-7](https://doi.org/10.1016%2Fs0364-0213%2879%2980008-7).
102. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-105)** Rock, Irvin. "The frame of reference." The legacy of Solomon Asch: Essays in cognition and social psychology (1990): 243–268.
103. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-106)** J. Hinton, Coursera lectures on Neural Networks, 2012, Url: <https://www.coursera.org/learn/neural-networks> [Archived](https://web.archive.org/web/20161231174321/https://www.coursera.org/learn/neural-networks) 2016-12-31 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")
104. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-quartz_107-0)**
Dave Gershgorn (18 June 2018). ["The inside story of how AI got good enough to dominate Silicon Valley"](https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/). *[Quartz](https://en.wikipedia.org/wiki/Quartz_\(website\) "Quartz (website)")*. [Archived](https://web.archive.org/web/20191212224842/https://qz.com/1307091/the-inside-story-of-how-ai-got-good-enough-to-dominate-silicon-valley/) from the original on 12 December 2019. Retrieved 5 October 2018.
105. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-108)**
Lawrence, Steve; C. Lee Giles; Ah Chung Tsoi; Andrew D. Back (1997). "Face Recognition: A Convolutional Neural Network Approach". *IEEE Transactions on Neural Networks*. **8** (1): 98–113\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.92.5813](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.5813). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/72.554195](https://doi.org/10.1109%2F72.554195). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18255614](https://pubmed.ncbi.nlm.nih.gov/18255614). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [2883848](https://api.semanticscholar.org/CorpusID:2883848).
106. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-video_quality_109-0)**
Le Callet, Patrick; Christian Viard-Gaudin; Dominique Barba (2006). ["A Convolutional Neural Network Approach for Objective Video Quality Assessment"](https://hal.archives-ouvertes.fr/file/index/docid/287426/filename/A_convolutional_neural_network_approach_for_objective_video_quality_assessment_completefinal_manuscript.pdf) (PDF). *IEEE Transactions on Neural Networks*. **17** (5): 1316–1327\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2006ITNN...17.1316L](https://ui.adsabs.harvard.edu/abs/2006ITNN...17.1316L). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TNN.2006.879766](https://doi.org/10.1109%2FTNN.2006.879766). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [17001990](https://pubmed.ncbi.nlm.nih.gov/17001990). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [221185563](https://api.semanticscholar.org/CorpusID:221185563). [Archived](https://web.archive.org/web/20210224123804/https://hal.archives-ouvertes.fr/file/index/docid/287426/filename/A_convolutional_neural_network_approach_for_objective_video_quality_assessment_completefinal_manuscript.pdf) (PDF) from the original on 24 February 2021. Retrieved 17 November 2013.
107. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ILSVRC2014_110-0)**
["ImageNet Large Scale Visual Recognition Competition 2014 (ILSVRC2014)"](https://image-net.org/challenges/LSVRC/2014/results). [Archived](https://web.archive.org/web/20160205153105/http://www.image-net.org/challenges/LSVRC/2014/results) from the original on 5 February 2016. Retrieved 30 January 2016.
108. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-googlenet_111-0)**
Szegedy, Christian; Liu, Wei; Jia, Yangqing; Sermanet, Pierre; Reed, Scott E.; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (2015). "Going deeper with convolutions". *IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015*. IEEE Computer Society. pp. 1–9\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1409\.4842](https://arxiv.org/abs/1409.4842). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CVPR.2015.7298594](https://doi.org/10.1109%2FCVPR.2015.7298594). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4673-6964-0](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4673-6964-0 "Special:BookSources/978-1-4673-6964-0")
.
109. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-112)**
[Russakovsky, Olga](https://en.wikipedia.org/wiki/Olga_Russakovsky "Olga Russakovsky"); Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; Ma, Sean; Huang, Zhiheng; [Karpathy, Andrej](https://en.wikipedia.org/wiki/Andrej_Karpathy "Andrej Karpathy"); Khosla, Aditya; Bernstein, Michael; Berg, Alexander C.; Fei-Fei, Li (2014). "Image *Net* Large Scale Visual Recognition Challenge". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1409\.0575](https://arxiv.org/abs/1409.0575) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
110. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-113)**
["The Face Detection Algorithm Set To Revolutionize Image Search"](https://www.technologyreview.com/2015/02/16/169357/the-face-detection-algorithm-set-to-revolutionize-image-search/). *Technology Review*. February 16, 2015. [Archived](https://web.archive.org/web/20200920130711/https://www.technologyreview.com/2015/02/16/169357/the-face-detection-algorithm-set-to-revolutionize-image-search/) from the original on 20 September 2020. Retrieved 27 October 2017.
111. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-114)**
Baccouche, Moez; Mamalet, Franck; Wolf, Christian; Garcia, Christophe; Baskurt, Atilla (2011-11-16). "Sequential Deep Learning for Human Action Recognition". In Salah, Albert Ali; Lepri, Bruno (eds.). *Human Behavior Unterstanding*. Lecture Notes in Computer Science. Vol. 7065. Springer Berlin Heidelberg. pp. 29–39\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.385.4740](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.385.4740). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-642-25446-8\_4](https://doi.org/10.1007%2F978-3-642-25446-8_4). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-642-25445-1](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-25445-1 "Special:BookSources/978-3-642-25445-1")
.
112. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-115)**
Ji, Shuiwang; Xu, Wei; Yang, Ming; Yu, Kai (2013-01-01). "3D Convolutional Neural Networks for Human Action Recognition". *IEEE Transactions on Pattern Analysis and Machine Intelligence*. **35** (1): 221–231\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2013ITPAM..35..221J](https://ui.adsabs.harvard.edu/abs/2013ITPAM..35..221J). [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.169.4046](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.4046). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/TPAMI.2012.59](https://doi.org/10.1109%2FTPAMI.2012.59). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0162-8828](https://search.worldcat.org/issn/0162-8828). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [22392705](https://pubmed.ncbi.nlm.nih.gov/22392705). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [1923924](https://api.semanticscholar.org/CorpusID:1923924).
113. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-116)**
Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (2018). "Video-based Sign Language Recognition without Temporal Segmentation". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1801\.10111](https://arxiv.org/abs/1801.10111) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
114. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-117)** Karpathy, Andrej, et al. "[Large-scale video classification with convolutional neural networks](https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf) [Archived](https://web.archive.org/web/20190806022753/https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Karpathy_Large-scale_Video_Classification_2014_CVPR_paper.pdf) 2019-08-06 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")." IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014.
115. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-118)**
Simonyan, Karen; Zisserman, Andrew (2014). "Two-Stream Convolutional Networks for Action Recognition in Videos". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1406\.2199](https://arxiv.org/abs/1406.2199) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
(2014).
116. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Wang_Duan_Zhang_Niu_p=1657_119-0)**
Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22). ["Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation"](https://qilin-zhang.github.io/_pages/pdfs/Segment-Tube_Spatio-Temporal_Action_Localization_in_Untrimmed_Videos_with_Per-Frame_Segmentation.pdf) (PDF). *Sensors*. **18** (5): 1657. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2018Senso..18.1657W](https://ui.adsabs.harvard.edu/abs/2018Senso..18.1657W). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/s18051657](https://doi.org/10.3390%2Fs18051657). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1424-8220](https://search.worldcat.org/issn/1424-8220). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [5982167](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5982167). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [29789447](https://pubmed.ncbi.nlm.nih.gov/29789447). [Archived](https://web.archive.org/web/20210301195518/https://qilin-zhang.github.io/_pages/pdfs/Segment-Tube_Spatio-Temporal_Action_Localization_in_Untrimmed_Videos_with_Per-Frame_Segmentation.pdf) (PDF) from the original on 2021-03-01. Retrieved 2018-09-14.
117. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Duan_Wang_Zhai_Zheng_2018_p._120-0)**
Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). "Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation". *2018 25th IEEE International Conference on Image Processing (ICIP)*. 25th IEEE International Conference on Image Processing (ICIP). pp. 918–922\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/icip.2018.8451692](https://doi.org/10.1109%2Ficip.2018.8451692). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4799-7061-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4799-7061-2 "Special:BookSources/978-1-4799-7061-2")
.
118. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-121)**
Taylor, Graham W.; Fergus, Rob; LeCun, Yann; Bregler, Christoph (2010-01-01). [*Convolutional Learning of Spatio-temporal Features*](https://dl.acm.org/doi/10.5555/1888212). Proceedings of the 11th European Conference on Computer Vision: Part VI. ECCV'10. Berlin, Heidelberg: Springer-Verlag. pp. 140–153\. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-642-15566-6](https://en.wikipedia.org/wiki/Special:BookSources/978-3-642-15566-6 "Special:BookSources/978-3-642-15566-6")
. [Archived](https://web.archive.org/web/20220331211137/https://dl.acm.org/doi/10.5555/1888212) from the original on 2022-03-31. Retrieved 2022-03-31.
119. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-122)**
Le, Q. V.; Zou, W. Y.; Yeung, S. Y.; Ng, A. Y. (2011-01-01). "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis". *CVPR 2011*. CVPR '11. Washington, DC, US: IEEE Computer Society. pp. 3361–3368\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.294.5948](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.5948). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/CVPR.2011.5995496](https://doi.org/10.1109%2FCVPR.2011.5995496). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-4577-0394-2](https://en.wikipedia.org/wiki/Special:BookSources/978-1-4577-0394-2 "Special:BookSources/978-1-4577-0394-2")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [6006618](https://api.semanticscholar.org/CorpusID:6006618).
120. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-123)**
Grefenstette, Edward; Blunsom, Phil; de Freitas, Nando; Hermann, Karl Moritz (2014-04-29). "A Deep Architecture for Semantic Parsing". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1404\.7296](https://arxiv.org/abs/1404.7296) \[[cs.CL](https://arxiv.org/archive/cs.CL)\].
121. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-124)**
Mesnil, Gregoire; Deng, Li; Gao, Jianfeng; He, Xiaodong; Shen, Yelong (April 2014). ["Learning Semantic Representations Using Convolutional Neural Networks for Web Search – Microsoft Research"](https://www.microsoft.com/en-us/research/publication/learning-semantic-representations-using-convolutional-neural-networks-for-web-search/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D214617). *Microsoft Research*. [Archived](https://web.archive.org/web/20170915160617/https://www.microsoft.com/en-us/research/publication/learning-semantic-representations-using-convolutional-neural-networks-for-web-search/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D214617) from the original on 2017-09-15. Retrieved 2015-12-17.
122. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-125)**
Kalchbrenner, Nal; Grefenstette, Edward; Blunsom, Phil (2014-04-08). "A Convolutional Neural Network for Modelling Sentences". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1404\.2188](https://arxiv.org/abs/1404.2188) \[[cs.CL](https://arxiv.org/archive/cs.CL)\].
123. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-126)**
Kim, Yoon (2014-08-25). "Convolutional Neural Networks for Sentence Classification". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1408\.5882](https://arxiv.org/abs/1408.5882) \[[cs.CL](https://arxiv.org/archive/cs.CL)\].
124. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-127)** Collobert, Ronan, and Jason Weston. "[A unified architecture for natural language processing: Deep neural networks with multitask learning](https://thetalkingmachines.com/sites/default/files/2018-12/unified_nlp.pdf) [Archived](https://web.archive.org/web/20190904161653/https://thetalkingmachines.com/sites/default/files/2018-12/unified_nlp.pdf) 2019-09-04 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine")."Proceedings of the 25th international conference on Machine learning. ACM, 2008.
125. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-128)**
Collobert, Ronan; Weston, Jason; Bottou, Leon; Karlen, Michael; Kavukcuoglu, Koray; Kuksa, Pavel (2011-03-02). "Natural Language Processing (almost) from Scratch". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1103\.0398](https://arxiv.org/abs/1103.0398) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
126. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-129)**
Yin, W; Kann, K; Yu, M; Schütze, H (2017-03-02). "Comparative study of CNN and RNN for natural language processing". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1702\.01923](https://arxiv.org/abs/1702.01923) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
127. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-130)**
Bai, S.; Kolter, J.S.; Koltun, V. (2018). "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1803\.01271](https://arxiv.org/abs/1803.01271) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
128. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-131)**
Gruber, N. (2021). "Detecting dynamics of action in text with a recurrent neural network". *Neural Computing and Applications*. **33** (12): 15709–15718\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/S00521-021-06190-5](https://doi.org/10.1007%2FS00521-021-06190-5). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [236307579](https://api.semanticscholar.org/CorpusID:236307579).
129. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-132)**
Haotian, J.; Zhong, Li; Qianxiao, Li (2021). "Approximation Theory of Convolutional Architectures for Time Series Modelling". *International Conference on Machine Learning*. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[2107\.09355](https://arxiv.org/abs/2107.09355).
130. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-133)**
Bohnslav, James P; Wimalasena, Nivanthika K; Clausing, Kelsey J; Dai, Yu Y; Yarmolinsky, David A; Cruz, Tomás; Kashlan, Adam D; Chiappe, M Eugenia; Orefice, Lauren L; Woolf, Clifford J; Harvey, Christopher D (2021-09-02). ["DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455138). *eLife*. **10** e63377. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.7554/eLife.63377](https://doi.org/10.7554%2FeLife.63377). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2050-084X](https://search.worldcat.org/issn/2050-084X). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [8455138](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455138). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [34473051](https://pubmed.ncbi.nlm.nih.gov/34473051).
131. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:7_134-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:7_134-1)
Gernat, Tim; Jagla, Tobias; Jones, Beryl M.; Middendorf, Martin; Robinson, Gene E. (2023-01-27). ["Automated monitoring of honey bees with barcodes and artificial intelligence reveals two distinct social networks from a single affiliative behavior"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883485). *Scientific Reports*. **13** (1) 1541. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2023NatSR..13.1541G](https://ui.adsabs.harvard.edu/abs/2023NatSR..13.1541G). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/s41598-022-26825-4](https://doi.org/10.1038%2Fs41598-022-26825-4). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2045-2322](https://search.worldcat.org/issn/2045-2322). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [9883485](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883485). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [36707534](https://pubmed.ncbi.nlm.nih.gov/36707534).
132. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-135)**
Norouzzadeh, Mohammad Sadegh; Nguyen, Anh; Kosmala, Margaret; Swanson, Alexandra; Palmer, Meredith S.; Packer, Craig; Clune, Jeff (2018-06-19). ["Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6016780). *Proceedings of the National Academy of Sciences*. **115** (25): E5716–E5725. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2018PNAS..115E5716N](https://ui.adsabs.harvard.edu/abs/2018PNAS..115E5716N). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1073/pnas.1719367115](https://doi.org/10.1073%2Fpnas.1719367115). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [0027-8424](https://search.worldcat.org/issn/0027-8424). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6016780](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6016780). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [29871948](https://pubmed.ncbi.nlm.nih.gov/29871948).
133. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-136)**
Svenning, Asger; Mougeot, Guillaume; Alison, Jamie; Chevalier, Daphne; Molina, Nisa Luise Chavez; Ong, Song-Quan; Bjerge, Kim; Carrillo, Juli; Hoeye, Toke Thomas (2025-04-14). "A General Method for Detection and Segmentation of Terrestrial Arthropods in Images". [bioRxiv](https://en.wikipedia.org/wiki/BioRxiv_\(identifier\) "BioRxiv (identifier)") [10\.1101/2025.04.08.647223](https://doi.org/10.1101%2F2025.04.08.647223).
134. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-137)**
Torrents, Jordi; Costa, Tiago; De Polavieja, Gonzalo G. (2025-06-02). "New idtracker.ai: rethinking multi-animal tracking as a representation learning problem to increase accuracy and reduce tracking times". [bioRxiv](https://en.wikipedia.org/wiki/BioRxiv_\(identifier\) "BioRxiv (identifier)") [10\.1101/2025.05.30.657023](https://doi.org/10.1101%2F2025.05.30.657023).
135. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-138)**
Mathis, Alexander; Mamidanna, Pranav; Cury, Kevin M.; Abe, Taiga; Murthy, Venkatesh N.; Mathis, Mackenzie Weygandt; Bethge, Matthias (September 2018). ["DeepLabCut: markerless pose estimation of user-defined body parts with deep learning"](https://www.nature.com/articles/s41593-018-0209-y). *Nature Neuroscience*. **21** (9): 1281–1289\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/s41593-018-0209-y](https://doi.org/10.1038%2Fs41593-018-0209-y). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1097-6256](https://search.worldcat.org/issn/1097-6256). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [30127430](https://pubmed.ncbi.nlm.nih.gov/30127430).
136. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-139)**
Graving, Jacob M; Chae, Daniel; Naik, Hemal; Li, Liang; Koger, Benjamin; Costelloe, Blair R; Couzin, Iain D (2019-10-01). ["DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897514). *eLife*. **8** e47994. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2019eLife...847994G](https://ui.adsabs.harvard.edu/abs/2019eLife...847994G). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.7554/eLife.47994](https://doi.org/10.7554%2FeLife.47994). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [2050-084X](https://search.worldcat.org/issn/2050-084X). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6897514](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6897514). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [31570119](https://pubmed.ncbi.nlm.nih.gov/31570119).
137. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-140)**
Pereira, Talmo D.; Tabris, Nathaniel; Matsliah, Arie; Turner, David M.; Li, Junyu; Ravindranath, Shruthi; Papadoyannis, Eleni S.; Normand, Edna; Deutsch, David S.; Wang, Z. Yan; McKenzie-Smith, Grace C.; Mitelut, Catalin C.; Castro, Marielisa Diez; D’Uva, John; Kislin, Mikhail (May 2022). ["Publisher Correction: SLEAP: A deep learning system for multi-animal pose tracking"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9119847). *Nature Methods*. **19** (5): 628. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/s41592-022-01495-2](https://doi.org/10.1038%2Fs41592-022-01495-2). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1548-7091](https://search.worldcat.org/issn/1548-7091). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [9119847](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9119847). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [35468969](https://pubmed.ncbi.nlm.nih.gov/35468969).
138. ^ [***a***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:8_141-0) [***b***](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-:8_141-1)
Arac, Ahmet; Zhao, Pingping; Dobkin, Bruce H.; Carmichael, S. Thomas; Golshani, Peyman (2019-05-07). ["DeepBehavior: A Deep Learning Toolbox for Automated Analysis of Animal and Human Behavior Imaging Data"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513883). *Frontiers in Systems Neuroscience*. **13** 20. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3389/fnsys.2019.00020](https://doi.org/10.3389%2Ffnsys.2019.00020). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1662-5137](https://search.worldcat.org/issn/1662-5137). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6513883](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513883). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [31133826](https://pubmed.ncbi.nlm.nih.gov/31133826).
139. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-142)**
Ren, Hansheng; Xu, Bixiong; Wang, Yujing; Yi, Chao; Huang, Congrui; Kou, Xiaoyu; Xing, Tony; Yang, Mao; Tong, Jie; Zhang, Qi (2019). *Time-Series Anomaly Detection Service at Microsoft \| Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1906\.03821](https://arxiv.org/abs/1906.03821). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/3292500.3330680](https://doi.org/10.1145%2F3292500.3330680). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [182952311](https://api.semanticscholar.org/CorpusID:182952311).
140. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-143)**
Wallach, Izhar; Dzamba, Michael; Heifets, Abraham (2015-10-09). "AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1510\.02855](https://arxiv.org/abs/1510.02855) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
141. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-144)**
Yosinski, Jason; Clune, Jeff; Nguyen, Anh; Fuchs, Thomas; Lipson, Hod (2015-06-22). "Understanding Neural Networks Through Deep Visualization". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1506\.06579](https://arxiv.org/abs/1506.06579) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
142. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-145)**
["Toronto startup has a faster way to discover effective medicines"](https://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/). *The Globe and Mail*. [Archived](https://web.archive.org/web/20151020040115/http://www.theglobeandmail.com/report-on-business/small-business/starting-out/toronto-startup-has-a-faster-way-to-discover-effective-medicines/article25660419/) from the original on 2015-10-20. Retrieved 2015-11-09.
143. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-146)**
["Startup Harnesses Supercomputers to Seek Cures"](https://www.kqed.org/futureofyou/3461/startup-harnesses-supercomputers-to-seek-cures). *KQED Future of You*. 2015-05-27. [Archived](https://web.archive.org/web/20181206234956/https://www.kqed.org/futureofyou/3461/startup-harnesses-supercomputers-to-seek-cures) from the original on 2018-12-06. Retrieved 2015-11-09.
144. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-147)**
Chellapilla, K; Fogel, DB (1999). "Evolving neural networks to play checkers without relying on expert knowledge". *IEEE Trans Neural Netw*. **10** (6): 1382–91\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[1999ITNN...10.1382C](https://ui.adsabs.harvard.edu/abs/1999ITNN...10.1382C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/72.809083](https://doi.org/10.1109%2F72.809083). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18252639](https://pubmed.ncbi.nlm.nih.gov/18252639).
145. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-148)**
Chellapilla, K.; Fogel, D.B. (2001). "Evolving an expert checkers playing program without using human expertise". *IEEE Transactions on Evolutionary Computation*. **5** (4): 422–428\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2001ITEC....5..422C](https://ui.adsabs.harvard.edu/abs/2001ITEC....5..422C). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/4235.942536](https://doi.org/10.1109%2F4235.942536).
146. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-149)**
[Fogel, David](https://en.wikipedia.org/wiki/David_B._Fogel "David B. Fogel") (2001). *Blondie24: Playing at the Edge of AI*. San Francisco, CA: Morgan Kaufmann. [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-55860-783-5](https://en.wikipedia.org/wiki/Special:BookSources/978-1-55860-783-5 "Special:BookSources/978-1-55860-783-5")
.
147. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-150)**
Clark, Christopher; Storkey, Amos (2014). "Teaching Deep Convolutional Neural Networks to Play Go". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.3409](https://arxiv.org/abs/1412.3409) \[[cs.AI](https://arxiv.org/archive/cs.AI)\].
148. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-151)**
Maddison, Chris J.; Huang, Aja; Sutskever, Ilya; Silver, David (2014). "Move Evaluation in Go Using Deep Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1412\.6564](https://arxiv.org/abs/1412.6564) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
149. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-152)**
["AlphaGo – Google DeepMind"](https://web.archive.org/web/20160130230207/http://www.deepmind.com/alpha-go.html). Archived from [the original](https://www.deepmind.com/alpha-go.html) on 30 January 2016. Retrieved 30 January 2016.
150. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-153)**
Bai, Shaojie; Kolter, J. Zico; Koltun, Vladlen (2018-04-19). "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1803\.01271](https://arxiv.org/abs/1803.01271) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
151. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-154)**
Yu, Fisher; Koltun, Vladlen (2016-04-30). "Multi-Scale Context Aggregation by Dilated Convolutions". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1511\.07122](https://arxiv.org/abs/1511.07122) \[[cs.CV](https://arxiv.org/archive/cs.CV)\].
152. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-155)**
Borovykh, Anastasia; Bohte, Sander; Oosterlee, Cornelis W. (2018-09-17). "Conditional Time Series Forecasting with Convolutional Neural Networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1703\.04691](https://arxiv.org/abs/1703.04691) \[[stat.ML](https://arxiv.org/archive/stat.ML)\].
153. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-156)**
Mittelman, Roni (2015-08-03). "Time-series modeling with undecimated fully convolutional neural networks". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1508\.00317](https://arxiv.org/abs/1508.00317) \[[stat.ML](https://arxiv.org/archive/stat.ML)\].
154. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-157)**
Chen, Yitian; Kang, Yanfei; Chen, Yixiong; Wang, Zizhuo (2019-06-11). "Probabilistic Forecasting with Temporal Convolutional Neural Network". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1906\.04397](https://arxiv.org/abs/1906.04397) \[[stat.ML](https://arxiv.org/archive/stat.ML)\].
155. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-158)**
Zhao, Bendong; Lu, Huanzhang; Chen, Shangfeng; Liu, Junliang; Wu, Dongya (2017-02-01). "Convolutional neural networks for time series classi". *Journal of Systems Engineering and Electronics*. **28** (1): 162–169\. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.21629/JSEE.2017.01.18](https://doi.org/10.21629%2FJSEE.2017.01.18).
156. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-159)**
Petneházi, Gábor (2019-08-21). "QCNN: Quantile Convolutional Neural Network". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1908\.07978](https://arxiv.org/abs/1908.07978) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
157. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-HeiCuBeDa_Hilprecht_160-0)**
[Hubert Mara](https://en.wikipedia.org/wiki/Hubert_Mara "Hubert Mara") (2019-06-07), *HeiCuBeDa Hilprecht – Heidelberg Cuneiform Benchmark Dataset for the Hilprecht Collection* (in German), heiDATA – institutional repository for research data of Heidelberg University, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.11588/data/IE8CCN](https://doi.org/10.11588%2Fdata%2FIE8CCN)
158. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ICDAR19_161-0)**
Hubert Mara and Bartosz Bogacz (2019), "Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets", *Proceedings of the 15th International Conference on Document Analysis and Recognition (ICDAR)* (in German), Sydney, Australien, pp. 148–153, [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/ICDAR.2019.00032](https://doi.org/10.1109%2FICDAR.2019.00032), [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-7281-3014-9](https://en.wikipedia.org/wiki/Special:BookSources/978-1-7281-3014-9 "Special:BookSources/978-1-7281-3014-9")
, [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [211026941](https://api.semanticscholar.org/CorpusID:211026941)
`{{citation}}`: CS1 maint: work parameter with ISBN ([link](https://en.wikipedia.org/wiki/Category:CS1_maint:_work_parameter_with_ISBN "Category:CS1 maint: work parameter with ISBN"))
159. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ICFHR20_162-0)**
Bogacz, Bartosz; Mara, Hubert (2020), "Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks", *Proceedings of the 17th International Conference on Frontiers of Handwriting Recognition (ICFHR)*, Dortmund, Germany
160. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-ICFHR20_Presentation_163-0)** [Presentation of the ICFHR paper on Period Classification of 3D Cuneiform Tablets with Geometric Neural Networks](https://www.youtube.com/watch?v=-iFntE51HRw) on [YouTube](https://en.wikipedia.org/wiki/YouTube_video_\(identifier\) "YouTube video (identifier)")
161. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-164)** Durjoy Sen Maitra; Ujjwal Bhattacharya; S.K. Parui, ["CNN based common approach to handwritten character recognition of multiple scripts"](https://ieeexplore.ieee.org/document/7333916) [Archived](https://web.archive.org/web/20231016190918/https://ieeexplore.ieee.org/document/7333916) 2023-10-16 at the [Wayback Machine](https://en.wikipedia.org/wiki/Wayback_Machine "Wayback Machine"), in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, vol., no., pp.1021–1025, 23–26 Aug. 2015
162. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Interpretable_ML_Symposium_2017_165-0)**
["NIPS 2017"](https://web.archive.org/web/20190907063237/http://interpretable.ml/). *Interpretable ML Symposium*. 2017-10-20. Archived from [the original](http://interpretable.ml/) on 2019-09-07. Retrieved 2018-09-12.
163. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Zang_Wang_Liu_Zhang_2018_pp._97%E2%80%93108_166-0)**
Zang, Jinliang; Wang, Le; Liu, Ziyi; Zhang, Qilin; Hua, Gang; Zheng, Nanning (2018). "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition". *Artificial Intelligence Applications and Innovations*. IFIP Advances in Information and Communication Technology. Vol. 519. Cham: Springer International Publishing. pp. 97–108\. [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1803\.07179](https://arxiv.org/abs/1803.07179). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/978-3-319-92007-8\_9](https://doi.org/10.1007%2F978-3-319-92007-8_9). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-319-92006-1](https://en.wikipedia.org/wiki/Special:BookSources/978-3-319-92006-1 "Special:BookSources/978-3-319-92006-1")
. [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1868-4238](https://search.worldcat.org/issn/1868-4238). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [4058889](https://api.semanticscholar.org/CorpusID:4058889).
164. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Wang_Zang_Zhang_Niu_p=1979_167-0)**
Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-06-21). ["Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network"](https://qilin-zhang.github.io/_pages/pdfs/sensors-18-01979-Action_Recognition_by_an_Attention-Aware_Temporal_Weighted_Convolutional_Neural_Network.pdf) (PDF). *Sensors*. **18** (7): 1979. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2018Senso..18.1979W](https://ui.adsabs.harvard.edu/abs/2018Senso..18.1979W). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.3390/s18071979](https://doi.org/10.3390%2Fs18071979). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1424-8220](https://search.worldcat.org/issn/1424-8220). [PMC](https://en.wikipedia.org/wiki/PMC_\(identifier\) "PMC (identifier)") [6069475](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069475). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [29933555](https://pubmed.ncbi.nlm.nih.gov/29933555). [Archived](https://web.archive.org/web/20180913040055/https://qilin-zhang.github.io/_pages/pdfs/sensors-18-01979-Action_Recognition_by_an_Attention-Aware_Temporal_Weighted_Convolutional_Neural_Network.pdf) (PDF) from the original on 2018-09-13. Retrieved 2018-09-14.
165. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-Ong_Chavez_Hong_2015_168-0)**
Ong, Hao Yi; Chavez, Kevin; Hong, Augustus (2015-08-18). "Distributed Deep Q-Learning". [arXiv](https://en.wikipedia.org/wiki/ArXiv_\(identifier\) "ArXiv (identifier)"):[1508\.04186v2](https://arxiv.org/abs/1508.04186v2) \[[cs.LG](https://arxiv.org/archive/cs.LG)\].
166. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-DQN_169-0)**
Mnih, Volodymyr; et al. (2015). "Human-level control through deep reinforcement learning". *Nature*. **518** (7540): 529–533\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2015Natur.518..529M](https://ui.adsabs.harvard.edu/abs/2015Natur.518..529M). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1038/nature14236](https://doi.org/10.1038%2Fnature14236). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [25719670](https://pubmed.ncbi.nlm.nih.gov/25719670). [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [205242740](https://api.semanticscholar.org/CorpusID:205242740).
167. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-170)**
Sun, R.; Sessions, C. (June 2000). "Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors". *IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics*. **30** (3): 403–418\. [Bibcode](https://en.wikipedia.org/wiki/Bibcode_\(identifier\) "Bibcode (identifier)"):[2000ITSMB..30..403S](https://ui.adsabs.harvard.edu/abs/2000ITSMB..30..403S). [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.11.226](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.226). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1109/3477.846230](https://doi.org/10.1109%2F3477.846230). [ISSN](https://en.wikipedia.org/wiki/ISSN_\(identifier\) "ISSN (identifier)") [1083-4419](https://search.worldcat.org/issn/1083-4419). [PMID](https://en.wikipedia.org/wiki/PMID_\(identifier\) "PMID (identifier)") [18252373](https://pubmed.ncbi.nlm.nih.gov/18252373).
168. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-CDBN-CIFAR_171-0)**
["Convolutional Deep Belief Networks on CIFAR-10"](http://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf) (PDF). [Archived](https://web.archive.org/web/20170830060223/http://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf) (PDF) from the original on 2017-08-30. Retrieved 2017-08-18.
169. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-CDBN_172-0)**
Lee, Honglak; Grosse, Roger; Ranganath, Rajesh; Ng, Andrew Y. (1 January 2009). "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations". *Proceedings of the 26th Annual International Conference on Machine Learning*. ACM. pp. 609–616\. [CiteSeerX](https://en.wikipedia.org/wiki/CiteSeerX_\(identifier\) "CiteSeerX (identifier)") [10\.1.1.149.6800](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.6800). [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1145/1553374.1553453](https://doi.org/10.1145%2F1553374.1553453). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-1-60558-516-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-60558-516-1 "Special:BookSources/978-1-60558-516-1")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [12008458](https://api.semanticscholar.org/CorpusID:12008458).
170. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-173)**
Behnke, Sven (2003). [*Hierarchical Neural Networks for Image Interpretation*](https://www.ais.uni-bonn.de/books/LNCS2766.pdf) (PDF). Lecture Notes in Computer Science. Vol. 2766. Springer. [doi](https://en.wikipedia.org/wiki/Doi_\(identifier\) "Doi (identifier)"):[10\.1007/b11963](https://doi.org/10.1007%2Fb11963). [ISBN](https://en.wikipedia.org/wiki/ISBN_\(identifier\) "ISBN (identifier)")
[978-3-540-40722-5](https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-40722-5 "Special:BookSources/978-3-540-40722-5")
. [S2CID](https://en.wikipedia.org/wiki/S2CID_\(identifier\) "S2CID (identifier)") [1304548](https://api.semanticscholar.org/CorpusID:1304548). [Archived](https://web.archive.org/web/20170810020001/http://www.ais.uni-bonn.de/books/LNCS2766.pdf) (PDF) from the original on 2017-08-10. Retrieved 2016-12-28.
171. **[^](https://en.wikipedia.org/wiki/Convolutional_neural_network#cite_ref-174)**
Choi, Rene Y.; Coyner, Aaron S.; Kalpathy-Cramer, Jayashree; Chiang, Michael F.; Campbell, J. Peter (February 2020). ["Introduction to Machine Learning, Neural Networks, and Deep Learning"](https://tvst.arvojournals.org/article.aspx?articleid=2762344). *Wired*. [Archived](https://web.archive.org/web/20180113150305/https://www.wired.com/2016/05/google-tpu-custom-chips/) from the original on January 13, 2018. Retrieved March 6, 2017.
- [CS231n: Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/) — [Andrej Karpathy](https://en.wikipedia.org/wiki/Andrej_Karpathy "Andrej Karpathy")'s [Stanford](https://en.wikipedia.org/wiki/Stanford_University "Stanford University") computer science course on CNNs in computer vision
- [vdumoulin/conv\_arithmetic: A technical report on convolution arithmetic in the context of deep learning](https://github.com/vdumoulin/conv_arithmetic). Animations of convolutions. |
| Shard | 152 (laksa) |
| Root Hash | 17790707453426894952 |
| Unparsed URL | org,wikipedia!en,/wiki/Convolutional_neural_network s443 |