ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://docs.pytorch.org/docs/stable/notes/multiprocessing.html |
| Last Crawled | 2026-04-05 22:58:18 (1 day ago) |
| First Indexed | 2025-07-07 17:46:42 (9 months ago) |
| HTTP Status Code | 200 |
| Meta Title | Multiprocessing best practices — PyTorch 2.11 documentation |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Created On: Jan 16, 2017 | Last Updated On: Jun 18, 2025
torch.multiprocessing
is a drop in replacement for Python’s
multiprocessing
module. It supports the exact same operations,
but extends it, so that all tensors sent through a
multiprocessing.Queue
, will have their data moved into shared
memory and will only send a handle to another process.
Note
When a
Tensor
is sent to another process, the
Tensor
data is shared. If
torch.Tensor.grad
is
not
None
, it is also shared. After a
Tensor
without
a
torch.Tensor.grad
field is sent to the other process, it
creates a standard process-specific
.grad
Tensor
that
is not automatically shared across all processes, unlike how the
Tensor
’s data has been shared.
This allows to implement various training methods, like Hogwild, A3C, or any
others that require asynchronous operation.
Poison fork in multiprocessing
#
When using multiprocessing with
accelerators
, a known issue called “poison fork” may occur.
This happens when the accelerator’s runtime is not fork safe and is initialized before a process forks, leading to
runtime errors in child processes.
To prevent such errors:
Avoid initializing the accelerator in the main process before forking child processes.
Use an alternative process start methods, such as
spawn
or
forkserver
, which ensures a clean initialization of each process.
CUDA in multiprocessing
#
The CUDA runtime has the limitation described in
Poison fork in multiprocessing
when using the
fork
start method;
either the
spawn
or
forkserver
start method are required to use CUDA in subprocesses.
Note
The start method can be set via either creating a context with
multiprocessing.get_context(...)
or directly using
multiprocessing.set_start_method(...)
.
Unlike CPU tensors, the sending process is required to keep the original tensor
as long as the receiving process retains a copy of the tensor. It is implemented
under the hood but requires users to follow the best practices for the program
to run correctly. For example, the sending process must stay alive as long as
the consumer process has references to the tensor, and the refcounting can not
save you if the consumer process exits abnormally via a fatal signal. See
this section
.
See also:
Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel
Best practices and tips
#
Avoiding and fighting deadlocks
#
There are a lot of things that can go wrong when a new process is spawned, with
the most common cause of deadlocks being background threads. If there’s any
thread that holds a lock or imports a module, and
fork
is called, it’s very
likely that the subprocess will be in a corrupted state and will deadlock or
fail in a different way. Note that even if you don’t, Python built in
libraries do - no need to look further than
multiprocessing
.
multiprocessing.Queue
is actually a very complex class, that
spawns multiple threads used to serialize, send and receive objects, and they
can cause aforementioned problems too. If you find yourself in such situation
try using a
SimpleQueue
, that doesn’t
use any additional threads.
We’re trying our best to make it easy for you and ensure these deadlocks don’t
happen but some things are out of our control. If you have any issues you can’t
cope with for a while, try reaching out on forums, and we’ll see if it’s an
issue we can fix.
Reuse buffers passed through a Queue
#
Remember that each time you put a
Tensor
into a
multiprocessing.Queue
, it has to be moved into shared memory.
If it’s already shared, it is a no-op, otherwise it will incur an additional
memory copy that can slow down the whole process. Even if you have a pool of
processes sending data to a single one, make it send the buffers back - this
is nearly free and will let you avoid a copy when sending next batch.
Asynchronous multiprocess training (e.g. Hogwild)
#
Using
torch.multiprocessing
, it is possible to train a model
asynchronously, with parameters either shared all the time, or being
periodically synchronized. In the first case, we recommend sending over the whole
model object, while in the latter, we advise to only send the
state_dict()
.
We recommend using
multiprocessing.Queue
for passing all kinds
of PyTorch objects between processes. It is possible to e.g. inherit the tensors
and storages already in shared memory, when using the
fork
start method,
however it is very bug prone and should be used with care, and only by advanced
users. Queues, even though they’re sometimes a less elegant solution, will work
properly in all cases.
Warning
You should be careful about having global statements, that are not guarded
with an
if
__name__
==
'__main__'
. If a different start method than
fork
is used, they will be executed in all subprocesses.
Hogwild
#
A concrete Hogwild implementation can be found in the
examples repository
,
but to showcase the overall structure of the code, there’s also a minimal
example below as well:
import
torch.multiprocessing
as
mp
from
model
import
MyModel
def
train
(
model
):
# Construct data_loader, optimizer, etc.
for
data
,
labels
in
data_loader
:
optimizer
.
zero_grad
()
loss_fn
(
model
(
data
),
labels
)
.
backward
()
optimizer
.
step
()
# This will update the shared parameters
if
__name__
==
'__main__'
:
num_processes
=
4
model
=
MyModel
()
# NOTE: this is required for the ``fork`` method to work
model
.
share_memory
()
processes
=
[]
for
rank
in
range
(
num_processes
):
p
=
mp
.
Process
(
target
=
train
,
args
=
(
model
,))
p
.
start
()
processes
.
append
(
p
)
for
p
in
processes
:
p
.
join
()
CPU in multiprocessing
#
Inappropriate multiprocessing can lead to CPU oversubscription, causing
different processes to compete for CPU resources, resulting in low
efficiency.
This tutorial will explain what CPU oversubscription is and how to
avoid it.
CPU oversubscription
#
CPU oversubscription is a technical term that refers to a situation
where the total number of vCPUs allocated to a system exceeds the total
number of vCPUs available on the hardware.
This leads to severe contention for CPU resources. In such cases, there
is frequent switching between processes, which increases processes
switching overhead and decreases overall system efficiency.
See CPU oversubscription with the code examples in the Hogwild
implementation found in the
example
repository
.
When running the training example with the following command on CPU
using 4 processes:
python
main.py
--num-processes
4
Assuming there are N vCPUs available on the machine, executing the above
command will generate 4 subprocesses. Each subprocess will allocate N
vCPUs for itself, resulting in a requirement of 4*N vCPUs. However, the
machine only has N vCPUs available. Consequently, the different
processes will compete for resources, leading to frequent process
switching.
The following observations indicate the presence of CPU over
subscription:
High CPU Utilization: By using the
htop
command, you can observe
that the CPU utilization is consistently high, often reaching or
exceeding its maximum capacity. This indicates that the demand for
CPU resources exceeds the available physical cores, causing
contention and competition among processes for CPU time.
Frequent Context Switching with Low System Efficiency: In an
oversubscribed CPU scenario, processes compete for CPU time, and the
operating system needs to rapidly switch between different processes
to allocate resources fairly. This frequent context switching adds
overhead and reduces the overall system efficiency.
Avoid CPU oversubscription
#
A good way to avoid CPU oversubscription is proper resource allocation.
Ensure that the number of processes or threads running concurrently does
not exceed the available CPU resources.
In this case, a solution would be to specify the appropriate number of
threads in the subprocesses. This can be achieved by setting the number
of threads for each process using the
torch.set_num_threads(int)
function in subprocess.
Assuming there are N vCPUs on the machine and M processes will be
generated, the maximum
num_threads
value used by each process would
be
floor(N/M)
. To avoid CPU oversubscription in the mnist_hogwild
example, the following changes are needed for the file
train.py
in
example
repository
.
def
train
(
rank
,
args
,
model
,
device
,
dataset
,
dataloader_kwargs
):
torch
.
manual_seed
(
args
.
seed
+
rank
)
#### define the num threads used in current sub-processes
torch
.
set_num_threads
(
floor
(
N
/
M
))
train_loader
=
torch
.
utils
.
data
.
DataLoader
(
dataset
,
**
dataloader_kwargs
)
optimizer
=
optim
.
SGD
(
model
.
parameters
(),
lr
=
args
.
lr
,
momentum
=
args
.
momentum
)
for
epoch
in
range
(
1
,
args
.
epochs
+
1
):
train_epoch
(
epoch
,
args
,
model
,
device
,
train_loader
,
optimizer
)
Set
num_thread
for each process using
torch.set_num_threads(floor(N/M))
. where you replace N with the
number of vCPUs available and M with the chosen number of processes. The
appropriate
num_thread
value will vary depending on the specific
task at hand. However, as a general guideline, the maximum value for the
num_thread
should be
floor(N/M)
to avoid CPU oversubscription.
In the
mnist_hogwild
training example, after avoiding CPU over
subscription, you can achieve a 30x performance boost. |
| Markdown | 
[Skip to main content](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#main-content)
Back to top
Help us understand how you use PyTorch! Take our quick survey. [Take Survey](https://forms.gle/Wm63Nff19mBd6orz7)
×
[ ](https://docs.pytorch.org/docs/stable/index.html)
[ ](https://docs.pytorch.org/docs/stable/index.html)
v2.11.0 (stable)
[v2.12.0 (unstable)](https://docs.pytorch.org/docs/main/notes/multiprocessing.html)[v2.11.0 (stable)](https://docs.pytorch.org/docs/2.11/notes/multiprocessing.html)[v2.10.0](https://docs.pytorch.org/docs/2.10/notes/multiprocessing.html)[v2.9.1](https://docs.pytorch.org/docs/2.9/notes/multiprocessing.html)[v2.8.0](https://docs.pytorch.org/docs/2.8/notes/multiprocessing.html)[v2.7.0](https://docs.pytorch.org/docs/2.7/notes/multiprocessing.html)[v2.6.0](https://docs.pytorch.org/docs/2.6/notes/multiprocessing.html)[v2.5.0](https://docs.pytorch.org/docs/2.5/notes/multiprocessing.html)[v2.4.0](https://docs.pytorch.org/docs/2.4/notes/multiprocessing.html)[v2.3.0](https://docs.pytorch.org/docs/2.3/notes/multiprocessing.html)[v2.2.0](https://docs.pytorch.org/docs/2.2/notes/multiprocessing.html)[v2.1.0](https://docs.pytorch.org/docs/2.1/notes/multiprocessing.html)[v2.0.0](https://docs.pytorch.org/docs/2.0/notes/multiprocessing.html)[v1.13](https://docs.pytorch.org/docs/1.13/notes/multiprocessing.html)[v1.12](https://docs.pytorch.org/docs/1.12/notes/multiprocessing.html)[v1.11](https://docs.pytorch.org/docs/1.11/notes/multiprocessing.html)[v1.10](https://docs.pytorch.org/docs/1.10/notes/multiprocessing.html)[v1.9.1](https://docs.pytorch.org/docs/1.9.1/notes/multiprocessing.html)[v1.9.0](https://docs.pytorch.org/docs/1.9.0/notes/multiprocessing.html)[v1.8.1](https://docs.pytorch.org/docs/1.8.1/notes/multiprocessing.html)[v1.8.0](https://docs.pytorch.org/docs/1.8.0/notes/multiprocessing.html)[v1.7.1](https://docs.pytorch.org/docs/1.7.1/notes/multiprocessing.html)[v1.7.0](https://docs.pytorch.org/docs/1.7.0/notes/multiprocessing.html)[v1.6.0](https://docs.pytorch.org/docs/1.6.0/notes/multiprocessing.html)[v1.5.1](https://docs.pytorch.org/docs/1.5.1/notes/multiprocessing.html)[v1.5.0](https://docs.pytorch.org/docs/1.5.0/notes/multiprocessing.html)[v1.4.0](https://docs.pytorch.org/docs/1.4.0/notes/multiprocessing.html)[v1.3.1](https://docs.pytorch.org/docs/1.3.1/notes/multiprocessing.html)[v1.3.0](https://docs.pytorch.org/docs/1.3.0/notes/multiprocessing.html)[v1.2.0](https://docs.pytorch.org/docs/1.2.0/notes/multiprocessing.html)[v1.1.0](https://docs.pytorch.org/docs/1.1.0/notes/multiprocessing.html)[v1.0.1](https://docs.pytorch.org/docs/1.0.1/notes/multiprocessing.html)[v1.0.0](https://docs.pytorch.org/docs/1.0.0/notes/multiprocessing.html)[v0.4.1](https://docs.pytorch.org/docs/0.4.1/notes/multiprocessing.html)[v0.4.0](https://docs.pytorch.org/docs/0.4.0/notes/multiprocessing.html)[v0.3.1](https://docs.pytorch.org/docs/0.3.1/notes/multiprocessing.html)[v0.3.0](https://docs.pytorch.org/docs/0.3.0/notes/multiprocessing.html)[v0.2.0](https://docs.pytorch.org/docs/0.2.0/notes/multiprocessing.html)[v0.1.12](https://docs.pytorch.org/docs/0.1.12/notes/multiprocessing.html)
- [Install PyTorch](https://pytorch.org/get-started/locally/)
- [User Guide](https://docs.pytorch.org/docs/stable/user_guide/index.html)
- [PyTorch Main Components](https://docs.pytorch.org/docs/stable/user_guide/pytorch_main_components.html)
- [torch.compiler](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler.html)
- [torch.export](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export.html)
- [Developer Notes](https://docs.pytorch.org/docs/stable/notes.html)
- [Accelerator Integration](https://docs.pytorch.org/docs/stable/accelerator/index.html)
- [Reference API](https://docs.pytorch.org/docs/stable/pytorch-api.html)
- [torch](https://docs.pytorch.org/docs/stable/torch.html)
- [torch.nn](https://docs.pytorch.org/docs/stable/nn.html)
- [torch.nn.functional](https://docs.pytorch.org/docs/stable/nn.functional.html)
- [torch.Tensor](https://docs.pytorch.org/docs/stable/tensors.html)
- [Tensor Attributes](https://docs.pytorch.org/docs/stable/tensor_attributes.html)
- [Tensor Views](https://docs.pytorch.org/docs/stable/tensor_view.html)
- [Automatic Mixed Precision package - torch.amp](https://docs.pytorch.org/docs/stable/amp.html)
- [Automatic differentiation package - torch.autograd](https://docs.pytorch.org/docs/stable/autograd.html)
- [torch.library](https://docs.pytorch.org/docs/stable/library.html)
- [torch.accelerator](https://docs.pytorch.org/docs/stable/accelerator.html)
- [torch.cpu](https://docs.pytorch.org/docs/stable/cpu.html)
- [torch.cuda](https://docs.pytorch.org/docs/stable/cuda.html)
- [Understanding CUDA Memory Usage](https://docs.pytorch.org/docs/stable/torch_cuda_memory.html)
- [torch.mps](https://docs.pytorch.org/docs/stable/mps.html)
- [torch.xpu](https://docs.pytorch.org/docs/stable/xpu.html)
- [torch.mtia](https://docs.pytorch.org/docs/stable/mtia.html)
- [torch.mtia.memory](https://docs.pytorch.org/docs/stable/mtia.memory.html)
- [torch.mtia.mtia\_graph](https://docs.pytorch.org/docs/stable/mtia.mtia_graph.html)
- [Meta device](https://docs.pytorch.org/docs/stable/meta.html)
- [torch.backends](https://docs.pytorch.org/docs/stable/backends.html)
- [torch.export](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export.html)
- [Distributed communication package - torch.distributed](https://docs.pytorch.org/docs/stable/distributed.html)
- [torch.distributed.tensor](https://docs.pytorch.org/docs/stable/distributed.tensor.html)
- [Generic Join Context Manager](https://docs.pytorch.org/docs/stable/distributed.algorithms.join.html)
- [Torch Distributed Elastic](https://docs.pytorch.org/docs/stable/distributed.elastic.html)
- [FullyShardedDataParallel](https://docs.pytorch.org/docs/stable/fsdp.html)
- [torch.distributed.fsdp.fully\_shard](https://docs.pytorch.org/docs/stable/distributed.fsdp.fully_shard.html)
- [Tensor Parallelism - torch.distributed.tensor.parallel](https://docs.pytorch.org/docs/stable/distributed.tensor.parallel.html)
- [Distributed Optimizers](https://docs.pytorch.org/docs/stable/distributed.optim.html)
- [Pipeline Parallelism](https://docs.pytorch.org/docs/stable/distributed.pipelining.html)
- [PyTorch Symmetric Memory](https://docs.pytorch.org/docs/stable/symmetric_memory.html)
- [Distributed Checkpoint - torch.distributed.checkpoint](https://docs.pytorch.org/docs/stable/distributed.checkpoint.html)
- [Probability distributions - torch.distributions](https://docs.pytorch.org/docs/stable/distributions.html)
- [torch.compiler API reference](https://docs.pytorch.org/docs/stable/torch.compiler_api.html)
- [torch.fft](https://docs.pytorch.org/docs/stable/fft.html)
- [torch.func](https://docs.pytorch.org/docs/stable/func.html)
- [torch.futures](https://docs.pytorch.org/docs/stable/futures.html)
- [torch.fx](https://docs.pytorch.org/docs/stable/fx.html)
- [torch.fx.experimental](https://docs.pytorch.org/docs/stable/fx.experimental.html)
- [torch.hub](https://docs.pytorch.org/docs/stable/hub.html)
- [torch.linalg](https://docs.pytorch.org/docs/stable/linalg.html)
- [torch.monitor](https://docs.pytorch.org/docs/stable/monitor.html)
- [torch.signal](https://docs.pytorch.org/docs/stable/signal.html)
- [torch.special](https://docs.pytorch.org/docs/stable/special.html)
- [torch.overrides](https://docs.pytorch.org/docs/stable/torch.overrides.html)
- [torch.nativert](https://docs.pytorch.org/docs/stable/nativert.html)
- [torch.package](https://docs.pytorch.org/docs/stable/package.html)
- [torch.profiler](https://docs.pytorch.org/docs/stable/profiler.html)
- [torch.nn.init](https://docs.pytorch.org/docs/stable/nn.init.html)
- [torch.nn.attention](https://docs.pytorch.org/docs/stable/nn.attention.html)
- [torch.onnx](https://docs.pytorch.org/docs/stable/onnx.html)
- [torch.optim](https://docs.pytorch.org/docs/stable/optim.html)
- [Complex Numbers](https://docs.pytorch.org/docs/stable/complex_numbers.html)
- [DDP Communication Hooks](https://docs.pytorch.org/docs/stable/ddp_comm_hooks.html)
- [Quantization](https://docs.pytorch.org/docs/stable/quantization.html)
- [Distributed RPC Framework](https://docs.pytorch.org/docs/stable/rpc.html)
- [torch.random](https://docs.pytorch.org/docs/stable/random.html)
- [torch.masked](https://docs.pytorch.org/docs/stable/masked.html)
- [torch.nested](https://docs.pytorch.org/docs/stable/nested.html)
- [torch.Size](https://docs.pytorch.org/docs/stable/size.html)
- [torch.sparse](https://docs.pytorch.org/docs/stable/sparse.html)
- [torch.Storage](https://docs.pytorch.org/docs/stable/storage.html)
- [torch.testing](https://docs.pytorch.org/docs/stable/testing.html)
- [torch.utils](https://docs.pytorch.org/docs/stable/utils.html)
- [Benchmark Utils - torch.utils.benchmark](https://docs.pytorch.org/docs/stable/benchmark_utils.html)
- [torch.utils.checkpoint](https://docs.pytorch.org/docs/stable/checkpoint.html)
- [torch.utils.cpp\_extension](https://docs.pytorch.org/docs/stable/cpp_extension.html)
- [torch.utils.data](https://docs.pytorch.org/docs/stable/data.html)
- [torch.utils.deterministic](https://docs.pytorch.org/docs/stable/deterministic.html)
- [JIT Utils - torch.utils.jit](https://docs.pytorch.org/docs/stable/jit_utils.html)
- [torch.utils.dlpack](https://docs.pytorch.org/docs/stable/dlpack.html)
- [torch.utils.mobile\_optimizer](https://docs.pytorch.org/docs/stable/mobile_optimizer.html)
- [torch.utils.model\_zoo](https://docs.pytorch.org/docs/stable/model_zoo.html)
- [torch.utils.tensorboard](https://docs.pytorch.org/docs/stable/tensorboard.html)
- [torch.utils.module\_tracker](https://docs.pytorch.org/docs/stable/module_tracker.html)
- [Type Info](https://docs.pytorch.org/docs/stable/type_info.html)
- [Named Tensors](https://docs.pytorch.org/docs/stable/named_tensor.html)
- [Named Tensors operator coverage](https://docs.pytorch.org/docs/stable/name_inference.html)
- [torch.config](https://docs.pytorch.org/docs/stable/config_mod.html)
- [torch.\_\_future\_\_](https://docs.pytorch.org/docs/stable/future_mod.html)
- [torch.\_logging](https://docs.pytorch.org/docs/stable/logging.html)
- [Torch Environment Variables](https://docs.pytorch.org/docs/stable/torch_environment_variables.html)
- [Developer Notes](https://docs.pytorch.org/docs/stable/notes.html)
- [Automatic Mixed Precision examples](https://docs.pytorch.org/docs/stable/notes/amp_examples.html)
- [Autograd mechanics](https://docs.pytorch.org/docs/stable/notes/autograd.html)
- [Broadcasting semantics](https://docs.pytorch.org/docs/stable/notes/broadcasting.html)
- [CPU threading and TorchScript inference](https://docs.pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
- [CUDA semantics](https://docs.pytorch.org/docs/stable/notes/cuda.html)
- [PyTorch Custom Operators Landing Page](https://docs.pytorch.org/docs/stable/notes/custom_operators.html)
- [Distributed Data Parallel](https://docs.pytorch.org/docs/stable/notes/ddp.html)
- [Extending PyTorch](https://docs.pytorch.org/docs/stable/notes/extending.html)
- [Extending torch.func with autograd.Function](https://docs.pytorch.org/docs/stable/notes/extending.func.html)
- [Frequently Asked Questions](https://docs.pytorch.org/docs/stable/notes/faq.html)
- [Getting Started on Intel GPU](https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html)
- [Gradcheck mechanics](https://docs.pytorch.org/docs/stable/notes/gradcheck.html)
- [HIP (ROCm) semantics](https://docs.pytorch.org/docs/stable/notes/hip.html)
- [Features for large-scale deployments](https://docs.pytorch.org/docs/stable/notes/large_scale_deployments.html)
- [LibTorch Stable ABI](https://docs.pytorch.org/docs/stable/notes/libtorch_stable_abi.html)
- [LocalTensor Tutorial: Single-Process SPMD Debugging](https://docs.pytorch.org/docs/stable/notes/local_tensor_tutorial.html)
- [MKLDNN backend](https://docs.pytorch.org/docs/stable/notes/mkldnn.html)
- [Modules](https://docs.pytorch.org/docs/stable/notes/modules.html)
- [MPS backend](https://docs.pytorch.org/docs/stable/notes/mps.html)
- [Multiprocessing best practices](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html)
- [Numerical accuracy](https://docs.pytorch.org/docs/stable/notes/numerical_accuracy.html)
- [Out Notes](https://docs.pytorch.org/docs/stable/notes/out.html)
- [Reproducibility](https://docs.pytorch.org/docs/stable/notes/randomness.html)
- [Serialization semantics](https://docs.pytorch.org/docs/stable/notes/serialization.html)
- [Windows FAQ](https://docs.pytorch.org/docs/stable/notes/windows.html)
- [Community](https://docs.pytorch.org/docs/stable/community/index.html)
- [PyTorch Governance \| Build + CI](https://docs.pytorch.org/docs/stable/community/build_ci_governance.html)
- [PyTorch Contribution Guide](https://docs.pytorch.org/docs/stable/community/contribution_guide.html)
- [PyTorch Design Philosophy](https://docs.pytorch.org/docs/stable/community/design.html)
- [PyTorch Governance \| Mechanics](https://docs.pytorch.org/docs/stable/community/governance.html)
- [PyTorch Governance \| Maintainers](https://docs.pytorch.org/docs/stable/community/persons_of_interest.html)
- [Tutorials](https://docs.pytorch.org/tutorials/)
[Go to pytorch.org](https://pytorch.org/)
- [X](https://x.com/PyTorch)
- [GitHub](https://github.com/pytorch/pytorch)
- [PyTorch Forum](https://discuss.pytorch.org/)
- [PyPi](https://pypi.org/project/torch/)
v2.11.0 (stable)
[v2.12.0 (unstable)](https://docs.pytorch.org/docs/main/notes/multiprocessing.html)[v2.11.0 (stable)](https://docs.pytorch.org/docs/2.11/notes/multiprocessing.html)[v2.10.0](https://docs.pytorch.org/docs/2.10/notes/multiprocessing.html)[v2.9.1](https://docs.pytorch.org/docs/2.9/notes/multiprocessing.html)[v2.8.0](https://docs.pytorch.org/docs/2.8/notes/multiprocessing.html)[v2.7.0](https://docs.pytorch.org/docs/2.7/notes/multiprocessing.html)[v2.6.0](https://docs.pytorch.org/docs/2.6/notes/multiprocessing.html)[v2.5.0](https://docs.pytorch.org/docs/2.5/notes/multiprocessing.html)[v2.4.0](https://docs.pytorch.org/docs/2.4/notes/multiprocessing.html)[v2.3.0](https://docs.pytorch.org/docs/2.3/notes/multiprocessing.html)[v2.2.0](https://docs.pytorch.org/docs/2.2/notes/multiprocessing.html)[v2.1.0](https://docs.pytorch.org/docs/2.1/notes/multiprocessing.html)[v2.0.0](https://docs.pytorch.org/docs/2.0/notes/multiprocessing.html)[v1.13](https://docs.pytorch.org/docs/1.13/notes/multiprocessing.html)[v1.12](https://docs.pytorch.org/docs/1.12/notes/multiprocessing.html)[v1.11](https://docs.pytorch.org/docs/1.11/notes/multiprocessing.html)[v1.10](https://docs.pytorch.org/docs/1.10/notes/multiprocessing.html)[v1.9.1](https://docs.pytorch.org/docs/1.9.1/notes/multiprocessing.html)[v1.9.0](https://docs.pytorch.org/docs/1.9.0/notes/multiprocessing.html)[v1.8.1](https://docs.pytorch.org/docs/1.8.1/notes/multiprocessing.html)[v1.8.0](https://docs.pytorch.org/docs/1.8.0/notes/multiprocessing.html)[v1.7.1](https://docs.pytorch.org/docs/1.7.1/notes/multiprocessing.html)[v1.7.0](https://docs.pytorch.org/docs/1.7.0/notes/multiprocessing.html)[v1.6.0](https://docs.pytorch.org/docs/1.6.0/notes/multiprocessing.html)[v1.5.1](https://docs.pytorch.org/docs/1.5.1/notes/multiprocessing.html)[v1.5.0](https://docs.pytorch.org/docs/1.5.0/notes/multiprocessing.html)[v1.4.0](https://docs.pytorch.org/docs/1.4.0/notes/multiprocessing.html)[v1.3.1](https://docs.pytorch.org/docs/1.3.1/notes/multiprocessing.html)[v1.3.0](https://docs.pytorch.org/docs/1.3.0/notes/multiprocessing.html)[v1.2.0](https://docs.pytorch.org/docs/1.2.0/notes/multiprocessing.html)[v1.1.0](https://docs.pytorch.org/docs/1.1.0/notes/multiprocessing.html)[v1.0.1](https://docs.pytorch.org/docs/1.0.1/notes/multiprocessing.html)[v1.0.0](https://docs.pytorch.org/docs/1.0.0/notes/multiprocessing.html)[v0.4.1](https://docs.pytorch.org/docs/0.4.1/notes/multiprocessing.html)[v0.4.0](https://docs.pytorch.org/docs/0.4.0/notes/multiprocessing.html)[v0.3.1](https://docs.pytorch.org/docs/0.3.1/notes/multiprocessing.html)[v0.3.0](https://docs.pytorch.org/docs/0.3.0/notes/multiprocessing.html)[v0.2.0](https://docs.pytorch.org/docs/0.2.0/notes/multiprocessing.html)[v0.1.12](https://docs.pytorch.org/docs/0.1.12/notes/multiprocessing.html)
- [Install PyTorch](https://pytorch.org/get-started/locally/)
- [User Guide](https://docs.pytorch.org/docs/stable/user_guide/index.html)
- [PyTorch Main Components](https://docs.pytorch.org/docs/stable/user_guide/pytorch_main_components.html)
- [torch.compiler](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler.html)
- [torch.export](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export.html)
- [Developer Notes](https://docs.pytorch.org/docs/stable/notes.html)
- [Accelerator Integration](https://docs.pytorch.org/docs/stable/accelerator/index.html)
- [Reference API](https://docs.pytorch.org/docs/stable/pytorch-api.html)
- [torch](https://docs.pytorch.org/docs/stable/torch.html)
- [torch.nn](https://docs.pytorch.org/docs/stable/nn.html)
- [torch.nn.functional](https://docs.pytorch.org/docs/stable/nn.functional.html)
- [torch.Tensor](https://docs.pytorch.org/docs/stable/tensors.html)
- [Tensor Attributes](https://docs.pytorch.org/docs/stable/tensor_attributes.html)
- [Tensor Views](https://docs.pytorch.org/docs/stable/tensor_view.html)
- [Automatic Mixed Precision package - torch.amp](https://docs.pytorch.org/docs/stable/amp.html)
- [Automatic differentiation package - torch.autograd](https://docs.pytorch.org/docs/stable/autograd.html)
- [torch.library](https://docs.pytorch.org/docs/stable/library.html)
- [torch.accelerator](https://docs.pytorch.org/docs/stable/accelerator.html)
- [torch.cpu](https://docs.pytorch.org/docs/stable/cpu.html)
- [torch.cuda](https://docs.pytorch.org/docs/stable/cuda.html)
- [Understanding CUDA Memory Usage](https://docs.pytorch.org/docs/stable/torch_cuda_memory.html)
- [torch.mps](https://docs.pytorch.org/docs/stable/mps.html)
- [torch.xpu](https://docs.pytorch.org/docs/stable/xpu.html)
- [torch.mtia](https://docs.pytorch.org/docs/stable/mtia.html)
- [torch.mtia.memory](https://docs.pytorch.org/docs/stable/mtia.memory.html)
- [torch.mtia.mtia\_graph](https://docs.pytorch.org/docs/stable/mtia.mtia_graph.html)
- [Meta device](https://docs.pytorch.org/docs/stable/meta.html)
- [torch.backends](https://docs.pytorch.org/docs/stable/backends.html)
- [torch.export](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export.html)
- [Distributed communication package - torch.distributed](https://docs.pytorch.org/docs/stable/distributed.html)
- [torch.distributed.tensor](https://docs.pytorch.org/docs/stable/distributed.tensor.html)
- [Generic Join Context Manager](https://docs.pytorch.org/docs/stable/distributed.algorithms.join.html)
- [Torch Distributed Elastic](https://docs.pytorch.org/docs/stable/distributed.elastic.html)
- [FullyShardedDataParallel](https://docs.pytorch.org/docs/stable/fsdp.html)
- [torch.distributed.fsdp.fully\_shard](https://docs.pytorch.org/docs/stable/distributed.fsdp.fully_shard.html)
- [Tensor Parallelism - torch.distributed.tensor.parallel](https://docs.pytorch.org/docs/stable/distributed.tensor.parallel.html)
- [Distributed Optimizers](https://docs.pytorch.org/docs/stable/distributed.optim.html)
- [Pipeline Parallelism](https://docs.pytorch.org/docs/stable/distributed.pipelining.html)
- [PyTorch Symmetric Memory](https://docs.pytorch.org/docs/stable/symmetric_memory.html)
- [Distributed Checkpoint - torch.distributed.checkpoint](https://docs.pytorch.org/docs/stable/distributed.checkpoint.html)
- [Probability distributions - torch.distributions](https://docs.pytorch.org/docs/stable/distributions.html)
- [torch.compiler API reference](https://docs.pytorch.org/docs/stable/torch.compiler_api.html)
- [torch.fft](https://docs.pytorch.org/docs/stable/fft.html)
- [torch.func](https://docs.pytorch.org/docs/stable/func.html)
- [torch.futures](https://docs.pytorch.org/docs/stable/futures.html)
- [torch.fx](https://docs.pytorch.org/docs/stable/fx.html)
- [torch.fx.experimental](https://docs.pytorch.org/docs/stable/fx.experimental.html)
- [torch.hub](https://docs.pytorch.org/docs/stable/hub.html)
- [torch.linalg](https://docs.pytorch.org/docs/stable/linalg.html)
- [torch.monitor](https://docs.pytorch.org/docs/stable/monitor.html)
- [torch.signal](https://docs.pytorch.org/docs/stable/signal.html)
- [torch.special](https://docs.pytorch.org/docs/stable/special.html)
- [torch.overrides](https://docs.pytorch.org/docs/stable/torch.overrides.html)
- [torch.nativert](https://docs.pytorch.org/docs/stable/nativert.html)
- [torch.package](https://docs.pytorch.org/docs/stable/package.html)
- [torch.profiler](https://docs.pytorch.org/docs/stable/profiler.html)
- [torch.nn.init](https://docs.pytorch.org/docs/stable/nn.init.html)
- [torch.nn.attention](https://docs.pytorch.org/docs/stable/nn.attention.html)
- [torch.onnx](https://docs.pytorch.org/docs/stable/onnx.html)
- [torch.optim](https://docs.pytorch.org/docs/stable/optim.html)
- [Complex Numbers](https://docs.pytorch.org/docs/stable/complex_numbers.html)
- [DDP Communication Hooks](https://docs.pytorch.org/docs/stable/ddp_comm_hooks.html)
- [Quantization](https://docs.pytorch.org/docs/stable/quantization.html)
- [Distributed RPC Framework](https://docs.pytorch.org/docs/stable/rpc.html)
- [torch.random](https://docs.pytorch.org/docs/stable/random.html)
- [torch.masked](https://docs.pytorch.org/docs/stable/masked.html)
- [torch.nested](https://docs.pytorch.org/docs/stable/nested.html)
- [torch.Size](https://docs.pytorch.org/docs/stable/size.html)
- [torch.sparse](https://docs.pytorch.org/docs/stable/sparse.html)
- [torch.Storage](https://docs.pytorch.org/docs/stable/storage.html)
- [torch.testing](https://docs.pytorch.org/docs/stable/testing.html)
- [torch.utils](https://docs.pytorch.org/docs/stable/utils.html)
- [Benchmark Utils - torch.utils.benchmark](https://docs.pytorch.org/docs/stable/benchmark_utils.html)
- [torch.utils.checkpoint](https://docs.pytorch.org/docs/stable/checkpoint.html)
- [torch.utils.cpp\_extension](https://docs.pytorch.org/docs/stable/cpp_extension.html)
- [torch.utils.data](https://docs.pytorch.org/docs/stable/data.html)
- [torch.utils.deterministic](https://docs.pytorch.org/docs/stable/deterministic.html)
- [JIT Utils - torch.utils.jit](https://docs.pytorch.org/docs/stable/jit_utils.html)
- [torch.utils.dlpack](https://docs.pytorch.org/docs/stable/dlpack.html)
- [torch.utils.mobile\_optimizer](https://docs.pytorch.org/docs/stable/mobile_optimizer.html)
- [torch.utils.model\_zoo](https://docs.pytorch.org/docs/stable/model_zoo.html)
- [torch.utils.tensorboard](https://docs.pytorch.org/docs/stable/tensorboard.html)
- [torch.utils.module\_tracker](https://docs.pytorch.org/docs/stable/module_tracker.html)
- [Type Info](https://docs.pytorch.org/docs/stable/type_info.html)
- [Named Tensors](https://docs.pytorch.org/docs/stable/named_tensor.html)
- [Named Tensors operator coverage](https://docs.pytorch.org/docs/stable/name_inference.html)
- [torch.config](https://docs.pytorch.org/docs/stable/config_mod.html)
- [torch.\_\_future\_\_](https://docs.pytorch.org/docs/stable/future_mod.html)
- [torch.\_logging](https://docs.pytorch.org/docs/stable/logging.html)
- [Torch Environment Variables](https://docs.pytorch.org/docs/stable/torch_environment_variables.html)
- [Developer Notes](https://docs.pytorch.org/docs/stable/notes.html)
- [Automatic Mixed Precision examples](https://docs.pytorch.org/docs/stable/notes/amp_examples.html)
- [Autograd mechanics](https://docs.pytorch.org/docs/stable/notes/autograd.html)
- [Broadcasting semantics](https://docs.pytorch.org/docs/stable/notes/broadcasting.html)
- [CPU threading and TorchScript inference](https://docs.pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
- [CUDA semantics](https://docs.pytorch.org/docs/stable/notes/cuda.html)
- [PyTorch Custom Operators Landing Page](https://docs.pytorch.org/docs/stable/notes/custom_operators.html)
- [Distributed Data Parallel](https://docs.pytorch.org/docs/stable/notes/ddp.html)
- [Extending PyTorch](https://docs.pytorch.org/docs/stable/notes/extending.html)
- [Extending torch.func with autograd.Function](https://docs.pytorch.org/docs/stable/notes/extending.func.html)
- [Frequently Asked Questions](https://docs.pytorch.org/docs/stable/notes/faq.html)
- [Getting Started on Intel GPU](https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html)
- [Gradcheck mechanics](https://docs.pytorch.org/docs/stable/notes/gradcheck.html)
- [HIP (ROCm) semantics](https://docs.pytorch.org/docs/stable/notes/hip.html)
- [Features for large-scale deployments](https://docs.pytorch.org/docs/stable/notes/large_scale_deployments.html)
- [LibTorch Stable ABI](https://docs.pytorch.org/docs/stable/notes/libtorch_stable_abi.html)
- [LocalTensor Tutorial: Single-Process SPMD Debugging](https://docs.pytorch.org/docs/stable/notes/local_tensor_tutorial.html)
- [MKLDNN backend](https://docs.pytorch.org/docs/stable/notes/mkldnn.html)
- [Modules](https://docs.pytorch.org/docs/stable/notes/modules.html)
- [MPS backend](https://docs.pytorch.org/docs/stable/notes/mps.html)
- [Multiprocessing best practices](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html)
- [Numerical accuracy](https://docs.pytorch.org/docs/stable/notes/numerical_accuracy.html)
- [Out Notes](https://docs.pytorch.org/docs/stable/notes/out.html)
- [Reproducibility](https://docs.pytorch.org/docs/stable/notes/randomness.html)
- [Serialization semantics](https://docs.pytorch.org/docs/stable/notes/serialization.html)
- [Windows FAQ](https://docs.pytorch.org/docs/stable/notes/windows.html)
- [Community](https://docs.pytorch.org/docs/stable/community/index.html)
- [PyTorch Governance \| Build + CI](https://docs.pytorch.org/docs/stable/community/build_ci_governance.html)
- [PyTorch Contribution Guide](https://docs.pytorch.org/docs/stable/community/contribution_guide.html)
- [PyTorch Design Philosophy](https://docs.pytorch.org/docs/stable/community/design.html)
- [PyTorch Governance \| Mechanics](https://docs.pytorch.org/docs/stable/community/governance.html)
- [PyTorch Governance \| Maintainers](https://docs.pytorch.org/docs/stable/community/persons_of_interest.html)
- [Tutorials](https://docs.pytorch.org/tutorials/)
[Go to pytorch.org](https://pytorch.org/)
- [X](https://x.com/PyTorch)
- [GitHub](https://github.com/pytorch/pytorch)
- [PyTorch Forum](https://discuss.pytorch.org/)
- [PyPi](https://pypi.org/project/torch/)
Section Navigation
Introduction
- [Pytorch Overview](https://docs.pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
- [Get Started](https://pytorch.org/get-started/locally/)
- [Learn the Basics](https://docs.pytorch.org/tutorials/beginner/basics/intro.html)
Core Concepts
- [PyTorch Main Components](https://docs.pytorch.org/docs/stable/user_guide/pytorch_main_components.html)
Torch Compile
- [Torch.compile](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler.html)
- [Getting Started](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_get_started.html)
- [Core Concepts](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/core_concepts.html)
- [torch.compile Programming Model](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.html)
- [Dynamo Core Concepts](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.dynamo_core_concepts.html)
- [Working with Graph Breaks](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.graph_breaks_index.html)
- [Non-strict Tracing Programming Model](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.non_strict_tracing_model.html)
- [Dealing with Recompilations](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.recompilation.html)
- [tlparse / TORCH\_TRACE](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.observability.html)
- [Reporting Issues](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.reporting_issues.html)
- [Dynamo Overview](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_dynamo_overview.html)
- [PyTorch 2.0 NNModule Support](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_nn_module.html)
- [`torch.compile` has different autograd semantics](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_backward.html)
- [Performance](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/performance.html)
- [PyTorch 2.0 Performance Dashboard](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_performance_dashboard.html)
- [TorchInductor GPU Profiling](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_inductor_profiling.html)
- [Profiling to understand torch.compile performance](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_profiling_torch_compile.html)
- [CUDAGraph Trees](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_cudagraph_trees.html)
- [Advanced](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/advanced.html)
- [Dynamo Deep-Dive](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_dynamo_deepdive.html)
- [Writing Graph Transformations on ATen IR](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_transformations.html)
- [Fake tensor](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_fake_tensor.html)
- [Custom Backends](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_custom_backends.html)
- [Dynamic Shapes](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_dynamic_shapes.html)
- [Dynamic Shapes Core Concepts](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_core_concepts.html)
- [Troubleshooting Dynamic Shapes](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_troubleshooting.html)
- [Advanced Options to Control Dynamic Behavior](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_advanced_control_options.html)
- [Beyond the Basics](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_beyond_the_basics.html)
- [Troubleshooting FAQs](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/troubleshooting_faqs.html)
- [tlparse / TORCH\_TRACE](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.observability.html)
- [Reporting Issues](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/programming_model.reporting_issues.html)
- [torch.compile Troubleshooting](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_troubleshooting.html)
- [Frequently Asked Questions](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_faq.html)
- [Reference/API](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/api_reference.html)
- [torch.compiler API reference](https://docs.pytorch.org/docs/stable/torch.compiler_api.html)
- [torch.compiler.compile](https://docs.pytorch.org/docs/stable/generated/torch.compiler.compile.html)
- [torch.compiler.reset](https://docs.pytorch.org/docs/stable/generated/torch.compiler.reset.html)
- [torch.compiler.allow\_in\_graph](https://docs.pytorch.org/docs/stable/generated/torch.compiler.allow_in_graph.html)
- [torch.compiler.substitute\_in\_graph](https://docs.pytorch.org/docs/stable/generated/torch.compiler.substitute_in_graph.html)
- [torch.compiler.assume\_constant\_result](https://docs.pytorch.org/docs/stable/generated/torch.compiler.assume_constant_result.html)
- [torch.compiler.list\_backends](https://docs.pytorch.org/docs/stable/generated/torch.compiler.list_backends.html)
- [torch.compiler.disable](https://docs.pytorch.org/docs/stable/generated/torch.compiler.disable.html)
- [torch.compiler.set\_stance](https://docs.pytorch.org/docs/stable/generated/torch.compiler.set_stance.html)
- [torch.compiler.set\_enable\_guard\_collectives](https://docs.pytorch.org/docs/stable/generated/torch.compiler.set_enable_guard_collectives.html)
- [torch.compiler.cudagraph\_mark\_step\_begin](https://docs.pytorch.org/docs/stable/generated/torch.compiler.cudagraph_mark_step_begin.html)
- [torch.compiler.is\_compiling](https://docs.pytorch.org/docs/stable/generated/torch.compiler.is_compiling.html)
- [torch.compiler.is\_dynamo\_compiling](https://docs.pytorch.org/docs/stable/generated/torch.compiler.is_dynamo_compiling.html)
- [torch.compiler.is\_exporting](https://docs.pytorch.org/docs/stable/generated/torch.compiler.is_exporting.html)
- [torch.compiler.keep\_portable\_guards\_unsafe](https://docs.pytorch.org/docs/stable/generated/torch.compiler.keep_portable_guards_unsafe.html)
- [torch.compiler.skip\_guard\_on\_inbuilt\_nn\_modules\_unsafe](https://docs.pytorch.org/docs/stable/generated/torch.compiler.skip_guard_on_inbuilt_nn_modules_unsafe.html)
- [torch.compiler.skip\_guard\_on\_all\_nn\_modules\_unsafe](https://docs.pytorch.org/docs/stable/generated/torch.compiler.skip_guard_on_all_nn_modules_unsafe.html)
- [torch.compiler.keep\_tensor\_guards\_unsafe](https://docs.pytorch.org/docs/stable/generated/torch.compiler.keep_tensor_guards_unsafe.html)
- [torch.compiler.skip\_guard\_on\_globals\_unsafe](https://docs.pytorch.org/docs/stable/generated/torch.compiler.skip_guard_on_globals_unsafe.html)
- [torch.compiler.skip\_all\_guards\_unsafe](https://docs.pytorch.org/docs/stable/generated/torch.compiler.skip_all_guards_unsafe.html)
- [torch.compiler.nested\_compile\_region](https://docs.pytorch.org/docs/stable/generated/torch.compiler.nested_compile_region.html)
- [torch.compiler.config](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler.config.html)
- [TorchDynamo APIs for fine-grained tracing](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_fine_grain_apis.html)
- [TorchInductor and AOTInductor Provenance Tracking](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_inductor_provenance.html)
- [Torch.export](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export.html)
- [torch.export API Reference](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export/api_reference.html)
- [torch.export Programming Model](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export/programming_model.html)
- [torch.export IR Specification](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export/ir_spec.html)
- [PT2 Archive Spec](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export/pt2_archive.html)
- [Draft Export](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export/draft_export.html)
- [Joint with descriptors](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/export/joint_with_descriptors.html)
- [Control Flow Operators](https://docs.pytorch.org/docs/stable/higher_order_ops/index.html)
- [Control Flow - Cond](https://docs.pytorch.org/docs/stable/higher_order_ops/cond.html)
- [Control Flow - While Loop](https://docs.pytorch.org/docs/stable/higher_order_ops/while_loop.html)
- [Control Flow - Scan](https://docs.pytorch.org/docs/stable/higher_order_ops/scan.html)
- [Control Flow - Associative Scan](https://docs.pytorch.org/docs/stable/higher_order_ops/associative_scan.html)
- [Control Flow - Map](https://docs.pytorch.org/docs/stable/higher_order_ops/map.html)
- [ExportDB](https://docs.pytorch.org/docs/stable/generated/exportdb/index.html)
- [torch.escape-hatch](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.escape-hatch.html)
- [torch.dynamic-shape](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.dynamic-shape.html)
- [torch.cond](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.cond.html)
- [python.closure](https://docs.pytorch.org/docs/stable/generated/exportdb/python.closure.html)
- [torch.dynamic-value](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.dynamic-value.html)
- [python.data-structure](https://docs.pytorch.org/docs/stable/generated/exportdb/python.data-structure.html)
- [python.assert](https://docs.pytorch.org/docs/stable/generated/exportdb/python.assert.html)
- [python.control-flow](https://docs.pytorch.org/docs/stable/generated/exportdb/python.control-flow.html)
- [torch.map](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.map.html)
- [python.builtin](https://docs.pytorch.org/docs/stable/generated/exportdb/python.builtin.html)
- [python.object-model](https://docs.pytorch.org/docs/stable/generated/exportdb/python.object-model.html)
- [python.context-manager](https://docs.pytorch.org/docs/stable/generated/exportdb/python.context-manager.html)
- [torch.operator](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.operator.html)
- [torch.mutation](https://docs.pytorch.org/docs/stable/generated/exportdb/torch.mutation.html)
- [AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_aot_inductor.html)
- [torch.\_logging](https://docs.pytorch.org/docs/stable/logging.html)
- [torch.\_logging.set\_logs](https://docs.pytorch.org/docs/stable/generated/torch._logging.set_logs.html)
- [AOTInductor Minifier](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_aot_inductor_minifier.html)
- [AOTInductor Debugging Guide](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_aot_inductor_debugging_guide.html)
- [IRs](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_ir.html)
- [Dynamic Shapes](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_dynamic_shapes.html)
- [Dynamic Shapes Core Concepts](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_core_concepts.html)
- [Troubleshooting Dynamic Shapes](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_troubleshooting.html)
- [Debugging with `tlparse` and `TORCH_LOGS=dynamic`](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_debugging_tlparse_torch_logs.html)
- [Troubleshooting GuardOnDataDependentSymNode Errors](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_troubleshooting_guardon_errors.html)
- [Advanced Options to Control Dynamic Behavior](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_advanced_control_options.html)
- [Beyond the Basics](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_beyond_the_basics.html)
- [The Zero-One Specialization Problem](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_zero_one_specialization.html)
- [Backed vs Unbacked Symints](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/compile/dynamic_shapes_backed_unbacked.html)
- [Fake tensor](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_fake_tensor.html)
- [Writing Graph Transformations on ATen IR](https://docs.pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler_transformations.html)
Developer Notes
- [Developer Notes](https://docs.pytorch.org/docs/stable/notes.html)
- [Automatic Mixed Precision examples](https://docs.pytorch.org/docs/stable/notes/amp_examples.html)
- [Autograd mechanics](https://docs.pytorch.org/docs/stable/notes/autograd.html)
- [Broadcasting semantics](https://docs.pytorch.org/docs/stable/notes/broadcasting.html)
- [CPU threading and TorchScript inference](https://docs.pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
- [CUDA semantics](https://docs.pytorch.org/docs/stable/notes/cuda.html)
- [PyTorch Custom Operators Landing Page](https://docs.pytorch.org/docs/stable/notes/custom_operators.html)
- [Distributed Data Parallel](https://docs.pytorch.org/docs/stable/notes/ddp.html)
- [Extending PyTorch](https://docs.pytorch.org/docs/stable/notes/extending.html)
- [Extending torch.func with autograd.Function](https://docs.pytorch.org/docs/stable/notes/extending.func.html)
- [Frequently Asked Questions](https://docs.pytorch.org/docs/stable/notes/faq.html)
- [Getting Started on Intel GPU](https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html)
- [Gradcheck mechanics](https://docs.pytorch.org/docs/stable/notes/gradcheck.html)
- [HIP (ROCm) semantics](https://docs.pytorch.org/docs/stable/notes/hip.html)
- [Features for large-scale deployments](https://docs.pytorch.org/docs/stable/notes/large_scale_deployments.html)
- [LibTorch Stable ABI](https://docs.pytorch.org/docs/stable/notes/libtorch_stable_abi.html)
- [LocalTensor Tutorial: Single-Process SPMD Debugging](https://docs.pytorch.org/docs/stable/notes/local_tensor_tutorial.html)
- [MKLDNN backend](https://docs.pytorch.org/docs/stable/notes/mkldnn.html)
- [Modules](https://docs.pytorch.org/docs/stable/notes/modules.html)
- [MPS backend](https://docs.pytorch.org/docs/stable/notes/mps.html)
- [Multiprocessing best practices](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html)
- [Numerical accuracy](https://docs.pytorch.org/docs/stable/notes/numerical_accuracy.html)
- [Out Notes](https://docs.pytorch.org/docs/stable/notes/out.html)
- [Reproducibility](https://docs.pytorch.org/docs/stable/notes/randomness.html)
- [Serialization semantics](https://docs.pytorch.org/docs/stable/notes/serialization.html)
- [Windows FAQ](https://docs.pytorch.org/docs/stable/notes/windows.html)
Accelerator Integration
- [Accelerator Integration](https://docs.pytorch.org/docs/stable/accelerator/index.html)
- [Device Management](https://docs.pytorch.org/docs/stable/accelerator/device.html)
- [Accelerator Hooks](https://docs.pytorch.org/docs/stable/accelerator/hooks.html)
- [Guard](https://docs.pytorch.org/docs/stable/accelerator/guard.html)
- [Autoload Mechanism](https://docs.pytorch.org/docs/stable/accelerator/autoload.html)
- [Operator Registration](https://docs.pytorch.org/docs/stable/accelerator/operators.html)
- [Automatic Mixed Precision](https://docs.pytorch.org/docs/stable/accelerator/amp.html)
- [Profiler Integration](https://docs.pytorch.org/docs/stable/accelerator/profiler.html)
- [User Guide](https://docs.pytorch.org/docs/stable/user_guide/index.html)
- [Developer Notes](https://docs.pytorch.org/docs/stable/notes.html)
- Multiprocess...
Rate this Page
★ ★ ★ ★ ★
# Multiprocessing best practices[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#multiprocessing-best-practices "Link to this heading")
Created On: Jan 16, 2017 \| Last Updated On: Jun 18, 2025
[`torch.multiprocessing`](https://docs.pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "torch.multiprocessing") is a drop in replacement for Python’s [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "(in Python v3.14)") module. It supports the exact same operations, but extends it, so that all tensors sent through a [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)"), will have their data moved into shared memory and will only send a handle to another process.
Note
When a [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") is sent to another process, the [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") data is shared. If [`torch.Tensor.grad`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.grad.html#torch.Tensor.grad "torch.Tensor.grad") is not `None`, it is also shared. After a [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") without a [`torch.Tensor.grad`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.grad.html#torch.Tensor.grad "torch.Tensor.grad") field is sent to the other process, it creates a standard process-specific `.grad` [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") that is not automatically shared across all processes, unlike how the [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor")’s data has been shared.
This allows to implement various training methods, like Hogwild, A3C, or any others that require asynchronous operation.
## Poison fork in multiprocessing[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#poison-fork-in-multiprocessing "Link to this heading")
When using multiprocessing with [accelerators](https://docs.pytorch.org/docs/stable/torch.html#accelerators), a known issue called “poison fork” may occur. This happens when the accelerator’s runtime is not fork safe and is initialized before a process forks, leading to runtime errors in child processes.
To prevent such errors:
- Avoid initializing the accelerator in the main process before forking child processes.
- Use an alternative process start methods, such as `spawn` or `forkserver`, which ensures a clean initialization of each process.
## CUDA in multiprocessing[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing "Link to this heading")
The CUDA runtime has the limitation described in [Poison fork in multiprocessing](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#multiprocessing-poison-fork-note) when using the `fork` start method; either the `spawn` or `forkserver` start method are required to use CUDA in subprocesses.
Note
The start method can be set via either creating a context with `multiprocessing.get_context(...)` or directly using `multiprocessing.set_start_method(...)`.
Unlike CPU tensors, the sending process is required to keep the original tensor as long as the receiving process retains a copy of the tensor. It is implemented under the hood but requires users to follow the best practices for the program to run correctly. For example, the sending process must stay alive as long as the consumer process has references to the tensor, and the refcounting can not save you if the consumer process exits abnormally via a fatal signal. See [this section](https://docs.pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details).
See also: [Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel](https://docs.pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead)
## Best practices and tips[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#best-practices-and-tips "Link to this heading")
### Avoiding and fighting deadlocks[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#avoiding-and-fighting-deadlocks "Link to this heading")
There are a lot of things that can go wrong when a new process is spawned, with the most common cause of deadlocks being background threads. If there’s any thread that holds a lock or imports a module, and `fork` is called, it’s very likely that the subprocess will be in a corrupted state and will deadlock or fail in a different way. Note that even if you don’t, Python built in libraries do - no need to look further than [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "(in Python v3.14)"). [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)") is actually a very complex class, that spawns multiple threads used to serialize, send and receive objects, and they can cause aforementioned problems too. If you find yourself in such situation try using a `SimpleQueue`, that doesn’t use any additional threads.
We’re trying our best to make it easy for you and ensure these deadlocks don’t happen but some things are out of our control. If you have any issues you can’t cope with for a while, try reaching out on forums, and we’ll see if it’s an issue we can fix.
### Reuse buffers passed through a Queue[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#reuse-buffers-passed-through-a-queue "Link to this heading")
Remember that each time you put a [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") into a [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)"), it has to be moved into shared memory. If it’s already shared, it is a no-op, otherwise it will incur an additional memory copy that can slow down the whole process. Even if you have a pool of processes sending data to a single one, make it send the buffers back - this is nearly free and will let you avoid a copy when sending next batch.
### Asynchronous multiprocess training (e.g. Hogwild)[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#asynchronous-multiprocess-training-e-g-hogwild "Link to this heading")
Using [`torch.multiprocessing`](https://docs.pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "torch.multiprocessing"), it is possible to train a model asynchronously, with parameters either shared all the time, or being periodically synchronized. In the first case, we recommend sending over the whole model object, while in the latter, we advise to only send the [`state_dict()`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict "torch.nn.Module.state_dict").
We recommend using [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)") for passing all kinds of PyTorch objects between processes. It is possible to e.g. inherit the tensors and storages already in shared memory, when using the `fork` start method, however it is very bug prone and should be used with care, and only by advanced users. Queues, even though they’re sometimes a less elegant solution, will work properly in all cases.
Warning
You should be careful about having global statements, that are not guarded with an `if __name__ == '__main__'`. If a different start method than `fork` is used, they will be executed in all subprocesses.
#### Hogwild[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#hogwild "Link to this heading")
A concrete Hogwild implementation can be found in the [examples repository](https://github.com/pytorch/examples/tree/master/mnist_hogwild), but to showcase the overall structure of the code, there’s also a minimal example below as well:
```
import torch.multiprocessing as mp
from model import MyModel
def train(model):
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
if __name__ == '__main__':
num_processes = 4
model = MyModel()
# NOTE: this is required for the ``fork`` method to work
model.share_memory()
processes = []
for rank in range(num_processes):
p = mp.Process(target=train, args=(model,))
p.start()
processes.append(p)
for p in processes:
p.join()
```
## CPU in multiprocessing[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cpu-in-multiprocessing "Link to this heading")
Inappropriate multiprocessing can lead to CPU oversubscription, causing different processes to compete for CPU resources, resulting in low efficiency.
This tutorial will explain what CPU oversubscription is and how to avoid it.
### CPU oversubscription[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cpu-oversubscription "Link to this heading")
CPU oversubscription is a technical term that refers to a situation where the total number of vCPUs allocated to a system exceeds the total number of vCPUs available on the hardware.
This leads to severe contention for CPU resources. In such cases, there is frequent switching between processes, which increases processes switching overhead and decreases overall system efficiency.
See CPU oversubscription with the code examples in the Hogwild implementation found in the [example repository](https://github.com/pytorch/examples/tree/main/mnist_hogwild).
When running the training example with the following command on CPU using 4 processes:
```
python main.py --num-processes 4
```
Assuming there are N vCPUs available on the machine, executing the above command will generate 4 subprocesses. Each subprocess will allocate N vCPUs for itself, resulting in a requirement of 4\*N vCPUs. However, the machine only has N vCPUs available. Consequently, the different processes will compete for resources, leading to frequent process switching.
The following observations indicate the presence of CPU over subscription:
1. High CPU Utilization: By using the `htop` command, you can observe that the CPU utilization is consistently high, often reaching or exceeding its maximum capacity. This indicates that the demand for CPU resources exceeds the available physical cores, causing contention and competition among processes for CPU time.
2. Frequent Context Switching with Low System Efficiency: In an oversubscribed CPU scenario, processes compete for CPU time, and the operating system needs to rapidly switch between different processes to allocate resources fairly. This frequent context switching adds overhead and reduces the overall system efficiency.
### Avoid CPU oversubscription[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription "Link to this heading")
A good way to avoid CPU oversubscription is proper resource allocation. Ensure that the number of processes or threads running concurrently does not exceed the available CPU resources.
In this case, a solution would be to specify the appropriate number of threads in the subprocesses. This can be achieved by setting the number of threads for each process using the `torch.set_num_threads(int)` function in subprocess.
Assuming there are N vCPUs on the machine and M processes will be generated, the maximum `num_threads` value used by each process would be `floor(N/M)`. To avoid CPU oversubscription in the mnist\_hogwild example, the following changes are needed for the file `train.py` in [example repository](https://github.com/pytorch/examples/tree/main/mnist_hogwild).
```
def train(rank, args, model, device, dataset, dataloader_kwargs):
torch.manual_seed(args.seed + rank)
#### define the num threads used in current sub-processes
torch.set_num_threads(floor(N/M))
train_loader = torch.utils.data.DataLoader(dataset, **dataloader_kwargs)
optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
for epoch in range(1, args.epochs + 1):
train_epoch(epoch, args, model, device, train_loader, optimizer)
```
Set `num_thread` for each process using `torch.set_num_threads(floor(N/M))`. where you replace N with the number of vCPUs available and M with the chosen number of processes. The appropriate `num_thread` value will vary depending on the specific task at hand. However, as a general guideline, the maximum value for the `num_thread` should be `floor(N/M)` to avoid CPU oversubscription. In the [mnist\_hogwild](https://github.com/pytorch/examples/tree/main/mnist_hogwild) training example, after avoiding CPU over subscription, you can achieve a 30x performance boost.
Rate this Page
★ ★ ★ ★ ★
Send Feedback
[previous MPS backend](https://docs.pytorch.org/docs/stable/notes/mps.html "previous page")
[next Numerical accuracy](https://docs.pytorch.org/docs/stable/notes/numerical_accuracy.html "next page")
Built with the [PyData Sphinx Theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/index.html) 0.15.4.
[previous MPS backend](https://docs.pytorch.org/docs/stable/notes/mps.html "previous page")
[next Numerical accuracy](https://docs.pytorch.org/docs/stable/notes/numerical_accuracy.html "next page")
On this page
- [Poison fork in multiprocessing](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#poison-fork-in-multiprocessing)
- [CUDA in multiprocessing](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing)
- [Best practices and tips](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#best-practices-and-tips)
- [Avoiding and fighting deadlocks](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#avoiding-and-fighting-deadlocks)
- [Reuse buffers passed through a Queue](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#reuse-buffers-passed-through-a-queue)
- [Asynchronous multiprocess training (e.g. Hogwild)](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#asynchronous-multiprocess-training-e-g-hogwild)
- [Hogwild](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#hogwild)
- [CPU in multiprocessing](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cpu-in-multiprocessing)
- [CPU oversubscription](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cpu-oversubscription)
- [Avoid CPU oversubscription](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription)
[Edit on GitHub](https://github.com/pytorch/pytorch/edit/main/docs/source/notes/multiprocessing.rst)
[Show Source](https://docs.pytorch.org/docs/stable/_sources/notes/multiprocessing.rst.txt)
PyTorch Libraries
- [ExecuTorch](https://docs.pytorch.org/executorch)
- [Helion](https://docs.pytorch.org/helion)
- [torchao](https://docs.pytorch.org/ao)
- [kineto](https://github.com/pytorch/kineto)
- [torchtitan](https://github.com/pytorch/torchtitan)
- [TorchRL](https://docs.pytorch.org/rl)
- [torchvision](https://docs.pytorch.org/vision)
- [torchaudio](https://docs.pytorch.org/audio)
- [tensordict](https://docs.pytorch.org/tensordict)
- [PyTorch on XLA Devices](https://docs.pytorch.org/xla)
## Docs
Access comprehensive developer documentation for PyTorch
[View Docs](https://docs.pytorch.org/docs/stable/index.html)
## Tutorials
Get in-depth tutorials for beginners and advanced developers
[View Tutorials](https://docs.pytorch.org/tutorials)
## Resources
Find development resources and get your questions answered
[View Resources](https://pytorch.org/resources)
**Stay in touch** for updates, event info, and the latest news
By submitting this form, I consent to receive marketing emails from the LF and its projects regarding their events, training, research, developments, and related announcements. I understand that I can unsubscribe at any time using the links in the footers of the emails I receive. [Privacy Policy](https://www.linuxfoundation.org/privacy/).
© PyTorch. Copyright © The Linux Foundation®. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For more information, including terms of use, privacy policy, and trademark usage, please see our [Policies](https://www.linuxfoundation.org/legal/policies) page. [Trademark Usage](https://www.linuxfoundation.org/trademark-usage). [Privacy Policy](http://www.linuxfoundation.org/privacy).
To analyze traffic and optimize your experience, we serve cookies on this site. By clicking or navigating, you agree to allow our usage of cookies. As the current maintainers of this site, Facebook’s Cookies Policy applies. Learn more, including about available controls: [Cookies Policy](https://opensource.fb.com/legal/cookie-policy).

© Copyright PyTorch Contributors.
Created using [Sphinx](https://www.sphinx-doc.org/) 7.2.6.
Built with the [PyData Sphinx Theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/index.html) 0.15.4. |
| Readable Markdown | Created On: Jan 16, 2017 \| Last Updated On: Jun 18, 2025
[`torch.multiprocessing`](https://docs.pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "torch.multiprocessing") is a drop in replacement for Python’s [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "(in Python v3.14)") module. It supports the exact same operations, but extends it, so that all tensors sent through a [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)"), will have their data moved into shared memory and will only send a handle to another process.
Note
When a [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") is sent to another process, the [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") data is shared. If [`torch.Tensor.grad`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.grad.html#torch.Tensor.grad "torch.Tensor.grad") is not `None`, it is also shared. After a [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") without a [`torch.Tensor.grad`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.grad.html#torch.Tensor.grad "torch.Tensor.grad") field is sent to the other process, it creates a standard process-specific `.grad` [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") that is not automatically shared across all processes, unlike how the [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor")’s data has been shared.
This allows to implement various training methods, like Hogwild, A3C, or any others that require asynchronous operation.
## Poison fork in multiprocessing[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#poison-fork-in-multiprocessing "Link to this heading")
When using multiprocessing with [accelerators](https://docs.pytorch.org/docs/stable/torch.html#accelerators), a known issue called “poison fork” may occur. This happens when the accelerator’s runtime is not fork safe and is initialized before a process forks, leading to runtime errors in child processes.
To prevent such errors:
- Avoid initializing the accelerator in the main process before forking child processes.
- Use an alternative process start methods, such as `spawn` or `forkserver`, which ensures a clean initialization of each process.
## CUDA in multiprocessing[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing "Link to this heading")
The CUDA runtime has the limitation described in [Poison fork in multiprocessing](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#multiprocessing-poison-fork-note) when using the `fork` start method; either the `spawn` or `forkserver` start method are required to use CUDA in subprocesses.
Note
The start method can be set via either creating a context with `multiprocessing.get_context(...)` or directly using `multiprocessing.set_start_method(...)`.
Unlike CPU tensors, the sending process is required to keep the original tensor as long as the receiving process retains a copy of the tensor. It is implemented under the hood but requires users to follow the best practices for the program to run correctly. For example, the sending process must stay alive as long as the consumer process has references to the tensor, and the refcounting can not save you if the consumer process exits abnormally via a fatal signal. See [this section](https://docs.pytorch.org/docs/stable/multiprocessing.html#multiprocessing-cuda-sharing-details).
See also: [Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel](https://docs.pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead)
## Best practices and tips[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#best-practices-and-tips "Link to this heading")
### Avoiding and fighting deadlocks[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#avoiding-and-fighting-deadlocks "Link to this heading")
There are a lot of things that can go wrong when a new process is spawned, with the most common cause of deadlocks being background threads. If there’s any thread that holds a lock or imports a module, and `fork` is called, it’s very likely that the subprocess will be in a corrupted state and will deadlock or fail in a different way. Note that even if you don’t, Python built in libraries do - no need to look further than [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "(in Python v3.14)"). [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)") is actually a very complex class, that spawns multiple threads used to serialize, send and receive objects, and they can cause aforementioned problems too. If you find yourself in such situation try using a `SimpleQueue`, that doesn’t use any additional threads.
We’re trying our best to make it easy for you and ensure these deadlocks don’t happen but some things are out of our control. If you have any issues you can’t cope with for a while, try reaching out on forums, and we’ll see if it’s an issue we can fix.
### Reuse buffers passed through a Queue[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#reuse-buffers-passed-through-a-queue "Link to this heading")
Remember that each time you put a [`Tensor`](https://docs.pytorch.org/docs/stable/tensors.html#torch.Tensor "torch.Tensor") into a [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)"), it has to be moved into shared memory. If it’s already shared, it is a no-op, otherwise it will incur an additional memory copy that can slow down the whole process. Even if you have a pool of processes sending data to a single one, make it send the buffers back - this is nearly free and will let you avoid a copy when sending next batch.
### Asynchronous multiprocess training (e.g. Hogwild)[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#asynchronous-multiprocess-training-e-g-hogwild "Link to this heading")
Using [`torch.multiprocessing`](https://docs.pytorch.org/docs/stable/multiprocessing.html#module-torch.multiprocessing "torch.multiprocessing"), it is possible to train a model asynchronously, with parameters either shared all the time, or being periodically synchronized. In the first case, we recommend sending over the whole model object, while in the latter, we advise to only send the [`state_dict()`](https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict "torch.nn.Module.state_dict").
We recommend using [`multiprocessing.Queue`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue "(in Python v3.14)") for passing all kinds of PyTorch objects between processes. It is possible to e.g. inherit the tensors and storages already in shared memory, when using the `fork` start method, however it is very bug prone and should be used with care, and only by advanced users. Queues, even though they’re sometimes a less elegant solution, will work properly in all cases.
Warning
You should be careful about having global statements, that are not guarded with an `if __name__ == '__main__'`. If a different start method than `fork` is used, they will be executed in all subprocesses.
#### Hogwild[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#hogwild "Link to this heading")
A concrete Hogwild implementation can be found in the [examples repository](https://github.com/pytorch/examples/tree/master/mnist_hogwild), but to showcase the overall structure of the code, there’s also a minimal example below as well:
```
import torch.multiprocessing as mp
from model import MyModel
def train(model):
# Construct data_loader, optimizer, etc.
for data, labels in data_loader:
optimizer.zero_grad()
loss_fn(model(data), labels).backward()
optimizer.step() # This will update the shared parameters
if __name__ == '__main__':
num_processes = 4
model = MyModel()
# NOTE: this is required for the ``fork`` method to work
model.share_memory()
processes = []
for rank in range(num_processes):
p = mp.Process(target=train, args=(model,))
p.start()
processes.append(p)
for p in processes:
p.join()
```
## CPU in multiprocessing[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cpu-in-multiprocessing "Link to this heading")
Inappropriate multiprocessing can lead to CPU oversubscription, causing different processes to compete for CPU resources, resulting in low efficiency.
This tutorial will explain what CPU oversubscription is and how to avoid it.
### CPU oversubscription[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#cpu-oversubscription "Link to this heading")
CPU oversubscription is a technical term that refers to a situation where the total number of vCPUs allocated to a system exceeds the total number of vCPUs available on the hardware.
This leads to severe contention for CPU resources. In such cases, there is frequent switching between processes, which increases processes switching overhead and decreases overall system efficiency.
See CPU oversubscription with the code examples in the Hogwild implementation found in the [example repository](https://github.com/pytorch/examples/tree/main/mnist_hogwild).
When running the training example with the following command on CPU using 4 processes:
```
python main.py --num-processes 4
```
Assuming there are N vCPUs available on the machine, executing the above command will generate 4 subprocesses. Each subprocess will allocate N vCPUs for itself, resulting in a requirement of 4\*N vCPUs. However, the machine only has N vCPUs available. Consequently, the different processes will compete for resources, leading to frequent process switching.
The following observations indicate the presence of CPU over subscription:
1. High CPU Utilization: By using the `htop` command, you can observe that the CPU utilization is consistently high, often reaching or exceeding its maximum capacity. This indicates that the demand for CPU resources exceeds the available physical cores, causing contention and competition among processes for CPU time.
2. Frequent Context Switching with Low System Efficiency: In an oversubscribed CPU scenario, processes compete for CPU time, and the operating system needs to rapidly switch between different processes to allocate resources fairly. This frequent context switching adds overhead and reduces the overall system efficiency.
### Avoid CPU oversubscription[\#](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#avoid-cpu-oversubscription "Link to this heading")
A good way to avoid CPU oversubscription is proper resource allocation. Ensure that the number of processes or threads running concurrently does not exceed the available CPU resources.
In this case, a solution would be to specify the appropriate number of threads in the subprocesses. This can be achieved by setting the number of threads for each process using the `torch.set_num_threads(int)` function in subprocess.
Assuming there are N vCPUs on the machine and M processes will be generated, the maximum `num_threads` value used by each process would be `floor(N/M)`. To avoid CPU oversubscription in the mnist\_hogwild example, the following changes are needed for the file `train.py` in [example repository](https://github.com/pytorch/examples/tree/main/mnist_hogwild).
```
def train(rank, args, model, device, dataset, dataloader_kwargs):
torch.manual_seed(args.seed + rank)
#### define the num threads used in current sub-processes
torch.set_num_threads(floor(N/M))
train_loader = torch.utils.data.DataLoader(dataset, **dataloader_kwargs)
optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
for epoch in range(1, args.epochs + 1):
train_epoch(epoch, args, model, device, train_loader, optimizer)
```
Set `num_thread` for each process using `torch.set_num_threads(floor(N/M))`. where you replace N with the number of vCPUs available and M with the chosen number of processes. The appropriate `num_thread` value will vary depending on the specific task at hand. However, as a general guideline, the maximum value for the `num_thread` should be `floor(N/M)` to avoid CPU oversubscription. In the [mnist\_hogwild](https://github.com/pytorch/examples/tree/main/mnist_hogwild) training example, after avoiding CPU over subscription, you can achieve a 30x performance boost. |
| Shard | 114 (laksa) |
| Root Hash | 14416670112284949514 |
| Unparsed URL | org,pytorch!docs,/docs/stable/notes/multiprocessing.html s443 |