βΉοΈ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/ |
| Last Crawled | 2026-04-15 09:46:40 (3 hours ago) |
| First Indexed | 2019-01-02 11:02:41 (7 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Developing an Asynchronous Task Queue in Python | TestDriven.io |
| Meta Description | This tutorial looks at how to implement several asynchronous task queues using the Python multiprocessing library and Redis. |
| Meta Canonical | null |
| Boilerpipe Text | This tutorial looks at how to implement several asynchronous task queues using Python's
multiprocessing
library and
Redis
.
Queue Data Structures
Task
Following along?
Multiprocessing Pool
Multiprocessing Queue
Logging
Redis
Conclusion
Queue Data Structures
A
queue
is a
First-In-First-Out
(
FIFO
) data structure.
an item is added at the tail (
enqueue
)
an item is removed at the head (
dequeue
)
You'll see this in practice as you code out the examples in this tutorial.
Let's start by creating a basic task:
# tasks.py
import
collections
import
json
import
os
import
sys
import
uuid
from
pathlib
import
Path
from
nltk.corpus
import
stopwords
COMMON_WORDS
=
set
(
stopwords
.
words
(
"english"
))
BASE_DIR
=
Path
(
__file__
)
.
resolve
(
strict
=
True
)
.
parent
DATA_DIR
=
Path
(
BASE_DIR
)
.
joinpath
(
"data"
)
OUTPUT_DIR
=
Path
(
BASE_DIR
)
.
joinpath
(
"output"
)
def
save_file
(
filename
,
data
):
random_str
=
uuid
.
uuid4
()
.
hex
outfile
=
f
"
{
filename
}
_
{
random_str
}
.txt"
with
open
(
Path
(
OUTPUT_DIR
)
.
joinpath
(
outfile
),
"w"
)
as
outfile
:
outfile
.
write
(
data
)
def
get_word_counts
(
filename
):
wordcount
=
collections
.
Counter
()
# get counts
with
open
(
Path
(
DATA_DIR
)
.
joinpath
(
filename
),
"r"
)
as
f
:
for
line
in
f
:
wordcount
.
update
(
line
.
split
())
for
word
in
set
(
COMMON_WORDS
):
del
wordcount
[
word
]
# save file
save_file
(
filename
,
json
.
dumps
(
dict
(
wordcount
.
most_common
(
20
))))
proc
=
os
.
getpid
()
print
(
f
"Processed
{
filename
}
with process id:
{
proc
}
"
)
if
__name__
==
"__main__"
:
get_word_counts
(
sys
.
argv
[
1
])
So,
get_word_counts
finds the twenty most frequent words from a given text file and saves them to an output file. It also prints the current process identifier (or pid) using Python's
os
library.
Following along?
Create a project directory along with a virtual environment. Then, use pip to install
NLTK
:
(
env
)
$ pip install
nltk
==
3
.8.1
Once installed, invoke the Python shell and download the
stopwords
corpus
:
>>> import nltk
>>> nltk.download
(
"stopwords"
)
[
nltk_data
]
Downloading package stopwords to
[
nltk_data
]
/Users/michael/nltk_data...
[
nltk_data
]
Unzipping corpora/stopwords.zip.
True
If you experience an SSL error refer to
this
article.
Example fix:
>>> import nltk
>>> nltk.download
(
'stopwords'
)
[
nltk_data
]
Error loading stopwords: <urlopen error
[
SSL:
[
nltk_data
]
CERTIFICATE_VERIFY_FAILED
]
certificate verify failed:
[
nltk_data
]
unable to get
local
issuer certificate
(
_ssl.c:1056
)
>
False
>>> import ssl
>>> try:
...
_create_unverified_https_context
=
ssl._create_unverified_context
... except AttributeError:
... pass
...
else
:
... ssl._create_default_https_context
=
_create_unverified_https_context
...
>>> nltk.download
(
'stopwords'
)
[
nltk_data
]
Downloading package stopwords to
[
nltk_data
]
/Users/michael.herman/nltk_data...
[
nltk_data
]
Unzipping corpora/stopwords.zip.
True
Add the above
tasks.py
file to your project directory but don't run it quite yet.
Multiprocessing Pool
We can run this task in parallel using the
multiprocessing
library:
# simple_pool.py
import
multiprocessing
import
time
from
tasks
import
get_word_counts
PROCESSES
=
multiprocessing
.
cpu_count
()
-
1
def
run
():
print
(
f
"Running with
{
PROCESSES
}
processes!"
)
start
=
time
.
time
()
with
multiprocessing
.
Pool
(
PROCESSES
)
as
p
:
p
.
map_async
(
get_word_counts
,
[
"pride-and-prejudice.txt"
,
"heart-of-darkness.txt"
,
"frankenstein.txt"
,
"dracula.txt"
,
],
)
# clean up
p
.
close
()
p
.
join
()
print
(
f
"Time taken =
{
time
.
time
()
-
start
:
.10f
}
"
)
if
__name__
==
"__main__"
:
run
()
Here, using the
Pool
class, we processed four tasks with two processes.
Did you notice the
map_async
method? There are essentially four different methods available for mapping tasks to processes. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account:
Method
Multi-args
Concurrency
Blocking
Ordered-results
map
No
Yes
Yes
Yes
map_async
No
No
No
Yes
apply
Yes
No
Yes
No
apply_async
Yes
Yes
No
No
Without both
close
and
join
, garbage collection may not occur, which could lead to a memory leak.
close
tells the pool not to accept any new tasks
join
tells the pool to exit after all tasks have completed
Following along?
Grab the
Project Gutenberg
sample text files from the "data" directory in the
simple-task-queue
repo, and then add an "output" directory.
Your project directory should look like this:
βββ data
βΒ Β βββ dracula.txt
βΒ Β βββ frankenstein.txt
βΒ Β βββ heart-of-darkness.txt
βΒ Β βββ pride-and-prejudice.txt
βββ output
βββ simple_pool.py
βββ tasks.py
It should take less than a second to run:
(
env
)
$ python simple_pool.py
Running with
15
processes!
Processed heart-of-darkness.txt with process id:
50510
Processed frankenstein.txt with process id:
50515
Processed pride-and-prejudice.txt with process id:
50511
Processed dracula.txt with process id:
50512
Time
taken
=
0
.6383581161
This script ran on a i9 Macbook Pro with 16 cores.
So, the multiprocessing
Pool
class handles the queuing logic for us. It's perfect for running CPU-bound tasks or really any job that can be broken up and distributed independently. If you need more control over the queue or need to share data between multiple processes, you may want to look at the
Queue
class.
For more on this along with the difference between parallelism (multiprocessing) and concurrency (multithreading), review the
Speeding Up Python with Concurrency, Parallelism, and asyncio
article.
Multiprocessing Queue
Let's look at a simple example:
# simple_queue.py
import
multiprocessing
def
run
():
books
=
[
"pride-and-prejudice.txt"
,
"heart-of-darkness.txt"
,
"frankenstein.txt"
,
"dracula.txt"
,
]
queue
=
multiprocessing
.
Queue
()
print
(
"Enqueuing..."
)
for
book
in
books
:
print
(
book
)
queue
.
put
(
book
)
print
(
"
\n
Dequeuing..."
)
while
not
queue
.
empty
():
print
(
queue
.
get
())
if
__name__
==
"__main__"
:
run
()
The
Queue
class, also from the multiprocessing library, is a basic FIFO (first in, first out) data structure. It's similar to the
queue.Queue
class, but designed for interprocess communication. We used
put
to enqueue an item to the queue and
get
to dequeue an item.
Check out the
Queue
source code
for a better understanding of the mechanics of this class.
Now, let's look at more advanced example:
# simple_task_queue.py
import
multiprocessing
import
time
from
tasks
import
get_word_counts
PROCESSES
=
multiprocessing
.
cpu_count
()
-
1
NUMBER_OF_TASKS
=
10
def
process_tasks
(
task_queue
):
while
not
task_queue
.
empty
():
book
=
task_queue
.
get
()
get_word_counts
(
book
)
return
True
def
add_tasks
(
task_queue
,
number_of_tasks
):
for
num
in
range
(
number_of_tasks
):
task_queue
.
put
(
"pride-and-prejudice.txt"
)
task_queue
.
put
(
"heart-of-darkness.txt"
)
task_queue
.
put
(
"frankenstein.txt"
)
task_queue
.
put
(
"dracula.txt"
)
return
task_queue
def
run
():
empty_task_queue
=
multiprocessing
.
Queue
()
full_task_queue
=
add_tasks
(
empty_task_queue
,
NUMBER_OF_TASKS
)
processes
=
[]
print
(
f
"Running with
{
PROCESSES
}
processes!"
)
start
=
time
.
time
()
for
n
in
range
(
PROCESSES
):
p
=
multiprocessing
.
Process
(
target
=
process_tasks
,
args
=
(
full_task_queue
,))
processes
.
append
(
p
)
p
.
start
()
for
p
in
processes
:
p
.
join
()
print
(
f
"Time taken =
{
time
.
time
()
-
start
:
.10f
}
"
)
if
__name__
==
"__main__"
:
run
()
Here, we enqueued 40 tasks (ten for each text file) to the queue, created separate processes via the
Process
class, used
start
to start running the processes, and, finally, used
join
to complete the processes.
It should still take less than a second to run.
Challenge
: Check your understanding by adding another queue to hold completed tasks. You can enqueue them within the
process_tasks
function.
Logging
The multiprocessing library provides support for logging as well:
# simple_task_queue_logging.py
import
logging
import
multiprocessing
import
os
import
time
from
tasks
import
get_word_counts
PROCESSES
=
multiprocessing
.
cpu_count
()
-
1
NUMBER_OF_TASKS
=
10
def
process_tasks
(
task_queue
):
logger
=
multiprocessing
.
get_logger
()
proc
=
os
.
getpid
()
while
not
task_queue
.
empty
():
try
:
book
=
task_queue
.
get
()
get_word_counts
(
book
)
except
Exception
as
e
:
logger
.
error
(
e
)
logger
.
info
(
f
"Process
{
proc
}
completed successfully"
)
return
True
def
add_tasks
(
task_queue
,
number_of_tasks
):
for
num
in
range
(
number_of_tasks
):
task_queue
.
put
(
"pride-and-prejudice.txt"
)
task_queue
.
put
(
"heart-of-darkness.txt"
)
task_queue
.
put
(
"frankenstein.txt"
)
task_queue
.
put
(
"dracula.txt"
)
return
task_queue
def
run
():
empty_task_queue
=
multiprocessing
.
Queue
()
full_task_queue
=
add_tasks
(
empty_task_queue
,
NUMBER_OF_TASKS
)
processes
=
[]
print
(
f
"Running with
{
PROCESSES
}
processes!"
)
start
=
time
.
time
()
for
w
in
range
(
PROCESSES
):
p
=
multiprocessing
.
Process
(
target
=
process_tasks
,
args
=
(
full_task_queue
,))
processes
.
append
(
p
)
p
.
start
()
for
p
in
processes
:
p
.
join
()
print
(
f
"Time taken =
{
time
.
time
()
-
start
:
.10f
}
"
)
if
__name__
==
"__main__"
:
multiprocessing
.
log_to_stderr
(
logging
.
ERROR
)
run
()
To test, change
task_queue.put("dracula.txt")
to
task_queue.put("drakula.txt")
. You should see the following error outputted ten times in the terminal:
[
ERROR/Process-4
]
[
Errno
2
]
No such file or directory:
'simple-task-queue/data/drakula.txt'
Want to log to disc?
# simple_task_queue_logging.py
import
logging
import
multiprocessing
import
os
import
time
from
tasks
import
get_word_counts
PROCESSES
=
multiprocessing
.
cpu_count
()
-
1
NUMBER_OF_TASKS
=
10
def
create_logger
():
logger
=
multiprocessing
.
get_logger
()
logger
.
setLevel
(
logging
.
INFO
)
fh
=
logging
.
FileHandler
(
"process.log"
)
fmt
=
"
%(asctime)s
-
%(levelname)s
-
%(message)s
"
formatter
=
logging
.
Formatter
(
fmt
)
fh
.
setFormatter
(
formatter
)
logger
.
addHandler
(
fh
)
return
logger
def
process_tasks
(
task_queue
):
logger
=
create_logger
()
proc
=
os
.
getpid
()
while
not
task_queue
.
empty
():
try
:
book
=
task_queue
.
get
()
get_word_counts
(
book
)
except
Exception
as
e
:
logger
.
error
(
e
)
logger
.
info
(
f
"Process
{
proc
}
completed successfully"
)
return
True
def
add_tasks
(
task_queue
,
number_of_tasks
):
for
num
in
range
(
number_of_tasks
):
task_queue
.
put
(
"pride-and-prejudice.txt"
)
task_queue
.
put
(
"heart-of-darkness.txt"
)
task_queue
.
put
(
"frankenstein.txt"
)
task_queue
.
put
(
"dracula.txt"
)
return
task_queue
def
run
():
empty_task_queue
=
multiprocessing
.
Queue
()
full_task_queue
=
add_tasks
(
empty_task_queue
,
NUMBER_OF_TASKS
)
processes
=
[]
print
(
f
"Running with
{
PROCESSES
}
processes!"
)
start
=
time
.
time
()
for
w
in
range
(
PROCESSES
):
p
=
multiprocessing
.
Process
(
target
=
process_tasks
,
args
=
(
full_task_queue
,))
processes
.
append
(
p
)
p
.
start
()
for
p
in
processes
:
p
.
join
()
print
(
f
"Time taken =
{
time
.
time
()
-
start
:
.10f
}
"
)
if
__name__
==
"__main__"
:
run
()
Again, cause an error by altering one of the file names, and then run it. Take a look at
process.log
. It's not quite as organized as it should be since the Python logging library does not use shared locks between processes. To get around this, let's have each process write to its own file. To keep things organized, add a logs directory to your project folder:
# simple_task_queue_logging_separate_files.py
import
logging
import
multiprocessing
import
os
import
time
from
tasks
import
get_word_counts
PROCESSES
=
multiprocessing
.
cpu_count
()
-
1
NUMBER_OF_TASKS
=
10
def
create_logger
(
pid
):
logger
=
multiprocessing
.
get_logger
()
logger
.
setLevel
(
logging
.
INFO
)
fh
=
logging
.
FileHandler
(
f
"logs/process_
{
pid
}
.log"
)
fmt
=
"
%(asctime)s
-
%(levelname)s
-
%(message)s
"
formatter
=
logging
.
Formatter
(
fmt
)
fh
.
setFormatter
(
formatter
)
logger
.
addHandler
(
fh
)
return
logger
def
process_tasks
(
task_queue
):
proc
=
os
.
getpid
()
logger
=
create_logger
(
proc
)
while
not
task_queue
.
empty
():
try
:
book
=
task_queue
.
get
()
get_word_counts
(
book
)
except
Exception
as
e
:
logger
.
error
(
e
)
logger
.
info
(
f
"Process
{
proc
}
completed successfully"
)
return
True
def
add_tasks
(
task_queue
,
number_of_tasks
):
for
num
in
range
(
number_of_tasks
):
task_queue
.
put
(
"pride-and-prejudice.txt"
)
task_queue
.
put
(
"heart-of-darkness.txt"
)
task_queue
.
put
(
"frankenstein.txt"
)
task_queue
.
put
(
"dracula.txt"
)
return
task_queue
def
run
():
empty_task_queue
=
multiprocessing
.
Queue
()
full_task_queue
=
add_tasks
(
empty_task_queue
,
NUMBER_OF_TASKS
)
processes
=
[]
print
(
f
"Running with
{
PROCESSES
}
processes!"
)
start
=
time
.
time
()
for
w
in
range
(
PROCESSES
):
p
=
multiprocessing
.
Process
(
target
=
process_tasks
,
args
=
(
full_task_queue
,))
processes
.
append
(
p
)
p
.
start
()
for
p
in
processes
:
p
.
join
()
print
(
f
"Time taken =
{
time
.
time
()
-
start
:
.10f
}
"
)
if
__name__
==
"__main__"
:
run
()
Redis
Moving right along, instead of using an in-memory queue, let's add
Redis
into the mix.
Following along?
Download
and install Redis if you do not already have it installed. Then, install the Python
interface
:
(env)$ pip install
redis
==
4
.5.5
We'll break the logic up into four files:
redis_queue.py
creates new queues and tasks via the
SimpleQueue
and
SimpleTask
classes, respectively.
redis_queue_client
enqueues new tasks.
redis_queue_worker
dequeues and processes tasks.
redis_queue_server
spawns worker processes.
# redis_queue.py
import
pickle
import
uuid
class
SimpleQueue
(
object
):
def
__init__
(
self
,
conn
,
name
):
self
.
conn
=
conn
self
.
name
=
name
def
enqueue
(
self
,
func
,
*
args
):
task
=
SimpleTask
(
func
,
*
args
)
serialized_task
=
pickle
.
dumps
(
task
,
protocol
=
pickle
.
HIGHEST_PROTOCOL
)
self
.
conn
.
lpush
(
self
.
name
,
serialized_task
)
return
task
.
id
def
dequeue
(
self
):
_
,
serialized_task
=
self
.
conn
.
brpop
(
self
.
name
)
task
=
pickle
.
loads
(
serialized_task
)
task
.
process_task
()
return
task
def
get_length
(
self
):
return
self
.
conn
.
llen
(
self
.
name
)
class
SimpleTask
(
object
):
def
__init__
(
self
,
func
,
*
args
):
self
.
id
=
str
(
uuid
.
uuid4
())
self
.
func
=
func
self
.
args
=
args
def
process_task
(
self
):
self
.
func
(
*
self
.
args
)
Here, we defined two classes,
SimpleQueue
and
SimpleTask
:
SimpleQueue
creates a new queue and enqueues, dequeues, and gets the length of the queue.
SimpleTask
creates new tasks, which are used by the instance of the
SimpleQueue
class to enqueue new tasks, and processes new tasks.
Curious about
lpush()
,
brpop()
, and
llen()
? Refer to the
Command reference
page. (
The brpop()
function is particularly cool because it blocks the connection until a value exists to be popped!)
# redis_queue_client.py
import
redis
from
redis_queue
import
SimpleQueue
from
tasks
import
get_word_counts
NUMBER_OF_TASKS
=
10
if
__name__
==
"__main__"
:
r
=
redis
.
Redis
()
queue
=
SimpleQueue
(
r
,
"sample"
)
count
=
0
for
num
in
range
(
NUMBER_OF_TASKS
):
queue
.
enqueue
(
get_word_counts
,
"pride-and-prejudice.txt"
)
queue
.
enqueue
(
get_word_counts
,
"heart-of-darkness.txt"
)
queue
.
enqueue
(
get_word_counts
,
"frankenstein.txt"
)
queue
.
enqueue
(
get_word_counts
,
"dracula.txt"
)
count
+=
4
print
(
f
"Enqueued
{
count
}
tasks!"
)
This module will create a new instance of Redis and the
SimpleQueue
class. It will then enqueue 40 tasks.
# redis_queue_worker.py
import
redis
from
redis_queue
import
SimpleQueue
def
worker
():
r
=
redis
.
Redis
()
queue
=
SimpleQueue
(
r
,
"sample"
)
if
queue
.
get_length
()
>
0
:
queue
.
dequeue
()
else
:
print
(
"No tasks in the queue"
)
if
__name__
==
"__main__"
:
worker
()
If a task is available, the
dequeue
method is called, which then de-serializes the task and calls the
process_task
method (in
redis_queue.py
).
# redis_queue_server.py
import
multiprocessing
from
redis_queue_worker
import
worker
PROCESSES
=
4
def
run
():
processes
=
[]
print
(
f
"Running with
{
PROCESSES
}
processes!"
)
while
True
:
for
w
in
range
(
PROCESSES
):
p
=
multiprocessing
.
Process
(
target
=
worker
)
processes
.
append
(
p
)
p
.
start
()
for
p
in
processes
:
p
.
join
()
if
__name__
==
"__main__"
:
run
()
The
run
method spawns four new worker processes.
You probably donβt want four processes running at once all the time, but there may be times that you will need four or more processes. Think about how you could programmatically spin up and down additional workers based on demand.
To test, run
redis_queue_server.py
and
redis_queue_client.py
in separate terminal windows:
Check your understanding again by adding logging to the above application.
Conclusion
In this tutorial, we looked at a number of asynchronous task queue implementations in Python. If the requirements are simple enough, it may be easier to develop a queue in this manner. That said, if you're looking for more advanced features -- like task scheduling, batch processing, job prioritization, and retrying of failed tasks -- you should look into a full-blown solution. Check out
Celery
,
RQ
, or
Huey
.
Grab the final code from the
simple-task-queue
repo. |
| Markdown | 
[](https://testdriven.io/)
- [Courses](https://testdriven.io/courses/)
- [Bundles](https://testdriven.io/bundles/)
- [Blog](https://testdriven.io/blog/)
- [Guides](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/)
[Complete Python](https://testdriven.io/guides/complete-python/) [Django and Celery](https://testdriven.io/guides/django-celery/) [Deep Dive Into Flask](https://testdriven.io/guides/flask-deep-dive/)
- [More](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/)
[Support and Consulting](https://testdriven.io/support/) [What is Test-Driven Development?](https://testdriven.io/test-driven-development/) [Testimonials](https://testdriven.io/testimonials/) [Open Source Donations](https://testdriven.io/opensource/) [About Us](https://testdriven.io/about/) [Meet the Authors](https://testdriven.io/authors/) [Tips and Tricks](https://testdriven.io/tips/)
- [Sign In](https://testdriven.io/accounts/login/?next=/blog/developing-an-asynchronous-task-queue-in-python/) [Sign Up](https://testdriven.io/accounts/signup/?next=/blog/developing-an-asynchronous-task-queue-in-python/)
- [Sign In](https://testdriven.io/accounts/login/)
- [Sign Up](https://testdriven.io/accounts/signup/)
# Developing an Asynchronous Task Queue in Python
Posted by [ Michael Herman](https://testdriven.io/authors/herman/) Last updated June 21st, 2023
## Share this tutorial
- [Twitter](https://twitter.com/intent/tweet/?text=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python%20from%20%40TestDrivenio%20by%20%40mikeherman&&url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/)
- [Reddit](https://reddit.com/submit/?url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&resubmit=true&title=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python)
- [Hacker News](https://news.ycombinator.com/submitlink?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&t=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python)
- [Facebook](https://facebook.com/sharer/sharer.php?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/)
This tutorial looks at how to implement several asynchronous task queues using Python's [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library and [Redis](https://redis.io/).
## Contents
- [Queue Data Structures](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#queue-data-structures)
- [Task](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#task)
- [Following along?](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#following-along)
- [Multiprocessing Pool](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-pool)
- [Multiprocessing Queue](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-queue)
- [Logging](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#logging)
- [Redis](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#redis)
- [Conclusion](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#conclusion)
## Queue Data Structures
A [queue](https://en.wikipedia.org/wiki/Queue_\(abstract_data_type\)) is a [First-In-First-Out](https://en.wikipedia.org/wiki/FIFO_\(computing_and_electronics\)) (**FIFO**) data structure.
1. an item is added at the tail (**enqueue**)
2. an item is removed at the head (**dequeue**)
![queue]()
You'll see this in practice as you code out the examples in this tutorial.
## Task
Let's start by creating a basic task:
```
```
So, `get_word_counts` finds the twenty most frequent words from a given text file and saves them to an output file. It also prints the current process identifier (or pid) using Python's [os](https://docs.python.org/3/library/os.html) library.
### Following along?
Create a project directory along with a virtual environment. Then, use pip to install [NLTK](https://pypi.org/project/nltk/):
```
(env)$ pip install nltk==3.8.1
```
Once installed, invoke the Python shell and download the `stopwords` [corpus](https://www.nltk.org/data.html):
```
```
> If you experience an SSL error refer to [this](https://stackoverflow.com/questions/41348621/ssl-error-downloading-nltk-data) article.
>
> Example fix:
> ```
> >>> import nltk
>>> nltk.download('stopwords')
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1056)>
False
>>> import ssl
>>> try:
... _create_unverified_https_context = ssl._create_unverified_context
... except AttributeError:
... pass
... else:
... ssl._create_default_https_context = _create_unverified_https_context
...
>>> nltk.download('stopwords')
[nltk_data] Downloading package stopwords to
[nltk_data] /Users/michael.herman/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
True
> ```
Add the above *tasks.py* file to your project directory but don't run it quite yet.
## Multiprocessing Pool
We can run this task in parallel using the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library:
```
```
Here, using the [Pool](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) class, we processed four tasks with two processes.
Did you notice the `map_async` method? There are essentially four different methods available for mapping tasks to processes. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account:
| Method | Multi-args | Concurrency | Blocking | Ordered-results |
|---|---|---|---|---|
| `map` | No | Yes | Yes | Yes |
| `map_async` | No | No | No | Yes |
| `apply` | Yes | No | Yes | No |
| `apply_async` | Yes | Yes | No | No |
Without both `close` and `join`, garbage collection may not occur, which could lead to a memory leak.
1. `close` tells the pool not to accept any new tasks
2. `join` tells the pool to exit after all tasks have completed
> **Following along?** Grab the [Project Gutenberg](http://www.gutenberg.org/) sample text files from the "data" directory in the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo, and then add an "output" directory.
>
> Your project directory should look like this:
> ```
> βββ data
βΒ Β βββ dracula.txt
βΒ Β βββ frankenstein.txt
βΒ Β βββ heart-of-darkness.txt
βΒ Β βββ pride-and-prejudice.txt
βββ output
βββ simple_pool.py
βββ tasks.py
> ```
It should take less than a second to run:
```
```
> This script ran on a i9 Macbook Pro with 16 cores.
So, the multiprocessing `Pool` class handles the queuing logic for us. It's perfect for running CPU-bound tasks or really any job that can be broken up and distributed independently. If you need more control over the queue or need to share data between multiple processes, you may want to look at the `Queue` class.
> For more on this along with the difference between parallelism (multiprocessing) and concurrency (multithreading), review the [Speeding Up Python with Concurrency, Parallelism, and asyncio](https://testdriven.io/blog/concurrency-parallelism-asyncio/) article.
## Multiprocessing Queue
Let's look at a simple example:
```
```
The [Queue](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes) class, also from the multiprocessing library, is a basic FIFO (first in, first out) data structure. It's similar to the [queue.Queue](https://docs.python.org/3/library/queue.html#queue.Queue) class, but designed for interprocess communication. We used `put` to enqueue an item to the queue and `get` to dequeue an item.
> Check out the `Queue` [source code](https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py) for a better understanding of the mechanics of this class.
Now, let's look at more advanced example:
```
```
Here, we enqueued 40 tasks (ten for each text file) to the queue, created separate processes via the `Process` class, used `start` to start running the processes, and, finally, used `join` to complete the processes.
It should still take less than a second to run.
> **Challenge**: Check your understanding by adding another queue to hold completed tasks. You can enqueue them within the `process_tasks` function.
## Logging
The multiprocessing library provides support for logging as well:
```
```
To test, change `task_queue.put("dracula.txt")` to `task_queue.put("drakula.txt")`. You should see the following error outputted ten times in the terminal:
```
```
Want to log to disc?
```
```
Again, cause an error by altering one of the file names, and then run it. Take a look at *process.log*. It's not quite as organized as it should be since the Python logging library does not use shared locks between processes. To get around this, let's have each process write to its own file. To keep things organized, add a logs directory to your project folder:
```
```
## Redis
Moving right along, instead of using an in-memory queue, let's add [Redis](https://redis.io/) into the mix.
> **Following along?** [Download](https://redis.io/download) and install Redis if you do not already have it installed. Then, install the Python [interface](https://pypi.org/project/redis/):
> ```
> (env)$ pip install redis==4.5.5
> ```
We'll break the logic up into four files:
1. *redis\_queue.py* creates new queues and tasks via the `SimpleQueue` and `SimpleTask` classes, respectively.
2. *redis\_queue\_client* enqueues new tasks.
3. *redis\_queue\_worker* dequeues and processes tasks.
4. *redis\_queue\_server* spawns worker processes.
```
```
Here, we defined two classes, `SimpleQueue` and `SimpleTask`:
1. `SimpleQueue` creates a new queue and enqueues, dequeues, and gets the length of the queue.
2. `SimpleTask` creates new tasks, which are used by the instance of the `SimpleQueue` class to enqueue new tasks, and processes new tasks.
> Curious about `lpush()`, `brpop()`, and `llen()`? Refer to the [Command reference](https://redis.io/commands) page. (`The brpop()` function is particularly cool because it blocks the connection until a value exists to be popped!)
```
```
This module will create a new instance of Redis and the `SimpleQueue` class. It will then enqueue 40 tasks.
```
```
If a task is available, the `dequeue` method is called, which then de-serializes the task and calls the `process_task` method (in *redis\_queue.py*).
```
```
The `run` method spawns four new worker processes.
> You probably donβt want four processes running at once all the time, but there may be times that you will need four or more processes. Think about how you could programmatically spin up and down additional workers based on demand.
To test, run *redis\_queue\_server.py* and *redis\_queue\_client.py* in separate terminal windows:
[![example]()](https://testdriven.io/static/images/blog/simple-task-queue/example.png)
[![example]()](https://testdriven.io/static/images/gifs/blog/simple-task-queue/example.gif)
> Check your understanding again by adding logging to the above application.
## Conclusion
In this tutorial, we looked at a number of asynchronous task queue implementations in Python. If the requirements are simple enough, it may be easier to develop a queue in this manner. That said, if you're looking for more advanced features -- like task scheduling, batch processing, job prioritization, and retrying of failed tasks -- you should look into a full-blown solution. Check out [Celery](https://docs.celeryq.dev/en/stable/), [RQ](http://python-rq.org/), or [Huey](http://huey.readthedocs.io/).
Grab the final code from the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo.
[Python](https://testdriven.io/blog/topics/python/) [Task Queue](https://testdriven.io/blog/topics/task-queue/)
## [Michael Herman](https://testdriven.io/authors/herman/)
[](https://testdriven.io/authors/herman/)
Michael is a software engineer and educator who lives and works in the Denver/Boulder area. He is the co-founder/author of [Real Python](https://realpython.com/). Besides development, he enjoys building financial models, tech writing, content marketing, and teaching.
## Share this tutorial
## Share this tutorial
- [Twitter](https://twitter.com/intent/tweet/?text=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python%20from%20%40TestDrivenio%20by%20%40mikeherman&&url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/)
- [Reddit](https://reddit.com/submit/?url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&resubmit=true&title=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python)
- [Hacker News](https://news.ycombinator.com/submitlink?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&t=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python)
- [Facebook](https://facebook.com/sharer/sharer.php?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/)
Featured Course
## [Creating an HTTP Load Balancer in Python](https://testdriven.io/courses/http-load-balancer/)
In this course, you'll learn how to implement a load balancer in Python using Test-Driven Development.
[Buy Now \$20](https://testdriven.io/payments/http-load-balancer/) [View Course](https://testdriven.io/courses/http-load-balancer/)
### Tutorial Topics
[API](https://testdriven.io/blog/topics/api/) [Architecture](https://testdriven.io/blog/topics/architecture/) [AWS](https://testdriven.io/blog/topics/aws/) [DevOps](https://testdriven.io/blog/topics/devops/) [Django](https://testdriven.io/blog/topics/django/) [Django REST Framework](https://testdriven.io/blog/topics/django-rest-framework/) [Docker](https://testdriven.io/blog/topics/docker/) [FastAPI](https://testdriven.io/blog/topics/fastapi/) [Flask](https://testdriven.io/blog/topics/flask/) [Front-end](https://testdriven.io/blog/topics/front-end/) [Heroku](https://testdriven.io/blog/topics/heroku/) [Kubernetes](https://testdriven.io/blog/topics/kubernetes/) [Machine Learning](https://testdriven.io/blog/topics/machine-learning/) [Python](https://testdriven.io/blog/topics/python/) [React](https://testdriven.io/blog/topics/react/) [Task Queue](https://testdriven.io/blog/topics/task-queue/) [Testing](https://testdriven.io/blog/topics/testing/) [Vue](https://testdriven.io/blog/topics/vue/) [Web Scraping](https://testdriven.io/blog/topics/web-scraping/)
### Table of Contents
Featured Course
## [Creating an HTTP Load Balancer in Python](https://testdriven.io/courses/http-load-balancer/)
In this course, you'll learn how to implement a load balancer in Python using Test-Driven Development.
[Buy Now \$20](https://testdriven.io/payments/http-load-balancer/) [View Course](https://testdriven.io/courses/http-load-balancer/)
## Recommended Tutorials
Loading...
## Stay Sharp with Course Updates
Join our mailing list to be notified about updates and new releases.
- Learn
- [Courses](https://testdriven.io/courses/)
- [Bundles](https://testdriven.io/bundles/)
- [Blog](https://testdriven.io/blog/)
- Guides
- [Complete Python](https://testdriven.io/guides/complete-python/)
- [Django and Celery](https://testdriven.io/guides/django-celery/)
- [Deep Dive Into Flask](https://testdriven.io/guides/flask-deep-dive/)
- About TestDriven.io
- [Support and Consulting](https://testdriven.io/support/)
- [What is Test-Driven Development?](https://testdriven.io/test-driven-development/)
- [Testimonials](https://testdriven.io/testimonials/)
- [Open Source Donations](https://testdriven.io/opensource/)
- [About Us](https://testdriven.io/about/)
- [Meet the Authors](https://testdriven.io/authors/)
- [Tips and Tricks](https://testdriven.io/tips/)
### TestDriven.io is a proud supporter of open source
**10% of profits** from each of our [FastAPI](https://testdriven.io/courses/topics/fastapi/) courses and our [Flask Web Development](https://testdriven.io/courses/learn-flask/) course will be donated to the FastAPI and Flask teams, respectively.
[Follow our contributions](https://testdriven.io/opensource/)
Β© Copyright 2017 - 2026 TestDriven Labs.
Developed by [Michael Herman](http://mherman.org/).
[Follow @testdrivenio](https://twitter.com/testdrivenio?ref_src=twsrc%5Etfw)

Feedback
##### Send Us Feedback
Γ
####
Close |
| Readable Markdown | This tutorial looks at how to implement several asynchronous task queues using Python's [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library and [Redis](https://redis.io/).
- [Queue Data Structures](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#queue-data-structures)
- [Task](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#task)
- [Following along?](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#following-along)
- [Multiprocessing Pool](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-pool)
- [Multiprocessing Queue](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-queue)
- [Logging](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#logging)
- [Redis](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#redis)
- [Conclusion](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#conclusion)
## Queue Data Structures
A [queue](https://en.wikipedia.org/wiki/Queue_\(abstract_data_type\)) is a [First-In-First-Out](https://en.wikipedia.org/wiki/FIFO_\(computing_and_electronics\)) (**FIFO**) data structure.
1. an item is added at the tail (**enqueue**)
2. an item is removed at the head (**dequeue**)

You'll see this in practice as you code out the examples in this tutorial.
Let's start by creating a basic task:
```
```
So, `get_word_counts` finds the twenty most frequent words from a given text file and saves them to an output file. It also prints the current process identifier (or pid) using Python's [os](https://docs.python.org/3/library/os.html) library.
### Following along?
Create a project directory along with a virtual environment. Then, use pip to install [NLTK](https://pypi.org/project/nltk/):
```
(env)$ pip install nltk==3.8.1
```
Once installed, invoke the Python shell and download the `stopwords` [corpus](https://www.nltk.org/data.html):
```
```
> If you experience an SSL error refer to [this](https://stackoverflow.com/questions/41348621/ssl-error-downloading-nltk-data) article.
>
> Example fix:
> ```
> >>> import nltk
>>> nltk.download('stopwords')
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1056)>
False
>>> import ssl
>>> try:
... _create_unverified_https_context = ssl._create_unverified_context
... except AttributeError:
... pass
... else:
... ssl._create_default_https_context = _create_unverified_https_context
...
>>> nltk.download('stopwords')
[nltk_data] Downloading package stopwords to
[nltk_data] /Users/michael.herman/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
True
> ```
Add the above *tasks.py* file to your project directory but don't run it quite yet.
## Multiprocessing Pool
We can run this task in parallel using the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library:
```
```
Here, using the [Pool](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) class, we processed four tasks with two processes.
Did you notice the `map_async` method? There are essentially four different methods available for mapping tasks to processes. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account:
| Method | Multi-args | Concurrency | Blocking | Ordered-results |
|---|---|---|---|---|
| `map` | No | Yes | Yes | Yes |
| `map_async` | No | No | No | Yes |
| `apply` | Yes | No | Yes | No |
| `apply_async` | Yes | Yes | No | No |
Without both `close` and `join`, garbage collection may not occur, which could lead to a memory leak.
1. `close` tells the pool not to accept any new tasks
2. `join` tells the pool to exit after all tasks have completed
> **Following along?** Grab the [Project Gutenberg](http://www.gutenberg.org/) sample text files from the "data" directory in the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo, and then add an "output" directory.
>
> Your project directory should look like this:
> ```
> βββ data
βΒ Β βββ dracula.txt
βΒ Β βββ frankenstein.txt
βΒ Β βββ heart-of-darkness.txt
βΒ Β βββ pride-and-prejudice.txt
βββ output
βββ simple_pool.py
βββ tasks.py
> ```
It should take less than a second to run:
```
```
> This script ran on a i9 Macbook Pro with 16 cores.
So, the multiprocessing `Pool` class handles the queuing logic for us. It's perfect for running CPU-bound tasks or really any job that can be broken up and distributed independently. If you need more control over the queue or need to share data between multiple processes, you may want to look at the `Queue` class.
> For more on this along with the difference between parallelism (multiprocessing) and concurrency (multithreading), review the [Speeding Up Python with Concurrency, Parallelism, and asyncio](https://testdriven.io/blog/concurrency-parallelism-asyncio/) article.
## Multiprocessing Queue
Let's look at a simple example:
```
```
The [Queue](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes) class, also from the multiprocessing library, is a basic FIFO (first in, first out) data structure. It's similar to the [queue.Queue](https://docs.python.org/3/library/queue.html#queue.Queue) class, but designed for interprocess communication. We used `put` to enqueue an item to the queue and `get` to dequeue an item.
> Check out the `Queue` [source code](https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py) for a better understanding of the mechanics of this class.
Now, let's look at more advanced example:
```
```
Here, we enqueued 40 tasks (ten for each text file) to the queue, created separate processes via the `Process` class, used `start` to start running the processes, and, finally, used `join` to complete the processes.
It should still take less than a second to run.
> **Challenge**: Check your understanding by adding another queue to hold completed tasks. You can enqueue them within the `process_tasks` function.
## Logging
The multiprocessing library provides support for logging as well:
```
```
To test, change `task_queue.put("dracula.txt")` to `task_queue.put("drakula.txt")`. You should see the following error outputted ten times in the terminal:
```
```
Want to log to disc?
```
```
Again, cause an error by altering one of the file names, and then run it. Take a look at *process.log*. It's not quite as organized as it should be since the Python logging library does not use shared locks between processes. To get around this, let's have each process write to its own file. To keep things organized, add a logs directory to your project folder:
```
```
## Redis
Moving right along, instead of using an in-memory queue, let's add [Redis](https://redis.io/) into the mix.
> **Following along?** [Download](https://redis.io/download) and install Redis if you do not already have it installed. Then, install the Python [interface](https://pypi.org/project/redis/):
> ```
> (env)$ pip install redis==4.5.5
> ```
We'll break the logic up into four files:
1. *redis\_queue.py* creates new queues and tasks via the `SimpleQueue` and `SimpleTask` classes, respectively.
2. *redis\_queue\_client* enqueues new tasks.
3. *redis\_queue\_worker* dequeues and processes tasks.
4. *redis\_queue\_server* spawns worker processes.
```
```
Here, we defined two classes, `SimpleQueue` and `SimpleTask`:
1. `SimpleQueue` creates a new queue and enqueues, dequeues, and gets the length of the queue.
2. `SimpleTask` creates new tasks, which are used by the instance of the `SimpleQueue` class to enqueue new tasks, and processes new tasks.
> Curious about `lpush()`, `brpop()`, and `llen()`? Refer to the [Command reference](https://redis.io/commands) page. (`The brpop()` function is particularly cool because it blocks the connection until a value exists to be popped!)
```
```
This module will create a new instance of Redis and the `SimpleQueue` class. It will then enqueue 40 tasks.
```
```
If a task is available, the `dequeue` method is called, which then de-serializes the task and calls the `process_task` method (in *redis\_queue.py*).
```
```
The `run` method spawns four new worker processes.
> You probably donβt want four processes running at once all the time, but there may be times that you will need four or more processes. Think about how you could programmatically spin up and down additional workers based on demand.
To test, run *redis\_queue\_server.py* and *redis\_queue\_client.py* in separate terminal windows:
[](https://testdriven.io/static/images/blog/simple-task-queue/example.png)
[![example]()](https://testdriven.io/static/images/gifs/blog/simple-task-queue/example.gif)
> Check your understanding again by adding logging to the above application.
## Conclusion
In this tutorial, we looked at a number of asynchronous task queue implementations in Python. If the requirements are simple enough, it may be easier to develop a queue in this manner. That said, if you're looking for more advanced features -- like task scheduling, batch processing, job prioritization, and retrying of failed tasks -- you should look into a full-blown solution. Check out [Celery](https://docs.celeryq.dev/en/stable/), [RQ](http://python-rq.org/), or [Huey](http://huey.readthedocs.io/).
Grab the final code from the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo. |
| Shard | 145 (laksa) |
| Root Hash | 3012241415413282145 |
| Unparsed URL | io,testdriven!/blog/developing-an-asynchronous-task-queue-in-python/ s443 |