πŸ•·οΈ Crawler Inspector

URL Lookup

Direct Parameter Lookup

Raw Queries and Responses

1. Shard Calculation

Query:
Response:
Calculated Shard: 145 (from laksa055)

2. Crawled Status Check

Query:
Response:

3. Robots.txt Check

Query:
Response:

4. Spam/Ban Check

Query:
Response:

5. Seen Status Check

ℹ️ Skipped - page is already crawled

πŸ“„
INDEXABLE
βœ…
CRAWLED
3 hours ago
πŸ€–
ROBOTS ALLOWED

Page Info Filters

FilterStatusConditionDetails
HTTP statusPASSdownload_http_code = 200HTTP 200
Age cutoffPASSdownload_stamp > now() - 6 MONTH0 months ago
History dropPASSisNull(history_drop_reason)No drop reason
Spam/banPASSfh_dont_index != 1 AND ml_spam_score = 0ml_spam_score=0
CanonicalPASSmeta_canonical IS NULL OR = '' OR = src_unparsedNot set

Page Details

PropertyValue
URLhttps://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/
Last Crawled2026-04-15 09:46:40 (3 hours ago)
First Indexed2019-01-02 11:02:41 (7 years ago)
HTTP Status Code200
Meta TitleDeveloping an Asynchronous Task Queue in Python | TestDriven.io
Meta DescriptionThis tutorial looks at how to implement several asynchronous task queues using the Python multiprocessing library and Redis.
Meta Canonicalnull
Boilerpipe Text
This tutorial looks at how to implement several asynchronous task queues using Python's multiprocessing library and Redis . Queue Data Structures Task Following along? Multiprocessing Pool Multiprocessing Queue Logging Redis Conclusion Queue Data Structures A queue is a First-In-First-Out ( FIFO ) data structure. an item is added at the tail ( enqueue ) an item is removed at the head ( dequeue ) You'll see this in practice as you code out the examples in this tutorial. Let's start by creating a basic task: # tasks.py import collections import json import os import sys import uuid from pathlib import Path from nltk.corpus import stopwords COMMON_WORDS = set ( stopwords . words ( "english" )) BASE_DIR = Path ( __file__ ) . resolve ( strict = True ) . parent DATA_DIR = Path ( BASE_DIR ) . joinpath ( "data" ) OUTPUT_DIR = Path ( BASE_DIR ) . joinpath ( "output" ) def save_file ( filename , data ): random_str = uuid . uuid4 () . hex outfile = f " { filename } _ { random_str } .txt" with open ( Path ( OUTPUT_DIR ) . joinpath ( outfile ), "w" ) as outfile : outfile . write ( data ) def get_word_counts ( filename ): wordcount = collections . Counter () # get counts with open ( Path ( DATA_DIR ) . joinpath ( filename ), "r" ) as f : for line in f : wordcount . update ( line . split ()) for word in set ( COMMON_WORDS ): del wordcount [ word ] # save file save_file ( filename , json . dumps ( dict ( wordcount . most_common ( 20 )))) proc = os . getpid () print ( f "Processed { filename } with process id: { proc } " ) if __name__ == "__main__" : get_word_counts ( sys . argv [ 1 ]) So, get_word_counts finds the twenty most frequent words from a given text file and saves them to an output file. It also prints the current process identifier (or pid) using Python's os library. Following along? Create a project directory along with a virtual environment. Then, use pip to install NLTK : ( env ) $ pip install nltk == 3 .8.1 Once installed, invoke the Python shell and download the stopwords corpus : >>> import nltk >>> nltk.download ( "stopwords" ) [ nltk_data ] Downloading package stopwords to [ nltk_data ] /Users/michael/nltk_data... [ nltk_data ] Unzipping corpora/stopwords.zip. True If you experience an SSL error refer to this article. Example fix: >>> import nltk >>> nltk.download ( 'stopwords' ) [ nltk_data ] Error loading stopwords: <urlopen error [ SSL: [ nltk_data ] CERTIFICATE_VERIFY_FAILED ] certificate verify failed: [ nltk_data ] unable to get local issuer certificate ( _ssl.c:1056 ) > False >>> import ssl >>> try: ... _create_unverified_https_context = ssl._create_unverified_context ... except AttributeError: ... pass ... else : ... ssl._create_default_https_context = _create_unverified_https_context ... >>> nltk.download ( 'stopwords' ) [ nltk_data ] Downloading package stopwords to [ nltk_data ] /Users/michael.herman/nltk_data... [ nltk_data ] Unzipping corpora/stopwords.zip. True Add the above tasks.py file to your project directory but don't run it quite yet. Multiprocessing Pool We can run this task in parallel using the multiprocessing library: # simple_pool.py import multiprocessing import time from tasks import get_word_counts PROCESSES = multiprocessing . cpu_count () - 1 def run (): print ( f "Running with { PROCESSES } processes!" ) start = time . time () with multiprocessing . Pool ( PROCESSES ) as p : p . map_async ( get_word_counts , [ "pride-and-prejudice.txt" , "heart-of-darkness.txt" , "frankenstein.txt" , "dracula.txt" , ], ) # clean up p . close () p . join () print ( f "Time taken = { time . time () - start : .10f } " ) if __name__ == "__main__" : run () Here, using the Pool class, we processed four tasks with two processes. Did you notice the map_async method? There are essentially four different methods available for mapping tasks to processes. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account: Method Multi-args Concurrency Blocking Ordered-results map No Yes Yes Yes map_async No No No Yes apply Yes No Yes No apply_async Yes Yes No No Without both close and join , garbage collection may not occur, which could lead to a memory leak. close tells the pool not to accept any new tasks join tells the pool to exit after all tasks have completed Following along? Grab the Project Gutenberg sample text files from the "data" directory in the simple-task-queue repo, and then add an "output" directory. Your project directory should look like this: β”œβ”€β”€ data β”‚Β Β  β”œβ”€β”€ dracula.txt β”‚Β Β  β”œβ”€β”€ frankenstein.txt β”‚Β Β  β”œβ”€β”€ heart-of-darkness.txt β”‚Β Β  └── pride-and-prejudice.txt β”œβ”€β”€ output β”œβ”€β”€ simple_pool.py └── tasks.py It should take less than a second to run: ( env ) $ python simple_pool.py Running with 15 processes! Processed heart-of-darkness.txt with process id: 50510 Processed frankenstein.txt with process id: 50515 Processed pride-and-prejudice.txt with process id: 50511 Processed dracula.txt with process id: 50512 Time taken = 0 .6383581161 This script ran on a i9 Macbook Pro with 16 cores. So, the multiprocessing Pool class handles the queuing logic for us. It's perfect for running CPU-bound tasks or really any job that can be broken up and distributed independently. If you need more control over the queue or need to share data between multiple processes, you may want to look at the Queue class. For more on this along with the difference between parallelism (multiprocessing) and concurrency (multithreading), review the Speeding Up Python with Concurrency, Parallelism, and asyncio article. Multiprocessing Queue Let's look at a simple example: # simple_queue.py import multiprocessing def run (): books = [ "pride-and-prejudice.txt" , "heart-of-darkness.txt" , "frankenstein.txt" , "dracula.txt" , ] queue = multiprocessing . Queue () print ( "Enqueuing..." ) for book in books : print ( book ) queue . put ( book ) print ( " \n Dequeuing..." ) while not queue . empty (): print ( queue . get ()) if __name__ == "__main__" : run () The Queue class, also from the multiprocessing library, is a basic FIFO (first in, first out) data structure. It's similar to the queue.Queue class, but designed for interprocess communication. We used put to enqueue an item to the queue and get to dequeue an item. Check out the Queue source code for a better understanding of the mechanics of this class. Now, let's look at more advanced example: # simple_task_queue.py import multiprocessing import time from tasks import get_word_counts PROCESSES = multiprocessing . cpu_count () - 1 NUMBER_OF_TASKS = 10 def process_tasks ( task_queue ): while not task_queue . empty (): book = task_queue . get () get_word_counts ( book ) return True def add_tasks ( task_queue , number_of_tasks ): for num in range ( number_of_tasks ): task_queue . put ( "pride-and-prejudice.txt" ) task_queue . put ( "heart-of-darkness.txt" ) task_queue . put ( "frankenstein.txt" ) task_queue . put ( "dracula.txt" ) return task_queue def run (): empty_task_queue = multiprocessing . Queue () full_task_queue = add_tasks ( empty_task_queue , NUMBER_OF_TASKS ) processes = [] print ( f "Running with { PROCESSES } processes!" ) start = time . time () for n in range ( PROCESSES ): p = multiprocessing . Process ( target = process_tasks , args = ( full_task_queue ,)) processes . append ( p ) p . start () for p in processes : p . join () print ( f "Time taken = { time . time () - start : .10f } " ) if __name__ == "__main__" : run () Here, we enqueued 40 tasks (ten for each text file) to the queue, created separate processes via the Process class, used start to start running the processes, and, finally, used join to complete the processes. It should still take less than a second to run. Challenge : Check your understanding by adding another queue to hold completed tasks. You can enqueue them within the process_tasks function. Logging The multiprocessing library provides support for logging as well: # simple_task_queue_logging.py import logging import multiprocessing import os import time from tasks import get_word_counts PROCESSES = multiprocessing . cpu_count () - 1 NUMBER_OF_TASKS = 10 def process_tasks ( task_queue ): logger = multiprocessing . get_logger () proc = os . getpid () while not task_queue . empty (): try : book = task_queue . get () get_word_counts ( book ) except Exception as e : logger . error ( e ) logger . info ( f "Process { proc } completed successfully" ) return True def add_tasks ( task_queue , number_of_tasks ): for num in range ( number_of_tasks ): task_queue . put ( "pride-and-prejudice.txt" ) task_queue . put ( "heart-of-darkness.txt" ) task_queue . put ( "frankenstein.txt" ) task_queue . put ( "dracula.txt" ) return task_queue def run (): empty_task_queue = multiprocessing . Queue () full_task_queue = add_tasks ( empty_task_queue , NUMBER_OF_TASKS ) processes = [] print ( f "Running with { PROCESSES } processes!" ) start = time . time () for w in range ( PROCESSES ): p = multiprocessing . Process ( target = process_tasks , args = ( full_task_queue ,)) processes . append ( p ) p . start () for p in processes : p . join () print ( f "Time taken = { time . time () - start : .10f } " ) if __name__ == "__main__" : multiprocessing . log_to_stderr ( logging . ERROR ) run () To test, change task_queue.put("dracula.txt") to task_queue.put("drakula.txt") . You should see the following error outputted ten times in the terminal: [ ERROR/Process-4 ] [ Errno 2 ] No such file or directory: 'simple-task-queue/data/drakula.txt' Want to log to disc? # simple_task_queue_logging.py import logging import multiprocessing import os import time from tasks import get_word_counts PROCESSES = multiprocessing . cpu_count () - 1 NUMBER_OF_TASKS = 10 def create_logger (): logger = multiprocessing . get_logger () logger . setLevel ( logging . INFO ) fh = logging . FileHandler ( "process.log" ) fmt = " %(asctime)s - %(levelname)s - %(message)s " formatter = logging . Formatter ( fmt ) fh . setFormatter ( formatter ) logger . addHandler ( fh ) return logger def process_tasks ( task_queue ): logger = create_logger () proc = os . getpid () while not task_queue . empty (): try : book = task_queue . get () get_word_counts ( book ) except Exception as e : logger . error ( e ) logger . info ( f "Process { proc } completed successfully" ) return True def add_tasks ( task_queue , number_of_tasks ): for num in range ( number_of_tasks ): task_queue . put ( "pride-and-prejudice.txt" ) task_queue . put ( "heart-of-darkness.txt" ) task_queue . put ( "frankenstein.txt" ) task_queue . put ( "dracula.txt" ) return task_queue def run (): empty_task_queue = multiprocessing . Queue () full_task_queue = add_tasks ( empty_task_queue , NUMBER_OF_TASKS ) processes = [] print ( f "Running with { PROCESSES } processes!" ) start = time . time () for w in range ( PROCESSES ): p = multiprocessing . Process ( target = process_tasks , args = ( full_task_queue ,)) processes . append ( p ) p . start () for p in processes : p . join () print ( f "Time taken = { time . time () - start : .10f } " ) if __name__ == "__main__" : run () Again, cause an error by altering one of the file names, and then run it. Take a look at process.log . It's not quite as organized as it should be since the Python logging library does not use shared locks between processes. To get around this, let's have each process write to its own file. To keep things organized, add a logs directory to your project folder: # simple_task_queue_logging_separate_files.py import logging import multiprocessing import os import time from tasks import get_word_counts PROCESSES = multiprocessing . cpu_count () - 1 NUMBER_OF_TASKS = 10 def create_logger ( pid ): logger = multiprocessing . get_logger () logger . setLevel ( logging . INFO ) fh = logging . FileHandler ( f "logs/process_ { pid } .log" ) fmt = " %(asctime)s - %(levelname)s - %(message)s " formatter = logging . Formatter ( fmt ) fh . setFormatter ( formatter ) logger . addHandler ( fh ) return logger def process_tasks ( task_queue ): proc = os . getpid () logger = create_logger ( proc ) while not task_queue . empty (): try : book = task_queue . get () get_word_counts ( book ) except Exception as e : logger . error ( e ) logger . info ( f "Process { proc } completed successfully" ) return True def add_tasks ( task_queue , number_of_tasks ): for num in range ( number_of_tasks ): task_queue . put ( "pride-and-prejudice.txt" ) task_queue . put ( "heart-of-darkness.txt" ) task_queue . put ( "frankenstein.txt" ) task_queue . put ( "dracula.txt" ) return task_queue def run (): empty_task_queue = multiprocessing . Queue () full_task_queue = add_tasks ( empty_task_queue , NUMBER_OF_TASKS ) processes = [] print ( f "Running with { PROCESSES } processes!" ) start = time . time () for w in range ( PROCESSES ): p = multiprocessing . Process ( target = process_tasks , args = ( full_task_queue ,)) processes . append ( p ) p . start () for p in processes : p . join () print ( f "Time taken = { time . time () - start : .10f } " ) if __name__ == "__main__" : run () Redis Moving right along, instead of using an in-memory queue, let's add Redis into the mix. Following along? Download and install Redis if you do not already have it installed. Then, install the Python interface : (env)$ pip install redis == 4 .5.5 We'll break the logic up into four files: redis_queue.py creates new queues and tasks via the SimpleQueue and SimpleTask classes, respectively. redis_queue_client enqueues new tasks. redis_queue_worker dequeues and processes tasks. redis_queue_server spawns worker processes. # redis_queue.py import pickle import uuid class SimpleQueue ( object ): def __init__ ( self , conn , name ): self . conn = conn self . name = name def enqueue ( self , func , * args ): task = SimpleTask ( func , * args ) serialized_task = pickle . dumps ( task , protocol = pickle . HIGHEST_PROTOCOL ) self . conn . lpush ( self . name , serialized_task ) return task . id def dequeue ( self ): _ , serialized_task = self . conn . brpop ( self . name ) task = pickle . loads ( serialized_task ) task . process_task () return task def get_length ( self ): return self . conn . llen ( self . name ) class SimpleTask ( object ): def __init__ ( self , func , * args ): self . id = str ( uuid . uuid4 ()) self . func = func self . args = args def process_task ( self ): self . func ( * self . args ) Here, we defined two classes, SimpleQueue and SimpleTask : SimpleQueue creates a new queue and enqueues, dequeues, and gets the length of the queue. SimpleTask creates new tasks, which are used by the instance of the SimpleQueue class to enqueue new tasks, and processes new tasks. Curious about lpush() , brpop() , and llen() ? Refer to the Command reference page. ( The brpop() function is particularly cool because it blocks the connection until a value exists to be popped!) # redis_queue_client.py import redis from redis_queue import SimpleQueue from tasks import get_word_counts NUMBER_OF_TASKS = 10 if __name__ == "__main__" : r = redis . Redis () queue = SimpleQueue ( r , "sample" ) count = 0 for num in range ( NUMBER_OF_TASKS ): queue . enqueue ( get_word_counts , "pride-and-prejudice.txt" ) queue . enqueue ( get_word_counts , "heart-of-darkness.txt" ) queue . enqueue ( get_word_counts , "frankenstein.txt" ) queue . enqueue ( get_word_counts , "dracula.txt" ) count += 4 print ( f "Enqueued { count } tasks!" ) This module will create a new instance of Redis and the SimpleQueue class. It will then enqueue 40 tasks. # redis_queue_worker.py import redis from redis_queue import SimpleQueue def worker (): r = redis . Redis () queue = SimpleQueue ( r , "sample" ) if queue . get_length () > 0 : queue . dequeue () else : print ( "No tasks in the queue" ) if __name__ == "__main__" : worker () If a task is available, the dequeue method is called, which then de-serializes the task and calls the process_task method (in redis_queue.py ). # redis_queue_server.py import multiprocessing from redis_queue_worker import worker PROCESSES = 4 def run (): processes = [] print ( f "Running with { PROCESSES } processes!" ) while True : for w in range ( PROCESSES ): p = multiprocessing . Process ( target = worker ) processes . append ( p ) p . start () for p in processes : p . join () if __name__ == "__main__" : run () The run method spawns four new worker processes. You probably don’t want four processes running at once all the time, but there may be times that you will need four or more processes. Think about how you could programmatically spin up and down additional workers based on demand. To test, run redis_queue_server.py and redis_queue_client.py in separate terminal windows: Check your understanding again by adding logging to the above application. Conclusion In this tutorial, we looked at a number of asynchronous task queue implementations in Python. If the requirements are simple enough, it may be easier to develop a queue in this manner. That said, if you're looking for more advanced features -- like task scheduling, batch processing, job prioritization, and retrying of failed tasks -- you should look into a full-blown solution. Check out Celery , RQ , or Huey . Grab the final code from the simple-task-queue repo.
Markdown
![](https://www.facebook.com/tr?id=516228106408212&ev=PageView&noscript=1) [![testdriven.io](https://testdriven.io/static/images/test_driven_io_full_logo_white_text.4a6302a91a54.svg)](https://testdriven.io/) - [Courses](https://testdriven.io/courses/) - [Bundles](https://testdriven.io/bundles/) - [Blog](https://testdriven.io/blog/) - [Guides](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/) [Complete Python](https://testdriven.io/guides/complete-python/) [Django and Celery](https://testdriven.io/guides/django-celery/) [Deep Dive Into Flask](https://testdriven.io/guides/flask-deep-dive/) - [More](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/) [Support and Consulting](https://testdriven.io/support/) [What is Test-Driven Development?](https://testdriven.io/test-driven-development/) [Testimonials](https://testdriven.io/testimonials/) [Open Source Donations](https://testdriven.io/opensource/) [About Us](https://testdriven.io/about/) [Meet the Authors](https://testdriven.io/authors/) [Tips and Tricks](https://testdriven.io/tips/) - [Sign In](https://testdriven.io/accounts/login/?next=/blog/developing-an-asynchronous-task-queue-in-python/) [Sign Up](https://testdriven.io/accounts/signup/?next=/blog/developing-an-asynchronous-task-queue-in-python/) - [Sign In](https://testdriven.io/accounts/login/) - [Sign Up](https://testdriven.io/accounts/signup/) # Developing an Asynchronous Task Queue in Python Posted by [![Michael Herman](https://testdriven.io/static/images/authors/herman.png) Michael Herman](https://testdriven.io/authors/herman/) Last updated June 21st, 2023 ## Share this tutorial - [Twitter](https://twitter.com/intent/tweet/?text=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python%20from%20%40TestDrivenio%20by%20%40mikeherman&&url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/) - [Reddit](https://reddit.com/submit/?url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&resubmit=true&title=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python) - [Hacker News](https://news.ycombinator.com/submitlink?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&t=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python) - [Facebook](https://facebook.com/sharer/sharer.php?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/) This tutorial looks at how to implement several asynchronous task queues using Python's [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library and [Redis](https://redis.io/). ## Contents - [Queue Data Structures](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#queue-data-structures) - [Task](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#task) - [Following along?](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#following-along) - [Multiprocessing Pool](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-pool) - [Multiprocessing Queue](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-queue) - [Logging](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#logging) - [Redis](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#redis) - [Conclusion](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#conclusion) ## Queue Data Structures A [queue](https://en.wikipedia.org/wiki/Queue_\(abstract_data_type\)) is a [First-In-First-Out](https://en.wikipedia.org/wiki/FIFO_\(computing_and_electronics\)) (**FIFO**) data structure. 1. an item is added at the tail (**enqueue**) 2. an item is removed at the head (**dequeue**) ![queue]() You'll see this in practice as you code out the examples in this tutorial. ## Task Let's start by creating a basic task: ``` ``` So, `get_word_counts` finds the twenty most frequent words from a given text file and saves them to an output file. It also prints the current process identifier (or pid) using Python's [os](https://docs.python.org/3/library/os.html) library. ### Following along? Create a project directory along with a virtual environment. Then, use pip to install [NLTK](https://pypi.org/project/nltk/): ``` (env)$ pip install nltk==3.8.1 ``` Once installed, invoke the Python shell and download the `stopwords` [corpus](https://www.nltk.org/data.html): ``` ``` > If you experience an SSL error refer to [this](https://stackoverflow.com/questions/41348621/ssl-error-downloading-nltk-data) article. > > Example fix: > ``` > >>> import nltk >>> nltk.download('stopwords') [nltk_data] Error loading stopwords: <urlopen error [SSL: [nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed: [nltk_data] unable to get local issuer certificate (_ssl.c:1056)> False >>> import ssl >>> try: ... _create_unverified_https_context = ssl._create_unverified_context ... except AttributeError: ... pass ... else: ... ssl._create_default_https_context = _create_unverified_https_context ... >>> nltk.download('stopwords') [nltk_data] Downloading package stopwords to [nltk_data] /Users/michael.herman/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip. True > ``` Add the above *tasks.py* file to your project directory but don't run it quite yet. ## Multiprocessing Pool We can run this task in parallel using the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library: ``` ``` Here, using the [Pool](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) class, we processed four tasks with two processes. Did you notice the `map_async` method? There are essentially four different methods available for mapping tasks to processes. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account: | Method | Multi-args | Concurrency | Blocking | Ordered-results | |---|---|---|---|---| | `map` | No | Yes | Yes | Yes | | `map_async` | No | No | No | Yes | | `apply` | Yes | No | Yes | No | | `apply_async` | Yes | Yes | No | No | Without both `close` and `join`, garbage collection may not occur, which could lead to a memory leak. 1. `close` tells the pool not to accept any new tasks 2. `join` tells the pool to exit after all tasks have completed > **Following along?** Grab the [Project Gutenberg](http://www.gutenberg.org/) sample text files from the "data" directory in the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo, and then add an "output" directory. > > Your project directory should look like this: > ``` > β”œβ”€β”€ data β”‚Β Β  β”œβ”€β”€ dracula.txt β”‚Β Β  β”œβ”€β”€ frankenstein.txt β”‚Β Β  β”œβ”€β”€ heart-of-darkness.txt β”‚Β Β  └── pride-and-prejudice.txt β”œβ”€β”€ output β”œβ”€β”€ simple_pool.py └── tasks.py > ``` It should take less than a second to run: ``` ``` > This script ran on a i9 Macbook Pro with 16 cores. So, the multiprocessing `Pool` class handles the queuing logic for us. It's perfect for running CPU-bound tasks or really any job that can be broken up and distributed independently. If you need more control over the queue or need to share data between multiple processes, you may want to look at the `Queue` class. > For more on this along with the difference between parallelism (multiprocessing) and concurrency (multithreading), review the [Speeding Up Python with Concurrency, Parallelism, and asyncio](https://testdriven.io/blog/concurrency-parallelism-asyncio/) article. ## Multiprocessing Queue Let's look at a simple example: ``` ``` The [Queue](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes) class, also from the multiprocessing library, is a basic FIFO (first in, first out) data structure. It's similar to the [queue.Queue](https://docs.python.org/3/library/queue.html#queue.Queue) class, but designed for interprocess communication. We used `put` to enqueue an item to the queue and `get` to dequeue an item. > Check out the `Queue` [source code](https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py) for a better understanding of the mechanics of this class. Now, let's look at more advanced example: ``` ``` Here, we enqueued 40 tasks (ten for each text file) to the queue, created separate processes via the `Process` class, used `start` to start running the processes, and, finally, used `join` to complete the processes. It should still take less than a second to run. > **Challenge**: Check your understanding by adding another queue to hold completed tasks. You can enqueue them within the `process_tasks` function. ## Logging The multiprocessing library provides support for logging as well: ``` ``` To test, change `task_queue.put("dracula.txt")` to `task_queue.put("drakula.txt")`. You should see the following error outputted ten times in the terminal: ``` ``` Want to log to disc? ``` ``` Again, cause an error by altering one of the file names, and then run it. Take a look at *process.log*. It's not quite as organized as it should be since the Python logging library does not use shared locks between processes. To get around this, let's have each process write to its own file. To keep things organized, add a logs directory to your project folder: ``` ``` ## Redis Moving right along, instead of using an in-memory queue, let's add [Redis](https://redis.io/) into the mix. > **Following along?** [Download](https://redis.io/download) and install Redis if you do not already have it installed. Then, install the Python [interface](https://pypi.org/project/redis/): > ``` > (env)$ pip install redis==4.5.5 > ``` We'll break the logic up into four files: 1. *redis\_queue.py* creates new queues and tasks via the `SimpleQueue` and `SimpleTask` classes, respectively. 2. *redis\_queue\_client* enqueues new tasks. 3. *redis\_queue\_worker* dequeues and processes tasks. 4. *redis\_queue\_server* spawns worker processes. ``` ``` Here, we defined two classes, `SimpleQueue` and `SimpleTask`: 1. `SimpleQueue` creates a new queue and enqueues, dequeues, and gets the length of the queue. 2. `SimpleTask` creates new tasks, which are used by the instance of the `SimpleQueue` class to enqueue new tasks, and processes new tasks. > Curious about `lpush()`, `brpop()`, and `llen()`? Refer to the [Command reference](https://redis.io/commands) page. (`The brpop()` function is particularly cool because it blocks the connection until a value exists to be popped!) ``` ``` This module will create a new instance of Redis and the `SimpleQueue` class. It will then enqueue 40 tasks. ``` ``` If a task is available, the `dequeue` method is called, which then de-serializes the task and calls the `process_task` method (in *redis\_queue.py*). ``` ``` The `run` method spawns four new worker processes. > You probably don’t want four processes running at once all the time, but there may be times that you will need four or more processes. Think about how you could programmatically spin up and down additional workers based on demand. To test, run *redis\_queue\_server.py* and *redis\_queue\_client.py* in separate terminal windows: [![example]()](https://testdriven.io/static/images/blog/simple-task-queue/example.png) [![example]()](https://testdriven.io/static/images/gifs/blog/simple-task-queue/example.gif) > Check your understanding again by adding logging to the above application. ## Conclusion In this tutorial, we looked at a number of asynchronous task queue implementations in Python. If the requirements are simple enough, it may be easier to develop a queue in this manner. That said, if you're looking for more advanced features -- like task scheduling, batch processing, job prioritization, and retrying of failed tasks -- you should look into a full-blown solution. Check out [Celery](https://docs.celeryq.dev/en/stable/), [RQ](http://python-rq.org/), or [Huey](http://huey.readthedocs.io/). Grab the final code from the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo. [Python](https://testdriven.io/blog/topics/python/) [Task Queue](https://testdriven.io/blog/topics/task-queue/) ## [Michael Herman](https://testdriven.io/authors/herman/) [![Michael Herman](https://testdriven.io/static/images/authors/herman.png)](https://testdriven.io/authors/herman/) Michael is a software engineer and educator who lives and works in the Denver/Boulder area. He is the co-founder/author of [Real Python](https://realpython.com/). Besides development, he enjoys building financial models, tech writing, content marketing, and teaching. ## Share this tutorial ## Share this tutorial - [Twitter](https://twitter.com/intent/tweet/?text=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python%20from%20%40TestDrivenio%20by%20%40mikeherman&&url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/) - [Reddit](https://reddit.com/submit/?url=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&resubmit=true&title=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python) - [Hacker News](https://news.ycombinator.com/submitlink?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/&t=Developing%20an%20Asynchronous%20Task%20Queue%20in%20Python) - [Facebook](https://facebook.com/sharer/sharer.php?u=https%3A//testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/) Featured Course ## [Creating an HTTP Load Balancer in Python](https://testdriven.io/courses/http-load-balancer/) In this course, you'll learn how to implement a load balancer in Python using Test-Driven Development. [Buy Now \$20](https://testdriven.io/payments/http-load-balancer/) [View Course](https://testdriven.io/courses/http-load-balancer/) ### Tutorial Topics [API](https://testdriven.io/blog/topics/api/) [Architecture](https://testdriven.io/blog/topics/architecture/) [AWS](https://testdriven.io/blog/topics/aws/) [DevOps](https://testdriven.io/blog/topics/devops/) [Django](https://testdriven.io/blog/topics/django/) [Django REST Framework](https://testdriven.io/blog/topics/django-rest-framework/) [Docker](https://testdriven.io/blog/topics/docker/) [FastAPI](https://testdriven.io/blog/topics/fastapi/) [Flask](https://testdriven.io/blog/topics/flask/) [Front-end](https://testdriven.io/blog/topics/front-end/) [Heroku](https://testdriven.io/blog/topics/heroku/) [Kubernetes](https://testdriven.io/blog/topics/kubernetes/) [Machine Learning](https://testdriven.io/blog/topics/machine-learning/) [Python](https://testdriven.io/blog/topics/python/) [React](https://testdriven.io/blog/topics/react/) [Task Queue](https://testdriven.io/blog/topics/task-queue/) [Testing](https://testdriven.io/blog/topics/testing/) [Vue](https://testdriven.io/blog/topics/vue/) [Web Scraping](https://testdriven.io/blog/topics/web-scraping/) ### Table of Contents Featured Course ## [Creating an HTTP Load Balancer in Python](https://testdriven.io/courses/http-load-balancer/) In this course, you'll learn how to implement a load balancer in Python using Test-Driven Development. [Buy Now \$20](https://testdriven.io/payments/http-load-balancer/) [View Course](https://testdriven.io/courses/http-load-balancer/) ## Recommended Tutorials Loading... ## Stay Sharp with Course Updates Join our mailing list to be notified about updates and new releases. - Learn - [Courses](https://testdriven.io/courses/) - [Bundles](https://testdriven.io/bundles/) - [Blog](https://testdriven.io/blog/) - Guides - [Complete Python](https://testdriven.io/guides/complete-python/) - [Django and Celery](https://testdriven.io/guides/django-celery/) - [Deep Dive Into Flask](https://testdriven.io/guides/flask-deep-dive/) - About TestDriven.io - [Support and Consulting](https://testdriven.io/support/) - [What is Test-Driven Development?](https://testdriven.io/test-driven-development/) - [Testimonials](https://testdriven.io/testimonials/) - [Open Source Donations](https://testdriven.io/opensource/) - [About Us](https://testdriven.io/about/) - [Meet the Authors](https://testdriven.io/authors/) - [Tips and Tricks](https://testdriven.io/tips/) ### TestDriven.io is a proud supporter of open source **10% of profits** from each of our [FastAPI](https://testdriven.io/courses/topics/fastapi/) courses and our [Flask Web Development](https://testdriven.io/courses/learn-flask/) course will be donated to the FastAPI and Flask teams, respectively. [Follow our contributions](https://testdriven.io/opensource/) Β© Copyright 2017 - 2026 TestDriven Labs. Developed by [Michael Herman](http://mherman.org/). [Follow @testdrivenio](https://twitter.com/testdrivenio?ref_src=twsrc%5Etfw) ![testdriven.io](https://testdriven.io/static/images/test_driven_io_full_logo_white_text.4a6302a91a54.svg) Feedback ##### Send Us Feedback Γ— #### Close
Readable Markdown
This tutorial looks at how to implement several asynchronous task queues using Python's [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library and [Redis](https://redis.io/). - [Queue Data Structures](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#queue-data-structures) - [Task](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#task) - [Following along?](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#following-along) - [Multiprocessing Pool](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-pool) - [Multiprocessing Queue](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#multiprocessing-queue) - [Logging](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#logging) - [Redis](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#redis) - [Conclusion](https://testdriven.io/blog/developing-an-asynchronous-task-queue-in-python/#conclusion) ## Queue Data Structures A [queue](https://en.wikipedia.org/wiki/Queue_\(abstract_data_type\)) is a [First-In-First-Out](https://en.wikipedia.org/wiki/FIFO_\(computing_and_electronics\)) (**FIFO**) data structure. 1. an item is added at the tail (**enqueue**) 2. an item is removed at the head (**dequeue**) ![queue](https://testdriven.io/static/images/blog/simple-task-queue/queue.png) You'll see this in practice as you code out the examples in this tutorial. Let's start by creating a basic task: ``` ``` So, `get_word_counts` finds the twenty most frequent words from a given text file and saves them to an output file. It also prints the current process identifier (or pid) using Python's [os](https://docs.python.org/3/library/os.html) library. ### Following along? Create a project directory along with a virtual environment. Then, use pip to install [NLTK](https://pypi.org/project/nltk/): ``` (env)$ pip install nltk==3.8.1 ``` Once installed, invoke the Python shell and download the `stopwords` [corpus](https://www.nltk.org/data.html): ``` ``` > If you experience an SSL error refer to [this](https://stackoverflow.com/questions/41348621/ssl-error-downloading-nltk-data) article. > > Example fix: > ``` > >>> import nltk >>> nltk.download('stopwords') [nltk_data] Error loading stopwords: <urlopen error [SSL: [nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed: [nltk_data] unable to get local issuer certificate (_ssl.c:1056)> False >>> import ssl >>> try: ... _create_unverified_https_context = ssl._create_unverified_context ... except AttributeError: ... pass ... else: ... ssl._create_default_https_context = _create_unverified_https_context ... >>> nltk.download('stopwords') [nltk_data] Downloading package stopwords to [nltk_data] /Users/michael.herman/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip. True > ``` Add the above *tasks.py* file to your project directory but don't run it quite yet. ## Multiprocessing Pool We can run this task in parallel using the [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) library: ``` ``` Here, using the [Pool](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool) class, we processed four tasks with two processes. Did you notice the `map_async` method? There are essentially four different methods available for mapping tasks to processes. When choosing one, you have to take multi-args, concurrency, blocking, and ordering into account: | Method | Multi-args | Concurrency | Blocking | Ordered-results | |---|---|---|---|---| | `map` | No | Yes | Yes | Yes | | `map_async` | No | No | No | Yes | | `apply` | Yes | No | Yes | No | | `apply_async` | Yes | Yes | No | No | Without both `close` and `join`, garbage collection may not occur, which could lead to a memory leak. 1. `close` tells the pool not to accept any new tasks 2. `join` tells the pool to exit after all tasks have completed > **Following along?** Grab the [Project Gutenberg](http://www.gutenberg.org/) sample text files from the "data" directory in the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo, and then add an "output" directory. > > Your project directory should look like this: > ``` > β”œβ”€β”€ data β”‚Β Β  β”œβ”€β”€ dracula.txt β”‚Β Β  β”œβ”€β”€ frankenstein.txt β”‚Β Β  β”œβ”€β”€ heart-of-darkness.txt β”‚Β Β  └── pride-and-prejudice.txt β”œβ”€β”€ output β”œβ”€β”€ simple_pool.py └── tasks.py > ``` It should take less than a second to run: ``` ``` > This script ran on a i9 Macbook Pro with 16 cores. So, the multiprocessing `Pool` class handles the queuing logic for us. It's perfect for running CPU-bound tasks or really any job that can be broken up and distributed independently. If you need more control over the queue or need to share data between multiple processes, you may want to look at the `Queue` class. > For more on this along with the difference between parallelism (multiprocessing) and concurrency (multithreading), review the [Speeding Up Python with Concurrency, Parallelism, and asyncio](https://testdriven.io/blog/concurrency-parallelism-asyncio/) article. ## Multiprocessing Queue Let's look at a simple example: ``` ``` The [Queue](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes) class, also from the multiprocessing library, is a basic FIFO (first in, first out) data structure. It's similar to the [queue.Queue](https://docs.python.org/3/library/queue.html#queue.Queue) class, but designed for interprocess communication. We used `put` to enqueue an item to the queue and `get` to dequeue an item. > Check out the `Queue` [source code](https://github.com/python/cpython/blob/master/Lib/multiprocessing/queues.py) for a better understanding of the mechanics of this class. Now, let's look at more advanced example: ``` ``` Here, we enqueued 40 tasks (ten for each text file) to the queue, created separate processes via the `Process` class, used `start` to start running the processes, and, finally, used `join` to complete the processes. It should still take less than a second to run. > **Challenge**: Check your understanding by adding another queue to hold completed tasks. You can enqueue them within the `process_tasks` function. ## Logging The multiprocessing library provides support for logging as well: ``` ``` To test, change `task_queue.put("dracula.txt")` to `task_queue.put("drakula.txt")`. You should see the following error outputted ten times in the terminal: ``` ``` Want to log to disc? ``` ``` Again, cause an error by altering one of the file names, and then run it. Take a look at *process.log*. It's not quite as organized as it should be since the Python logging library does not use shared locks between processes. To get around this, let's have each process write to its own file. To keep things organized, add a logs directory to your project folder: ``` ``` ## Redis Moving right along, instead of using an in-memory queue, let's add [Redis](https://redis.io/) into the mix. > **Following along?** [Download](https://redis.io/download) and install Redis if you do not already have it installed. Then, install the Python [interface](https://pypi.org/project/redis/): > ``` > (env)$ pip install redis==4.5.5 > ``` We'll break the logic up into four files: 1. *redis\_queue.py* creates new queues and tasks via the `SimpleQueue` and `SimpleTask` classes, respectively. 2. *redis\_queue\_client* enqueues new tasks. 3. *redis\_queue\_worker* dequeues and processes tasks. 4. *redis\_queue\_server* spawns worker processes. ``` ``` Here, we defined two classes, `SimpleQueue` and `SimpleTask`: 1. `SimpleQueue` creates a new queue and enqueues, dequeues, and gets the length of the queue. 2. `SimpleTask` creates new tasks, which are used by the instance of the `SimpleQueue` class to enqueue new tasks, and processes new tasks. > Curious about `lpush()`, `brpop()`, and `llen()`? Refer to the [Command reference](https://redis.io/commands) page. (`The brpop()` function is particularly cool because it blocks the connection until a value exists to be popped!) ``` ``` This module will create a new instance of Redis and the `SimpleQueue` class. It will then enqueue 40 tasks. ``` ``` If a task is available, the `dequeue` method is called, which then de-serializes the task and calls the `process_task` method (in *redis\_queue.py*). ``` ``` The `run` method spawns four new worker processes. > You probably don’t want four processes running at once all the time, but there may be times that you will need four or more processes. Think about how you could programmatically spin up and down additional workers based on demand. To test, run *redis\_queue\_server.py* and *redis\_queue\_client.py* in separate terminal windows: [![example](https://testdriven.io/static/images/blog/simple-task-queue/example.png)](https://testdriven.io/static/images/blog/simple-task-queue/example.png) [![example]()](https://testdriven.io/static/images/gifs/blog/simple-task-queue/example.gif) > Check your understanding again by adding logging to the above application. ## Conclusion In this tutorial, we looked at a number of asynchronous task queue implementations in Python. If the requirements are simple enough, it may be easier to develop a queue in this manner. That said, if you're looking for more advanced features -- like task scheduling, batch processing, job prioritization, and retrying of failed tasks -- you should look into a full-blown solution. Check out [Celery](https://docs.celeryq.dev/en/stable/), [RQ](http://python-rq.org/), or [Huey](http://huey.readthedocs.io/). Grab the final code from the [simple-task-queue](https://github.com/testdrivenio/simple-task-queue) repo.
Shard145 (laksa)
Root Hash3012241415413282145
Unparsed URLio,testdriven!/blog/developing-an-asynchronous-task-queue-in-python/ s443