âšď¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.2 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c |
| Last Crawled | 2026-04-04 18:24:54 (5 days ago) |
| First Indexed | 2025-08-06 14:58:16 (8 months ago) |
| HTTP Status Code | 200 |
| Meta Title | Minimalistic blocking bounded queue in C++ | More Stina Blog! |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | While working on speeding up a Python extension written in C++, I needed a queue to distribute work among threads, and possibly to gather their results. What I had in mind was something akin to Pythonâs
queue
module â more specifically:
a
blocking
MPMC
queue with
fixed capacity
;
that supports
push
,
try_push
,
pop
, and
try_pop
;
that depends only on the
C++11 standard library
;
and has a simple and robust
header-only
implementation.
Since I could choose the size of the unit of work arbitrarily, the performance of the queue as measured by microbenchmarks wasnât a priority, as long as it wasnât abysmal. Likewise, the queue capacity would be fairly small, on the order of the number of cores on the system, so the queue wouldnât need to be particularly efficient at allocation â while a ring buffer would be optimal for storage, a linked list or deque would work as well. Other non-requirements included:
strong exception safety â while certainly an indicator of quality, the work queue did not require it because its payload would consist of smart pointers, either
std::unique_ptr<T>
or
std::shared_ptr<T>
, neither of which throws on move/copy.
lock-free/wait-free mutation â again, most definitely a plus (so long as the queue can fall back to blocking when necessary), but not a requirement because the workers would spend the majority of time doing actual work.
timed versions of blocking operations â useful in general, but I didnât need them.
Googling C++ bounded blocking MPMC queue gives quite a few results, but surprisingly, most of them donât really meet the requirements set out at the beginning.
For example,
this queue
is written by
Dmitry Vyukov
and
endorsed by Martin Thompson
, so itâs definitely of the highest quality, and way beyond something I (or most people) would be able to come up with on my own. However, itâs a lock-free queue that doesnât expose any kinds of blocking operations by design. According to the author, waiting for events such as the queue no longer being empty or full is a concern that should be handled outside the queue. This position has a lot of merit, especially in more complex scenarios where you might need to wait on multiple queues at once, but it complicates the use in many simple cases, such as the work distribution I was implementing. In these use cases the a thread is communicating with only queue at one time. When the queue is unavailable due to being empty or full, that is a signal of backpressure and there is nothing else the thread can do but sleep until the situation changes. If done right, this sleep should neither hog the CPU nor introduce unnecessary latency, so it is at least convenient that it be implemented as part of the queue.
The next implementation that looks really promising is
Erik Rigtorpâs MPMCQueue
. It has a small implementation, supports both blocking and non-blocking variants of enqueue and dequeue operations, and covers inserting elements by move, copy, and emplace. The author claims that itâs battle-tested in several high-profile EA games, as well as in a low-latency trading platform. However, a closer look at
push()
and
pop()
betrays a problem with the blocking operations. For example,
pop()
contains the following loop:
while (turn(tail) * 2 + 1 != slot.turn.load(std::memory_order_acquire))
;
Waiting is implemented with a busy loop, so that the waiting thread is not put to sleep when unable to pop, but it instead loops until this is possible. In usage that means that if the producers stall for any reason, such as reading new data from a slow network disk, consumers will angrily pounce on the CPU waiting for new items to appear in the queue. And if the producers are faster than the consumers, then itâs them who will spend CPU to wait for some free space to appear in the queue. In the latter case, the busy-looping producers will actually take the CPU cycles away from the workers doing useful work, prolonging the wait. This is not to say that Erikâs queue is not of high quality, just that it doesnât fit this use case. I suspect that applications the queue was designed for very rarely invoke the blocking versions of these operations, and when they do, they do so in scenarios where they are confident that the blockage is about to clear up.
Then there are a lot of queue implementations on stackoverflow and various personal blogs that seem like they could be used, but donât stand up to scrutiny. For example,
this queue
comes pretty high in search results, definitely supports blocking, and appears to achieve the stated requirements. Compiling it does produce some nasty warnings, though:
warning: operation on 'm_popIndex' may be undefined [-Wsequence-point]
Looking at the source, the assignment
m_popIndex = ++m_popIndex % m_size
is an infamous case of
undefined behavior
in C and C++. The author probably thought he found a way to avoid parentheses in
m_popIndex = (m_popIndex + 1) % m_size
, but C++ doesnât work like that. The problem is easily fixed, but it leads a bad overall impression. Another issue is that it requires a semaphore implementation, using boostâs inter-process semaphore by default. The page does provide a replacement semaphore, but the two in combination become unreasonably heavy-weight: the queue contains no less than three mutexes, two condition variables, and two counters. The simple queues implemented with mutexes and condition variables tend to have a single mutex and one or two condition variables, depending on whether the queue is bounded.
It was at this point that it occurred to me that it would actually be
more
productive to just take a simple and well-tested classic queue and port it to C++ than to review and correct examples found by googling. I started from Pythonâs queue
implementation
and ended up with the following:
template<typename T>
class queue {
std::deque<T> content;
size_t capacity;
std::mutex mutex;
std::condition_variable not_empty;
std::condition_variable not_full;
queue(const queue &) = delete;
queue(queue &&) = delete;
queue &operator = (const queue &) = delete;
queue &operator = (queue &&) = delete;
public:
queue(size_t capacity): capacity(capacity) {}
void push(T &&item) {
{
std::unique_lock<std::mutex> lk(mutex);
not_full.wait(lk, [this]() { return content.size() < capacity; });
content.push_back(std::move(item));
}
not_empty.notify_one();
}
bool try_push(T &&item) {
{
std::unique_lock<std::mutex> lk(mutex);
if (content.size() == capacity)
return false;
content.push_back(std::move(item));
}
not_empty.notify_one();
return true;
}
void pop(T &item) {
{
std::unique_lock<std::mutex> lk(mutex);
not_empty.wait(lk, [this]() { return !content.empty(); });
item = std::move(content.front());
content.pop_front();
}
not_full.notify_one();
}
bool try_pop(T &item) {
{
std::unique_lock<std::mutex> lk(mutex);
if (content.empty())
return false;
item = std::move(content.front());
content.pop_front();
}
not_full.notify_one();
return true;
}
};
Notes on the implementation:
push()
and
try_push()
only accept rvalue references, so elements get
moved
into the queue, and if you have them in a variable, you might need to use
queue.push(std::move(el))
. This property allows the use of
std::unique_ptr<T>
as the queue element. If you have copyable types that you want to copy into the queue, you can always use
queue.push(T(el))
.
pop()
and
try_pop()
accept a reference to the item rather than returning the item. This provides the strong exception guarantee â if moving the element from the front of the queue to
item
throws, the queue remains unchanged. Yes, I know I said I didnât care for exception guarantees in my use case, but this design is shared by most C++ queues, and the exception guarantee follows from it quite naturally, so itâs a good idea to embrace it. I would have preferred
pop()
to just
return
an item, but that would require the C++17
std::optional
(or equivalent) for the return type of
try_pop()
, and I couldnât use C++17 in the project. The downside of accepting a reference is that to pop an item you must first be able to construct an empty one. Of course, thatâs not a problem if you use the smart pointers which default-construct to hold
nullptr
, but itâs worth mentioning. Again, the same design is used by other C++ queues, so itâs apparently an acceptable choice.
The use of mutex and condition variables make this queue utterly boring to people excited about parallel and lock-free programming. Still, those primitives are implemented very efficiently in modern C++, without the need to use system calls in the non-contended case. In benchmarks against fancier queues I was never able to measure the difference on the workload of the application I was using.
Is this queue as battle-tested as the ones by Mr Vyukov and Mr Rigtorp? Certainly not! But it does work fine in production, it has passed code review by in-house experts, and most importantly, the code is easy to follow and understand.
The remaining question is â is this a classic example of
NIH
? Is there really no other way than to implement your own queue? How do others do it? To shed some light on this, I invite the reader to read Dmitry Vyukovâs
classification of queues
. In short, he categorizes queues across several dimensions: by whether they produce multiple producers, consumers, or both, by the underlying data structure, by maximum size, overflow behavior, support for priorities, ordering guarantees, and many many others! Differences in these choices vastly affect the implementation, and there is
no
one class that fits into all use cases. If you have needs for extremely low latency, definitely look into the lock-free queues like the ones from Vyukov or Rigtorp. If your needs are matched by the âclassicâ blocking queues, then a simple implementation like the one showed in this article might be the right choice. I would still prefer to have one good implementation written by experts that tries to cover the middle ground, for example a C++ equivalent of Rustâs
crossbeam channels
.
But that kind of thing doesnât appear to exist for C++ yet, at least not as a free-standing class.
EDIT
As pointed out on
reddit
, the last paragraph is not completely correct. There is a queue implementation that satisfies the requirements and has most of the crossbeam channels functionality: the
moodycamel queue
. I did stumble on that queue when originally researching the queues, but missed it when writing the article later, since it didnât come up at the first page of results when googling for C++ bounded blocking MPMC queue. This queue was too complex to drop into the project I was working on, since its size would have dwarfed the rest of the code. But if size doesnât scare you and youâre looking for a full-featured high quality queue implementation, do consider that one. |
| Markdown | [](https://morestina.net/)
# [More Stina Blog\!](https://morestina.net/)
[Search](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#search-container)
Primary Menu
[Skip to content](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#content)
- [About](https://morestina.net/about)
- [Programs](https://morestina.net/programs)
- [Reviews](https://morestina.net/reviews)
[Tech](https://morestina.net/category/tech)
# Minimalistic blocking bounded queue in C++
[January 19, 2020](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c)
[Hrvoje](https://morestina.net/author/hrvoje) [4 Comments](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#comments)
While working on speeding up a Python extension written in C++, I needed a queue to distribute work among threads, and possibly to gather their results. What I had in mind was something akin to Pythonâs [`queue`](https://docs.python.org/3/library/queue.html) module â more specifically:
- a **blocking** [MPMC](http://www.1024cores.net/home/lock-free-algorithms/queues) queue with **fixed capacity**;
- that supports `push`, `try_push`, `pop`, and `try_pop`;
- that depends only on the **C++11 standard library**;
- and has a simple and robust **header-only** implementation.
Since I could choose the size of the unit of work arbitrarily, the performance of the queue as measured by microbenchmarks wasnât a priority, as long as it wasnât abysmal. Likewise, the queue capacity would be fairly small, on the order of the number of cores on the system, so the queue wouldnât need to be particularly efficient at allocation â while a ring buffer would be optimal for storage, a linked list or deque would work as well. Other non-requirements included:
- strong exception safety â while certainly an indicator of quality, the work queue did not require it because its payload would consist of smart pointers, either `std::unique_ptr<T>` or `std::shared_ptr<T>`, neither of which throws on move/copy.
- lock-free/wait-free mutation â again, most definitely a plus (so long as the queue can fall back to blocking when necessary), but not a requirement because the workers would spend the majority of time doing actual work.
- timed versions of blocking operations â useful in general, but I didnât need them.
Googling C++ bounded blocking MPMC queue gives quite a few results, but surprisingly, most of them donât really meet the requirements set out at the beginning.
For example, [this queue](http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue) is written by [Dmitry Vyukov](http://www.1024cores.net/home/about-me) and [endorsed by Martin Thompson](https://groups.google.com/d/msg/mechanical-sympathy/yR2imPNvlr0/0QsDoF-3BQAJ), so itâs definitely of the highest quality, and way beyond something I (or most people) would be able to come up with on my own. However, itâs a lock-free queue that doesnât expose any kinds of blocking operations by design. According to the author, waiting for events such as the queue no longer being empty or full is a concern that should be handled outside the queue. This position has a lot of merit, especially in more complex scenarios where you might need to wait on multiple queues at once, but it complicates the use in many simple cases, such as the work distribution I was implementing. In these use cases the a thread is communicating with only queue at one time. When the queue is unavailable due to being empty or full, that is a signal of backpressure and there is nothing else the thread can do but sleep until the situation changes. If done right, this sleep should neither hog the CPU nor introduce unnecessary latency, so it is at least convenient that it be implemented as part of the queue.
The next implementation that looks really promising is [Erik Rigtorpâs MPMCQueue](https://github.com/rigtorp/MPMCQueue). It has a small implementation, supports both blocking and non-blocking variants of enqueue and dequeue operations, and covers inserting elements by move, copy, and emplace. The author claims that itâs battle-tested in several high-profile EA games, as well as in a low-latency trading platform. However, a closer look at `push()` and `pop()` betrays a problem with the blocking operations. For example, `pop()` contains the following loop:
```
while (turn(tail) * 2 + 1 != slot.turn.load(std::memory_order_acquire))
;
```
Waiting is implemented with a busy loop, so that the waiting thread is not put to sleep when unable to pop, but it instead loops until this is possible. In usage that means that if the producers stall for any reason, such as reading new data from a slow network disk, consumers will angrily pounce on the CPU waiting for new items to appear in the queue. And if the producers are faster than the consumers, then itâs them who will spend CPU to wait for some free space to appear in the queue. In the latter case, the busy-looping producers will actually take the CPU cycles away from the workers doing useful work, prolonging the wait. This is not to say that Erikâs queue is not of high quality, just that it doesnât fit this use case. I suspect that applications the queue was designed for very rarely invoke the blocking versions of these operations, and when they do, they do so in scenarios where they are confident that the blockage is about to clear up.
Then there are a lot of queue implementations on stackoverflow and various personal blogs that seem like they could be used, but donât stand up to scrutiny. For example, [this queue](https://vorbrodt.blog/2019/02/03/blocking-queue/) comes pretty high in search results, definitely supports blocking, and appears to achieve the stated requirements. Compiling it does produce some nasty warnings, though:
```
warning: operation on 'm_popIndex' may be undefined [-Wsequence-point]
```
Looking at the source, the assignment `m_popIndex = ++m_popIndex % m_size` is an infamous case of [undefined behavior](https://stackoverflow.com/questions/4176328/undefined-behavior-and-sequence-points) in C and C++. The author probably thought he found a way to avoid parentheses in `m_popIndex = (m_popIndex + 1) % m_size`, but C++ doesnât work like that. The problem is easily fixed, but it leads a bad overall impression. Another issue is that it requires a semaphore implementation, using boostâs inter-process semaphore by default. The page does provide a replacement semaphore, but the two in combination become unreasonably heavy-weight: the queue contains no less than three mutexes, two condition variables, and two counters. The simple queues implemented with mutexes and condition variables tend to have a single mutex and one or two condition variables, depending on whether the queue is bounded.
It was at this point that it occurred to me that it would actually be *more* productive to just take a simple and well-tested classic queue and port it to C++ than to review and correct examples found by googling. I started from Pythonâs queue [implementation](https://github.com/python/cpython/blob/cd7db76a636c218b2d81d3526eb435cfae61f212/Lib/queue.py#L27) and ended up with the following:
```
template<typename T>
class queue {
std::deque<T> content;
size_t capacity;
std::mutex mutex;
std::condition_variable not_empty;
std::condition_variable not_full;
queue(const queue &) = delete;
queue(queue &&) = delete;
queue &operator = (const queue &) = delete;
queue &operator = (queue &&) = delete;
public:
queue(size_t capacity): capacity(capacity) {}
void push(T &&item) {
{
std::unique_lock<std::mutex> lk(mutex);
not_full.wait(lk, [this]() { return content.size() < capacity; });
content.push_back(std::move(item));
}
not_empty.notify_one();
}
bool try_push(T &&item) {
{
std::unique_lock<std::mutex> lk(mutex);
if (content.size() == capacity)
return false;
content.push_back(std::move(item));
}
not_empty.notify_one();
return true;
}
void pop(T &item) {
{
std::unique_lock<std::mutex> lk(mutex);
not_empty.wait(lk, [this]() { return !content.empty(); });
item = std::move(content.front());
content.pop_front();
}
not_full.notify_one();
}
bool try_pop(T &item) {
{
std::unique_lock<std::mutex> lk(mutex);
if (content.empty())
return false;
item = std::move(content.front());
content.pop_front();
}
not_full.notify_one();
return true;
}
};
```
Notes on the implementation:
- `push()` and `try_push()` only accept rvalue references, so elements get *moved* into the queue, and if you have them in a variable, you might need to use `queue.push(std::move(el))`. This property allows the use of `std::unique_ptr<T>` as the queue element. If you have copyable types that you want to copy into the queue, you can always use `queue.push(T(el))`.
- `pop()` and `try_pop()` accept a reference to the item rather than returning the item. This provides the strong exception guarantee â if moving the element from the front of the queue to `item` throws, the queue remains unchanged. Yes, I know I said I didnât care for exception guarantees in my use case, but this design is shared by most C++ queues, and the exception guarantee follows from it quite naturally, so itâs a good idea to embrace it. I would have preferred `pop()` to just *return* an item, but that would require the C++17 `std::optional` (or equivalent) for the return type of `try_pop()`, and I couldnât use C++17 in the project. The downside of accepting a reference is that to pop an item you must first be able to construct an empty one. Of course, thatâs not a problem if you use the smart pointers which default-construct to hold `nullptr`, but itâs worth mentioning. Again, the same design is used by other C++ queues, so itâs apparently an acceptable choice.
- The use of mutex and condition variables make this queue utterly boring to people excited about parallel and lock-free programming. Still, those primitives are implemented very efficiently in modern C++, without the need to use system calls in the non-contended case. In benchmarks against fancier queues I was never able to measure the difference on the workload of the application I was using.
Is this queue as battle-tested as the ones by Mr Vyukov and Mr Rigtorp? Certainly not! But it does work fine in production, it has passed code review by in-house experts, and most importantly, the code is easy to follow and understand.
The remaining question is â is this a classic example of [NIH](https://en.wikipedia.org/wiki/Not_invented_here)? Is there really no other way than to implement your own queue? How do others do it? To shed some light on this, I invite the reader to read Dmitry Vyukovâs [classification of queues](http://www.1024cores.net/home/lock-free-algorithms/queues). In short, he categorizes queues across several dimensions: by whether they produce multiple producers, consumers, or both, by the underlying data structure, by maximum size, overflow behavior, support for priorities, ordering guarantees, and many many others! Differences in these choices vastly affect the implementation, and there is **no** one class that fits into all use cases. If you have needs for extremely low latency, definitely look into the lock-free queues like the ones from Vyukov or Rigtorp. If your needs are matched by the âclassicâ blocking queues, then a simple implementation like the one showed in this article might be the right choice. I would still prefer to have one good implementation written by experts that tries to cover the middle ground, for example a C++ equivalent of Rustâs [crossbeam channels](https://docs.rs/crossbeam/latest/crossbeam/). ~~But that kind of thing doesnât appear to exist for C++ yet, at least not as a free-standing class.~~
**EDIT**
As pointed out on [reddit](https://www.reddit.com/r/cpp/comments/equle2/account_of_search_for_a_minimalistic_bounded/), the last paragraph is not completely correct. There is a queue implementation that satisfies the requirements and has most of the crossbeam channels functionality: the [moodycamel queue](https://github.com/cameron314/concurrentqueue). I did stumble on that queue when originally researching the queues, but missed it when writing the article later, since it didnât come up at the first page of results when googling for C++ bounded blocking MPMC queue. This queue was too complex to drop into the project I was working on, since its size would have dwarfed the rest of the code. But if size doesnât scare you and youâre looking for a full-featured high quality queue implementation, do consider that one.
# Post navigation
[Previous PostControl your magnetic tapes](https://morestina.net/1388/1388)[Next PostParallel stream processing with Rayon](https://morestina.net/1432/parallel-stream-processing-with-rayon)
## 4 thoughts on âMinimalistic blocking bounded queue in C++â
1. Pingback: [Ring buffer usage and cache locality: boost lockfree spsc\_queue cache memory access â HPC](https://hpc170063702.wordpress.com/2018/07/04/boost-lockfree-spsc_queue-cache-memory-access/)
2.  **preparer** says:
[January 26, 2021 at 22:01](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#comment-211)
I think you should use notify\_all. For push scenario, notify\_one can notify a thread that is waiting on push while we want to notify a thread on pop. Likewise think about the pop scenario where we want to notify a thread that waiting on push. To achieve this you need to use notify\_all.
[Reply](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c?replytocom=211#respond)
1.  **preparer** says:
[January 26, 2021 at 22:04](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#comment-212)
ignore my comment .. I realized you are using one condition\_variable for not\_empty and one for not\_full.
[Reply](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c?replytocom=212#respond)
3.  **Hao** says:
[March 12, 2023 at 22:26](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#comment-246)
Thanks for the post! Just posting another version similar to yours using ring buffer (use int as an example but could be easily changed to T)
```
void enqueue(int element) {
{
std::unique_lock<std::mutex> lock(guard_);
not_full_.wait(lock, [this]{
return !this->isFull();
});
data_[back_] = element;
back_ = next(back_);
}
not_empty_.notify_one();
}
int dequeue() {
int payload = 0;
{
std::unique_lock<std::mutex> lock(guard_);
not_empty_.wait(lock, [this]{
return !this->isEmpty();
});
payload = data_[front_];
front_ = next(front_);
}
not_full_.notify_one();
return payload;
}
int size() const noexcept {
if (back_ >= front_) {
return back_ - front_;
} else {
return static_cast<int>(data_.size()) - (front_ - back_);
}
}
private:
bool isFull() const noexcept {
return next(back_) == front_;
}
bool isEmpty() const noexcept {
return back_ == front_;
}
int next(int index) const noexcept {
int next = index + 1;
if (next == static_cast<int>(data_.size())) {
next = 0;
}
return next;
}
```
[Reply](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c?replytocom=246#respond)
### Leave a Reply[Cancel reply](https://morestina.net/1400/minimalistic-blocking-bounded-queue-for-c#respond)
## Ramblings about whatever
# Recent Posts
- [Rust global variables, two years on](https://morestina.net/2055/rust-global-variables-two-years-on)
- [A close encounter with false sharing](https://morestina.net/1976/a-close-encounter-with-false-sharing)
- [Compile-time checks in generic functions work, and you can use them in your code](https://morestina.net/1940/compile-time-checks-in-generic-functions-work-and-you-can-use-them-in-your-code)
- [Faster sorting with decorate-sort-undecorate](https://morestina.net/1916/faster-sorting-with-decorate-sort-undecorate)
- [Self-referential types for fun and profit](https://morestina.net/1868/self-referential-types-for-fun-and-profit)
# Recent Comments
- James E. on [Parallel iteration in Python](https://morestina.net/1378/parallel-iteration-in-python#comment-259)
- [Grownup Webcam Chat â offmarket business for sale](https://offmarketbusinessforsale.com/grownup-webcam-chat/) on [Access your photo collection from a smartphone](https://morestina.net/311/access-your-photo-collection-from-a-smartphone#comment-258)
- [ĺ¨Rustä¸ĺ¤çşżç¨ćŁĺ襨螞ĺźĺšé
çäşčŽş - ĺć§çç ĺ](https://geek.ds3783.com/2024/07/%E5%9C%A8rust%E4%B8%AD%E5%A4%9A%E7%BA%BF%E7%A8%8B%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F%E5%8C%B9%E9%85%8D%E7%9A%84%E4%BA%89%E8%AE%BA/) on [Contention on multi-threaded regex matching](https://morestina.net/1827/multi-threaded-regex#comment-256)
- [ĺ¨Rustä¸ä¸é误ĺ
ąäşŤçĺŻĺćĽč§Ś - ĺć§çç ĺ](https://geek.ds3783.com/2024/05/%E5%9C%A8rust%E4%B8%AD%E4%B8%8E%E9%94%99%E8%AF%AF%E5%85%B1%E4%BA%AB%E7%9A%84%E5%AF%86%E5%88%87%E6%8E%A5%E8%A7%A6/) on [A close encounter with false sharing](https://morestina.net/1976/a-close-encounter-with-false-sharing#comment-255)
- [Dominic Portain](http://goocy.tumblr.com/) on [Control your magnetic tapes](https://morestina.net/1388/1388#comment-254)
# Archives
- [November 2023](https://morestina.net/date/2023/11)
- [October 2023](https://morestina.net/date/2023/10)
- [January 2023](https://morestina.net/date/2023/01)
- [December 2022](https://morestina.net/date/2022/12)
- [October 2022](https://morestina.net/date/2022/10)
- [November 2021](https://morestina.net/date/2021/11)
- [May 2021](https://morestina.net/date/2021/05)
- [March 2021](https://morestina.net/date/2021/03)
- [October 2020](https://morestina.net/date/2020/10)
- [July 2020](https://morestina.net/date/2020/07)
- [January 2020](https://morestina.net/date/2020/01)
- [February 2019](https://morestina.net/date/2019/02)
- [August 2018](https://morestina.net/date/2018/08)
- [March 2018](https://morestina.net/date/2018/03)
- [February 2018](https://morestina.net/date/2018/02)
- [October 2017](https://morestina.net/date/2017/10)
- [September 2017](https://morestina.net/date/2017/09)
- [August 2017](https://morestina.net/date/2017/08)
- [March 2017](https://morestina.net/date/2017/03)
- [September 2016](https://morestina.net/date/2016/09)
- [June 2016](https://morestina.net/date/2016/06)
- [February 2016](https://morestina.net/date/2016/02)
- [January 2016](https://morestina.net/date/2016/01)
- [September 2015](https://morestina.net/date/2015/09)
- [December 2014](https://morestina.net/date/2014/12)
- [November 2014](https://morestina.net/date/2014/11)
- [September 2014](https://morestina.net/date/2014/09)
# Categories
- [archeology](https://morestina.net/category/archeology)
- [biking](https://morestina.net/category/biking)
- [Cooking](https://morestina.net/category/cooking)
- [health](https://morestina.net/category/health)
- [Linux](https://morestina.net/category/linux-2)
- [Movie](https://morestina.net/category/movie)
- [Photography](https://morestina.net/category/photo)
- [Politics](https://morestina.net/category/politics)
- [Programs](https://morestina.net/category/programs)
- [Reviews](https://morestina.net/category/reviews)
- [smoking](https://morestina.net/category/smoking)
- [Space](https://morestina.net/category/space)
- [Tech](https://morestina.net/category/tech)
- [trip-taking](https://morestina.net/category/trip-taking)
- [Wine](https://morestina.net/category/wine)
- [Writing](https://morestina.net/category/writing)
- [zagreb](https://morestina.net/category/zagreb)
[Proudly powered by WordPress](http://wordpress.org/) |
| Readable Markdown | While working on speeding up a Python extension written in C++, I needed a queue to distribute work among threads, and possibly to gather their results. What I had in mind was something akin to Pythonâs [`queue`](https://docs.python.org/3/library/queue.html) module â more specifically:
- a **blocking** [MPMC](http://www.1024cores.net/home/lock-free-algorithms/queues) queue with **fixed capacity**;
- that supports `push`, `try_push`, `pop`, and `try_pop`;
- that depends only on the **C++11 standard library**;
- and has a simple and robust **header-only** implementation.
Since I could choose the size of the unit of work arbitrarily, the performance of the queue as measured by microbenchmarks wasnât a priority, as long as it wasnât abysmal. Likewise, the queue capacity would be fairly small, on the order of the number of cores on the system, so the queue wouldnât need to be particularly efficient at allocation â while a ring buffer would be optimal for storage, a linked list or deque would work as well. Other non-requirements included:
- strong exception safety â while certainly an indicator of quality, the work queue did not require it because its payload would consist of smart pointers, either `std::unique_ptr<T>` or `std::shared_ptr<T>`, neither of which throws on move/copy.
- lock-free/wait-free mutation â again, most definitely a plus (so long as the queue can fall back to blocking when necessary), but not a requirement because the workers would spend the majority of time doing actual work.
- timed versions of blocking operations â useful in general, but I didnât need them.
Googling C++ bounded blocking MPMC queue gives quite a few results, but surprisingly, most of them donât really meet the requirements set out at the beginning.
For example, [this queue](http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue) is written by [Dmitry Vyukov](http://www.1024cores.net/home/about-me) and [endorsed by Martin Thompson](https://groups.google.com/d/msg/mechanical-sympathy/yR2imPNvlr0/0QsDoF-3BQAJ), so itâs definitely of the highest quality, and way beyond something I (or most people) would be able to come up with on my own. However, itâs a lock-free queue that doesnât expose any kinds of blocking operations by design. According to the author, waiting for events such as the queue no longer being empty or full is a concern that should be handled outside the queue. This position has a lot of merit, especially in more complex scenarios where you might need to wait on multiple queues at once, but it complicates the use in many simple cases, such as the work distribution I was implementing. In these use cases the a thread is communicating with only queue at one time. When the queue is unavailable due to being empty or full, that is a signal of backpressure and there is nothing else the thread can do but sleep until the situation changes. If done right, this sleep should neither hog the CPU nor introduce unnecessary latency, so it is at least convenient that it be implemented as part of the queue.
The next implementation that looks really promising is [Erik Rigtorpâs MPMCQueue](https://github.com/rigtorp/MPMCQueue). It has a small implementation, supports both blocking and non-blocking variants of enqueue and dequeue operations, and covers inserting elements by move, copy, and emplace. The author claims that itâs battle-tested in several high-profile EA games, as well as in a low-latency trading platform. However, a closer look at `push()` and `pop()` betrays a problem with the blocking operations. For example, `pop()` contains the following loop:
```
while (turn(tail) * 2 + 1 != slot.turn.load(std::memory_order_acquire))
;
```
Waiting is implemented with a busy loop, so that the waiting thread is not put to sleep when unable to pop, but it instead loops until this is possible. In usage that means that if the producers stall for any reason, such as reading new data from a slow network disk, consumers will angrily pounce on the CPU waiting for new items to appear in the queue. And if the producers are faster than the consumers, then itâs them who will spend CPU to wait for some free space to appear in the queue. In the latter case, the busy-looping producers will actually take the CPU cycles away from the workers doing useful work, prolonging the wait. This is not to say that Erikâs queue is not of high quality, just that it doesnât fit this use case. I suspect that applications the queue was designed for very rarely invoke the blocking versions of these operations, and when they do, they do so in scenarios where they are confident that the blockage is about to clear up.
Then there are a lot of queue implementations on stackoverflow and various personal blogs that seem like they could be used, but donât stand up to scrutiny. For example, [this queue](https://vorbrodt.blog/2019/02/03/blocking-queue/) comes pretty high in search results, definitely supports blocking, and appears to achieve the stated requirements. Compiling it does produce some nasty warnings, though:
```
warning: operation on 'm_popIndex' may be undefined [-Wsequence-point]
```
Looking at the source, the assignment `m_popIndex = ++m_popIndex % m_size` is an infamous case of [undefined behavior](https://stackoverflow.com/questions/4176328/undefined-behavior-and-sequence-points) in C and C++. The author probably thought he found a way to avoid parentheses in `m_popIndex = (m_popIndex + 1) % m_size`, but C++ doesnât work like that. The problem is easily fixed, but it leads a bad overall impression. Another issue is that it requires a semaphore implementation, using boostâs inter-process semaphore by default. The page does provide a replacement semaphore, but the two in combination become unreasonably heavy-weight: the queue contains no less than three mutexes, two condition variables, and two counters. The simple queues implemented with mutexes and condition variables tend to have a single mutex and one or two condition variables, depending on whether the queue is bounded.
It was at this point that it occurred to me that it would actually be *more* productive to just take a simple and well-tested classic queue and port it to C++ than to review and correct examples found by googling. I started from Pythonâs queue [implementation](https://github.com/python/cpython/blob/cd7db76a636c218b2d81d3526eb435cfae61f212/Lib/queue.py#L27) and ended up with the following:
```
template<typename T>
class queue {
std::deque<T> content;
size_t capacity;
std::mutex mutex;
std::condition_variable not_empty;
std::condition_variable not_full;
queue(const queue &) = delete;
queue(queue &&) = delete;
queue &operator = (const queue &) = delete;
queue &operator = (queue &&) = delete;
public:
queue(size_t capacity): capacity(capacity) {}
void push(T &&item) {
{
std::unique_lock<std::mutex> lk(mutex);
not_full.wait(lk, [this]() { return content.size() < capacity; });
content.push_back(std::move(item));
}
not_empty.notify_one();
}
bool try_push(T &&item) {
{
std::unique_lock<std::mutex> lk(mutex);
if (content.size() == capacity)
return false;
content.push_back(std::move(item));
}
not_empty.notify_one();
return true;
}
void pop(T &item) {
{
std::unique_lock<std::mutex> lk(mutex);
not_empty.wait(lk, [this]() { return !content.empty(); });
item = std::move(content.front());
content.pop_front();
}
not_full.notify_one();
}
bool try_pop(T &item) {
{
std::unique_lock<std::mutex> lk(mutex);
if (content.empty())
return false;
item = std::move(content.front());
content.pop_front();
}
not_full.notify_one();
return true;
}
};
```
Notes on the implementation:
- `push()` and `try_push()` only accept rvalue references, so elements get *moved* into the queue, and if you have them in a variable, you might need to use `queue.push(std::move(el))`. This property allows the use of `std::unique_ptr<T>` as the queue element. If you have copyable types that you want to copy into the queue, you can always use `queue.push(T(el))`.
- `pop()` and `try_pop()` accept a reference to the item rather than returning the item. This provides the strong exception guarantee â if moving the element from the front of the queue to `item` throws, the queue remains unchanged. Yes, I know I said I didnât care for exception guarantees in my use case, but this design is shared by most C++ queues, and the exception guarantee follows from it quite naturally, so itâs a good idea to embrace it. I would have preferred `pop()` to just *return* an item, but that would require the C++17 `std::optional` (or equivalent) for the return type of `try_pop()`, and I couldnât use C++17 in the project. The downside of accepting a reference is that to pop an item you must first be able to construct an empty one. Of course, thatâs not a problem if you use the smart pointers which default-construct to hold `nullptr`, but itâs worth mentioning. Again, the same design is used by other C++ queues, so itâs apparently an acceptable choice.
- The use of mutex and condition variables make this queue utterly boring to people excited about parallel and lock-free programming. Still, those primitives are implemented very efficiently in modern C++, without the need to use system calls in the non-contended case. In benchmarks against fancier queues I was never able to measure the difference on the workload of the application I was using.
Is this queue as battle-tested as the ones by Mr Vyukov and Mr Rigtorp? Certainly not! But it does work fine in production, it has passed code review by in-house experts, and most importantly, the code is easy to follow and understand.
The remaining question is â is this a classic example of [NIH](https://en.wikipedia.org/wiki/Not_invented_here)? Is there really no other way than to implement your own queue? How do others do it? To shed some light on this, I invite the reader to read Dmitry Vyukovâs [classification of queues](http://www.1024cores.net/home/lock-free-algorithms/queues). In short, he categorizes queues across several dimensions: by whether they produce multiple producers, consumers, or both, by the underlying data structure, by maximum size, overflow behavior, support for priorities, ordering guarantees, and many many others! Differences in these choices vastly affect the implementation, and there is **no** one class that fits into all use cases. If you have needs for extremely low latency, definitely look into the lock-free queues like the ones from Vyukov or Rigtorp. If your needs are matched by the âclassicâ blocking queues, then a simple implementation like the one showed in this article might be the right choice. I would still prefer to have one good implementation written by experts that tries to cover the middle ground, for example a C++ equivalent of Rustâs [crossbeam channels](https://docs.rs/crossbeam/latest/crossbeam/). ~~But that kind of thing doesnât appear to exist for C++ yet, at least not as a free-standing class.~~
**EDIT**
As pointed out on [reddit](https://www.reddit.com/r/cpp/comments/equle2/account_of_search_for_a_minimalistic_bounded/), the last paragraph is not completely correct. There is a queue implementation that satisfies the requirements and has most of the crossbeam channels functionality: the [moodycamel queue](https://github.com/cameron314/concurrentqueue). I did stumble on that queue when originally researching the queues, but missed it when writing the article later, since it didnât come up at the first page of results when googling for C++ bounded blocking MPMC queue. This queue was too complex to drop into the project I was working on, since its size would have dwarfed the rest of the code. But if size doesnât scare you and youâre looking for a full-featured high quality queue implementation, do consider that one. |
| Shard | 180 (laksa) |
| Root Hash | 2014184751704518180 |
| Unparsed URL | net,morestina!/1400/minimalistic-blocking-bounded-queue-for-c s443 |