ℹ️ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 1.1 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://enccs.github.io/sycl-workshop/queues-cgs-kernels/ |
| Last Crawled | 2026-03-22 02:00:40 (1 month ago) |
| First Indexed | not set |
| HTTP Status Code | 200 |
| Content | |
| Meta Title | Queues, command groups, and kernels — Heterogeneous programming with SYCL documentation |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Questions
How do we organize work in a SYCL application?
Objectives
Learn about queues to describe ordering of operations.
Command groups.
Understand that kernels are units of parallelism in SYCL.
SYCL
queue
objects are the abstraction connecting a host program to a single
device. The
queue
is a central abstraction in
SYCL
: all device code is
submitted
to a queue as
actions
. The runtime
schedules
the actions and
executes them
asynchronously
. The runtime keeps track of action
prerequisites in its scheduling, for example, availability of data.
We can state that the tracking of actions and their dependencies is the essence
of
SYCL
. The SYCL standard models our program as a
task graph
, a set of
nodes
connected by
edges
:
Nodes
are actions to be performed on a device, such as the invocation of a
kernel or explicit data movements.
Edges
are dependencies between the actions and express when it’s legal for
a node to execute. Edges arise most often because of data dependencies between
nodes.
The task graph is a
directed acyclic graph (DAG)
: it has a well-defined
start-to-finish direction and no nodes are self-connected.
The SYCL runtime can resolve dependencies and thus
generate
the task graph.
Furthermore, it can schedule how to execute the nodes,
i.e.
traversal
of
the task graph, in a completely asynchronous manner from the execution of the
host code.
We will see in
The task graph: data, dependencies, synchronization
how to manually modify the
task graph.
Two kinds of actions can be part of the task graph:
Execution of device code
These actions add nodes to the graph that will, eventually, execute device
code. They accept kernel code and its execution space as argument and you
invoke them as methods on the
queue
class directly or on the
handler
class. They come in three flavors, which represent different abstractions for
work distribution in SYCL:
single_task
: as the name says, this will execute one single instance of
the kernel code.
parallel_for
: this will launch a kernel with given work-size
specification in a single instruction, multiple threads (SIMT) fashion.
parallel_for_work_group
: launches a kernel with hierarchical
parallelism. This is only available on the
handler
class.
Explicit memory operations.
These actions add nodes to the graph that will, eventually, perform data migrations.
You invoke them as methods on the
queue
class directly or on the
handler
class:
copy
: copies data.
update_host
: updates data in the buffer on the host-side.
fill
: initializes data in a buffer to the given value.
We have given a high-level overview of the abstractions in the execution model:
from the queue to the execution on a device, passing through submission of work,
described as a data-parallel kernel.
But how do we write a kernel?
Kernels
Kernels are the fundamental building blocks for performing work in a SYCL
program. We will only consider two ways of writing kernels in SYCL:
lambda expressions
Kernels as lambdas are very concise, thanks especially to the
capture
syntax. They cannot be templated and might be cumbersome to reuse. In some
cases, lambdas can be too terse.
+1 as a lambda
[
=
](
id
<
1
>
idx
)
->
void
{
data_acc
[
idx
]
+=
1
;
}
function objects
A kernel is a class that overloads
operator()
function call operator. They
can be templated, easily reused, and give full control over what data is
passed in and out. They are more verbosee.
+1 as a function object
class
PlusOne
{
public
:
PlusOne
(
accessor
<
int
>
acc
)
:
data_acc_
(
acc
)
{}
void
operator
()(
id
<
1
>
idx
)
{
data_acc
[
idx
]
+=
1
;
}
private
:
accessor
<
int
>
data_acc_
;
};
There are no technical reasons to prefer one style over the other, it will ultimately boil down to personal preference. Regardless of the chosen style, kernel code has some restrictions:
It must have
void
as return type.
It cannot use
runtime type identification (RTTI)
.
It cannot dynamic allocate memory.
Queues
One queue maps to one device: the mapping happens upon construction of a
queue
object and cannot be changed subsequently.
It is not possible to use a single
queue
object to:
manage more than one device. The runtime would face ambiguities in deciding
which device should actually do the work!
spread enqueued work over multiple devices.
While these might appear as limitations, we are free to declare as many
queue
object as we like in our programs. It is also valid to create multiple
queues to the
same
device. Thus, the relation between queues and devices is
many-to-one
.
Work on a device can be enqueued with the shortcut methods described above. For
example, we can launch a data-parallel kernel with
parallel_for
invoked on
the desired queue object:
Creating work on a device using
queue
shortcuts.
auto
Q
=
queue
{
my_selector
{}};
Q
.
parallel_for
(
range
<
1
>
{
sz
},
[
=
](
auto
&
idx
){
/* kernel code */
});
Command groups
A command group handler gives more control over how code is submitted to the
queue. Submission is slightly more verbose, but we get access to features of
hierarchical parallelism.
The abstraction for command groups is the class
handler
: these objects are
constructed for us by the SYCL runtime. As such, we will meet them only as
arguments of the lambda functions passed to the
submit
method of our queues.
A command group handler contains:
host code, to set up the dependencies of the corresponding node in the task graph.
Host code is executed immediately upon submission.
exactly one
action of the ones described above. The action executes
asynchronously on the device. Parallel work actions will, furthermore, need
an execution range and a kernel function.
Creating work on a device using a command group
handler
.
auto
Q
=
queue
{
my_selector
{}};
Q
.
submit
([
&
](
handler
&
cgh
){
/* host code: sets up the dependencies of this node. It executes **immediately!** */
accessor
acc
{
B
,
h
};
/* exactly **one** of the available actions. It executes **asynchronously** */
cgh
.
parallel_for
(
range
<
1
>
{
sz
},
[
=
](
auto
&
idx
){
/* kernel code */
});
});
single_task
and streams
We’ll walk through the use of the
single_task
method to create work on a
device.
As the name suggests, this will create a task for sequential execution:
probably not a method you will use often, but definitely something to be
aware of!
The task we would like to perform is a print-out on the device. If you are
familiar with CUDA/HIP, you probably know that
printf
can be used in
device code. In keeping with C++, the SYCL standard defines a
stream
class, which works similar to the standard streams. A SYCL stream needs a
handler
object on construction:
auto
out
=
stream
(
1024
,
/* maximum size of output per kernel invocation */
256
,
/* maximum size before flushing the stream */
cgh
);
SYCL streams behave just like standard C++ streams. We can write something to
a stream using
operator<<
:
out
<<
"my message"
<<
std
::
endl
;
You can find a scaffold for the code in the
content/code/day-1/04_single-task/single-task.cpp
file,
alongside the CMake script to build the executable. You will have to complete
the source code to compile and run correctly: follow the hints in the source
file. A working solution is in the
solution
subfolder.
Create a queue object. You’re free to use any of the device selection
strategies we have encountered in the previous episode.
Submit work to the queue using a command handler group.
Create a
stream
object.
Create a single task on the
handler
printing a string to the stream. A
single_task
only accepts a function with no input arguments as
parameter:
cgh
.
single_task
([
=
](){
/* task code */
});
Keypoints
One queue maps to one device, such that there is no ambiguity in
spreading work.
A program can have as many queues as desired. Multiple queues can use the
same device: the queue-device mapping is many-to-one.
Enqueing actions can happen by submitting
command groups
using the
handler
class.
You can also enqueue actions with
shortcut
methods on the
queue
class.
Work can be enqueued with a command group handler. This gives more
flexibility over the definition of the corresponding node in the task
graph.
Kernels are
callables
: either lambda
functions or function objects.
Kernel code cannot use neither RTTI nor dynamic memory allocation. |
| Markdown | [Heterogeneous programming with SYCL ](https://enccs.github.io/sycl-workshop/)
- [Setting up your system](https://enccs.github.io/sycl-workshop/karolina/)
The lesson
- [What is SYCL?](https://enccs.github.io/sycl-workshop/what-is-sycl/)
- [Device discovery](https://enccs.github.io/sycl-workshop/device-discovery/)
- [Queues, command groups, and kernels](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/)
- [Kernels](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#kernels)
- [Queues](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#queues)
- [Command groups](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#command-groups)
- [Data management with buffers and accessors](https://enccs.github.io/sycl-workshop/buffers-accessors/)
- [Data management with unified shared memory](https://enccs.github.io/sycl-workshop/unified-shared-memory/)
- [Expressing parallelism with SYCL: basic data-parallel kernels](https://enccs.github.io/sycl-workshop/expressing-parallelism-basic/)
- [Expressing parallelism with SYCL: nd-range data-parallel kernels](https://enccs.github.io/sycl-workshop/expressing-parallelism-nd-range/)
- [The task graph: data, dependencies, synchronization](https://enccs.github.io/sycl-workshop/task-graphs-synchronization/)
- [Heat equation mini-app](https://enccs.github.io/sycl-workshop/heat-equation/)
- [Using sub-groups in SYCL](https://enccs.github.io/sycl-workshop/sub-groups/)
- [Profiling SYCL applications](https://enccs.github.io/sycl-workshop/profiling/)
- [Buffer-accessor model *vs* unified shared memory](https://enccs.github.io/sycl-workshop/buffer-accessor-vs-usm/)
Reference
- [Quick Reference](https://enccs.github.io/sycl-workshop/quick-reference/)
- [Bibliography](https://enccs.github.io/sycl-workshop/zbibliography/)
- [Instructor’s guide](https://enccs.github.io/sycl-workshop/guide/)
[Heterogeneous programming with SYCL](https://enccs.github.io/sycl-workshop/)
- Queues, command groups, and kernels
- [Edit on GitHub](https://github.com/ENCCS/sycl-workshop/blob/main/content/queues-cgs-kernels.rst)
***
# Queues, command groups, and kernels[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#queues-command-groups-and-kernels "Permalink to this heading")
Questions
- How do we organize work in a SYCL application?
Objectives
- Learn about queues to describe ordering of operations.
- Command groups.
- Understand that kernels are units of parallelism in SYCL.
SYCL `queue` objects are the abstraction connecting a host program to a single device. The [queue](https://enccs.github.io/sycl-workshop/quick-reference/#term-queue) is a central abstraction in [SYCL](https://www.khronos.org/sycl/): all device code is **submitted** to a queue as *actions*. The runtime **schedules** the actions and executes them **asynchronously**. The runtime keeps track of action prerequisites in its scheduling, for example, availability of data. We can state that the tracking of actions and their dependencies is the essence of [SYCL](https://www.khronos.org/sycl/). The SYCL standard models our program as a **task graph**, a set of *nodes* connected by *edges*:
- **Nodes** are actions to be performed on a device, such as the invocation of a kernel or explicit data movements.
- **Edges** are dependencies between the actions and express when it’s legal for a node to execute. Edges arise most often because of data dependencies between nodes.
The task graph is a *directed acyclic graph (DAG)*: it has a well-defined start-to-finish direction and no nodes are self-connected. The SYCL runtime can resolve dependencies and thus **generate** the task graph. Furthermore, it can schedule how to execute the nodes, *i.e.* **traversal** of the task graph, in a completely asynchronous manner from the execution of the host code. We will see in [The task graph: data, dependencies, synchronization](https://enccs.github.io/sycl-workshop/task-graphs-synchronization/#task-graphs-synchronization) how to manually modify the task graph.
Two kinds of actions can be part of the task graph:
Execution of device code
These actions add nodes to the graph that will, eventually, execute device code. They accept kernel code and its execution space as argument and you invoke them as methods on the `queue` class directly or on the `handler` class. They come in three flavors, which represent different abstractions for work distribution in SYCL:
- `single_task`: as the name says, this will execute one single instance of the kernel code.
- `parallel_for`: this will launch a kernel with given work-size specification in a single instruction, multiple threads (SIMT) fashion.
- `parallel_for_work_group`: launches a kernel with hierarchical parallelism. This is only available on the `handler` class.
Explicit memory operations.
These actions add nodes to the graph that will, eventually, perform data migrations. You invoke them as methods on the `queue` class directly or on the `handler` class:
- `copy`: copies data.
- `update_host`: updates data in the buffer on the host-side.
- `fill`: initializes data in a buffer to the given value.
We have given a high-level overview of the abstractions in the execution model: from the queue to the execution on a device, passing through submission of work, described as a data-parallel kernel.
But how do we write a kernel?
## Kernels[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#kernels "Permalink to this heading")
Kernels are the fundamental building blocks for performing work in a SYCL program. We will only consider two ways of writing kernels in SYCL:
[lambda expressions](https://en.cppreference.com/w/cpp/language/lambda)
Kernels as lambdas are very concise, thanks especially to the *capture* syntax. They cannot be templated and might be cumbersome to reuse. In some cases, lambdas can be too terse.
\+1 as a lambda
```
[=](id<1> idx) -> void {
data_acc[idx] += 1;
}
```
[function objects](https://en.cppreference.com/w/cpp/utility/functional)
A kernel is a class that overloads `operator()` function call operator. They can be templated, easily reused, and give full control over what data is passed in and out. They are more verbosee.
\+1 as a function object
```
class PlusOne {
public:
PlusOne(accessor<int> acc) : data_acc_(acc) {}
void operator()(id<1> idx) {
data_acc[idx] += 1;
}
private:
accessor<int> data_acc_;
};
```
There are no technical reasons to prefer one style over the other, it will ultimately boil down to personal preference. Regardless of the chosen style, kernel code has some restrictions:
- It must have `void` as return type.
- It cannot use [runtime type identification (RTTI)](https://en.m.wikibooks.org/wiki/C%2B%2B_Programming/RTTI).
- It cannot dynamic allocate memory.
## Queues[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#queues "Permalink to this heading")
One queue maps to one device: the mapping happens upon construction of a `queue` object and cannot be changed subsequently. It is not possible to use a single `queue` object to:
- manage more than one device. The runtime would face ambiguities in deciding which device should actually do the work\!
- spread enqueued work over multiple devices.
While these might appear as limitations, we are free to declare as many `queue` object as we like in our programs. It is also valid to create multiple queues to the *same* device. Thus, the relation between queues and devices is **many-to-one**.
Work on a device can be enqueued with the shortcut methods described above. For example, we can launch a data-parallel kernel with `parallel_for` invoked on the desired queue object:
Creating work on a device using `queue` shortcuts.
```
auto Q = queue{my_selector{}};
Q.parallel_for(range<1>{sz}, [=](auto &idx){
/* kernel code */
});
```
## Command groups[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#command-groups "Permalink to this heading")
A command group handler gives more control over how code is submitted to the queue. Submission is slightly more verbose, but we get access to features of hierarchical parallelism. The abstraction for command groups is the class `handler`: these objects are constructed for us by the SYCL runtime. As such, we will meet them only as arguments of the lambda functions passed to the `submit` method of our queues. A command group handler contains:
- host code, to set up the dependencies of the corresponding node in the task graph. Host code is executed immediately upon submission.
- **exactly one** action of the ones described above. The action executes asynchronously on the device. Parallel work actions will, furthermore, need an execution range and a kernel function.
Creating work on a device using a command group `handler`.
```
auto Q = queue{my_selector{}};
Q.submit([&](handler &cgh){
/* host code: sets up the dependencies of this node. It executes **immediately!** */
accessor acc{B, h};
/* exactly **one** of the available actions. It executes **asynchronously** */
cgh.parallel_for(range<1>{sz}, [=](auto &idx){
/* kernel code */
});
});
```
`single_task` and streams
We’ll walk through the use of the `single_task` method to create work on a device. As the name suggests, this will create a task for sequential execution: probably not a method you will use often, but definitely something to be aware of! The task we would like to perform is a print-out on the device. If you are familiar with CUDA/HIP, you probably know that `printf` can be used in device code. In keeping with C++, the SYCL standard defines a `stream` class, which works similar to the standard streams. A SYCL stream needs a `handler` object on construction:
```
auto out = stream(1024, /* maximum size of output per kernel invocation */
256, /* maximum size before flushing the stream */
cgh);
```
SYCL streams behave just like standard C++ streams. We can write something to a stream using `operator<<`:
```
out << "my message" << std::endl;
```
You can find a scaffold for the code in the `content/code/day-1/04_single-task/single-task.cpp` file, alongside the CMake script to build the executable. You will have to complete the source code to compile and run correctly: follow the hints in the source file. A working solution is in the `solution` subfolder.
1. Create a queue object. You’re free to use any of the device selection strategies we have encountered in the previous episode.
2. Submit work to the queue using a command handler group.
3. Create a `stream` object.
4. Create a single task on the `handler` printing a string to the stream. A `single_task` only accepts a function with no input arguments as parameter:
```
cgh.single_task([=](){
/* task code */
});
```
Keypoints
- One queue maps to one device, such that there is no ambiguity in spreading work.
- A program can have as many queues as desired. Multiple queues can use the same device: the queue-device mapping is many-to-one.
- Enqueing actions can happen by submitting **command groups** using the `handler` class.
- You can also enqueue actions with *shortcut* methods on the `queue` class.
- Work can be enqueued with a command group handler. This gives more flexibility over the definition of the corresponding node in the task graph.
- Kernels are [callables](https://en.cppreference.com/w/cpp/named_req/Callable): either lambda functions or function objects.
- Kernel code cannot use neither RTTI nor dynamic memory allocation.
[Previous](https://enccs.github.io/sycl-workshop/device-discovery/ "Device discovery") [Next](https://enccs.github.io/sycl-workshop/buffers-accessors/ "Data management with buffers and accessors")
***
© Copyright 2021, Roberto Di Remigio and individual contributors..
Built with [Sphinx](https://www.sphinx-doc.org/) using a [theme](https://github.com/readthedocs/sphinx_rtd_theme) provided by [Read the Docs](https://readthedocs.org/). |
| Readable Markdown | Questions
- How do we organize work in a SYCL application?
Objectives
- Learn about queues to describe ordering of operations.
- Command groups.
- Understand that kernels are units of parallelism in SYCL.
SYCL `queue` objects are the abstraction connecting a host program to a single device. The [queue](https://enccs.github.io/sycl-workshop/quick-reference/#term-queue) is a central abstraction in [SYCL](https://www.khronos.org/sycl/): all device code is **submitted** to a queue as *actions*. The runtime **schedules** the actions and executes them **asynchronously**. The runtime keeps track of action prerequisites in its scheduling, for example, availability of data. We can state that the tracking of actions and their dependencies is the essence of [SYCL](https://www.khronos.org/sycl/). The SYCL standard models our program as a **task graph**, a set of *nodes* connected by *edges*:
- **Nodes** are actions to be performed on a device, such as the invocation of a kernel or explicit data movements.
- **Edges** are dependencies between the actions and express when it’s legal for a node to execute. Edges arise most often because of data dependencies between nodes.
The task graph is a *directed acyclic graph (DAG)*: it has a well-defined start-to-finish direction and no nodes are self-connected. The SYCL runtime can resolve dependencies and thus **generate** the task graph. Furthermore, it can schedule how to execute the nodes, *i.e.* **traversal** of the task graph, in a completely asynchronous manner from the execution of the host code. We will see in [The task graph: data, dependencies, synchronization](https://enccs.github.io/sycl-workshop/task-graphs-synchronization/#task-graphs-synchronization) how to manually modify the task graph.
Two kinds of actions can be part of the task graph:
Execution of device code
These actions add nodes to the graph that will, eventually, execute device code. They accept kernel code and its execution space as argument and you invoke them as methods on the `queue` class directly or on the `handler` class. They come in three flavors, which represent different abstractions for work distribution in SYCL:
- `single_task`: as the name says, this will execute one single instance of the kernel code.
- `parallel_for`: this will launch a kernel with given work-size specification in a single instruction, multiple threads (SIMT) fashion.
- `parallel_for_work_group`: launches a kernel with hierarchical parallelism. This is only available on the `handler` class.
Explicit memory operations.
These actions add nodes to the graph that will, eventually, perform data migrations. You invoke them as methods on the `queue` class directly or on the `handler` class:
- `copy`: copies data.
- `update_host`: updates data in the buffer on the host-side.
- `fill`: initializes data in a buffer to the given value.
We have given a high-level overview of the abstractions in the execution model: from the queue to the execution on a device, passing through submission of work, described as a data-parallel kernel.
But how do we write a kernel?
## Kernels[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#kernels "Permalink to this heading")
Kernels are the fundamental building blocks for performing work in a SYCL program. We will only consider two ways of writing kernels in SYCL:
[lambda expressions](https://en.cppreference.com/w/cpp/language/lambda)
Kernels as lambdas are very concise, thanks especially to the *capture* syntax. They cannot be templated and might be cumbersome to reuse. In some cases, lambdas can be too terse.
\+1 as a lambda
```
[=](id<1> idx) -> void {
data_acc[idx] += 1;
}
```
[function objects](https://en.cppreference.com/w/cpp/utility/functional)
A kernel is a class that overloads `operator()` function call operator. They can be templated, easily reused, and give full control over what data is passed in and out. They are more verbosee.
\+1 as a function object
```
class PlusOne {
public:
PlusOne(accessor<int> acc) : data_acc_(acc) {}
void operator()(id<1> idx) {
data_acc[idx] += 1;
}
private:
accessor<int> data_acc_;
};
```
There are no technical reasons to prefer one style over the other, it will ultimately boil down to personal preference. Regardless of the chosen style, kernel code has some restrictions:
- It must have `void` as return type.
- It cannot use [runtime type identification (RTTI)](https://en.m.wikibooks.org/wiki/C%2B%2B_Programming/RTTI).
- It cannot dynamic allocate memory.
## Queues[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#queues "Permalink to this heading")
One queue maps to one device: the mapping happens upon construction of a `queue` object and cannot be changed subsequently. It is not possible to use a single `queue` object to:
- manage more than one device. The runtime would face ambiguities in deciding which device should actually do the work\!
- spread enqueued work over multiple devices.
While these might appear as limitations, we are free to declare as many `queue` object as we like in our programs. It is also valid to create multiple queues to the *same* device. Thus, the relation between queues and devices is **many-to-one**.
Work on a device can be enqueued with the shortcut methods described above. For example, we can launch a data-parallel kernel with `parallel_for` invoked on the desired queue object:
Creating work on a device using `queue` shortcuts.
```
auto Q = queue{my_selector{}};
Q.parallel_for(range<1>{sz}, [=](auto &idx){
/* kernel code */
});
```
## Command groups[](https://enccs.github.io/sycl-workshop/queues-cgs-kernels/#command-groups "Permalink to this heading")
A command group handler gives more control over how code is submitted to the queue. Submission is slightly more verbose, but we get access to features of hierarchical parallelism. The abstraction for command groups is the class `handler`: these objects are constructed for us by the SYCL runtime. As such, we will meet them only as arguments of the lambda functions passed to the `submit` method of our queues. A command group handler contains:
- host code, to set up the dependencies of the corresponding node in the task graph. Host code is executed immediately upon submission.
- **exactly one** action of the ones described above. The action executes asynchronously on the device. Parallel work actions will, furthermore, need an execution range and a kernel function.
Creating work on a device using a command group `handler`.
```
auto Q = queue{my_selector{}};
Q.submit([&](handler &cgh){
/* host code: sets up the dependencies of this node. It executes **immediately!** */
accessor acc{B, h};
/* exactly **one** of the available actions. It executes **asynchronously** */
cgh.parallel_for(range<1>{sz}, [=](auto &idx){
/* kernel code */
});
});
```
`single_task` and streams
We’ll walk through the use of the `single_task` method to create work on a device. As the name suggests, this will create a task for sequential execution: probably not a method you will use often, but definitely something to be aware of! The task we would like to perform is a print-out on the device. If you are familiar with CUDA/HIP, you probably know that `printf` can be used in device code. In keeping with C++, the SYCL standard defines a `stream` class, which works similar to the standard streams. A SYCL stream needs a `handler` object on construction:
```
auto out = stream(1024, /* maximum size of output per kernel invocation */
256, /* maximum size before flushing the stream */
cgh);
```
SYCL streams behave just like standard C++ streams. We can write something to a stream using `operator<<`:
```
out << "my message" << std::endl;
```
You can find a scaffold for the code in the `content/code/day-1/04_single-task/single-task.cpp` file, alongside the CMake script to build the executable. You will have to complete the source code to compile and run correctly: follow the hints in the source file. A working solution is in the `solution` subfolder.
1. Create a queue object. You’re free to use any of the device selection strategies we have encountered in the previous episode.
2. Submit work to the queue using a command handler group.
3. Create a `stream` object.
4. Create a single task on the `handler` printing a string to the stream. A `single_task` only accepts a function with no input arguments as parameter:
```
cgh.single_task([=](){
/* task code */
});
```
Keypoints
- One queue maps to one device, such that there is no ambiguity in spreading work.
- A program can have as many queues as desired. Multiple queues can use the same device: the queue-device mapping is many-to-one.
- Enqueing actions can happen by submitting **command groups** using the `handler` class.
- You can also enqueue actions with *shortcut* methods on the `queue` class.
- Work can be enqueued with a command group handler. This gives more flexibility over the definition of the corresponding node in the task graph.
- Kernels are [callables](https://en.cppreference.com/w/cpp/named_req/Callable): either lambda functions or function objects.
- Kernel code cannot use neither RTTI nor dynamic memory allocation. |
| ML Classification | |
| ML Categories | null |
| ML Page Types | null |
| ML Intent Types | null |
| Content Metadata | |
| Language | en |
| Author | null |
| Publish Time | not set |
| Original Publish Time | 2021-12-03 15:36:50 (4 years ago) |
| Republished | No |
| Word Count (Total) | 1,571 |
| Word Count (Content) | 1,441 |
| Links | |
| External Links | 10 |
| Internal Links | 17 |
| Technical SEO | |
| Meta Nofollow | No |
| Meta Noarchive | No |
| JS Rendered | No |
| Redirect Target | null |
| Performance | |
| Download Time (ms) | 49 |
| TTFB (ms) | 49 |
| Download Size (bytes) | 6,986 |
| Shard | 143 (laksa) |
| Root Hash | 2566890010099092343 |
| Unparsed URL | io,github!enccs,/sycl-workshop/queues-cgs-kernels/ s443 |