Introduction
Graphs are fundamental data structures used to represent relationships between objects. In TNL, the TNL::Graphs::Graph class provides a flexible and efficient implementation of graphs that can run on both CPUs and GPUs.
A graph in TNL consists of vertices (also called nodes) and edges connecting pairs of vertices. Edges can be:
- Directed: An edge from vertex A to vertex B is different from an edge from B to A
- Undirected: An edge between A and B can be traversed in both directions
- Weighted: Each edge has an associated weight (numerical value)
Internally, TNL represents graphs using an adjacency matrix, which stores information about which vertices are connected by edges and what weights those edges have.
Graph Orientation: Directed vs. Undirected Graphs
TNL supports both directed and undirected graphs through the type tags TNL::Graphs::DirectedGraph and TNL::Graphs::UndirectedGraph :
- Directed graphs: Each edge has a direction. An edge from vertex u to vertex v does not imply an edge from v to u.
- Undirected graphs: Edges have no direction. If there's an edge between vertices u and v, it can be traversed in both directions.
Sparse vs. Dense Adjacency Matrices
TNL graphs can use either sparse or dense adjacency matrices to store edge information. The choice significantly impacts memory usage and performance.
Sparse Adjacency Matrices And Sparse Graphs (Default)
By default, graphs use sparse matrix representations (typically CSR format). A sparse graph only stores information about edges that actually exist:
The memory usage is proportional to the number of edges in the graph. By default, the sparse adjacency matrix is stored in the CSR format (segments) but any other format (segments) defined in TNL::Algorithms::Segments can be used instead. For unweighted graphs, using ValueType = bool avoids storing numerical edge weights and reduces memory overhead, as only the presence or absence of edges is represented.
Dense Adjacency Matrices And Dense Graphs
Dense graphs store information about all possible edges between all vertex pairs:
Memory usage is proportional to the square of the number of vertices (O(V²)). This representation is best suited for complete or nearly complete graphs, where most vertex pairs are connected. Missing edges must be represented explicitly by the user (e.g., via a chosen sentinel value or encoding), since the matrix representation assumes that all vertex pairs are present, including diagonal entries representing self-loops. It is therefore the user’s responsibility to omit such edges when they are not desired.
Comparison Table
| Aspect | Sparse Graph | Dense Graph |
| Memory | O(V + E) | O(V²) |
| Edge lookup | O(degree) | O(1) |
| Best for | Sparse connectivity | Dense connectivity |
In the following examples, we assume the following graph types:
- Sparse directed graph:
- Sparse undirected graph:
- Dense directed graph:
Constructing Graphs
TNL provides several ways to construct graphs.
Default Constructor
To create an empty graph, the default constructor can be used:
std::cout <<
"Example 1: Default constructor\n";
GraphType graph1;
std::cout <<
"Empty graph - vertices: " << graph1.getVertexCount() <<
", edges: " << graph1.getEdgeCount() <<
"\n\n";
Constructor with Vertex Count
Constructor with vertex count creates a graph with specified number of vertices but no edges:
std::cout <<
"Example 2: Constructor with vertex count\n";
GraphType graph2( 5 );
std::cout <<
"Graph with 5 vertices - vertices: " << graph2.getVertexCount() <<
", edges: " << graph2.getEdgeCount()
<< "\n\n";
Constructor with Initializer List for Sparse and Dense Graphs
Edges in both sparse and dense graphs can be specified as (sourceIdx, targetIdx, weight) tuples in an initializer list, together with the number of vertices:
std::cout <<
"Example 3a: Constructor with initializer list (sparse graph)\n";
GraphType graph3a( 5,
{
{ 0, 1, 10.0 }, { 0, 2, 20.0 },
{ 1, 2, 30.0 }, { 1, 3, 40.0 },
{ 2, 3, 50.0 },
{ 3, 0, 60.0 }, { 3, 4, 70.0 } } );
std::cout <<
"Sparse graph:\n" << graph3a <<
'\n';
In a sparse graph, unspecified edges are considered missing. In a dense graph, unspecified edges are represented by zero weights.
Constructor with Initializer List for Sparse Undirected Graphs
Edges of sparse undirected graphs can be specified in the same way, but only half of them are required:
std::cout <<
"Example 3b: Undirected graph constructor\n";
UndirectedGraphType graph3b( 4,
{
{ 0, 1, 1.0 }, { 0, 2, 2.0 },
{ 1, 2, 3.0 },
std::cout <<
"Undirected graph:\n" << graph3b <<
'\n';
The last parameter specifies the encoding of the edges (see also TNL::Matrices::MatrixElementsEncoding):
- Complete – all edges must be specified explicitly, and symmetry is required.
- SymmetricLower – only the lower triangular part of the adjacency matrix is specified, i.e. edges (u, v) with u > v.
- SymmetricUpper – only the upper triangular part of the adjacency matrix is specified, i.e. edges (u, v) with u < v.
- SymmetricMixed – elements from both the lower and upper triangular parts of the adjacency matrix may be specified; the corresponding symmetric elements are filled automatically. This is the default setting for the undirected graphs.
By default, undirected graphs are represented by a symmetric sparse matrix, i.e. a matrix in which only the lower triangular part is stored. This behavior can be changed by explicitly selecting a general adjacency matrix type (see also TNL::Matrices::SparseMatrix).
Constructor with Initializer List for Dense Graphs
Edges of dense graphs can be specified by providing all edge weights in a two-dimensional structure:
std::cout <<
"\nExample 3c: Constructor with initializer list (dense adjacency matrix)\n";
DenseGraphType graph3c( { { 0.0, 10.0, 20.0, 0.0, 0.0 },
{ 0.0, 0.0, 30.0, 40.0, 0.0 },
{ 0.0, 0.0, 0.0, 50.0, 0.0 },
{60.0, 0.0, 0.0, 0.0, 70.0 },
{ 0.0, 0.0, 0.0, 0.0, 0.0 } } );
std::cout <<
"Dense graph (complete graph with weighted edges):\n" << graph3c <<
'\n';
Constructor with std::map
This constructor builds graphs from a map of edge pairs to weights:
std::cout <<
"\nExample 4: Constructor with std::map\n";
edgeMap[ { 0, 1 } ] = 1.5;
edgeMap[ { 0, 2 } ] = 2.5;
edgeMap[ { 1, 2 } ] = 3.5;
edgeMap[ { 2, 3 } ] = 4.5;
GraphType graph4( 4, edgeMap );
std::cout <<
"Graph from map:\n" << graph4 <<
'\n';
If the graph is dense, unspecified edges are assigned zero weight. If the graph is undirected, the SymmetricMixed encoding of adjacency matrix elements is required. The encoding can be changed using the third constructor parameter, in the same way as for the constructor that takes an initializer list.
Copy constructor
The copy constructor creates a copy of another graph.
std::cout <<
"Example 5: Copy constructor\n";
GraphType graph5( graph3a );
std::cout <<
"Copied graph - vertices: " << graph5.getVertexCount() <<
", edges: " << graph5.getEdgeCount() <<
"\n\n";
Constructor from Adjacency Matrix
This constructor creates a graph from an existing adjacency matrix:
std::cout <<
"Example 6: Constructor from adjacency matrix\n";
using MatrixType = typename GraphType::AdjacencyMatrixType;
MatrixType matrix( 3, 3 );
matrix.setElements( { { 0, 1, 1.0 }, { 0, 2, 2.0 },
{ 1, 2, 3.0 },
{ 2, 0, 4.0 } } );
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Matrices/DenseMatrix.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Devices/Hip.h>
7#include <TNL/Containers/Vector.h>
8
9template< typename Device >
10void
11constructorsExample()
12{
16
20
23 float,
24 Device,
25 int,
30
32
33
34
35 std::cout <<
"Example 1: Default constructor\n";
36 GraphType graph1;
37 std::cout <<
"Empty graph - vertices: " << graph1.getVertexCount() <<
", edges: " << graph1.getEdgeCount() <<
"\n\n";
39
41
42
43
44 std::cout <<
"Example 2: Constructor with vertex count\n";
45 GraphType graph2( 5 );
46 std::cout <<
"Graph with 5 vertices - vertices: " << graph2.getVertexCount() <<
", edges: " << graph2.getEdgeCount()
47 << "\n\n";
49
51
52
53
54 std::cout <<
"Example 3a: Constructor with initializer list (sparse graph)\n";
55
56 GraphType graph3a( 5,
57 {
58 { 0, 1, 10.0 }, { 0, 2, 20.0 },
59 { 1, 2, 30.0 }, { 1, 3, 40.0 },
60 { 2, 3, 50.0 },
61 { 3, 0, 60.0 }, { 3, 4, 70.0 } } );
62
63 std::cout <<
"Sparse graph:\n" << graph3a <<
'\n';
65
67
68
69
70 std::cout <<
"Example 3b: Undirected graph constructor\n";
71
72 UndirectedGraphType graph3b( 4,
73 {
74 { 0, 1, 1.0 }, { 0, 2, 2.0 },
75 { 1, 2, 3.0 },
77
78 std::cout <<
"Undirected graph:\n" << graph3b <<
'\n';
80
82
83
84
85
86 std::cout <<
"\nExample 3c: Constructor with initializer list (dense adjacency matrix)\n";
87
88 DenseGraphType graph3c( { { 0.0, 10.0, 20.0, 0.0, 0.0 },
89 { 0.0, 0.0, 30.0, 40.0, 0.0 },
90 { 0.0, 0.0, 0.0, 50.0, 0.0 },
91 {60.0, 0.0, 0.0, 0.0, 70.0 },
92 { 0.0, 0.0, 0.0, 0.0, 0.0 } } );
93
94 std::cout <<
"Dense graph (complete graph with weighted edges):\n" << graph3c <<
'\n';
96
98
99
100
101 std::cout <<
"\nExample 4: Constructor with std::map\n";
103 edgeMap[ { 0, 1 } ] = 1.5;
104 edgeMap[ { 0, 2 } ] = 2.5;
105 edgeMap[ { 1, 2 } ] = 3.5;
106 edgeMap[ { 2, 3 } ] = 4.5;
107
108 GraphType graph4( 4, edgeMap );
109 std::cout <<
"Graph from map:\n" << graph4 <<
'\n';
111
113
114
115
116 std::cout <<
"Example 5: Copy constructor\n";
117 GraphType graph5( graph3a );
118 std::cout <<
"Copied graph - vertices: " << graph5.getVertexCount() <<
", edges: " << graph5.getEdgeCount() <<
"\n\n";
120
122
123
124
125 std::cout <<
"Example 6: Constructor from adjacency matrix\n";
126 using MatrixType = typename GraphType::AdjacencyMatrixType;
127 MatrixType matrix( 3, 3 );
128
129 matrix.setElements( { { 0, 1, 1.0 }, { 0, 2, 2.0 },
130 { 1, 2, 3.0 },
131 { 2, 0, 4.0 } } );
132
134
135 GraphType graph6( matrix );
136 std::cout <<
"Graph from adjacency matrix:\n" << graph6 <<
'\n';
137}
138
139int
140main( int argc, char* argv[] )
141{
143 constructorsExample< TNL::Devices::Host >();
144
145#ifdef __CUDACC__
146 std::cout <<
"Running on CUDA device:\n";
147 constructorsExample< TNL::Devices::Cuda >();
148#endif
149
150#ifdef __HIP__
152 constructorsExample< TNL::Devices::Hip >();
153#endif
154
155 return EXIT_SUCCESS;
156}
Data structure for CSR segments.
Definition CSR.h:37
Implementation of dense matrix, i.e. matrix storing explicitly all of its elements including zeros.
Definition DenseMatrix.h:31
@ SymmetricMixed
Definition MatrixBase.h:38
Type tag for directed graphs.
Definition GraphOrientation.h:65
Graph class represents a mathematical graph using an adjacency matrix.
Definition Graph.h:57
And the output looks as follows:
Running on host:
Example 1: Default constructor
Empty graph - vertices: 0, edges: 0
Example 2: Constructor with vertex count
Graph with 5 vertices - vertices: 5, edges: 0
Example 3a: Constructor with initializer list (sparse graph)
Sparse graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40
Row: 2 -> 3:50
Row: 3 -> 0:60 4:70
Row: 4 ->
Example 3b: Undirected graph constructor
Undirected graph:
Row: 0 -> 1:1 2:2
Row: 1 -> 0:1 2:3
Row: 2 -> 0:2 1:3 3:4
Row: 3 -> 2:4
Example 3c: Constructor with initializer list (dense adjacency matrix)
Dense graph (complete graph with weighted edges):
Row: 0 -> 0:0 1:10 2:20 3:0 4:0
Row: 1 -> 0:0 1:0 2:30 3:40 4:0
Row: 2 -> 0:0 1:0 2:0 3:50 4:0
Row: 3 -> 0:60 1:0 2:0 3:0 4:70
Row: 4 -> 0:0 1:0 2:0 3:0 4:0
Example 4: Constructor with std::map
Graph from map:
Row: 0 -> 1:1.5 2:2.5
Row: 1 -> 2:3.5
Row: 2 -> 3:4.5
Row: 3 ->
Example 5: Copy constructor
Copied graph - vertices: 5, edges: 7
Example 6: Constructor from adjacency matrix
Graph from adjacency matrix:
Row: 0 -> 1:1 2:2
Row: 1 -> 2:3
Row: 2 -> 0:4
Running on CUDA device:
Example 1: Default constructor
Empty graph - vertices: 0, edges: 0
Example 2: Constructor with vertex count
Graph with 5 vertices - vertices: 5, edges: 0
Example 3a: Constructor with initializer list (sparse graph)
Sparse graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40
Row: 2 -> 3:50
Row: 3 -> 0:60 4:70
Row: 4 ->
Example 3b: Undirected graph constructor
Undirected graph:
Row: 0 -> 1:1 2:2
Row: 1 -> 0:1 2:3
Row: 2 -> 0:2 1:3 3:4
Row: 3 -> 2:4
Example 3c: Constructor with initializer list (dense adjacency matrix)
Dense graph (complete graph with weighted edges):
Row: 0 -> 0:0 1:10 2:20 3:0 4:0
Row: 1 -> 0:0 1:0 2:30 3:40 4:0
Row: 2 -> 0:0 1:0 2:0 3:50 4:0
Row: 3 -> 0:60 1:0 2:0 3:0 4:70
Row: 4 -> 0:0 1:0 2:0 3:0 4:0
Example 4: Constructor with std::map
Graph from map:
Row: 0 -> 1:1.5 2:2.5
Row: 1 -> 2:3.5
Row: 2 -> 3:4.5
Row: 3 ->
Example 5: Copy constructor
Copied graph - vertices: 5, edges: 7
Example 6: Constructor from adjacency matrix
Graph from adjacency matrix:
Row: 0 -> 1:1 2:2
Row: 1 -> 2:3
Row: 2 -> 0:4
Setting Edges
After creating a graph, you can set or modify its edges using the setEdges and setDenseEdges methods. These methods behave similarly to the corresponding constructors.
Setting Edges with Initializer List (Sparse)
For sparse and dense graphs:
std::cout <<
"Example 1: setEdges with initializer list (sparse graph)\n";
GraphType graph1;
graph1.setVertexCount( 5 );
graph1.setEdges( {
{ 0, 1, 10.0 }, { 0, 2, 20.0 },
{ 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
{ 2, 3, 60.0 },
{ 3, 0, 70.0 }, { 3, 4, 80.0 } } );
std::cout <<
"Sparse graph after setEdges:\n" << graph1 <<
'\n';
The method TNL::Graphs::Graph::setEdges also accepts a matrix elements encoding (see TNL::Matrices::MatrixElementsEncoding) as the second parameter, following the initializer list. This is particularly useful when setting undirected graphs.
Setting Edges with Initializer List (Dense)
For dense graphs, the method TNL::Graphs::Graph::setDenseEdges sets the graph edges based on the complete adjacency matrix:
std::cout <<
"\nExample 1b: setEdges with initializer list (dense adjacency matrix)\n";
float,
Device,
int,
DenseGraphType graph1b;
graph1b.setVertexCount( 4 );
graph1b.setDenseEdges( { { 0.0, 10.0, 20.0, 0.0, 0.0 },
{ 0.0, 0.0, 30.0, 40.0, 50.0 },
{ 0.0, 0.0, 0.0, 60.0, 0.0 },
{ 70.0, 0.0, 0.0, 0.0, 80.0 },
{ 0.0, 0.0, 0.0, 0.0, 0.0 } } );
std::cout <<
"Dense graph after setDenseEdges:\n" << graph1b <<
'\n';
Setting Edges with std::map
The graph edges can also be set using a std::map:
std::cout <<
"Example 2: setEdges with std::map\n";
GraphType graph2;
graph2.setVertexCount( 4 );
edgeMap[ { 0, 1 } ] = 1.5;
edgeMap[ { 0, 2 } ] = 2.5;
edgeMap[ { 1, 2 } ] = 3.5;
edgeMap[ { 1, 3 } ] = 4.5;
edgeMap[ { 2, 3 } ] = 5.5;
edgeMap[ { 3, 0 } ] = 6.5;
graph2.setEdges( edgeMap );
std::cout <<
"Graph from map:\n" << graph2 <<
'\n';
Graph Traversal
TNL provides powerful parallel traversal capabilities through the functions TNL::Graphs::forVertices, TNL::Graphs::forEdges, and related utilities. A graph can be traversed either by vertices or by edges. Vertex-based traversal is performed using a vertex view.
The following provides an overview of graph traversal functions:
- TNL distinguishes between vertex-based and edge-based traversal. Vertex-based traversal assigns at most one thread to each vertex, while edge-based traversal enables finer-grained parallelism by distributing work across edges.
| Category | Operates On | Lambda Parameter | Use Case |
| Edge-wise (forEdges, forAllEdges) | Individual edges | Edge indices & weights | Operate on each edge separately |
| Vertex-wise (forVertices, forAllVertices) | Individual vertices | VertexView object | Operate on vertices |
- The scope of vertex traversal can be defined in several ways: all vertices, a vertex range, an explicit list of vertex indices, or a condition on the vertex index.
| Scope | Vertices Processed | Parameters |
| All | All vertices | No range/array parameters |
| Range | Vertices in [begin, end) | begin and end indices |
| Array | Specific vertices | Array of vertex indices |
| If | Vertices filtered by a condition | Process vertices based on vertex-level properties |
- All functions can be called for both constant and non-constant graphs.
| Category | Graph Modifiable? | Use Case |
| Non-const | Yes | Can modify graph edges and structure |
| Const | No | Read-only access to graph vertices and edges |
See also Overview of Graph Traversal Functions for more details.
Traversing By Vertices
A TNL::Graphs::GraphVertexView provides access to a single vertex and its outgoing edges. Vertex views are the primary way to interact with graph structure in parallel GPU code.
Traversing All Vertices
Functions TNL::Graphs::forAllVertices iterates over all vertices in parallel:
auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.
The whole example reads as:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllVerticesExample()
10{
11
12
13
15
16 GraphType graph( 5,
17 {
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22
23
24
25
26
28
30
31
32
33 auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
34 {
35 for( int i = 0; i < vertex.getDegree(); i++ )
36 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
37 };
40
41
42
43
44 std::cout <<
"Modified graph:\n" << graph <<
'\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
51 forAllVerticesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
55 forAllVerticesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
#define __cuda_callable__
Definition Macros.h:49
void forAllVertices(Graph &graph, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all graph vertices and applies the given lambda function to each vertex.
and the output reads as:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Traversing a Range of Vertices
Traversing vertices in a specific range [begin, end) can be done using the function TNL::Graphs::forVertices :
auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.
The whole example reads as:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forVerticesExample()
10{
11
12
13
15
16 GraphType graph( 5,
17 {
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22
23
24
25
26
28
30
31
32
33 auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
34 {
35 for( int i = 0; i < vertex.getDegree(); i++ )
36 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
37 };
40
41
42
43
44 std::cout <<
"Modified graph:\n" << graph <<
'\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
51 forVerticesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
55 forVerticesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
void forVertices(Graph &graph, IndexBegin begin, IndexEnd end, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over graph vertices within the specified range of vertex indexes and applies the...
and the output reads as:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Traversing Vertices with Given Indices
The function TNL::Graphs::forVertices also allows traversing vertices with explicitly specified indices. The indices are provided as an array:
The vertices with these indices can be traversed as follows:
auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.
The whole example reads as:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10forVerticesWithIndexesExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
31
32
33
36
38
39
40
41 auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
42 {
43 for( int i = 0; i < vertex.getDegree(); i++ )
44 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
45 };
48
49
50
51
52 std::cout <<
"Modified graph:\n" << graph <<
'\n';
53}
54
55int
56main( int argc, char* argv[] )
57{
59 forVerticesWithIndexesExample< TNL::Devices::Host >();
60
61#ifdef __CUDACC__
63 forVerticesWithIndexesExample< TNL::Devices::Cuda >();
64#endif
65
66 return EXIT_SUCCESS;
67}
Vector extends Array with algebraic operations.
Definition Vector.h:37
and the output reads as:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 4:65
Row: 3 -> 0:70 4:80
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 4:65
Row: 3 -> 0:70 4:80
Row: 4 ->
Traversing Vertices with a Condition
Use forAllVerticesIf to process only vertices that meet criteria specified by the lambda function condition:
{
return graph.getVertexDegree( vertexIdx ) > 1;
};
The lambda function takes a vertex index and returns true for vertices that should be traversed; all other vertices are skipped. The function forAllVerticesIf is invoked as follows:
auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
There is also an alternative function, TNL::Graphs::forVerticesIf, which accepts a range of vertex indices [begin, end). Only vertices within this range are considered for traversal.
If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.
The complete example reads as:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllVerticesIfExample()
10{
11
12
13
15
16 GraphType graph( 5,
17 {
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22
23
24
25
26
28
30
31
32
34 {
35 return graph.getVertexDegree( vertexIdx ) > 1;
36 };
38
40
41
42
43 auto processVertex = []
__cuda_callable__(
typename GraphType::VertexView vertex )
mutable
44 {
45 for( int i = 0; i < vertex.getDegree(); i++ )
46 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
47 };
50
51
52
53
54 std::cout <<
"Modified graph:\n" << graph <<
'\n';
55}
56
57int
58main( int argc, char* argv[] )
59{
61 forAllVerticesIfExample< TNL::Devices::Host >();
62
63#ifdef __CUDACC__
65 forAllVerticesIfExample< TNL::Devices::Cuda >();
66#endif
67
68 return EXIT_SUCCESS;
69}
void forAllVerticesIf(Graph &graph, VertexCondition &&vertexCondition, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all graph vertices, applying a condition to determine whether each vertex s...
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 3:60
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 3:60
Row: 3 -> 1:75 0:85
Row: 4 ->
Traversing By Edges
Accessing edges from a vertex is possible; however, in this mode each vertex is processed by at most one thread. For higher degrees of parallelism, specialized traversal functions must be used.
Traversing All Edges
The TNL::Graphs::forAllEdges function iterates over all edges in the graph:
auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int& targetIdx,
float& weight )
mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
The localIdx index specifies the rank of an edge within the set of all edges incident to the vertex sourceIdx. Note that for sparse, non-constant graphs, both targetIdx and weight can be modified. For dense graphs, only weight can be modified, and the localIdx index is equal to targetIdx.
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllEdgesExample()
10{
11
12
13
15
16 GraphType graph( 5,
17 {
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22
23
24
25
26
28
30
31
32
33 auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int& targetIdx,
float& weight )
mutable
34 {
35 targetIdx = ( targetIdx + 1 ) % 5;
36 weight += 5;
37 };
40
41
42
43
44 std::cout <<
"Modified graph:\n" << graph <<
'\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
51 forAllEdgesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
55 forAllEdgesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
void forAllEdges(Graph &graph, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all edges of all graph vertices and applies the specified lambda function.
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Traversing Edges in a Range
Traversing only edges connected to vertices in a specific range [begin, end) can be done as follows:
auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int& targetIdx,
float& weight )
mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllEdgesExample()
10{
11
12
13
15
16 GraphType graph( 5,
17 {
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22
23
24
25
26
28
30
31
32
33 auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int& targetIdx,
float& weight )
mutable
34 {
35 targetIdx = ( targetIdx + 1 ) % 5;
36 weight += 5;
37 };
40
41
42
43
44 std::cout <<
"Modified graph:\n" << graph <<
'\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
51 forAllEdgesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
55 forAllEdgesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
void forEdges(Graph &graph, IndexBegin begin, IndexEnd end, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all edges in the given range of graph vertices and applies the specified la...
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Traversing Edges for Specific Vertices
The function TNL::Graphs::forEdges can traverse only edges of vertices with explicitly specified indices, similarly to TNL::Graphs::forVertices.
The indices of vertices for traversing are specified as follows:
The traversal itself is performed as follows:
auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int targetIdx,
float weight )
mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10forEdgesWithIndexesExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
31
32
33
36
38
39
40
41 auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int targetIdx,
float weight )
mutable
42 {
43 targetIdx = ( targetIdx + 1 ) % 5;
44 weight += 5;
45 };
48
49
50
51
52 std::cout <<
"Modified graph:\n" << graph <<
'\n';
53}
54
55int
56main( int argc, char* argv[] )
57{
59 forEdgesWithIndexesExample< TNL::Devices::Host >();
60
61#ifdef __CUDACC__
63 forEdgesWithIndexesExample< TNL::Devices::Cuda >();
64#endif
65
66 return EXIT_SUCCESS;
67}
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Conditional Edge Traversal
For conditional traversal of edges, the function TNL::Graphs::forAllEdgesIf can be used. The condition is expressed by a lambda function condition:
{
return vertexIdx % 2 == 0;
};
The lambda function takes a vertex index and returns true if the edges incident to that vertex should be traversed; otherwise, it returns false. The traversal is executed as follows:
auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int targetIdx,
float weight )
mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
The function TNL::Graphs::forEdgesIf behaves in the same way and additionally accepts a range of vertex indices to be traversed.
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllEdgesIfExample()
10{
11
12
13
15
16 GraphType graph( 5,
17 {
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22
23
24
25
26
28
30
31
32
34 {
35 return vertexIdx % 2 == 0;
36 };
38
40
41
42
43 auto modifyEdge = []
__cuda_callable__(
int sourceIdx,
int localIdx,
int targetIdx,
float weight )
mutable
44 {
45 targetIdx = ( targetIdx + 1 ) % 5;
46 weight += 5;
47 };
50
51
52
53
54 std::cout <<
"Modified graph:\n" << graph <<
'\n';
55}
56
57int
58main( int argc, char* argv[] )
59{
61 forAllEdgesIfExample< TNL::Devices::Host >();
62
63#ifdef __CUDACC__
65 forAllEdgesIfExample< TNL::Devices::Cuda >();
66#endif
67
68 return EXIT_SUCCESS;
69}
void forAllEdgesIf(Graph &graph, Condition &&condition, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all edges of all graph vertices based on a condition.
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Const vs. Non-Const Edge Traversal
For constant graphs, edge data is read-only:
Reductions on Graphs
Graph reductions allow efficient computation of aggregate values over edges or vertices. Such computations can also be implemented using vertex traversal functions (TNL::Graphs::forAllVertices, TNL::Graphs::forVertices, TNL::Graphs::forAllVerticesIf, TNL::Graphs::forVerticesIf). However, to achieve a higher level of parallelism, TNL provides specialized reduction operations for graph data structures.
The following provides an overview of graph reduction functions:
- Reductions can be performed either in a basic mode or with arguments, which additionally return the position of the edge being searched for.
| Category | Tracks Position? | Use Case |
| Basic | No | Only the reduced weight is needed (e.g., vertex sum, vertex max) |
| WithArgument | Yes | Need weight and target vertex index (e.g., max weight and where it occurs) |
- The scope of reduction can be defined in several ways: all vertices, a vertex range, an explicit list of vertex indices, or a condition on the vertex index.
| Scope | Vertices Processed | Parameters |
| All | All vertices | No range/array parameters |
| Range | Vertices [begin, end) | begin and end indices |
| Array | Specific vertices | Array of vertex indices |
| If | Vertices filtered by a condition | Process vertices based on vertex-level properties |
- All functions can be called for both constant and non-constant graphs.
| Category | Graph Modifiable? | Use Case |
| Non-const | Yes | Can modify graph edges during reduction |
| Const | No | Read-only access to graph edges |
See also Overview of Graph Reduction Functions for more details.
Reducing Over Edges of a Vertex
The function TNL::Graphs::reduceVertices computes a reduction over edges adjacent to the specified vertices. First, we define a vector vertexMaxWeights (together with the corresponding vector view vertexMaxWeights_view ) to store the results of the reduction:
auto vertexMaxWeights_view = vertexMaxWeights.getView();
Next, we define the fetch lambda function responsible for reading the required data:
auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
{
return weight;
};
The lambda function fetch takes the indices sourceIdx and targetIdx of the source and target vertices of an edge, respectively. The third parameter is the edge weight. For non-constant graphs, the weight parameter may be passed by reference and can be modified during the fetch operation. This allows the reduction and graph modification to be performed simultaneously. For sparse graphs, the targetIdx index can also be modified.
Next, we define the store lambda function, which is responsible for storing the reduction results for individual vertices:
auto store = [ = ]
__cuda_callable__(
int vertexIdx,
const float& maxWeight )
mutable
{
vertexMaxWeights_view[ vertexIdx ] = maxWeight;
};
Finally, the reduction is executed as follows:
Here, we use TNL::Max for reduction (see also Function objects for reduction operations for other functionals). Another variant of TNL::Graphs::reduceVertices allows to define the reduction operation via a lambda function.
Note that the function TNL::Graphs::reduceAllVertices is also available for performing reductions over all vertices.
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
30
31
32
35 auto vertexMaxWeights_view = vertexMaxWeights.getView();
37
39 auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
40 {
41 return weight;
42 };
44
46 auto store = [ = ]
__cuda_callable__(
int vertexIdx,
const float& maxWeight )
mutable
47 {
48 vertexMaxWeights_view[ vertexIdx ] = maxWeight;
49 };
51
55
56
57
58
59 std::cout <<
"Maximum edge weight for vertices 1-3:" << vertexMaxWeights <<
'\n';
60}
61
62int
63main( int argc, char* argv[] )
64{
66 reduceVerticesExample< TNL::Devices::Host >();
67
68#ifdef __CUDACC__
70 reduceVerticesExample< TNL::Devices::Cuda >();
71#endif
72
73 return EXIT_SUCCESS;
74}
void reduceVertices(Graph &graph, IndexBegin begin, IndexEnd end, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes.
Function object implementing max(x, y).
Definition Functional.h:272
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Maximum edge weight for vertices 1-3:[ -1, 50, 60, 80, -1 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Maximum edge weight for vertices 1-3:[ -1, 50, 60, 80, -1 ]
Reducing Over Edges of Specific Vertices and Conditional Reduction
Reductions over a specified set of vertices can be performed using TNL::Graphs::reduceVertices. Conditional reductions are provided by TNL::Graphs::reduceAllVerticesIf and TNL::Graphs::reduceVerticesIf. Their behavior is analogous to the corresponding vertex traversal functions; however, they differ in how the results of the reduction are stored, as demonstrated in the following code snippet.
Assume that we perform a reduction over a given set of vertices:
auto vertexSums_view = vertexSums.getView();
auto compressedVertexSums_view = compressedVertexSums.getView();
auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
{
return weight;
};
auto store = [ = ]
__cuda_callable__(
int indexOfVertexIdx,
int vertexIdx,
const float& sum )
mutable
{
compressedVertexSums_view[ indexOfVertexIdx ] = sum;
vertexSums_view[ vertexIdx ] = sum;
};
The store lambda function takes the following parameters:
- indexOfVertexIdx – the rank of the vertex within the set of vertices being processed. In this example, the reduction is performed only for vertices 0, 2, and 3. Their corresponding indexOfVertexIdx values are 0, 1, and 2, respectively.
- vertexIdx – the index of the vertex whose reduction result is currently being stored.
- sum – the result of the reduction, i.e. the sum of the weights of all edges connected to the vertex.
The whole exmpale reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesWithIndexesExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
31
32
33
37 auto vertexSums_view = vertexSums.getView();
38 auto compressedVertexSums_view = compressedVertexSums.getView();
39
40 auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
41 {
42 return weight;
43 };
44
45 auto store = [ = ]
__cuda_callable__(
int indexOfVertexIdx,
int vertexIdx,
const float& sum )
mutable
46 {
47 compressedVertexSums_view[ indexOfVertexIdx ] = sum;
48 vertexSums_view[ vertexIdx ] = sum;
49 };
50
53
54
55
56
57 std::cout <<
"Sum of edge weights for specific vertices:" << vertexSums <<
'\n';
58 std::cout <<
"Compressed sums:" << compressedVertexSums.getView( 0, vertexIndices.getSize() ) <<
'\n';
59}
60
61int
62main( int argc, char* argv[] )
63{
65 reduceVerticesWithIndexesExample< TNL::Devices::Host >();
66
67#ifdef __CUDACC__
69 reduceVerticesWithIndexesExample< TNL::Devices::Cuda >();
70#endif
71
72 return EXIT_SUCCESS;
73}
Function object implementing x + y.
Definition Functional.h:34
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Sum of edge weights for specific vertices:[ 30, -1, 60, 150, -1 ]
Compressed sums:[ 30, 60, 150 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Sum of edge weights for specific vertices:[ 30, -1, 60, 150, -1 ]
Compressed sums:[ 30, 60, 150 ]
The store lambda function behaves in the same way for conditional reductions, as demonstrated by the following code snippet:
auto vertexMinWeights_view = vertexMinWeights.getView();
auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
{
return graph.getVertexDegree( vertexIdx ) >= 2;
};
auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
{
return weight;
};
auto store = [ = ]
__cuda_callable__(
int indexOfVertexIdx,
int vertexIdx,
const float& minWeight )
mutable
{
compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
vertexMinWeights_view[ vertexIdx ] = minWeight;
};
The function TNL::Graphs::reduceVerticesIf also returns the number of vertices (reducedVertexCount) for which the reduction has been performed. This is particularly useful for subsequent processing of data stored in arrays with compressed results (for example, compressedVertexMinWeights_view in this example).
The whole example reads as:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesIfExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
31
32
33
36 auto vertexMinWeights_view = vertexMinWeights.getView();
37 auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
38
40 {
41 return graph.getVertexDegree( vertexIdx ) >= 2;
42 };
43
44 auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
45 {
46 return weight;
47 };
48
49 auto store = [ = ]
__cuda_callable__(
int indexOfVertexIdx,
int vertexIdx,
const float& minWeight )
mutable
50 {
51 compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
52 vertexMinWeights_view[ vertexIdx ] = minWeight;
53 };
54
57
58
59
60
61 std::cout <<
"Number of reduced vertices: " << reducedVertexCount <<
'\n';
62 std::cout <<
"Minimum edge weight for vertices 1-3 with degree >= 2:" << vertexMinWeights <<
'\n';
63 std::cout <<
"Compressed minimum weights:" << compressedVertexMinWeights.getView( 0, reducedVertexCount ) <<
'\n';
64}
65
66int
67main( int argc, char* argv[] )
68{
70 reduceVerticesIfExample< TNL::Devices::Host >();
71
72#ifdef __CUDACC__
74 reduceVerticesIfExample< TNL::Devices::Cuda >();
75#endif
76
77 return EXIT_SUCCESS;
78}
Graph::IndexType reduceVerticesIf(Graph &graph, IndexBegin begin, IndexEnd end, Condition &&condition, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes based on a ...
Function object implementing min(x, y).
Definition Functional.h:242
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of reduced vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Compressed minimum weights:[ 30, 70 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of reduced vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Compressed minimum weights:[ 30, 70 ]
Reducing Over Edges of a Vertex with Argument
Reductions with arguments allow tracking graph edges during the computation. This is useful, for example, when searching for the edge with the maximal (or minimal) weight among edges adjacent to given vertices, or for similar operations. Such functionality is provided by TNL::Graphs::reduceVerticesWithArgument and related functions (TNL::Graphs::reduceAllVerticesWithArgument, TNL::Graphs::reduceVerticesWithArgumentIf, TNL::Graphs::reduceAllVerticesWithArgumentIf).
In the following example, for each vertex in a given range, we search for the edge with the minimal weight. We begin by defining vectors and corresponding vector views to store the results of the reduction:
auto minWeights_view = minWeights.getView();
auto minTargets_view = minTargets.getView();
The fetch lambda function behaves in the same way as in a standard reduction:
auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
{
return weight;
};
The store lambda function is defined as follows:
auto store = [ = ]
__cuda_callable__(
int vertexIdx,
int localIdx,
int targetIdx,
float result,
bool isolatedVertex )
mutable
{
minWeights_view[ vertexIdx ] = result;
if( ! isolatedVertex )
minTargets_view[ vertexIdx ] = targetIdx;
};
In addition to the reduction result, it receives the parameters localIdx, targetIdx, and the boolean flag isolatedVertex:
- vertexIdx is the index of the vertex for which the result is being stored.
- targetIdx is the index of the target vertex of the edge with the minimal weight.
- localIdx is the position of the edge with the minimal weight within the set of all edges adjacent to the vertex vertexIdx. If the graph is dense (i.e., the adjacency matrix is dense), localIdx has the same value as targetIdx and may be omitted.
- result is the value of the minimal edge weight.
- isolatedVertex is a boolean flag. If it is set to true, the vertex with index vertexIdx is isolated and has no adjacent edges. In this case, result is equal to the identity value of the reduction operation, and both localIdx and targetIdx are undefined and must be ignored.
The reduction is performed as follows:
We use TNL::MinWithArg for reduction (see also Function objects for reduction operations with argument for other functionals). Another variant of TNL::Graphs::reduceVerticesWithArgument allows to define the reduction operation via a lambda function.
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesWithArgumentExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
30
31
32
36 auto minWeights_view = minWeights.getView();
37 auto minTargets_view = minTargets.getView();
39
41 auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
42 {
43 return weight;
44 };
46
48 auto store = [ = ]
__cuda_callable__(
int vertexIdx,
int localIdx,
int targetIdx,
float result,
bool isolatedVertex )
mutable
49 {
50 minWeights_view[ vertexIdx ] = result;
51 if( ! isolatedVertex )
52 minTargets_view[ vertexIdx ] = targetIdx;
53 };
55
59
60
61
62
63 std::cout <<
"Minimum edge weight for vertices 1-4:" << minWeights <<
'\n';
64 std::cout <<
"Target vertex for minimum edge:" << minTargets <<
'\n';
65}
66
67int
68main( int argc, char* argv[] )
69{
71 reduceVerticesWithArgumentExample< TNL::Devices::Host >();
72
73#ifdef __CUDACC__
75 reduceVerticesWithArgumentExample< TNL::Devices::Cuda >();
76#endif
77
78 return EXIT_SUCCESS;
79}
void reduceVerticesWithArgument(Graph &graph, IndexBegin begin, IndexEnd end, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes while retur...
Function object implementing argmin(x, y, i, j), i.e. returning the minimum value and its index.
Definition Functional.h:321
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Minimum edge weight for vertices 1-4:[ -1, 30, 60, 70, 3.40282e+38 ]
Target vertex for minimum edge:[ -1, 2, 3, 0, -1 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Minimum edge weight for vertices 1-4:[ -1, 30, 60, 70, 3.40282e+38 ]
Target vertex for minimum edge:[ -1, 2, 3, 0, -1 ]
Reducing Over Edges of Specific Vertices and Conditional Reduction With Argument
The following example demonstrates a reduction with arguments over a specific set of vertices, specified either by their indices or by a condition:
auto vertexMinWeights_view = vertexMinWeights.getView();
auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
auto vertexMinTargets_view = vertexMinTargets.getView();
auto compressedVertexMinTargets_view = compressedVertexMinTargets.getView();
{
return graph.getVertexDegree( vertexIdx ) >= 2;
};
auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
{
return weight;
};
auto store =
int indexOfVertexIdx, int vertexIdx, int localIdx, int targetIdx, float minWeight, bool isolatedVertex ) mutable
{
compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
vertexMinWeights_view[ vertexIdx ] = minWeight;
if( ! isolatedVertex ) {
compressedVertexMinTargets_view[ indexOfVertexIdx ] = targetIdx;
vertexMinTargets_view[ vertexIdx ] = targetIdx;
}
};
int traversedVertexCount =
The lambda functions condition and fetch remain unchanged. The store lambda function accepts the following parameters:
- indexOfVertexIdx – the position of the vertex whose result is being stored within the set of all vertices reduced during the operation. This index can be used to store results in a compressed form.
- vertexIdx – the index of the vertex whose reduction result is being stored.
- localIdx – the position of the edge with the minimal weight within the set of all edges adjacent to the vertex vertexIdx.
- targetIdx – the index of the target vertex of the edge with the minimal weight.
- minWeight – the weight value of the edge with the minimal weight.
- isolatedVertex – indicates (if set to true) that the vertex with index vertexIdx is isolated and has no adjacent edges over which the reduction can be performed.
In this case, the reduction result (minWeight in this example) is set to the identity value of the reduction operation, and the variables localIdx and targetIdx are undefined and must not be used.
The whole example reads as follows:
1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesWithArgumentIfExample()
11{
12
13
14
16
17 GraphType graph( 5,
18 {
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23
24
25
26
27
29
31
32
33
38 auto vertexMinWeights_view = vertexMinWeights.getView();
39 auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
40 auto vertexMinTargets_view = vertexMinTargets.getView();
41 auto compressedVertexMinTargets_view = compressedVertexMinTargets.getView();
42
44 {
45 return graph.getVertexDegree( vertexIdx ) >= 2;
46 };
47
48 auto fetch = []
__cuda_callable__(
int sourceIdx,
int targetIdx,
const float& weight ) ->
float
49 {
50 return weight;
51 };
52
53 auto store =
55 int indexOfVertexIdx, int vertexIdx, int localIdx, int targetIdx, float minWeight, bool isolatedVertex ) mutable
56 {
57 compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
58 vertexMinWeights_view[ vertexIdx ] = minWeight;
59 if( ! isolatedVertex ) {
60 compressedVertexMinTargets_view[ indexOfVertexIdx ] = targetIdx;
61 vertexMinTargets_view[ vertexIdx ] = targetIdx;
62 }
63 };
64
65 int traversedVertexCount =
68
69
70
71
72 std::cout <<
"Number of traversed vertices: " << traversedVertexCount <<
'\n';
73 std::cout <<
"Minimum edge weight for vertices 1-3 with degree >= 2:" << vertexMinWeights <<
'\n';
74 std::cout <<
"Target vertex for minimum edge:" << vertexMinTargets <<
'\n';
75 std::cout <<
"Compressed minimum weights:" << compressedVertexMinWeights.getView( 0, traversedVertexCount ) <<
'\n';
76 std::cout <<
"Compressed minimum targets:" << compressedVertexMinTargets.getView( 0, traversedVertexCount ) <<
'\n';
77}
78
79int
80main( int argc, char* argv[] )
81{
83 reduceVerticesWithArgumentIfExample< TNL::Devices::Host >();
84
85#ifdef __CUDACC__
87 reduceVerticesWithArgumentIfExample< TNL::Devices::Cuda >();
88#endif
89
90 return EXIT_SUCCESS;
91}
Graph::IndexType reduceVerticesWithArgumentIf(Graph &graph, IndexBegin begin, IndexEnd end, Condition &&condition, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes based on a ...
The output looks as follows:
Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of traversed vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Target vertex for minimum edge:[ -1, 2, -1, 0, -1 ]
Compressed minimum weights:[ 30, 70 ]
Compressed minimum targets:[ 2, 0 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of traversed vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Target vertex for minimum edge:[ -1, 2, -1, 0, -1 ]
Compressed minimum weights:[ 30, 70 ]
Compressed minimum targets:[ 2, 0 ]
Graph Views
Similar to matrix views, graph views (TNL::Graphs::GraphView) are lightweight reference objects that allow passing graphs to GPU kernels. The graph views can be obtained from the graph using the method TNL::Graphs::Graph::getView or TNL::Graphs::Graph::getConstView.
Graph views are particularly useful when:
- Passing graphs to custom GPU kernels
- Working with graphs in lambda functions
- Creating shallow copies for parallel algorithms
Performance Considerations
Choosing the Right Matrix Format
For sparse graphs, different segment formats offer different performance characteristics. The user may choose any type of segments
For example graph with Ellpack format can be obtained as follows:
Data structure for Ellpack segments.
Definition Ellpack.h:42
Further Reading