Template Numerical Library version\ main:4e6e2c1
Loading...
Searching...
No Matches
Graphs

Introduction

Graphs are fundamental data structures used to represent relationships between objects. In TNL, the TNL::Graphs::Graph class provides a flexible and efficient implementation of graphs that can run on both CPUs and GPUs.

A graph in TNL consists of vertices (also called nodes) and edges connecting pairs of vertices. Edges can be:

  • Directed: An edge from vertex A to vertex B is different from an edge from B to A
  • Undirected: An edge between A and B can be traversed in both directions
  • Weighted: Each edge has an associated weight (numerical value)

Internally, TNL represents graphs using an adjacency matrix, which stores information about which vertices are connected by edges and what weights those edges have.

Graph Orientation: Directed vs. Undirected Graphs

TNL supports both directed and undirected graphs through the type tags TNL::Graphs::DirectedGraph and TNL::Graphs::UndirectedGraph :

Sparse vs. Dense Adjacency Matrices

TNL graphs can use either sparse or dense adjacency matrices to store edge information. The choice significantly impacts memory usage and performance.

Sparse Adjacency Matrices And Sparse Graphs (Default)

By default, graphs use sparse matrix representations (typically CSR format). A sparse graph only stores information about edges that actually exist:

The memory usage is proportional to the number of edges in the graph. By default, the sparse adjacency matrix is stored in the CSR format (segments) but any other format (segments) defined in TNL::Algorithms::Segments can be used instead. For unweighted graphs, using ValueType = bool avoids storing numerical edge weights and reduces memory overhead, as only the presence or absence of edges is represented.

Dense Adjacency Matrices And Dense Graphs

Dense graphs store information about all possible edges between all vertex pairs:

using DenseGraphType = TNL::Graphs::Graph<
float,
Device,
int,
TNL::Algorithms::Segments::CSR, // Type of segments is ignored for dense matrices

Memory usage is proportional to the square of the number of vertices (O(V²)). This representation is best suited for complete or nearly complete graphs, where most vertex pairs are connected. Missing edges must be represented explicitly by the user (e.g., via a chosen sentinel value or encoding), since the matrix representation assumes that all vertex pairs are present, including diagonal entries representing self-loops. It is therefore the user’s responsibility to omit such edges when they are not desired.

Comparison Table

Aspect Sparse Graph Dense Graph
Memory O(V + E) O(V²)
Edge lookup O(degree) O(1)
Best for Sparse connectivity Dense connectivity

In the following examples, we assume the following graph types:

Constructing Graphs

TNL provides several ways to construct graphs.

Default Constructor

To create an empty graph, the default constructor can be used:

/***
* Example 1: Default constructor
*/
std::cout << "Example 1: Default constructor\n";
GraphType graph1;
std::cout << "Empty graph - vertices: " << graph1.getVertexCount() << ", edges: " << graph1.getEdgeCount() << "\n\n";

Constructor with Vertex Count

Constructor with vertex count creates a graph with specified number of vertices but no edges:

/***
* Example 2: Constructor with number of vertices
*/
std::cout << "Example 2: Constructor with vertex count\n";
GraphType graph2( 5 ); // 5 vertices, no edges
std::cout << "Graph with 5 vertices - vertices: " << graph2.getVertexCount() << ", edges: " << graph2.getEdgeCount()
<< "\n\n";

Constructor with Initializer List for Sparse and Dense Graphs

Edges in both sparse and dense graphs can be specified as (sourceIdx, targetIdx, weight) tuples in an initializer list, together with the number of vertices:

/***
* Example 3: Constructor with vertices and edges (initializer list - sparse graph)
*/
std::cout << "Example 3a: Constructor with initializer list (sparse graph)\n";
// clang-format off
GraphType graph3a( 5, // number of vertices
{ // definition of edges with weights
{ 0, 1, 10.0 }, { 0, 2, 20.0 },
{ 1, 2, 30.0 }, { 1, 3, 40.0 },
{ 2, 3, 50.0 },
{ 3, 0, 60.0 }, { 3, 4, 70.0 } } );
// clang-format on
std::cout << "Sparse graph:\n" << graph3a << '\n';

In a sparse graph, unspecified edges are considered missing. In a dense graph, unspecified edges are represented by zero weights.

Constructor with Initializer List for Sparse Undirected Graphs

Edges of sparse undirected graphs can be specified in the same way, but only half of them are required:

/***
* Example 3b: Constructor for undirected graph
*/
std::cout << "Example 3b: Undirected graph constructor\n";
// clang-format off
UndirectedGraphType graph3b( 4,
{ // only need to specify edges once for undirected graph
{ 0, 1, 1.0 }, { 0, 2, 2.0 },
{ 1, 2, 3.0 },
// clang-format on
std::cout << "Undirected graph:\n" << graph3b << '\n';

The last parameter specifies the encoding of the edges (see also TNL::Matrices::MatrixElementsEncoding):

  • Complete – all edges must be specified explicitly, and symmetry is required.
  • SymmetricLower – only the lower triangular part of the adjacency matrix is specified, i.e. edges (u, v) with u > v.
  • SymmetricUpper – only the upper triangular part of the adjacency matrix is specified, i.e. edges (u, v) with u < v.
  • SymmetricMixed – elements from both the lower and upper triangular parts of the adjacency matrix may be specified; the corresponding symmetric elements are filled automatically. This is the default setting for the undirected graphs.

By default, undirected graphs are represented by a symmetric sparse matrix, i.e. a matrix in which only the lower triangular part is stored. This behavior can be changed by explicitly selecting a general adjacency matrix type (see also TNL::Matrices::SparseMatrix).

Constructor with Initializer List for Dense Graphs

Edges of dense graphs can be specified by providing all edge weights in a two-dimensional structure:

/***
* Example 3c: Constructor with initializer list only for dense adjacency matrix.
* For dense matrix graphs, one can also use nested initializer lists {{row1}, {row2}, ...}
*/
std::cout << "\nExample 3c: Constructor with initializer list (dense adjacency matrix)\n";
// clang-format off
DenseGraphType graph3c( { { 0.0, 10.0, 20.0, 0.0, 0.0 }, // edges from vertex 0
{ 0.0, 0.0, 30.0, 40.0, 0.0 }, // edges from vertex 1
{ 0.0, 0.0, 0.0, 50.0, 0.0 }, // edges from vertex 2
{60.0, 0.0, 0.0, 0.0, 70.0 }, // edges from vertex 3
{ 0.0, 0.0, 0.0, 0.0, 0.0 } } ); // edges from vertex 4
// clang-format on
std::cout << "Dense graph (complete graph with weighted edges):\n" << graph3c << '\n';

Constructor with std::map

This constructor builds graphs from a map of edge pairs to weights:

/***
* Example 4: Constructor with vertices and edges (map)
*/
std::cout << "\nExample 4: Constructor with std::map\n";
edgeMap[ { 0, 1 } ] = 1.5;
edgeMap[ { 0, 2 } ] = 2.5;
edgeMap[ { 1, 2 } ] = 3.5;
edgeMap[ { 2, 3 } ] = 4.5;
GraphType graph4( 4, edgeMap );
std::cout << "Graph from map:\n" << graph4 << '\n';

If the graph is dense, unspecified edges are assigned zero weight. If the graph is undirected, the SymmetricMixed encoding of adjacency matrix elements is required. The encoding can be changed using the third constructor parameter, in the same way as for the constructor that takes an initializer list.

Copy constructor

The copy constructor creates a copy of another graph.

/***
* Example 5: Copy constructor
*/
std::cout << "Example 5: Copy constructor\n";
GraphType graph5( graph3a ); // NOLINT(performance-unnecessary-copy-initialization)
std::cout << "Copied graph - vertices: " << graph5.getVertexCount() << ", edges: " << graph5.getEdgeCount() << "\n\n";

Constructor from Adjacency Matrix

This constructor creates a graph from an existing adjacency matrix:

/***
* Example 6: Constructor from adjacency matrix
*/
std::cout << "Example 6: Constructor from adjacency matrix\n";
using MatrixType = typename GraphType::AdjacencyMatrixType;
MatrixType matrix( 3, 3 );
// clang-format off
matrix.setElements( { { 0, 1, 1.0 }, { 0, 2, 2.0 },
{ 1, 2, 3.0 },
{ 2, 0, 4.0 } } );
// clang-format on

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Matrices/DenseMatrix.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Devices/Hip.h>
7#include <TNL/Containers/Vector.h>
8
9template< typename Device >
10void
11constructorsExample()
12{
16
20
22 using DenseGraphType = TNL::Graphs::Graph<
23 float,
24 Device,
25 int,
27 TNL::Algorithms::Segments::CSR, // Type of segments is ignored for dense matrices
30
32 /***
33 * Example 1: Default constructor
34 */
35 std::cout << "Example 1: Default constructor\n";
36 GraphType graph1;
37 std::cout << "Empty graph - vertices: " << graph1.getVertexCount() << ", edges: " << graph1.getEdgeCount() << "\n\n";
39
41 /***
42 * Example 2: Constructor with number of vertices
43 */
44 std::cout << "Example 2: Constructor with vertex count\n";
45 GraphType graph2( 5 ); // 5 vertices, no edges
46 std::cout << "Graph with 5 vertices - vertices: " << graph2.getVertexCount() << ", edges: " << graph2.getEdgeCount()
47 << "\n\n";
49
51 /***
52 * Example 3: Constructor with vertices and edges (initializer list - sparse graph)
53 */
54 std::cout << "Example 3a: Constructor with initializer list (sparse graph)\n";
55 // clang-format off
56 GraphType graph3a( 5, // number of vertices
57 { // definition of edges with weights
58 { 0, 1, 10.0 }, { 0, 2, 20.0 },
59 { 1, 2, 30.0 }, { 1, 3, 40.0 },
60 { 2, 3, 50.0 },
61 { 3, 0, 60.0 }, { 3, 4, 70.0 } } );
62 // clang-format on
63 std::cout << "Sparse graph:\n" << graph3a << '\n';
65
67 /***
68 * Example 3b: Constructor for undirected graph
69 */
70 std::cout << "Example 3b: Undirected graph constructor\n";
71 // clang-format off
72 UndirectedGraphType graph3b( 4,
73 { // only need to specify edges once for undirected graph
74 { 0, 1, 1.0 }, { 0, 2, 2.0 },
75 { 1, 2, 3.0 },
77 // clang-format on
78 std::cout << "Undirected graph:\n" << graph3b << '\n';
80
82 /***
83 * Example 3c: Constructor with initializer list only for dense adjacency matrix.
84 * For dense matrix graphs, one can also use nested initializer lists {{row1}, {row2}, ...}
85 */
86 std::cout << "\nExample 3c: Constructor with initializer list (dense adjacency matrix)\n";
87 // clang-format off
88 DenseGraphType graph3c( { { 0.0, 10.0, 20.0, 0.0, 0.0 }, // edges from vertex 0
89 { 0.0, 0.0, 30.0, 40.0, 0.0 }, // edges from vertex 1
90 { 0.0, 0.0, 0.0, 50.0, 0.0 }, // edges from vertex 2
91 {60.0, 0.0, 0.0, 0.0, 70.0 }, // edges from vertex 3
92 { 0.0, 0.0, 0.0, 0.0, 0.0 } } ); // edges from vertex 4
93 // clang-format on
94 std::cout << "Dense graph (complete graph with weighted edges):\n" << graph3c << '\n';
96
98 /***
99 * Example 4: Constructor with vertices and edges (map)
100 */
101 std::cout << "\nExample 4: Constructor with std::map\n";
102 std::map< std::pair< int, int >, float > edgeMap;
103 edgeMap[ { 0, 1 } ] = 1.5;
104 edgeMap[ { 0, 2 } ] = 2.5;
105 edgeMap[ { 1, 2 } ] = 3.5;
106 edgeMap[ { 2, 3 } ] = 4.5;
107
108 GraphType graph4( 4, edgeMap );
109 std::cout << "Graph from map:\n" << graph4 << '\n';
111
113 /***
114 * Example 5: Copy constructor
115 */
116 std::cout << "Example 5: Copy constructor\n";
117 GraphType graph5( graph3a ); // NOLINT(performance-unnecessary-copy-initialization)
118 std::cout << "Copied graph - vertices: " << graph5.getVertexCount() << ", edges: " << graph5.getEdgeCount() << "\n\n";
120
122 /***
123 * Example 6: Constructor from adjacency matrix
124 */
125 std::cout << "Example 6: Constructor from adjacency matrix\n";
126 using MatrixType = typename GraphType::AdjacencyMatrixType;
127 MatrixType matrix( 3, 3 );
128 // clang-format off
129 matrix.setElements( { { 0, 1, 1.0 }, { 0, 2, 2.0 },
130 { 1, 2, 3.0 },
131 { 2, 0, 4.0 } } );
132 // clang-format on
134
135 GraphType graph6( matrix );
136 std::cout << "Graph from adjacency matrix:\n" << graph6 << '\n';
137}
138
139int
140main( int argc, char* argv[] )
141{
142 std::cout << "Running on host:\n";
143 constructorsExample< TNL::Devices::Host >();
144
145#ifdef __CUDACC__
146 std::cout << "Running on CUDA device:\n";
147 constructorsExample< TNL::Devices::Cuda >();
148#endif
149
150#ifdef __HIP__
151 std::cout << "Running on HIP device:\n";
152 constructorsExample< TNL::Devices::Hip >();
153#endif
154
155 return EXIT_SUCCESS;
156}
Data structure for CSR segments.
Definition CSR.h:37
Implementation of dense matrix, i.e. matrix storing explicitly all of its elements including zeros.
Definition DenseMatrix.h:31
@ SymmetricMixed
Definition MatrixBase.h:38
Type tag for directed graphs.
Definition GraphOrientation.h:65
Graph class represents a mathematical graph using an adjacency matrix.
Definition Graph.h:57

And the output looks as follows:

Running on host:
Example 1: Default constructor
Empty graph - vertices: 0, edges: 0
Example 2: Constructor with vertex count
Graph with 5 vertices - vertices: 5, edges: 0
Example 3a: Constructor with initializer list (sparse graph)
Sparse graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40
Row: 2 -> 3:50
Row: 3 -> 0:60 4:70
Row: 4 ->
Example 3b: Undirected graph constructor
Undirected graph:
Row: 0 -> 1:1 2:2
Row: 1 -> 0:1 2:3
Row: 2 -> 0:2 1:3 3:4
Row: 3 -> 2:4
Example 3c: Constructor with initializer list (dense adjacency matrix)
Dense graph (complete graph with weighted edges):
Row: 0 -> 0:0 1:10 2:20 3:0 4:0
Row: 1 -> 0:0 1:0 2:30 3:40 4:0
Row: 2 -> 0:0 1:0 2:0 3:50 4:0
Row: 3 -> 0:60 1:0 2:0 3:0 4:70
Row: 4 -> 0:0 1:0 2:0 3:0 4:0
Example 4: Constructor with std::map
Graph from map:
Row: 0 -> 1:1.5 2:2.5
Row: 1 -> 2:3.5
Row: 2 -> 3:4.5
Row: 3 ->
Example 5: Copy constructor
Copied graph - vertices: 5, edges: 7
Example 6: Constructor from adjacency matrix
Graph from adjacency matrix:
Row: 0 -> 1:1 2:2
Row: 1 -> 2:3
Row: 2 -> 0:4
Running on CUDA device:
Example 1: Default constructor
Empty graph - vertices: 0, edges: 0
Example 2: Constructor with vertex count
Graph with 5 vertices - vertices: 5, edges: 0
Example 3a: Constructor with initializer list (sparse graph)
Sparse graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40
Row: 2 -> 3:50
Row: 3 -> 0:60 4:70
Row: 4 ->
Example 3b: Undirected graph constructor
Undirected graph:
Row: 0 -> 1:1 2:2
Row: 1 -> 0:1 2:3
Row: 2 -> 0:2 1:3 3:4
Row: 3 -> 2:4
Example 3c: Constructor with initializer list (dense adjacency matrix)
Dense graph (complete graph with weighted edges):
Row: 0 -> 0:0 1:10 2:20 3:0 4:0
Row: 1 -> 0:0 1:0 2:30 3:40 4:0
Row: 2 -> 0:0 1:0 2:0 3:50 4:0
Row: 3 -> 0:60 1:0 2:0 3:0 4:70
Row: 4 -> 0:0 1:0 2:0 3:0 4:0
Example 4: Constructor with std::map
Graph from map:
Row: 0 -> 1:1.5 2:2.5
Row: 1 -> 2:3.5
Row: 2 -> 3:4.5
Row: 3 ->
Example 5: Copy constructor
Copied graph - vertices: 5, edges: 7
Example 6: Constructor from adjacency matrix
Graph from adjacency matrix:
Row: 0 -> 1:1 2:2
Row: 1 -> 2:3
Row: 2 -> 0:4

Setting Edges

After creating a graph, you can set or modify its edges using the setEdges and setDenseEdges methods. These methods behave similarly to the corresponding constructors.

Setting Edges with Initializer List (Sparse)

For sparse and dense graphs:

/***
* Example 1: setEdges with initializer list (sparse graph)
*/
std::cout << "Example 1: setEdges with initializer list (sparse graph)\n";
GraphType graph1;
graph1.setVertexCount( 5 );
graph1.setEdgeCounts( TNL::Containers::Vector< int, Device >( { 2, 3, 1, 2, 0 } ) );
// clang-format off
graph1.setEdges( {
{ 0, 1, 10.0 }, { 0, 2, 20.0 },
{ 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
{ 2, 3, 60.0 },
{ 3, 0, 70.0 }, { 3, 4, 80.0 } } );
// clang-format on
std::cout << "Sparse graph after setEdges:\n" << graph1 << '\n';

The method TNL::Graphs::Graph::setEdges also accepts a matrix elements encoding (see TNL::Matrices::MatrixElementsEncoding) as the second parameter, following the initializer list. This is particularly useful when setting undirected graphs.

Setting Edges with Initializer List (Dense)

For dense graphs, the method TNL::Graphs::Graph::setDenseEdges sets the graph edges based on the complete adjacency matrix:

/***
* Example 1b: setEdges with initializer list (dense adjacency matrix)
* For dense matrix graphs, use nested initializer lists {{row1}, {row2}, ...}
*/
std::cout << "\nExample 1b: setEdges with initializer list (dense adjacency matrix)\n";
using DenseGraphType = TNL::Graphs::Graph<
float,
Device,
int,
TNL::Algorithms::Segments::CSR, // Type of segments is ignored for dense matrices
DenseGraphType graph1b;
graph1b.setVertexCount( 4 );
// clang-format off
graph1b.setDenseEdges( { { 0.0, 10.0, 20.0, 0.0, 0.0 }, // edges from vertex 0
{ 0.0, 0.0, 30.0, 40.0, 50.0 }, // edges from vertex 1
{ 0.0, 0.0, 0.0, 60.0, 0.0 }, // edges from vertex 2
{ 70.0, 0.0, 0.0, 0.0, 80.0 }, // edges from vertex 3
{ 0.0, 0.0, 0.0, 0.0, 0.0 } } ); // edges from vertex 4
// clang-format on
std::cout << "Dense graph after setDenseEdges:\n" << graph1b << '\n';

Setting Edges with std::map

The graph edges can also be set using a std::map:

/***
* Example 2: setEdges with std::map
*/
std::cout << "Example 2: setEdges with std::map\n";
GraphType graph2;
graph2.setVertexCount( 4 );
graph2.setEdgeCounts( TNL::Containers::Vector< int, Device >( { 2, 2, 1, 1 } ) );
edgeMap[ { 0, 1 } ] = 1.5;
edgeMap[ { 0, 2 } ] = 2.5;
edgeMap[ { 1, 2 } ] = 3.5;
edgeMap[ { 1, 3 } ] = 4.5;
edgeMap[ { 2, 3 } ] = 5.5;
edgeMap[ { 3, 0 } ] = 6.5;
graph2.setEdges( edgeMap );
std::cout << "Graph from map:\n" << graph2 << '\n';

Graph Traversal

TNL provides powerful parallel traversal capabilities through the functions TNL::Graphs::forVertices, TNL::Graphs::forEdges, and related utilities. A graph can be traversed either by vertices or by edges. Vertex-based traversal is performed using a vertex view.

The following provides an overview of graph traversal functions:

  1. TNL distinguishes between vertex-based and edge-based traversal. Vertex-based traversal assigns at most one thread to each vertex, while edge-based traversal enables finer-grained parallelism by distributing work across edges.
    Category Operates On Lambda Parameter Use Case
    Edge-wise (forEdges, forAllEdges) Individual edges Edge indices & weights Operate on each edge separately
    Vertex-wise (forVertices, forAllVertices) Individual vertices VertexView object Operate on vertices
  2. The scope of vertex traversal can be defined in several ways: all vertices, a vertex range, an explicit list of vertex indices, or a condition on the vertex index.
    Scope Vertices Processed Parameters
    All All vertices No range/array parameters
    Range Vertices in [begin, end) begin and end indices
    Array Specific vertices Array of vertex indices
    If Vertices filtered by a condition Process vertices based on vertex-level properties
  3. All functions can be called for both constant and non-constant graphs.
    Category Graph Modifiable? Use Case
    Non-const Yes Can modify graph edges and structure
    Const No Read-only access to graph vertices and edges

See also Overview of Graph Traversal Functions for more details.

Traversing By Vertices

A TNL::Graphs::GraphVertexView provides access to a single vertex and its outgoing edges. Vertex views are the primary way to interact with graph structure in parallel GPU code.

Traversing All Vertices

Functions TNL::Graphs::forAllVertices iterates over all vertices in parallel:

/***
* Traverse all vertices and modify their edges.
*/
auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
TNL::Graphs::forAllVertices( graph, processVertex );

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The whole example reads as:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllVerticesExample()
10{
11 /***
12 * Create a directed graph with 5 vertices.
13 */
15 // clang-format off
16 GraphType graph( 5, // number of vertices
17 { // definition of edges with weights
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22 // clang-format on
23
24 /***
25 * Print the graph.
26 */
27 std::cout << "Graph:\n" << graph << '\n';
28
30 /***
31 * Traverse all vertices and modify their edges.
32 */
33 auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
34 {
35 for( int i = 0; i < vertex.getDegree(); i++ )
36 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
37 };
38 TNL::Graphs::forAllVertices( graph, processVertex );
40
41 /***
42 * Print the modified graph.
43 */
44 std::cout << "Modified graph:\n" << graph << '\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
50 std::cout << "Running on host:\n";
51 forAllVerticesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
54 std::cout << "Running on CUDA device:\n";
55 forAllVerticesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
#define __cuda_callable__
Definition Macros.h:49
void forAllVertices(Graph &graph, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all graph vertices and applies the given lambda function to each vertex.

and the output reads as:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->

Traversing a Range of Vertices

Traversing vertices in a specific range [begin, end) can be done using the function TNL::Graphs::forVertices :

/***
* Traverse vertices in range [1, 4) and modify their edges.
*/
auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
TNL::Graphs::forVertices( graph, 1, 4, processVertex );

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The whole example reads as:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forVerticesExample()
10{
11 /***
12 * Create a directed graph with 5 vertices.
13 */
15 // clang-format off
16 GraphType graph( 5, // number of vertices
17 { // definition of edges with weights
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22 // clang-format on
23
24 /***
25 * Print the graph.
26 */
27 std::cout << "Graph:\n" << graph << '\n';
28
30 /***
31 * Traverse vertices in range [1, 4) and modify their edges.
32 */
33 auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
34 {
35 for( int i = 0; i < vertex.getDegree(); i++ )
36 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
37 };
38 TNL::Graphs::forVertices( graph, 1, 4, processVertex );
40
41 /***
42 * Print the modified graph.
43 */
44 std::cout << "Modified graph:\n" << graph << '\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
50 std::cout << "Running on host:\n";
51 forVerticesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
54 std::cout << "Running on CUDA device:\n";
55 forVerticesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
void forVertices(Graph &graph, IndexBegin begin, IndexEnd end, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over graph vertices within the specified range of vertex indexes and applies the...

and the output reads as:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->

Traversing Vertices with Given Indices

The function TNL::Graphs::forVertices also allows traversing vertices with explicitly specified indices. The indices are provided as an array:

/***
* Create an array of vertex indices.
*/
TNL::Containers::Vector< int, Device > vertices( { 0, 2, 4 } );

The vertices with these indices can be traversed as follows:

/***
* Traverse only the specified vertices.
*/
auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
TNL::Graphs::forVertices( graph, vertices, processVertex );

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The whole example reads as:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10forVerticesWithIndexesExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
31 /***
32 * Create an array of vertex indices.
33 */
34 TNL::Containers::Vector< int, Device > vertices( { 0, 2, 4 } );
36
38 /***
39 * Traverse only the specified vertices.
40 */
41 auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
42 {
43 for( int i = 0; i < vertex.getDegree(); i++ )
44 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
45 };
46 TNL::Graphs::forVertices( graph, vertices, processVertex );
48
49 /***
50 * Print the modified graph.
51 */
52 std::cout << "Modified graph:\n" << graph << '\n';
53}
54
55int
56main( int argc, char* argv[] )
57{
58 std::cout << "Running on host:\n";
59 forVerticesWithIndexesExample< TNL::Devices::Host >();
60
61#ifdef __CUDACC__
62 std::cout << "Running on CUDA device:\n";
63 forVerticesWithIndexesExample< TNL::Devices::Cuda >();
64#endif
65
66 return EXIT_SUCCESS;
67}
Vector extends Array with algebraic operations.
Definition Vector.h:37

and the output reads as:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 4:65
Row: 3 -> 0:70 4:80
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 4:65
Row: 3 -> 0:70 4:80
Row: 4 ->

Traversing Vertices with a Condition

Use forAllVerticesIf to process only vertices that meet criteria specified by the lambda function condition:

/***
* Define a condition: process only vertices with more than one edge.
*/
auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
{
return graph.getVertexDegree( vertexIdx ) > 1;
};

The lambda function takes a vertex index and returns true for vertices that should be traversed; all other vertices are skipped. The function forAllVerticesIf is invoked as follows:

/***
* Traverse vertices that satisfy the condition.
*/
auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
{
for( int i = 0; i < vertex.getDegree(); i++ )
vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
};
TNL::Graphs::forAllVerticesIf( graph, condition, processVertex );

There is also an alternative function, TNL::Graphs::forVerticesIf, which accepts a range of vertex indices [begin, end). Only vertices within this range are considered for traversal.

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The complete example reads as:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllVerticesIfExample()
10{
11 /***
12 * Create a directed graph with 5 vertices.
13 */
15 // clang-format off
16 GraphType graph( 5, // number of vertices
17 { // definition of edges with weights
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22 // clang-format on
23
24 /***
25 * Print the graph.
26 */
27 std::cout << "Graph:\n" << graph << '\n';
28
30 /***
31 * Define a condition: process only vertices with more than one edge.
32 */
33 auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
34 {
35 return graph.getVertexDegree( vertexIdx ) > 1;
36 };
38
40 /***
41 * Traverse vertices that satisfy the condition.
42 */
43 auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
44 {
45 for( int i = 0; i < vertex.getDegree(); i++ )
46 vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
47 };
48 TNL::Graphs::forAllVerticesIf( graph, condition, processVertex );
50
51 /***
52 * Print the modified graph.
53 */
54 std::cout << "Modified graph:\n" << graph << '\n';
55}
56
57int
58main( int argc, char* argv[] )
59{
60 std::cout << "Running on host:\n";
61 forAllVerticesIfExample< TNL::Devices::Host >();
62
63#ifdef __CUDACC__
64 std::cout << "Running on CUDA device:\n";
65 forAllVerticesIfExample< TNL::Devices::Cuda >();
66#endif
67
68 return EXIT_SUCCESS;
69}
void forAllVerticesIf(Graph &graph, VertexCondition &&vertexCondition, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all graph vertices, applying a condition to determine whether each vertex s...

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 3:60
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 3:60
Row: 3 -> 1:75 0:85
Row: 4 ->

Traversing By Edges

Accessing edges from a vertex is possible; however, in this mode each vertex is processed by at most one thread. For higher degrees of parallelism, specialized traversal functions must be used.

Traversing All Edges

The TNL::Graphs::forAllEdges function iterates over all edges in the graph:

/***
* Traverse all edges and modify them.
*/
auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
TNL::Graphs::forAllEdges( graph, modifyEdge );

The localIdx index specifies the rank of an edge within the set of all edges incident to the vertex sourceIdx. Note that for sparse, non-constant graphs, both targetIdx and weight can be modified. For dense graphs, only weight can be modified, and the localIdx index is equal to targetIdx.

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllEdgesExample()
10{
11 /***
12 * Create a directed graph with 5 vertices.
13 */
15 // clang-format off
16 GraphType graph( 5, // number of vertices
17 { // definition of edges with weights
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22 // clang-format on
23
24 /***
25 * Print the graph.
26 */
27 std::cout << "Graph:\n" << graph << '\n';
28
30 /***
31 * Traverse all edges and modify them.
32 */
33 auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
34 {
35 targetIdx = ( targetIdx + 1 ) % 5;
36 weight += 5;
37 };
38 TNL::Graphs::forAllEdges( graph, modifyEdge );
40
41 /***
42 * Print the modified graph.
43 */
44 std::cout << "Modified graph:\n" << graph << '\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
50 std::cout << "Running on host:\n";
51 forAllEdgesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
54 std::cout << "Running on CUDA device:\n";
55 forAllEdgesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
void forAllEdges(Graph &graph, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all edges of all graph vertices and applies the specified lambda function.

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 2:15 3:25
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->

Traversing Edges in a Range

Traversing only edges connected to vertices in a specific range [begin, end) can be done as follows:

/***
* Traverse edges in range [1, 4) and modify them.
*/
auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
TNL::Graphs::forEdges( graph, 1, 4, modifyEdge );

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllEdgesExample()
10{
11 /***
12 * Create a directed graph with 5 vertices.
13 */
15 // clang-format off
16 GraphType graph( 5, // number of vertices
17 { // definition of edges with weights
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22 // clang-format on
23
24 /***
25 * Print the graph.
26 */
27 std::cout << "Graph:\n" << graph << '\n';
28
30 /***
31 * Traverse edges in range [1, 4) and modify them.
32 */
33 auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
34 {
35 targetIdx = ( targetIdx + 1 ) % 5;
36 weight += 5;
37 };
38 TNL::Graphs::forEdges( graph, 1, 4, modifyEdge );
40
41 /***
42 * Print the modified graph.
43 */
44 std::cout << "Modified graph:\n" << graph << '\n';
45}
46
47int
48main( int argc, char* argv[] )
49{
50 std::cout << "Running on host:\n";
51 forAllEdgesExample< TNL::Devices::Host >();
52
53#ifdef __CUDACC__
54 std::cout << "Running on CUDA device:\n";
55 forAllEdgesExample< TNL::Devices::Cuda >();
56#endif
57
58 return EXIT_SUCCESS;
59}
void forEdges(Graph &graph, IndexBegin begin, IndexEnd end, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all edges in the given range of graph vertices and applies the specified la...

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 3:35 4:45 0:55
Row: 2 -> 4:65
Row: 3 -> 1:75 0:85
Row: 4 ->

Traversing Edges for Specific Vertices

The function TNL::Graphs::forEdges can traverse only edges of vertices with explicitly specified indices, similarly to TNL::Graphs::forVertices.

The indices of vertices for traversing are specified as follows:

/***
* Create an array of vertex indices to process.
*/
TNL::Containers::Vector< int, Device > vertices( { 0, 2, 3 } );

The traversal itself is performed as follows:

/***
* Traverse edges only from the specified vertices.
*/
auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
TNL::Graphs::forEdges( graph, vertices, modifyEdge );

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10forEdgesWithIndexesExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
31 /***
32 * Create an array of vertex indices to process.
33 */
34 TNL::Containers::Vector< int, Device > vertices( { 0, 2, 3 } );
36
38 /***
39 * Traverse edges only from the specified vertices.
40 */
41 auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
42 {
43 targetIdx = ( targetIdx + 1 ) % 5;
44 weight += 5;
45 };
46 TNL::Graphs::forEdges( graph, vertices, modifyEdge );
48
49 /***
50 * Print the modified graph.
51 */
52 std::cout << "Modified graph:\n" << graph << '\n';
53}
54
55int
56main( int argc, char* argv[] )
57{
58 std::cout << "Running on host:\n";
59 forEdgesWithIndexesExample< TNL::Devices::Host >();
60
61#ifdef __CUDACC__
62 std::cout << "Running on CUDA device:\n";
63 forEdgesWithIndexesExample< TNL::Devices::Cuda >();
64#endif
65
66 return EXIT_SUCCESS;
67}

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->

Conditional Edge Traversal

For conditional traversal of edges, the function TNL::Graphs::forAllEdgesIf can be used. The condition is expressed by a lambda function condition:

/***
* Define a condition: process only vertices with even indices.
*/
auto condition = [] __cuda_callable__( int vertexIdx ) -> bool
{
return vertexIdx % 2 == 0;
};

The lambda function takes a vertex index and returns true if the edges incident to that vertex should be traversed; otherwise, it returns false. The traversal is executed as follows:

/***
* Traverse edges only from vertices that satisfy the condition.
*/
auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
{
targetIdx = ( targetIdx + 1 ) % 5;
weight += 5;
};
TNL::Graphs::forAllEdgesIf( graph, condition, modifyEdge );

The function TNL::Graphs::forEdgesIf behaves in the same way and additionally accepts a range of vertex indices to be traversed.

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/traverse.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6
7template< typename Device >
8void
9forAllEdgesIfExample()
10{
11 /***
12 * Create a directed graph with 5 vertices.
13 */
15 // clang-format off
16 GraphType graph( 5, // number of vertices
17 { // definition of edges with weights
18 { 0, 1, 10.0 }, { 0, 2, 20.0 },
19 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
20 { 2, 3, 60.0 },
21 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
22 // clang-format on
23
24 /***
25 * Print the graph.
26 */
27 std::cout << "Graph:\n" << graph << '\n';
28
30 /***
31 * Define a condition: process only vertices with even indices.
32 */
33 auto condition = [] __cuda_callable__( int vertexIdx ) -> bool
34 {
35 return vertexIdx % 2 == 0;
36 };
38
40 /***
41 * Traverse edges only from vertices that satisfy the condition.
42 */
43 auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
44 {
45 targetIdx = ( targetIdx + 1 ) % 5;
46 weight += 5;
47 };
48 TNL::Graphs::forAllEdgesIf( graph, condition, modifyEdge );
50
51 /***
52 * Print the modified graph.
53 */
54 std::cout << "Modified graph:\n" << graph << '\n';
55}
56
57int
58main( int argc, char* argv[] )
59{
60 std::cout << "Running on host:\n";
61 forAllEdgesIfExample< TNL::Devices::Host >();
62
63#ifdef __CUDACC__
64 std::cout << "Running on CUDA device:\n";
65 forAllEdgesIfExample< TNL::Devices::Cuda >();
66#endif
67
68 return EXIT_SUCCESS;
69}
void forAllEdgesIf(Graph &graph, Condition &&condition, Function &&function, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Iterates in parallel over all edges of all graph vertices based on a condition.

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Modified graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->

Const vs. Non-Const Edge Traversal

For constant graphs, edge data is read-only:

constGraph,
[] __cuda_callable__ ( int source, int local, int target, const float& weight ) {
// target and weight are read-only
// No 'mutable' needed
}
);

Reductions on Graphs

Graph reductions allow efficient computation of aggregate values over edges or vertices. Such computations can also be implemented using vertex traversal functions (TNL::Graphs::forAllVertices, TNL::Graphs::forVertices, TNL::Graphs::forAllVerticesIf, TNL::Graphs::forVerticesIf). However, to achieve a higher level of parallelism, TNL provides specialized reduction operations for graph data structures.

The following provides an overview of graph reduction functions:

  1. Reductions can be performed either in a basic mode or with arguments, which additionally return the position of the edge being searched for.
    Category Tracks Position? Use Case
    Basic No Only the reduced weight is needed (e.g., vertex sum, vertex max)
    WithArgument Yes Need weight and target vertex index (e.g., max weight and where it occurs)
  2. The scope of reduction can be defined in several ways: all vertices, a vertex range, an explicit list of vertex indices, or a condition on the vertex index.
    Scope Vertices Processed Parameters
    All All vertices No range/array parameters
    Range Vertices [begin, end) begin and end indices
    Array Specific vertices Array of vertex indices
    If Vertices filtered by a condition Process vertices based on vertex-level properties
  3. All functions can be called for both constant and non-constant graphs.
    Category Graph Modifiable? Use Case
    Non-const Yes Can modify graph edges during reduction
    Const No Read-only access to graph edges

See also Overview of Graph Reduction Functions for more details.

Reducing Over Edges of a Vertex

The function TNL::Graphs::reduceVertices computes a reduction over edges adjacent to the specified vertices. First, we define a vector vertexMaxWeights (together with the corresponding vector view vertexMaxWeights_view ) to store the results of the reduction:

TNL::Containers::Vector< float, Device > vertexMaxWeights( 5, -1 );
auto vertexMaxWeights_view = vertexMaxWeights.getView();

Next, we define the fetch lambda function responsible for reading the required data:

auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
{
return weight;
};

The lambda function fetch takes the indices sourceIdx and targetIdx of the source and target vertices of an edge, respectively. The third parameter is the edge weight. For non-constant graphs, the weight parameter may be passed by reference and can be modified during the fetch operation. This allows the reduction and graph modification to be performed simultaneously. For sparse graphs, the targetIdx index can also be modified.

Next, we define the store lambda function, which is responsible for storing the reduction results for individual vertices:

auto store = [ = ] __cuda_callable__( int vertexIdx, const float& maxWeight ) mutable
{
vertexMaxWeights_view[ vertexIdx ] = maxWeight;
};

Finally, the reduction is executed as follows:

TNL::Graphs::reduceVertices( graph, 1, 4, fetch, TNL::Max{}, store );

Here, we use TNL::Max for reduction (see also Function objects for reduction operations for other functionals). Another variant of TNL::Graphs::reduceVertices allows to define the reduction operation via a lambda function.

Note that the function TNL::Graphs::reduceAllVertices is also available for performing reductions over all vertices.

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
30 /***
31 * Compute maximum edge weight for vertices in range [1, 4).
32 */
34 TNL::Containers::Vector< float, Device > vertexMaxWeights( 5, -1 );
35 auto vertexMaxWeights_view = vertexMaxWeights.getView();
37
39 auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
40 {
41 return weight;
42 };
44
46 auto store = [ = ] __cuda_callable__( int vertexIdx, const float& maxWeight ) mutable
47 {
48 vertexMaxWeights_view[ vertexIdx ] = maxWeight;
49 };
51
53 TNL::Graphs::reduceVertices( graph, 1, 4, fetch, TNL::Max{}, store );
55
56 /***
57 * Print results.
58 */
59 std::cout << "Maximum edge weight for vertices 1-3:" << vertexMaxWeights << '\n';
60}
61
62int
63main( int argc, char* argv[] )
64{
65 std::cout << "Running on host:\n";
66 reduceVerticesExample< TNL::Devices::Host >();
67
68#ifdef __CUDACC__
69 std::cout << "Running on CUDA device:\n";
70 reduceVerticesExample< TNL::Devices::Cuda >();
71#endif
72
73 return EXIT_SUCCESS;
74}
void reduceVertices(Graph &graph, IndexBegin begin, IndexEnd end, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes.
Function object implementing max(x, y).
Definition Functional.h:272

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Maximum edge weight for vertices 1-3:[ -1, 50, 60, 80, -1 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Maximum edge weight for vertices 1-3:[ -1, 50, 60, 80, -1 ]

Reducing Over Edges of Specific Vertices and Conditional Reduction

Reductions over a specified set of vertices can be performed using TNL::Graphs::reduceVertices. Conditional reductions are provided by TNL::Graphs::reduceAllVerticesIf and TNL::Graphs::reduceVerticesIf. Their behavior is analogous to the corresponding vertex traversal functions; however, they differ in how the results of the reduction are stored, as demonstrated in the following code snippet.

Assume that we perform a reduction over a given set of vertices:

/***
* Compute sum of edge weights for specific vertices (0, 2, 3).
*/
TNL::Containers::Vector< int, Device > vertexIndices( { 0, 2, 3 } );
TNL::Containers::Vector< float, Device > compressedVertexSums( 3, -1 );
auto vertexSums_view = vertexSums.getView();
auto compressedVertexSums_view = compressedVertexSums.getView();
auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
{
return weight;
};
auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& sum ) mutable
{
compressedVertexSums_view[ indexOfVertexIdx ] = sum;
vertexSums_view[ vertexIdx ] = sum;
};
TNL::Graphs::reduceVertices( graph, vertexIndices, fetch, TNL::Plus{}, store );

The store lambda function takes the following parameters:

  • indexOfVertexIdx – the rank of the vertex within the set of vertices being processed. In this example, the reduction is performed only for vertices 0, 2, and 3. Their corresponding indexOfVertexIdx values are 0, 1, and 2, respectively.
  • vertexIdx – the index of the vertex whose reduction result is currently being stored.
  • sum – the result of the reduction, i.e. the sum of the weights of all edges connected to the vertex.

The whole exmpale reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesWithIndexesExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
31 /***
32 * Compute sum of edge weights for specific vertices (0, 2, 3).
33 */
34 TNL::Containers::Vector< int, Device > vertexIndices( { 0, 2, 3 } );
36 TNL::Containers::Vector< float, Device > compressedVertexSums( 3, -1 );
37 auto vertexSums_view = vertexSums.getView();
38 auto compressedVertexSums_view = compressedVertexSums.getView();
39
40 auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
41 {
42 return weight;
43 };
44
45 auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& sum ) mutable
46 {
47 compressedVertexSums_view[ indexOfVertexIdx ] = sum;
48 vertexSums_view[ vertexIdx ] = sum;
49 };
50
51 TNL::Graphs::reduceVertices( graph, vertexIndices, fetch, TNL::Plus{}, store );
53
54 /***
55 * Print results.
56 */
57 std::cout << "Sum of edge weights for specific vertices:" << vertexSums << '\n';
58 std::cout << "Compressed sums:" << compressedVertexSums.getView( 0, vertexIndices.getSize() ) << '\n';
59}
60
61int
62main( int argc, char* argv[] )
63{
64 std::cout << "Running on host:\n";
65 reduceVerticesWithIndexesExample< TNL::Devices::Host >();
66
67#ifdef __CUDACC__
68 std::cout << "Running on CUDA device:\n";
69 reduceVerticesWithIndexesExample< TNL::Devices::Cuda >();
70#endif
71
72 return EXIT_SUCCESS;
73}
Function object implementing x + y.
Definition Functional.h:34

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Sum of edge weights for specific vertices:[ 30, -1, 60, 150, -1 ]
Compressed sums:[ 30, 60, 150 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Sum of edge weights for specific vertices:[ 30, -1, 60, 150, -1 ]
Compressed sums:[ 30, 60, 150 ]

The store lambda function behaves in the same way for conditional reductions, as demonstrated by the following code snippet:

/***
* Compute minimum edge weight for vertices in range [1, 4) with degree >= 2.
*/
TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
auto vertexMinWeights_view = vertexMinWeights.getView();
auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
{
return graph.getVertexDegree( vertexIdx ) >= 2;
};
auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
{
return weight;
};
auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& minWeight ) mutable
{
compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
vertexMinWeights_view[ vertexIdx ] = minWeight;
};
int reducedVertexCount = TNL::Graphs::reduceVerticesIf( graph, 1, 4, condition, fetch, TNL::Min{}, store );

The function TNL::Graphs::reduceVerticesIf also returns the number of vertices (reducedVertexCount) for which the reduction has been performed. This is particularly useful for subsequent processing of data stored in arrays with compressed results (for example, compressedVertexMinWeights_view in this example).

The whole example reads as:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesIfExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
31 /***
32 * Compute minimum edge weight for vertices in range [1, 4) with degree >= 2.
33 */
34 TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
35 TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
36 auto vertexMinWeights_view = vertexMinWeights.getView();
37 auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
38
39 auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
40 {
41 return graph.getVertexDegree( vertexIdx ) >= 2;
42 };
43
44 auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
45 {
46 return weight;
47 };
48
49 auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& minWeight ) mutable
50 {
51 compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
52 vertexMinWeights_view[ vertexIdx ] = minWeight;
53 };
54
55 int reducedVertexCount = TNL::Graphs::reduceVerticesIf( graph, 1, 4, condition, fetch, TNL::Min{}, store );
57
58 /***
59 * Print results.
60 */
61 std::cout << "Number of reduced vertices: " << reducedVertexCount << '\n';
62 std::cout << "Minimum edge weight for vertices 1-3 with degree >= 2:" << vertexMinWeights << '\n';
63 std::cout << "Compressed minimum weights:" << compressedVertexMinWeights.getView( 0, reducedVertexCount ) << '\n';
64}
65
66int
67main( int argc, char* argv[] )
68{
69 std::cout << "Running on host:\n";
70 reduceVerticesIfExample< TNL::Devices::Host >();
71
72#ifdef __CUDACC__
73 std::cout << "Running on CUDA device:\n";
74 reduceVerticesIfExample< TNL::Devices::Cuda >();
75#endif
76
77 return EXIT_SUCCESS;
78}
Graph::IndexType reduceVerticesIf(Graph &graph, IndexBegin begin, IndexEnd end, Condition &&condition, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes based on a ...
Function object implementing min(x, y).
Definition Functional.h:242

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of reduced vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Compressed minimum weights:[ 30, 70 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of reduced vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Compressed minimum weights:[ 30, 70 ]

Reducing Over Edges of a Vertex with Argument

Reductions with arguments allow tracking graph edges during the computation. This is useful, for example, when searching for the edge with the maximal (or minimal) weight among edges adjacent to given vertices, or for similar operations. Such functionality is provided by TNL::Graphs::reduceVerticesWithArgument and related functions (TNL::Graphs::reduceAllVerticesWithArgument, TNL::Graphs::reduceVerticesWithArgumentIf, TNL::Graphs::reduceAllVerticesWithArgumentIf).

In the following example, for each vertex in a given range, we search for the edge with the minimal weight. We begin by defining vectors and corresponding vector views to store the results of the reduction:

auto minWeights_view = minWeights.getView();
auto minTargets_view = minTargets.getView();

The fetch lambda function behaves in the same way as in a standard reduction:

auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
{
return weight;
};

The store lambda function is defined as follows:

auto store = [ = ] __cuda_callable__( int vertexIdx, int localIdx, int targetIdx, float result, bool isolatedVertex ) mutable
{
minWeights_view[ vertexIdx ] = result;
if( ! isolatedVertex )
minTargets_view[ vertexIdx ] = targetIdx;
};

In addition to the reduction result, it receives the parameters localIdx, targetIdx, and the boolean flag isolatedVertex:

  • vertexIdx is the index of the vertex for which the result is being stored.
  • targetIdx is the index of the target vertex of the edge with the minimal weight.
  • localIdx is the position of the edge with the minimal weight within the set of all edges adjacent to the vertex vertexIdx. If the graph is dense (i.e., the adjacency matrix is dense), localIdx has the same value as targetIdx and may be omitted.
  • result is the value of the minimal edge weight.
  • isolatedVertex is a boolean flag. If it is set to true, the vertex with index vertexIdx is isolated and has no adjacent edges. In this case, result is equal to the identity value of the reduction operation, and both localIdx and targetIdx are undefined and must be ignored.

The reduction is performed as follows:

We use TNL::MinWithArg for reduction (see also Function objects for reduction operations with argument for other functionals). Another variant of TNL::Graphs::reduceVerticesWithArgument allows to define the reduction operation via a lambda function.

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesWithArgumentExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
30 /***
31 * Find minimum edge weight and target vertex for vertices in range [1, 4).
32 */
36 auto minWeights_view = minWeights.getView();
37 auto minTargets_view = minTargets.getView();
39
41 auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
42 {
43 return weight;
44 };
46
48 auto store = [ = ] __cuda_callable__( int vertexIdx, int localIdx, int targetIdx, float result, bool isolatedVertex ) mutable
49 {
50 minWeights_view[ vertexIdx ] = result;
51 if( ! isolatedVertex )
52 minTargets_view[ vertexIdx ] = targetIdx;
53 };
55
57 TNL::Graphs::reduceVerticesWithArgument( graph, 1, 5, fetch, TNL::MinWithArg{}, store );
59
60 /***
61 * Print results.
62 */
63 std::cout << "Minimum edge weight for vertices 1-4:" << minWeights << '\n';
64 std::cout << "Target vertex for minimum edge:" << minTargets << '\n';
65}
66
67int
68main( int argc, char* argv[] )
69{
70 std::cout << "Running on host:\n";
71 reduceVerticesWithArgumentExample< TNL::Devices::Host >();
72
73#ifdef __CUDACC__
74 std::cout << "Running on CUDA device:\n";
75 reduceVerticesWithArgumentExample< TNL::Devices::Cuda >();
76#endif
77
78 return EXIT_SUCCESS;
79}
void reduceVerticesWithArgument(Graph &graph, IndexBegin begin, IndexEnd end, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes while retur...
Function object implementing argmin(x, y, i, j), i.e. returning the minimum value and its index.
Definition Functional.h:321

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Minimum edge weight for vertices 1-4:[ -1, 30, 60, 70, 3.40282e+38 ]
Target vertex for minimum edge:[ -1, 2, 3, 0, -1 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Minimum edge weight for vertices 1-4:[ -1, 30, 60, 70, 3.40282e+38 ]
Target vertex for minimum edge:[ -1, 2, 3, 0, -1 ]

Reducing Over Edges of Specific Vertices and Conditional Reduction With Argument

The following example demonstrates a reduction with arguments over a specific set of vertices, specified either by their indices or by a condition:

/***
* Find minimum edge weight and target vertex for vertices in range [1, 4) with degree >= 2.
*/
TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
TNL::Containers::Vector< int, Device > vertexMinTargets( 5, -1 );
TNL::Containers::Vector< int, Device > compressedVertexMinTargets( 5 );
auto vertexMinWeights_view = vertexMinWeights.getView();
auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
auto vertexMinTargets_view = vertexMinTargets.getView();
auto compressedVertexMinTargets_view = compressedVertexMinTargets.getView();
auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
{
return graph.getVertexDegree( vertexIdx ) >= 2;
};
auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
{
return weight;
};
auto store =
int indexOfVertexIdx, int vertexIdx, int localIdx, int targetIdx, float minWeight, bool isolatedVertex ) mutable
{
compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
vertexMinWeights_view[ vertexIdx ] = minWeight;
if( ! isolatedVertex ) {
compressedVertexMinTargets_view[ indexOfVertexIdx ] = targetIdx;
vertexMinTargets_view[ vertexIdx ] = targetIdx;
}
};
int traversedVertexCount =
TNL::Graphs::reduceVerticesWithArgumentIf( graph, 1, 4, condition, fetch, TNL::MinWithArg{}, store );

The lambda functions condition and fetch remain unchanged. The store lambda function accepts the following parameters:

  • indexOfVertexIdx – the position of the vertex whose result is being stored within the set of all vertices reduced during the operation. This index can be used to store results in a compressed form.
  • vertexIdx – the index of the vertex whose reduction result is being stored.
  • localIdx – the position of the edge with the minimal weight within the set of all edges adjacent to the vertex vertexIdx.
  • targetIdx – the index of the target vertex of the edge with the minimal weight.
  • minWeight – the weight value of the edge with the minimal weight.
  • isolatedVertex – indicates (if set to true) that the vertex with index vertexIdx is isolated and has no adjacent edges over which the reduction can be performed.

In this case, the reduction result (minWeight in this example) is set to the identity value of the reduction operation, and the variables localIdx and targetIdx are undefined and must not be used.

The whole example reads as follows:

1#include <iostream>
2#include <TNL/Graphs/Graph.h>
3#include <TNL/Graphs/reduce.h>
4#include <TNL/Devices/Host.h>
5#include <TNL/Devices/Cuda.h>
6#include <TNL/Containers/Vector.h>
7
8template< typename Device >
9void
10reduceVerticesWithArgumentIfExample()
11{
12 /***
13 * Create a directed graph with 5 vertices.
14 */
16 // clang-format off
17 GraphType graph( 5, // number of vertices
18 { // definition of edges with weights
19 { 0, 1, 10.0 }, { 0, 2, 20.0 },
20 { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
21 { 2, 3, 60.0 },
22 { 3, 0, 70.0 }, { 3, 4, 80.0 } } );
23 // clang-format on
24
25 /***
26 * Print the graph.
27 */
28 std::cout << "Graph:\n" << graph << '\n';
29
31 /***
32 * Find minimum edge weight and target vertex for vertices in range [1, 4) with degree >= 2.
33 */
34 TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
35 TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
36 TNL::Containers::Vector< int, Device > vertexMinTargets( 5, -1 );
37 TNL::Containers::Vector< int, Device > compressedVertexMinTargets( 5 );
38 auto vertexMinWeights_view = vertexMinWeights.getView();
39 auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
40 auto vertexMinTargets_view = vertexMinTargets.getView();
41 auto compressedVertexMinTargets_view = compressedVertexMinTargets.getView();
42
43 auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
44 {
45 return graph.getVertexDegree( vertexIdx ) >= 2;
46 };
47
48 auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
49 {
50 return weight;
51 };
52
53 auto store =
55 int indexOfVertexIdx, int vertexIdx, int localIdx, int targetIdx, float minWeight, bool isolatedVertex ) mutable
56 {
57 compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
58 vertexMinWeights_view[ vertexIdx ] = minWeight;
59 if( ! isolatedVertex ) {
60 compressedVertexMinTargets_view[ indexOfVertexIdx ] = targetIdx;
61 vertexMinTargets_view[ vertexIdx ] = targetIdx;
62 }
63 };
64
65 int traversedVertexCount =
66 TNL::Graphs::reduceVerticesWithArgumentIf( graph, 1, 4, condition, fetch, TNL::MinWithArg{}, store );
68
69 /***
70 * Print results.
71 */
72 std::cout << "Number of traversed vertices: " << traversedVertexCount << '\n';
73 std::cout << "Minimum edge weight for vertices 1-3 with degree >= 2:" << vertexMinWeights << '\n';
74 std::cout << "Target vertex for minimum edge:" << vertexMinTargets << '\n';
75 std::cout << "Compressed minimum weights:" << compressedVertexMinWeights.getView( 0, traversedVertexCount ) << '\n';
76 std::cout << "Compressed minimum targets:" << compressedVertexMinTargets.getView( 0, traversedVertexCount ) << '\n';
77}
78
79int
80main( int argc, char* argv[] )
81{
82 std::cout << "Running on host:\n";
83 reduceVerticesWithArgumentIfExample< TNL::Devices::Host >();
84
85#ifdef __CUDACC__
86 std::cout << "Running on CUDA device:\n";
87 reduceVerticesWithArgumentIfExample< TNL::Devices::Cuda >();
88#endif
89
90 return EXIT_SUCCESS;
91}
Graph::IndexType reduceVerticesWithArgumentIf(Graph &graph, IndexBegin begin, IndexEnd end, Condition &&condition, Fetch &&fetch, Reduction &&reduction, Store &&store, const FetchValue &identity, TNL::Algorithms::Segments::LaunchConfiguration launchConfig=TNL::Algorithms::Segments::LaunchConfiguration())
Performs parallel reduction within each graph vertex over a given range of vertex indexes based on a ...

The output looks as follows:

Running on host:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of traversed vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Target vertex for minimum edge:[ -1, 2, -1, 0, -1 ]
Compressed minimum weights:[ 30, 70 ]
Compressed minimum targets:[ 2, 0 ]
Running on CUDA device:
Graph:
Row: 0 -> 1:10 2:20
Row: 1 -> 2:30 3:40 4:50
Row: 2 -> 3:60
Row: 3 -> 0:70 4:80
Row: 4 ->
Number of traversed vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Target vertex for minimum edge:[ -1, 2, -1, 0, -1 ]
Compressed minimum weights:[ 30, 70 ]
Compressed minimum targets:[ 2, 0 ]

Graph Views

Similar to matrix views, graph views (TNL::Graphs::GraphView) are lightweight reference objects that allow passing graphs to GPU kernels. The graph views can be obtained from the graph using the method TNL::Graphs::Graph::getView or TNL::Graphs::Graph::getConstView.

Graph views are particularly useful when:

  • Passing graphs to custom GPU kernels
  • Working with graphs in lambda functions
  • Creating shallow copies for parallel algorithms

Performance Considerations

Choosing the Right Matrix Format

For sparse graphs, different segment formats offer different performance characteristics. The user may choose any type of segments

For example graph with Ellpack format can be obtained as follows:

using EllpackGraph = TNL::Graphs::Graph< float, Device, int,
Data structure for Ellpack segments.
Definition Ellpack.h:42

Further Reading