Introduction
Graph Orientation: Directed vs. Undirected Graphs
Sparse vs. Dense Adjacency Matrices
Constructing Graphs
Setting Edges
Graph Traversal
- Traversing By Vertices
- Traversing By Edges
Reductions on Graphs
Graph Views
Performance Considerations
- Choosing the Right Matrix Format
Further Reading

Introduction

Graphs are fundamental data structures used to represent relationships between objects. In TNL, the TNL::Graphs::Graph class provides a flexible and efficient implementation of graphs that can run on both CPUs and GPUs.

A graph in TNL consists of vertices (also called nodes) and edges connecting pairs of vertices. Edges can be:

Directed: An edge from vertex A to vertex B is different from an edge from B to A
Undirected: An edge between A and B can be traversed in both directions
Weighted: Each edge has an associated weight (numerical value)

Internally, TNL represents graphs using an adjacency matrix, which stores information about which vertices are connected by edges and what weights those edges have.

Graph Orientation: Directed vs. Undirected Graphs

TNL supports both directed and undirected graphs through the type tags TNL::Graphs::DirectedGraph and TNL::Graphs::UndirectedGraph :

Directed graphs: Each edge has a direction. An edge from vertex u to vertex v does not imply an edge from v to u.
using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
Undirected graphs: Edges have no direction. If there's an edge between vertices u and v, it can be traversed in both directions.
using UndirectedGraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::UndirectedGraph >;

Sparse vs. Dense Adjacency Matrices

TNL graphs can use either sparse or dense adjacency matrices to store edge information. The choice significantly impacts memory usage and performance.

Sparse Adjacency Matrices And Sparse Graphs (Default)

By default, graphs use sparse matrix representations (typically CSR format). A sparse graph only stores information about edges that actually exist:

using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;

The memory usage is proportional to the number of edges in the graph. By default, the sparse adjacency matrix is stored in the CSR format (segments) but any other format (segments) defined in TNL::Algorithms::Segments can be used instead. For unweighted graphs, using ValueType = bool avoids storing numerical edge weights and reduces memory overhead, as only the presence or absence of edges is represented.

Dense Adjacency Matrices And Dense Graphs

Dense graphs store information about all possible edges between all vertex pairs:

   using DenseGraphType = TNL::Graphs::Graph<
      float,
      Device,
      int,
      TNL::Graphs::DirectedGraph,
      TNL::Algorithms::Segments::CSR,  // Type of segments is ignored for dense matrices
      TNL::Matrices::DenseMatrix< float, Device, int > >;

Memory usage is proportional to the square of the number of vertices (O(V²)). This representation is best suited for complete or nearly complete graphs, where most vertex pairs are connected. Missing edges must be represented explicitly by the user (e.g., via a chosen sentinel value or encoding), since the matrix representation assumes that all vertex pairs are present, including diagonal entries representing self-loops. It is therefore the user’s responsibility to omit such edges when they are not desired.

Comparison Table

Aspect	Sparse Graph	Dense Graph
Memory	O(V + E)	O(V²)
Edge lookup	O(degree)	O(1)
Best for	Sparse connectivity	Dense connectivity

In the following examples, we assume the following graph types:

Sparse directed graph:
using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
Sparse undirected graph:
using UndirectedGraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::UndirectedGraph >;
Dense directed graph:
using DenseGraphType = TNL::Graphs::Graph<

float,

Device,

int,

TNL::Graphs::DirectedGraph,

TNL::Algorithms::Segments::CSR, // Type of segments is ignored for dense matrices

TNL::Matrices::DenseMatrix< float, Device, int > >;

Constructing Graphs

TNL provides several ways to construct graphs.

Default Constructor

To create an empty graph, the default constructor can be used:

   /***
    * Example 1: Default constructor
    */
   std::cout << "Example 1: Default constructor\n";
   GraphType graph1;
   std::cout << "Empty graph - vertices: " << graph1.getVertexCount() << ", edges: " << graph1.getEdgeCount() << "\n\n";

Constructor with Vertex Count

Constructor with vertex count creates a graph with specified number of vertices but no edges:

   /***
    * Example 2: Constructor with number of vertices
    */
   std::cout << "Example 2: Constructor with vertex count\n";
   GraphType graph2( 5 );  // 5 vertices, no edges
   std::cout << "Graph with 5 vertices - vertices: " << graph2.getVertexCount() << ", edges: " << graph2.getEdgeCount()
             << "\n\n";

Constructor with Initializer List for Sparse and Dense Graphs

Edges in both sparse and dense graphs can be specified as (sourceIdx, targetIdx, weight) tuples in an initializer list, together with the number of vertices:

   /***
    * Example 3: Constructor with vertices and edges (initializer list - sparse graph)
    */
   std::cout << "Example 3a: Constructor with initializer list (sparse graph)\n";
   // clang-format off
   GraphType graph3a( 5, // number of vertices
        {  // definition of edges with weights
                          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                                          { 1, 2, 30.0 }, { 1, 3, 40.0 },
                                                          { 2, 3, 50.0 },
          { 3, 0, 60.0 },                                                 { 3, 4, 70.0 } } );
   // clang-format on
   std::cout << "Sparse graph:\n" << graph3a << '\n';

In a sparse graph, unspecified edges are considered missing. In a dense graph, unspecified edges are represented by zero weights.

Constructor with Initializer List for Sparse Undirected Graphs

Edges of sparse undirected graphs can be specified in the same way, but only half of them are required:

   /***
    * Example 3b: Constructor for undirected graph
    */
   std::cout << "Example 3b: Undirected graph constructor\n";
   // clang-format off
   UndirectedGraphType graph3b( 4,
        {  // only need to specify edges once for undirected graph
          { 0, 1, 1.0 }, { 0, 2, 2.0 },
                         { 1, 2, 3.0 },
                                        { 2, 3, 4.0 } }, TNL::Matrices::MatrixElementsEncoding::SymmetricMixed );
   // clang-format on
   std::cout << "Undirected graph:\n" << graph3b << '\n';

The last parameter specifies the encoding of the edges (see also TNL::Matrices::MatrixElementsEncoding):

Complete – all edges must be specified explicitly, and symmetry is required.
SymmetricLower – only the lower triangular part of the adjacency matrix is specified, i.e. edges (u, v) with u > v.
SymmetricUpper – only the upper triangular part of the adjacency matrix is specified, i.e. edges (u, v) with u < v.
SymmetricMixed – elements from both the lower and upper triangular parts of the adjacency matrix may be specified; the corresponding symmetric elements are filled automatically. This is the default setting for the undirected graphs.

By default, undirected graphs are represented by a symmetric sparse matrix, i.e. a matrix in which only the lower triangular part is stored. This behavior can be changed by explicitly selecting a general adjacency matrix type (see also TNL::Matrices::SparseMatrix).

Constructor with Initializer List for Dense Graphs

Edges of dense graphs can be specified by providing all edge weights in a two-dimensional structure:

   /***
    * Example 3c: Constructor with initializer list only for dense adjacency matrix.
    * For dense matrix graphs, one can also use nested initializer lists {{row1}, {row2}, ...}
    */
   std::cout << "\nExample 3c: Constructor with initializer list (dense adjacency matrix)\n";
   // clang-format off
   DenseGraphType graph3c( { { 0.0, 10.0, 20.0,  0.0,  0.0 },  // edges from vertex 0
                             { 0.0,  0.0, 30.0, 40.0,  0.0 },  // edges from vertex 1
                             { 0.0,  0.0,  0.0, 50.0,  0.0 },  // edges from vertex 2
                             {60.0,  0.0,  0.0,  0.0, 70.0 },  // edges from vertex 3
                             { 0.0,  0.0,  0.0,  0.0,  0.0 } } ); // edges from vertex 4
   // clang-format on
   std::cout << "Dense graph (complete graph with weighted edges):\n" << graph3c << '\n';

Constructor with std::map

This constructor builds graphs from a map of edge pairs to weights:

   /***
    * Example 4: Constructor with vertices and edges (map)
    */
   std::cout << "\nExample 4: Constructor with std::map\n";
   std::map< std::pair< int, int >, float > edgeMap;
   edgeMap[ { 0, 1 } ] = 1.5;
   edgeMap[ { 0, 2 } ] = 2.5;
   edgeMap[ { 1, 2 } ] = 3.5;
   edgeMap[ { 2, 3 } ] = 4.5;
 
   GraphType graph4( 4, edgeMap );
   std::cout << "Graph from map:\n" << graph4 << '\n';

If the graph is dense, unspecified edges are assigned zero weight. If the graph is undirected, the SymmetricMixed encoding of adjacency matrix elements is required. The encoding can be changed using the third constructor parameter, in the same way as for the constructor that takes an initializer list.

Copy constructor

The copy constructor creates a copy of another graph.

   /***
    * Example 5: Copy constructor
    */
   std::cout << "Example 5: Copy constructor\n";
   GraphType graph5( graph3a );  // NOLINT(performance-unnecessary-copy-initialization)
   std::cout << "Copied graph - vertices: " << graph5.getVertexCount() << ", edges: " << graph5.getEdgeCount() << "\n\n";

Constructor from Adjacency Matrix

This constructor creates a graph from an existing adjacency matrix:

   /***
    * Example 6: Constructor from adjacency matrix
    */
   std::cout << "Example 6: Constructor from adjacency matrix\n";
   using MatrixType = typename GraphType::AdjacencyMatrixType;
   MatrixType matrix( 3, 3 );
   // clang-format off
   matrix.setElements( {                { 0, 1, 1.0 }, { 0, 2, 2.0 },
                                                       { 1, 2, 3.0 },
                         { 2, 0, 4.0 } } );
   // clang-format on

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Devices/Hip.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
constructorsExample()
{
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   using UndirectedGraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::UndirectedGraph >;
   using DenseGraphType = TNL::Graphs::Graph<
      float,
      Device,
      int,
      TNL::Graphs::DirectedGraph,
      TNL::Algorithms::Segments::CSR,  // Type of segments is ignored for dense matrices
      TNL::Matrices::DenseMatrix< float, Device, int > >;
   /***
    * Example 1: Default constructor
    */
   std::cout << "Example 1: Default constructor\n";
   GraphType graph1;
   std::cout << "Empty graph - vertices: " << graph1.getVertexCount() << ", edges: " << graph1.getEdgeCount() << "\n\n";
   /***
    * Example 2: Constructor with number of vertices
    */
   std::cout << "Example 2: Constructor with vertex count\n";
   GraphType graph2( 5 );  // 5 vertices, no edges
   std::cout << "Graph with 5 vertices - vertices: " << graph2.getVertexCount() << ", edges: " << graph2.getEdgeCount()
             << "\n\n";
   /***
    * Example 3: Constructor with vertices and edges (initializer list - sparse graph)
    */
   std::cout << "Example 3a: Constructor with initializer list (sparse graph)\n";
   // clang-format off
   GraphType graph3a( 5, // number of vertices
        {  // definition of edges with weights
                          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                                          { 1, 2, 30.0 }, { 1, 3, 40.0 },
                                                          { 2, 3, 50.0 },
          { 3, 0, 60.0 },                                                 { 3, 4, 70.0 } } );
   // clang-format on
   std::cout << "Sparse graph:\n" << graph3a << '\n';
   /***
    * Example 3b: Constructor for undirected graph
    */
   std::cout << "Example 3b: Undirected graph constructor\n";
   // clang-format off
   UndirectedGraphType graph3b( 4,
        {  // only need to specify edges once for undirected graph
          { 0, 1, 1.0 }, { 0, 2, 2.0 },
                         { 1, 2, 3.0 },
                                        { 2, 3, 4.0 } }, TNL::Matrices::MatrixElementsEncoding::SymmetricMixed );
   // clang-format on
   std::cout << "Undirected graph:\n" << graph3b << '\n';
   /***
    * Example 3c: Constructor with initializer list only for dense adjacency matrix.
    * For dense matrix graphs, one can also use nested initializer lists {{row1}, {row2}, ...}
    */
   std::cout << "\nExample 3c: Constructor with initializer list (dense adjacency matrix)\n";
   // clang-format off
   DenseGraphType graph3c( { { 0.0, 10.0, 20.0,  0.0,  0.0 },  // edges from vertex 0
                             { 0.0,  0.0, 30.0, 40.0,  0.0 },  // edges from vertex 1
                             { 0.0,  0.0,  0.0, 50.0,  0.0 },  // edges from vertex 2
                             {60.0,  0.0,  0.0,  0.0, 70.0 },  // edges from vertex 3
                             { 0.0,  0.0,  0.0,  0.0,  0.0 } } ); // edges from vertex 4
   // clang-format on
   std::cout << "Dense graph (complete graph with weighted edges):\n" << graph3c << '\n';
   /***
    * Example 4: Constructor with vertices and edges (map)
    */
   std::cout << "\nExample 4: Constructor with std::map\n";
   std::map< std::pair< int, int >, float > edgeMap;
   edgeMap[ { 0, 1 } ] = 1.5;
   edgeMap[ { 0, 2 } ] = 2.5;
   edgeMap[ { 1, 2 } ] = 3.5;
   edgeMap[ { 2, 3 } ] = 4.5;
 
   GraphType graph4( 4, edgeMap );
   std::cout << "Graph from map:\n" << graph4 << '\n';
   /***
    * Example 5: Copy constructor
    */
   std::cout << "Example 5: Copy constructor\n";
   GraphType graph5( graph3a );  // NOLINT(performance-unnecessary-copy-initialization)
   std::cout << "Copied graph - vertices: " << graph5.getVertexCount() << ", edges: " << graph5.getEdgeCount() << "\n\n";
   /***
    * Example 6: Constructor from adjacency matrix
    */
   std::cout << "Example 6: Constructor from adjacency matrix\n";
   using MatrixType = typename GraphType::AdjacencyMatrixType;
   MatrixType matrix( 3, 3 );
   // clang-format off
   matrix.setElements( {                { 0, 1, 1.0 }, { 0, 2, 2.0 },
                                                       { 1, 2, 3.0 },
                         { 2, 0, 4.0 } } );
   // clang-format on
 
   GraphType graph6( matrix );
   std::cout << "Graph from adjacency matrix:\n" << graph6 << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   constructorsExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   constructorsExample< TNL::Devices::Cuda >();
#endif
 
#ifdef __HIP__
   std::cout << "Running on HIP device:\n";
   constructorsExample< TNL::Devices::Hip >();
#endif
 
   return EXIT_SUCCESS;
}

And the output looks as follows:

Running on host:
Example 1: Default constructor
Empty graph - vertices: 0, edges: 0
 
Example 2: Constructor with vertex count
Graph with 5 vertices - vertices: 5, edges: 0
 
Example 3a: Constructor with initializer list (sparse graph)
Sparse graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40  
Row: 2 ->     3:50  
Row: 3 ->     0:60      4:70  
Row: 4 -> 
 
Example 3b: Undirected graph constructor
Undirected graph:
Row: 0 ->     1:1       2:2   
Row: 1 ->     0:1       2:3   
Row: 2 ->     0:2       1:3       3:4   
Row: 3 ->     2:4   
 
 
Example 3c: Constructor with initializer list (dense adjacency matrix)
Dense graph (complete graph with weighted edges):
Row: 0 ->     0:0       1:10      2:20      3:0       4:0   
Row: 1 ->     0:0       1:0       2:30      3:40      4:0   
Row: 2 ->     0:0       1:0       2:0       3:50      4:0   
Row: 3 ->     0:60      1:0       2:0       3:0       4:70  
Row: 4 ->     0:0       1:0       2:0       3:0       4:0   
 
Example 4: Constructor with std::map
Graph from map:
Row: 0 ->     1:1.5     2:2.5 
Row: 1 ->     2:3.5 
Row: 2 ->     3:4.5 
Row: 3 -> 
 
Example 5: Copy constructor
Copied graph - vertices: 5, edges: 7
 
Example 6: Constructor from adjacency matrix
Graph from adjacency matrix:
Row: 0 ->     1:1       2:2   
Row: 1 ->     2:3   
Row: 2 ->     0:4   
 
Running on CUDA device:
Example 1: Default constructor
Empty graph - vertices: 0, edges: 0
 
Example 2: Constructor with vertex count
Graph with 5 vertices - vertices: 5, edges: 0
 
Example 3a: Constructor with initializer list (sparse graph)
Sparse graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40  
Row: 2 ->     3:50  
Row: 3 ->     0:60      4:70  
Row: 4 -> 
 
Example 3b: Undirected graph constructor
Undirected graph:
Row: 0 ->     1:1       2:2   
Row: 1 ->     0:1       2:3   
Row: 2 ->     0:2       1:3       3:4   
Row: 3 ->     2:4   
 
 
Example 3c: Constructor with initializer list (dense adjacency matrix)
Dense graph (complete graph with weighted edges):
Row: 0 ->     0:0       1:10      2:20      3:0       4:0   
Row: 1 ->     0:0       1:0       2:30      3:40      4:0   
Row: 2 ->     0:0       1:0       2:0       3:50      4:0   
Row: 3 ->     0:60      1:0       2:0       3:0       4:70  
Row: 4 ->     0:0       1:0       2:0       3:0       4:0   
 
Example 4: Constructor with std::map
Graph from map:
Row: 0 ->     1:1.5     2:2.5 
Row: 1 ->     2:3.5 
Row: 2 ->     3:4.5 
Row: 3 -> 
 
Example 5: Copy constructor
Copied graph - vertices: 5, edges: 7
 
Example 6: Constructor from adjacency matrix
Graph from adjacency matrix:
Row: 0 ->     1:1       2:2   
Row: 1 ->     2:3   
Row: 2 ->     0:4   
 

Setting Edges

After creating a graph, you can set or modify its edges using the setEdges and setDenseEdges methods. These methods behave similarly to the corresponding constructors.

Setting Edges with Initializer List (Sparse)

For sparse and dense graphs:

   /***
    * Example 1: setEdges with initializer list (sparse graph)
    */
   std::cout << "Example 1: setEdges with initializer list (sparse graph)\n";
   GraphType graph1;
   graph1.setVertexCount( 5 );
   graph1.setEdgeCounts( TNL::Containers::Vector< int, Device >( { 2, 3, 1, 2, 0 } ) );
 
   // clang-format off
   graph1.setEdges( {
                      { 0, 1, 10.0 }, { 0, 2, 20.0 },
                                      { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                                      { 2, 3, 60.0 },
      { 3, 0, 70.0 },                                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   std::cout << "Sparse graph after setEdges:\n" << graph1 << '\n';

The method TNL::Graphs::Graph::setEdges also accepts a matrix elements encoding (see TNL::Matrices::MatrixElementsEncoding) as the second parameter, following the initializer list. This is particularly useful when setting undirected graphs.

Setting Edges with Initializer List (Dense)

For dense graphs, the method TNL::Graphs::Graph::setDenseEdges sets the graph edges based on the complete adjacency matrix:

   /***
    * Example 1b: setEdges with initializer list (dense adjacency matrix)
    * For dense matrix graphs, use nested initializer lists {{row1}, {row2}, ...}
    */
   std::cout << "\nExample 1b: setEdges with initializer list (dense adjacency matrix)\n";
   using DenseGraphType = TNL::Graphs::Graph<
      float,
      Device,
      int,
      TNL::Graphs::DirectedGraph,
      TNL::Algorithms::Segments::CSR,  // Type of segments is ignored for dense matrices
      TNL::Matrices::DenseMatrix< float, Device, int > >;
   DenseGraphType graph1b;
   graph1b.setVertexCount( 4 );
 
   // clang-format off
   graph1b.setDenseEdges( { {  0.0, 10.0, 20.0,  0.0,  0.0 },  // edges from vertex 0
                            {  0.0,  0.0, 30.0, 40.0, 50.0 },  // edges from vertex 1
                            {  0.0,  0.0,  0.0, 60.0,  0.0 },  // edges from vertex 2
                            { 70.0,  0.0,  0.0,  0.0, 80.0 },  // edges from vertex 3
                            {  0.0,  0.0,  0.0,  0.0,  0.0 } } ); // edges from vertex 4
   // clang-format on
 
   std::cout << "Dense graph after setDenseEdges:\n" << graph1b << '\n';

Setting Edges with std::map

The graph edges can also be set using a std::map:

   /***
    * Example 2: setEdges with std::map
    */
   std::cout << "Example 2: setEdges with std::map\n";
   GraphType graph2;
   graph2.setVertexCount( 4 );
   graph2.setEdgeCounts( TNL::Containers::Vector< int, Device >( { 2, 2, 1, 1 } ) );
 
   std::map< std::pair< int, int >, float > edgeMap;
   edgeMap[ { 0, 1 } ] = 1.5;
   edgeMap[ { 0, 2 } ] = 2.5;
   edgeMap[ { 1, 2 } ] = 3.5;
   edgeMap[ { 1, 3 } ] = 4.5;
   edgeMap[ { 2, 3 } ] = 5.5;
   edgeMap[ { 3, 0 } ] = 6.5;
 
   graph2.setEdges( edgeMap );
   std::cout << "Graph from map:\n" << graph2 << '\n';

Graph Traversal

TNL provides powerful parallel traversal capabilities through the functions TNL::Graphs::forVertices, TNL::Graphs::forEdges, and related utilities. A graph can be traversed either by vertices or by edges. Vertex-based traversal is performed using a vertex view.

The following provides an overview of graph traversal functions:

TNL distinguishes between vertex-based and edge-based traversal. Vertex-based traversal assigns at most one thread to each vertex, while edge-based traversal enables finer-grained parallelism by distributing work across edges.

Category	Operates On	Lambda Parameter	Use Case
Edge-wise (forEdges, forAllEdges)	Individual edges	Edge indices & weights	Operate on each edge separately
Vertex-wise (forVertices, forAllVertices)	Individual vertices	VertexView object	Operate on vertices

The scope of vertex traversal can be defined in several ways: all vertices, a vertex range, an explicit list of vertex indices, or a condition on the vertex index.

Scope	Vertices Processed	Parameters
All	All vertices	No range/array parameters
Range	Vertices in [begin, end)	begin and end indices
Array	Specific vertices	Array of vertex indices
If	Vertices filtered by a condition	Process vertices based on vertex-level properties

All functions can be called for both constant and non-constant graphs.

Category Graph Modifiable? Use Case

Non-const Yes Can modify graph edges and structure

Const No Read-only access to graph vertices and edges

Category	Graph Modifiable?	Use Case
Non-const	Yes	Can modify graph edges and structure
Const	No	Read-only access to graph vertices and edges

See also Overview of Graph Traversal Functions for more details.

Traversing By Vertices

A TNL::Graphs::GraphVertexView provides access to a single vertex and its outgoing edges. Vertex views are the primary way to interact with graph structure in parallel GPU code.

Traversing All Vertices

Functions TNL::Graphs::forAllVertices iterates over all vertices in parallel:

   /***
    * Traverse all vertices and modify their edges.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forAllVertices( graph, processVertex );

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The whole example reads as:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
 
template< typename Device >
void
forAllVerticesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Traverse all vertices and modify their edges.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forAllVertices( graph, processVertex );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forAllVerticesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forAllVerticesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

and the output reads as:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 

Traversing a Range of Vertices

Traversing vertices in a specific range [begin, end) can be done using the function TNL::Graphs::forVertices :

   /***
    * Traverse vertices in range [1, 4) and modify their edges.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forVertices( graph, 1, 4, processVertex );

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The whole example reads as:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
 
template< typename Device >
void
forVerticesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Traverse vertices in range [1, 4) and modify their edges.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forVertices( graph, 1, 4, processVertex );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forVerticesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forVerticesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

and the output reads as:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 

Traversing Vertices with Given Indices

The function TNL::Graphs::forVertices also allows traversing vertices with explicitly specified indices. The indices are provided as an array:

   /***
    * Create an array of vertex indices.
    */
   TNL::Containers::Vector< int, Device > vertices( { 0, 2, 4 } );

The vertices with these indices can be traversed as follows:

   /***
    * Traverse only the specified vertices.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forVertices( graph, vertices, processVertex );

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The whole example reads as:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
forVerticesWithIndexesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Create an array of vertex indices.
    */
   TNL::Containers::Vector< int, Device > vertices( { 0, 2, 4 } );
   /***
    * Traverse only the specified vertices.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forVertices( graph, vertices, processVertex );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forVerticesWithIndexesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forVerticesWithIndexesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

and the output reads as:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     4:65  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     4:65  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 

Traversing Vertices with a Condition

Use forAllVerticesIf to process only vertices that meet criteria specified by the lambda function condition:

   /***
    * Define a condition: process only vertices with more than one edge.
    */
   auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
   {
      return graph.getVertexDegree( vertexIdx ) > 1;
   };

The lambda function takes a vertex index and returns true for vertices that should be traversed; all other vertices are skipped. The function forAllVerticesIf is invoked as follows:

   /***
    * Traverse vertices that satisfy the condition.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forAllVerticesIf( graph, condition, processVertex );

There is also an alternative function, TNL::Graphs::forVerticesIf, which accepts a range of vertex indices [begin, end). Only vertices within this range are considered for traversal.

If the function is called with the constant graphs the typename GraphType::VertexType needs to be replaced with typename GraphType::ConstVertexView.

The complete example reads as:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
 
template< typename Device >
void
forAllVerticesIfExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                        { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                        { 2, 3, 60.0 },
          { 3, 0, 70.0 },                               { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Define a condition: process only vertices with more than one edge.
    */
   auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
   {
      return graph.getVertexDegree( vertexIdx ) > 1;
   };
   /***
    * Traverse vertices that satisfy the condition.
    */
   auto processVertex = [] __cuda_callable__( typename GraphType::VertexView vertex ) mutable
   {
      for( int i = 0; i < vertex.getDegree(); i++ )
         vertex.setEdge( i, ( vertex.getTargetIndex( i ) + 1 ) % 5, vertex.getEdgeWeight( i ) + 5 );
   };
   TNL::Graphs::forAllVerticesIf( graph, condition, processVertex );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forAllVerticesIfExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forAllVerticesIfExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     3:60  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     3:60  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 

Traversing By Edges

Accessing edges from a vertex is possible; however, in this mode each vertex is processed by at most one thread. For higher degrees of parallelism, specialized traversal functions must be used.

Traversing All Edges

The TNL::Graphs::forAllEdges function iterates over all edges in the graph:

   /***
    * Traverse all edges and modify them.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forAllEdges( graph, modifyEdge );

The localIdx index specifies the rank of an edge within the set of all edges incident to the vertex sourceIdx. Note that for sparse, non-constant graphs, both targetIdx and weight can be modified. For dense graphs, only weight can be modified, and the localIdx index is equal to targetIdx.

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
 
template< typename Device >
void
forAllEdgesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Traverse all edges and modify them.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forAllEdges( graph, modifyEdge );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forAllEdgesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forAllEdgesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     2:15      3:25  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 

Traversing Edges in a Range

Traversing only edges connected to vertices in a specific range [begin, end) can be done as follows:

   /***
    * Traverse edges in range [1, 4) and modify them.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forEdges( graph, 1, 4, modifyEdge );

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
 
template< typename Device >
void
forAllEdgesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Traverse edges in range [1, 4) and modify them.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int& targetIdx, float& weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forEdges( graph, 1, 4, modifyEdge );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forAllEdgesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forAllEdgesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     3:35      4:45      0:55  
Row: 2 ->     4:65  
Row: 3 ->     1:75      0:85  
Row: 4 -> 
 

Traversing Edges for Specific Vertices

The function TNL::Graphs::forEdges can traverse only edges of vertices with explicitly specified indices, similarly to TNL::Graphs::forVertices.

The indices of vertices for traversing are specified as follows:

   /***
    * Create an array of vertex indices to process.
    */
   TNL::Containers::Vector< int, Device > vertices( { 0, 2, 3 } );

The traversal itself is performed as follows:

   /***
    * Traverse edges only from the specified vertices.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forEdges( graph, vertices, modifyEdge );

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
forEdgesWithIndexesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Create an array of vertex indices to process.
    */
   TNL::Containers::Vector< int, Device > vertices( { 0, 2, 3 } );
   /***
    * Traverse edges only from the specified vertices.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forEdges( graph, vertices, modifyEdge );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forEdgesWithIndexesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forEdgesWithIndexesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 

Conditional Edge Traversal

For conditional traversal of edges, the function TNL::Graphs::forAllEdgesIf can be used. The condition is expressed by a lambda function condition:

   /***
    * Define a condition: process only vertices with even indices.
    */
   auto condition = [] __cuda_callable__( int vertexIdx ) -> bool
   {
      return vertexIdx % 2 == 0;
   };

The lambda function takes a vertex index and returns true if the edges incident to that vertex should be traversed; otherwise, it returns false. The traversal is executed as follows:

   /***
    * Traverse edges only from vertices that satisfy the condition.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forAllEdgesIf( graph, condition, modifyEdge );

The function TNL::Graphs::forEdgesIf behaves in the same way and additionally accepts a range of vertex indices to be traversed.

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/traverse.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
 
template< typename Device >
void
forAllEdgesIfExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Define a condition: process only vertices with even indices.
    */
   auto condition = [] __cuda_callable__( int vertexIdx ) -> bool
   {
      return vertexIdx % 2 == 0;
   };
   /***
    * Traverse edges only from vertices that satisfy the condition.
    */
   auto modifyEdge = [] __cuda_callable__( int sourceIdx, int localIdx, int targetIdx, float weight ) mutable
   {
      targetIdx = ( targetIdx + 1 ) % 5;
      weight += 5;
   };
   TNL::Graphs::forAllEdgesIf( graph, condition, modifyEdge );
 
   /***
    * Print the modified graph.
    */
   std::cout << "Modified graph:\n" << graph << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   forAllEdgesIfExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   forAllEdgesIfExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Modified graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 

Const vs. Non-Const Edge Traversal

For constant graphs, edge data is read-only:

TNL::Graphs::forAllEdges(
    constGraph,
    [] __cuda_callable__ ( int source, int local, int target, const float& weight ) {
        // target and weight are read-only
        // No 'mutable' needed
    }
);

Reductions on Graphs

Graph reductions allow efficient computation of aggregate values over edges or vertices. Such computations can also be implemented using vertex traversal functions (TNL::Graphs::forAllVertices, TNL::Graphs::forVertices, TNL::Graphs::forAllVerticesIf, TNL::Graphs::forVerticesIf). However, to achieve a higher level of parallelism, TNL provides specialized reduction operations for graph data structures.

The following provides an overview of graph reduction functions:

Reductions can be performed either in a basic mode or with arguments, which additionally return the position of the edge being searched for.

Category	Tracks Position?	Use Case
Basic	No	Only the reduced weight is needed (e.g., vertex sum, vertex max)
WithArgument	Yes	Need weight and target vertex index (e.g., max weight and where it occurs)

The scope of reduction can be defined in several ways: all vertices, a vertex range, an explicit list of vertex indices, or a condition on the vertex index.

Scope	Vertices Processed	Parameters
All	All vertices	No range/array parameters
Range	Vertices [begin, end)	begin and end indices
Array	Specific vertices	Array of vertex indices
If	Vertices filtered by a condition	Process vertices based on vertex-level properties

All functions can be called for both constant and non-constant graphs.

Category Graph Modifiable? Use Case

Non-const Yes Can modify graph edges during reduction

Const No Read-only access to graph edges

Category	Graph Modifiable?	Use Case
Non-const	Yes	Can modify graph edges during reduction
Const	No	Read-only access to graph edges

See also Overview of Graph Reduction Functions for more details.

Reducing Over Edges of a Vertex

The function TNL::Graphs::reduceVertices computes a reduction over edges adjacent to the specified vertices. First, we define a vector vertexMaxWeights (together with the corresponding vector view vertexMaxWeights_view ) to store the results of the reduction:

TNL::Containers::Vector< float, Device > vertexMaxWeights( 5, -1 );

auto vertexMaxWeights_view = vertexMaxWeights.getView();

Next, we define the fetch lambda function responsible for reading the required data:

   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };

The lambda function fetch takes the indices sourceIdx and targetIdx of the source and target vertices of an edge, respectively. The third parameter is the edge weight. For non-constant graphs, the weight parameter may be passed by reference and can be modified during the fetch operation. This allows the reduction and graph modification to be performed simultaneously. For sparse graphs, the targetIdx index can also be modified.

Next, we define the store lambda function, which is responsible for storing the reduction results for individual vertices:

   auto store = [ = ] __cuda_callable__( int vertexIdx, const float& maxWeight ) mutable
   {
      vertexMaxWeights_view[ vertexIdx ] = maxWeight;
   };

Finally, the reduction is executed as follows:

TNL::Graphs::reduceVertices( graph, 1, 4, fetch, TNL::Max{}, store );

Here, we use TNL::Max for reduction (see also Function objects for reduction operations for other functionals). Another variant of TNL::Graphs::reduceVertices allows to define the reduction operation via a lambda function.

Note that the function TNL::Graphs::reduceAllVertices is also available for performing reductions over all vertices.

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/reduce.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
reduceVerticesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
 
   /***
    * Compute maximum edge weight for vertices in range [1, 4).
    */
   TNL::Containers::Vector< float, Device > vertexMaxWeights( 5, -1 );
   auto vertexMaxWeights_view = vertexMaxWeights.getView();
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
   auto store = [ = ] __cuda_callable__( int vertexIdx, const float& maxWeight ) mutable
   {
      vertexMaxWeights_view[ vertexIdx ] = maxWeight;
   };
   TNL::Graphs::reduceVertices( graph, 1, 4, fetch, TNL::Max{}, store );
 
   /***
    * Print results.
    */
   std::cout << "Maximum edge weight for vertices 1-3:" << vertexMaxWeights << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   reduceVerticesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   reduceVerticesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Maximum edge weight for vertices 1-3:[ -1, 50, 60, 80, -1 ]
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Maximum edge weight for vertices 1-3:[ -1, 50, 60, 80, -1 ]

Reducing Over Edges of Specific Vertices and Conditional Reduction

Reductions over a specified set of vertices can be performed using TNL::Graphs::reduceVertices. Conditional reductions are provided by TNL::Graphs::reduceAllVerticesIf and TNL::Graphs::reduceVerticesIf. Their behavior is analogous to the corresponding vertex traversal functions; however, they differ in how the results of the reduction are stored, as demonstrated in the following code snippet.

Assume that we perform a reduction over a given set of vertices:

   /***
    * Compute sum of edge weights for specific vertices (0, 2, 3).
    */
   TNL::Containers::Vector< int, Device > vertexIndices( { 0, 2, 3 } );
   TNL::Containers::Vector< float, Device > vertexSums( 5, -1 );
   TNL::Containers::Vector< float, Device > compressedVertexSums( 3, -1 );
   auto vertexSums_view = vertexSums.getView();
   auto compressedVertexSums_view = compressedVertexSums.getView();
 
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
 
   auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& sum ) mutable
   {
      compressedVertexSums_view[ indexOfVertexIdx ] = sum;
      vertexSums_view[ vertexIdx ] = sum;
   };
 
   TNL::Graphs::reduceVertices( graph, vertexIndices, fetch, TNL::Plus{}, store );

The store lambda function takes the following parameters:

indexOfVertexIdx – the rank of the vertex within the set of vertices being processed. In this example, the reduction is performed only for vertices 0, 2, and 3. Their corresponding indexOfVertexIdx values are 0, 1, and 2, respectively.
vertexIdx – the index of the vertex whose reduction result is currently being stored.
sum – the result of the reduction, i.e. the sum of the weights of all edges connected to the vertex.

The whole exmpale reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/reduce.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
reduceVerticesWithIndexesExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Compute sum of edge weights for specific vertices (0, 2, 3).
    */
   TNL::Containers::Vector< int, Device > vertexIndices( { 0, 2, 3 } );
   TNL::Containers::Vector< float, Device > vertexSums( 5, -1 );
   TNL::Containers::Vector< float, Device > compressedVertexSums( 3, -1 );
   auto vertexSums_view = vertexSums.getView();
   auto compressedVertexSums_view = compressedVertexSums.getView();
 
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
 
   auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& sum ) mutable
   {
      compressedVertexSums_view[ indexOfVertexIdx ] = sum;
      vertexSums_view[ vertexIdx ] = sum;
   };
 
   TNL::Graphs::reduceVertices( graph, vertexIndices, fetch, TNL::Plus{}, store );
 
   /***
    * Print results.
    */
   std::cout << "Sum of edge weights for specific vertices:" << vertexSums << '\n';
   std::cout << "Compressed sums:" << compressedVertexSums.getView( 0, vertexIndices.getSize() ) << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   reduceVerticesWithIndexesExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   reduceVerticesWithIndexesExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Sum of edge weights for specific vertices:[ 30, -1, 60, 150, -1 ]
Compressed sums:[ 30, 60, 150 ]
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Sum of edge weights for specific vertices:[ 30, -1, 60, 150, -1 ]
Compressed sums:[ 30, 60, 150 ]

The store lambda function behaves in the same way for conditional reductions, as demonstrated by the following code snippet:

   /***
    * Compute minimum edge weight for vertices in range [1, 4) with degree >= 2.
    */
   TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
   TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
   auto vertexMinWeights_view = vertexMinWeights.getView();
   auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
 
   auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
   {
      return graph.getVertexDegree( vertexIdx ) >= 2;
   };
 
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
 
   auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& minWeight ) mutable
   {
      compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
      vertexMinWeights_view[ vertexIdx ] = minWeight;
   };
 
   int reducedVertexCount = TNL::Graphs::reduceVerticesIf( graph, 1, 4, condition, fetch, TNL::Min{}, store );

The function TNL::Graphs::reduceVerticesIf also returns the number of vertices (reducedVertexCount) for which the reduction has been performed. This is particularly useful for subsequent processing of data stored in arrays with compressed results (for example, compressedVertexMinWeights_view in this example).

The whole example reads as:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/reduce.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
reduceVerticesIfExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Compute minimum edge weight for vertices in range [1, 4) with degree >= 2.
    */
   TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
   TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
   auto vertexMinWeights_view = vertexMinWeights.getView();
   auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
 
   auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
   {
      return graph.getVertexDegree( vertexIdx ) >= 2;
   };
 
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
 
   auto store = [ = ] __cuda_callable__( int indexOfVertexIdx, int vertexIdx, const float& minWeight ) mutable
   {
      compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
      vertexMinWeights_view[ vertexIdx ] = minWeight;
   };
 
   int reducedVertexCount = TNL::Graphs::reduceVerticesIf( graph, 1, 4, condition, fetch, TNL::Min{}, store );
 
   /***
    * Print results.
    */
   std::cout << "Number of reduced vertices: " << reducedVertexCount << '\n';
   std::cout << "Minimum edge weight for vertices 1-3 with degree >= 2:" << vertexMinWeights << '\n';
   std::cout << "Compressed minimum weights:" << compressedVertexMinWeights.getView( 0, reducedVertexCount ) << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   reduceVerticesIfExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   reduceVerticesIfExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Number of reduced vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Compressed minimum weights:[ 30, 70 ]
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Number of reduced vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Compressed minimum weights:[ 30, 70 ]

Reducing Over Edges of a Vertex with Argument

Reductions with arguments allow tracking graph edges during the computation. This is useful, for example, when searching for the edge with the maximal (or minimal) weight among edges adjacent to given vertices, or for similar operations. Such functionality is provided by TNL::Graphs::reduceVerticesWithArgument and related functions (TNL::Graphs::reduceAllVerticesWithArgument, TNL::Graphs::reduceVerticesWithArgumentIf, TNL::Graphs::reduceAllVerticesWithArgumentIf).

In the following example, for each vertex in a given range, we search for the edge with the minimal weight. We begin by defining vectors and corresponding vector views to store the results of the reduction:

   TNL::Containers::Vector< float, Device > minWeights( 5, -1 );
   TNL::Containers::Vector< int, Device > minTargets( 5, -1 );
   auto minWeights_view = minWeights.getView();
   auto minTargets_view = minTargets.getView();

The fetch lambda function behaves in the same way as in a standard reduction:

   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };

The store lambda function is defined as follows:

   auto store = [ = ] __cuda_callable__( int vertexIdx, int localIdx, int targetIdx, float result, bool isolatedVertex ) mutable
   {
      minWeights_view[ vertexIdx ] = result;
      if( ! isolatedVertex )
         minTargets_view[ vertexIdx ] = targetIdx;
   };

In addition to the reduction result, it receives the parameters localIdx, targetIdx, and the boolean flag isolatedVertex:

vertexIdx is the index of the vertex for which the result is being stored.
targetIdx is the index of the target vertex of the edge with the minimal weight.
localIdx is the position of the edge with the minimal weight within the set of all edges adjacent to the vertex vertexIdx. If the graph is dense (i.e., the adjacency matrix is dense), localIdx has the same value as targetIdx and may be omitted.
result is the value of the minimal edge weight.
isolatedVertex is a boolean flag. If it is set to true, the vertex with index vertexIdx is isolated and has no adjacent edges. In this case, result is equal to the identity value of the reduction operation, and both localIdx and targetIdx are undefined and must be ignored.

The reduction is performed as follows:

TNL::Graphs::reduceVerticesWithArgument( graph, 1, 5, fetch, TNL::MinWithArg{}, store );

We use TNL::MinWithArg for reduction (see also Function objects for reduction operations with argument for other functionals). Another variant of TNL::Graphs::reduceVerticesWithArgument allows to define the reduction operation via a lambda function.

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/reduce.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
reduceVerticesWithArgumentExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
 
   /***
    * Find minimum edge weight and target vertex for vertices in range [1, 4).
    */
   TNL::Containers::Vector< float, Device > minWeights( 5, -1 );
   TNL::Containers::Vector< int, Device > minTargets( 5, -1 );
   auto minWeights_view = minWeights.getView();
   auto minTargets_view = minTargets.getView();
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
   auto store = [ = ] __cuda_callable__( int vertexIdx, int localIdx, int targetIdx, float result, bool isolatedVertex ) mutable
   {
      minWeights_view[ vertexIdx ] = result;
      if( ! isolatedVertex )
         minTargets_view[ vertexIdx ] = targetIdx;
   };
   TNL::Graphs::reduceVerticesWithArgument( graph, 1, 5, fetch, TNL::MinWithArg{}, store );
 
   /***
    * Print results.
    */
   std::cout << "Minimum edge weight for vertices 1-4:" << minWeights << '\n';
   std::cout << "Target vertex for minimum edge:" << minTargets << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   reduceVerticesWithArgumentExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   reduceVerticesWithArgumentExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Minimum edge weight for vertices 1-4:[ -1, 30, 60, 70, 3.40282e+38 ]
Target vertex for minimum edge:[ -1, 2, 3, 0, -1 ]
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Minimum edge weight for vertices 1-4:[ -1, 30, 60, 70, 3.40282e+38 ]
Target vertex for minimum edge:[ -1, 2, 3, 0, -1 ]

Reducing Over Edges of Specific Vertices and Conditional Reduction With Argument

The following example demonstrates a reduction with arguments over a specific set of vertices, specified either by their indices or by a condition:

   /***
    * Find minimum edge weight and target vertex for vertices in range [1, 4) with degree >= 2.
    */
   TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
   TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
   TNL::Containers::Vector< int, Device > vertexMinTargets( 5, -1 );
   TNL::Containers::Vector< int, Device > compressedVertexMinTargets( 5 );
   auto vertexMinWeights_view = vertexMinWeights.getView();
   auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
   auto vertexMinTargets_view = vertexMinTargets.getView();
   auto compressedVertexMinTargets_view = compressedVertexMinTargets.getView();
 
   auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
   {
      return graph.getVertexDegree( vertexIdx ) >= 2;
   };
 
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
 
   auto store =
      [ = ] __cuda_callable__(
         int indexOfVertexIdx, int vertexIdx, int localIdx, int targetIdx, float minWeight, bool isolatedVertex ) mutable
   {
      compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
      vertexMinWeights_view[ vertexIdx ] = minWeight;
      if( ! isolatedVertex ) {
         compressedVertexMinTargets_view[ indexOfVertexIdx ] = targetIdx;
         vertexMinTargets_view[ vertexIdx ] = targetIdx;
      }
   };
 
   int traversedVertexCount =
      TNL::Graphs::reduceVerticesWithArgumentIf( graph, 1, 4, condition, fetch, TNL::MinWithArg{}, store );

The lambda functions condition and fetch remain unchanged. The store lambda function accepts the following parameters:

indexOfVertexIdx – the position of the vertex whose result is being stored within the set of all vertices reduced during the operation. This index can be used to store results in a compressed form.
vertexIdx – the index of the vertex whose reduction result is being stored.
localIdx – the position of the edge with the minimal weight within the set of all edges adjacent to the vertex vertexIdx.
targetIdx – the index of the target vertex of the edge with the minimal weight.
minWeight – the weight value of the edge with the minimal weight.
isolatedVertex – indicates (if set to true) that the vertex with index vertexIdx is isolated and has no adjacent edges over which the reduction can be performed.

In this case, the reduction result (minWeight in this example) is set to the identity value of the reduction operation, and the variables localIdx and targetIdx are undefined and must not be used.

The whole example reads as follows:

#include <iostream>
#include <TNL/Graphs/Graph.h>
#include <TNL/Graphs/reduce.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
#include <TNL/Containers/Vector.h>
 
template< typename Device >
void
reduceVerticesWithArgumentIfExample()
{
   /***
    * Create a directed graph with 5 vertices.
    */
   using GraphType = TNL::Graphs::Graph< float, Device, int, TNL::Graphs::DirectedGraph >;
   // clang-format off
   GraphType graph( 5, // number of vertices
        {  // definition of edges with weights
          { 0, 1, 10.0 }, { 0, 2, 20.0 },
                          { 1, 2, 30.0 }, { 1, 3, 40.0 }, { 1, 4, 50.0 },
                                          { 2, 3, 60.0 },
          { 3, 0, 70.0 },                                 { 3, 4, 80.0 } } );
   // clang-format on
 
   /***
    * Print the graph.
    */
   std::cout << "Graph:\n" << graph << '\n';
   /***
    * Find minimum edge weight and target vertex for vertices in range [1, 4) with degree >= 2.
    */
   TNL::Containers::Vector< float, Device > vertexMinWeights( 5, -1 );
   TNL::Containers::Vector< float, Device > compressedVertexMinWeights( 5 );
   TNL::Containers::Vector< int, Device > vertexMinTargets( 5, -1 );
   TNL::Containers::Vector< int, Device > compressedVertexMinTargets( 5 );
   auto vertexMinWeights_view = vertexMinWeights.getView();
   auto compressedVertexMinWeights_view = compressedVertexMinWeights.getView();
   auto vertexMinTargets_view = vertexMinTargets.getView();
   auto compressedVertexMinTargets_view = compressedVertexMinTargets.getView();
 
   auto condition = [ = ] __cuda_callable__( int vertexIdx ) -> bool
   {
      return graph.getVertexDegree( vertexIdx ) >= 2;
   };
 
   auto fetch = [] __cuda_callable__( int sourceIdx, int targetIdx, const float& weight ) -> float
   {
      return weight;
   };
 
   auto store =
      [ = ] __cuda_callable__(
         int indexOfVertexIdx, int vertexIdx, int localIdx, int targetIdx, float minWeight, bool isolatedVertex ) mutable
   {
      compressedVertexMinWeights_view[ indexOfVertexIdx ] = minWeight;
      vertexMinWeights_view[ vertexIdx ] = minWeight;
      if( ! isolatedVertex ) {
         compressedVertexMinTargets_view[ indexOfVertexIdx ] = targetIdx;
         vertexMinTargets_view[ vertexIdx ] = targetIdx;
      }
   };
 
   int traversedVertexCount =
      TNL::Graphs::reduceVerticesWithArgumentIf( graph, 1, 4, condition, fetch, TNL::MinWithArg{}, store );
 
   /***
    * Print results.
    */
   std::cout << "Number of traversed vertices: " << traversedVertexCount << '\n';
   std::cout << "Minimum edge weight for vertices 1-3 with degree >= 2:" << vertexMinWeights << '\n';
   std::cout << "Target vertex for minimum edge:" << vertexMinTargets << '\n';
   std::cout << "Compressed minimum weights:" << compressedVertexMinWeights.getView( 0, traversedVertexCount ) << '\n';
   std::cout << "Compressed minimum targets:" << compressedVertexMinTargets.getView( 0, traversedVertexCount ) << '\n';
}
 
int
main( int argc, char* argv[] )
{
   std::cout << "Running on host:\n";
   reduceVerticesWithArgumentIfExample< TNL::Devices::Host >();
 
#ifdef __CUDACC__
   std::cout << "Running on CUDA device:\n";
   reduceVerticesWithArgumentIfExample< TNL::Devices::Cuda >();
#endif
 
   return EXIT_SUCCESS;
}

The output looks as follows:

Running on host:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Number of traversed vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Target vertex for minimum edge:[ -1, 2, -1, 0, -1 ]
Compressed minimum weights:[ 30, 70 ]
Compressed minimum targets:[ 2, 0 ]
Running on CUDA device:
Graph:
Row: 0 ->     1:10      2:20  
Row: 1 ->     2:30      3:40      4:50  
Row: 2 ->     3:60  
Row: 3 ->     0:70      4:80  
Row: 4 -> 
 
Number of traversed vertices: 2
Minimum edge weight for vertices 1-3 with degree >= 2:[ -1, 30, -1, 70, -1 ]
Target vertex for minimum edge:[ -1, 2, -1, 0, -1 ]
Compressed minimum weights:[ 30, 70 ]
Compressed minimum targets:[ 2, 0 ]

Graph Views

Similar to matrix views, graph views (TNL::Graphs::GraphView) are lightweight reference objects that allow passing graphs to GPU kernels. The graph views can be obtained from the graph using the method TNL::Graphs::Graph::getView or TNL::Graphs::Graph::getConstView.

Graph views are particularly useful when:

Passing graphs to custom GPU kernels
Working with graphs in lambda functions
Creating shallow copies for parallel algorithms

Performance Considerations

Choosing the Right Matrix Format

For sparse graphs, different segment formats offer different performance characteristics. The user may choose any type of segments

see TNL::Algorithms::Segments.

For example graph with Ellpack format can be obtained as follows:

using EllpackGraph = TNL::Graphs::Graph< float, Device, int,
                                         TNL::Graphs::DirectedGraph,
                                         TNL::Algorithms::Segments::Ellpack >;

Table of Contents

Introduction

Graph Orientation: Directed vs. Undirected Graphs

Sparse vs. Dense Adjacency Matrices

Sparse Adjacency Matrices And Sparse Graphs (Default)

Dense Adjacency Matrices And Dense Graphs

Comparison Table

Constructing Graphs

Default Constructor

Constructor with Vertex Count

Constructor with Initializer List for Sparse and Dense Graphs

Constructor with Initializer List for Sparse Undirected Graphs

Constructor with Initializer List for Dense Graphs

Constructor with std::map

Copy constructor

Constructor from Adjacency Matrix

Setting Edges

Setting Edges with Initializer List (Sparse)

Setting Edges with Initializer List (Dense)

Setting Edges with std::map

Graph Traversal

Traversing By Vertices

Traversing All Vertices

Traversing a Range of Vertices

Traversing Vertices with Given Indices

Traversing Vertices with a Condition

Traversing By Edges

Traversing All Edges

Traversing Edges in a Range

Traversing Edges for Specific Vertices

Conditional Edge Traversal

Const vs. Non-Const Edge Traversal

Reductions on Graphs

Reducing Over Edges of a Vertex

Reducing Over Edges of Specific Vertices and Conditional Reduction

Reducing Over Edges of a Vertex with Argument

Reducing Over Edges of Specific Vertices and Conditional Reduction With Argument

Graph Views

Performance Considerations

Choosing the Right Matrix Format

Further Reading