Template Numerical Library version main:1655e92
Static Public Member Functions | List of all members
TNL::Algorithms::ParallelFor< Device, Mode > Struct Template Reference

Parallel for loop for one dimensional interval of indices. More...

#include <TNL/Algorithms/ParallelFor.h>

Static Public Member Functions

template<typename Index , typename Function , typename... FunctionArgs>
static void exec (Index start, Index end, Function f, FunctionArgs... args)
 Static method for the execution of the loop. More...
 

Detailed Description

template<typename Device = Devices::Sequential, ParallelForMode Mode = SynchronousMode>
struct TNL::Algorithms::ParallelFor< Device, Mode >

Parallel for loop for one dimensional interval of indices.

Template Parameters
Devicespecifies the device where the for-loop will be executed. It can be TNL::Devices::Host, TNL::Devices::Cuda or TNL::Devices::Sequential.
Modedefines synchronous/asynchronous mode on parallel devices.

Member Function Documentation

◆ exec()

template<typename Device = Devices::Sequential, ParallelForMode Mode = SynchronousMode>
template<typename Index , typename Function , typename... FunctionArgs>
static void TNL::Algorithms::ParallelFor< Device, Mode >::exec ( Index  start,
Index  end,
Function  f,
FunctionArgs...  args 
)
inlinestatic

Static method for the execution of the loop.

Template Parameters
Indexis the type of the loop indices.
Functionis the type of the functor to be called in each iteration (it is usually deduced from the argument used in the function call).
FunctionArgsis a variadic pack of types for additional parameters that are forwarded to the functor in every iteration.
Parameters
startis the left bound of the iteration range [begin, end).
endis the right bound of the iteration range [begin, end).
fis the function to be called in each iteration.
argsare additional parameters to be passed to the function f.
Example
#include <iostream>
#include <TNL/Containers/Vector.h>
#include <TNL/Algorithms/ParallelFor.h>
using namespace TNL;
using namespace TNL::Containers;
using namespace TNL::Algorithms;
/****
* Set all elements of the vector v to the constant c.
*/
template< typename Device >
void initVector( Vector< double, Device >& v,
const double& c )
{
auto view = v.getView();
auto init = [=] __cuda_callable__ ( int i ) mutable
{
view[ i ] = c;
};
ParallelFor< Device >::exec( 0, v.getSize(), init );
}
int main( int argc, char* argv[] )
{
/***
* Firstly, test the vector initiation on CPU.
*/
Vector< double, Devices::Host > host_v( 10 );
initVector( host_v, 1.0 );
std::cout << "host_v = " << host_v << std::endl;
/***
* And then also on GPU.
*/
#ifdef HAVE_CUDA
Vector< double, Devices::Cuda > cuda_v( 10 );
initVector( cuda_v, 1.0 );
std::cout << "cuda_v = " << cuda_v << std::endl;
#endif
return EXIT_SUCCESS;
}
#define __cuda_callable__
Definition: CudaCallable.h:22
T endl(T... args)
Namespace for fundamental TNL algorithms.
Definition: AtomicOperations.h:14
Namespace for TNL containers.
Definition: Array.h:21
The main TNL namespace.
Definition: AtomicOperations.h:13
static void exec(Index start, Index end, Function f, FunctionArgs... args)
Static method for the execution of the loop.
Definition: ParallelFor.h:87
Output
host_v = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]
cuda_v = [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ]

The documentation for this struct was generated from the following file: