Template Numerical Library version\ main:94209208
Loading...
Searching...
No Matches
Static Public Member Functions | List of all members
TNL::Algorithms::SegmentedScan< Devices::Host, Type > Struct Template Reference

Static Public Member Functions

template<typename Vector , typename Reduction , typename Flags >
static void perform (Vector &v, Flags &flags, typename Vector::IndexType begin, typename Vector::IndexType end, const Reduction &reduction, typename Vector::ValueType identity)
 Computes segmented scan (prefix sum) using OpenMP.
 

Member Function Documentation

◆ perform()

template<detail::ScanType Type>
template<typename Vector , typename Reduction , typename Flags >
void TNL::Algorithms::SegmentedScan< Devices::Host, Type >::perform ( Vector & v,
Flags & flags,
typename Vector::IndexType begin,
typename Vector::IndexType end,
const Reduction & reduction,
typename Vector::ValueType identity )
static

Computes segmented scan (prefix sum) using OpenMP.

Template Parameters
Vectortype vector being used for the scan.
Reductionlambda function defining the reduction operation
Flagsarray type containing zeros and ones defining the segments begining
Parameters
vinput vector, the result of scan is stored in the same vector
flagsis an array with zeros and ones defining the segments begining
beginthe first element in the array to be scanned
endthe last element in the array to be scanned
reductionlambda function implementing the reduction operation
identityis the identity element for the reduction operation, i.e. element which does not change the result of the reduction.

The reduction lambda function takes two variables which are supposed to be reduced:

auto reduction = [] __cuda_callable__ ( const Result& a, const Result& b ) { return ... };
#define __cuda_callable__
Definition Macros.h:49
Example
#include <iostream>
#include <TNL/Containers/Array.h>
#include <TNL/Algorithms/SegmentedScan.h>
using namespace TNL;
using namespace TNL::Containers;
using namespace TNL::Algorithms;
template< typename Device >
void
{
/***
* Reduction is sum of two numbers.
*/
auto reduce = [] __cuda_callable__( const double& a, const double& b )
{
return a + b;
};
/***
* As parameters, we pass array on which the scan is to be performed, interval
* where the scan is performed, lambda function which is used by the scan and
* zero as the identity element of the 'sum' operation.
*/
SegmentedScan< Device >::perform( v, flags, 0, v.getSize(), reduce, 0.0 );
}
int
main( int argc, char* argv[] )
{
/***
* Firstly, test the segmented prefix sum with arrays allocated on CPU.
*/
Array< bool, Devices::Host > host_flags{ 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0 };
Array< double, Devices::Host > host_v{ 1, 3, 5, 2, 4, 6, 9, 3, 5, 3, 6, 9, 12, 15 };
std::cout << "host_flags = " << host_flags << std::endl;
std::cout << "host_v = " << host_v << std::endl;
segmentedScan( host_v, host_flags );
std::cout << "The segmented prefix sum of the host array is " << host_v << "." << std::endl;
/***
* And then also on GPU.
*/
#ifdef __CUDACC__
//Array< bool, Devices::Cuda > cuda_flags{ 1,0,0,1,0,0,0,1,0,1,0,0, 0, 0 };
//Array< double, Devices::Cuda > cuda_v { 1,3,5,2,4,6,9,3,5,3,6,9,12,15 };
//std::cout << "cuda_flags = " << cuda_flags << std::endl;
//std::cout << "cuda_v = " << cuda_v << std::endl;
//segmentedScan( cuda_v, cuda_flags );
//std::cout << "The segmnted prefix sum of the CUDA array is " << cuda_v << "." << std::endl;
#endif
return EXIT_SUCCESS;
}
Array is responsible for memory management, access to array elements, and general array operations.
Definition Array.h:64
__cuda_callable__ IndexType getSize() const
Returns the current array size.
Definition Array.hpp:245
T endl(T... args)
Namespace for fundamental TNL algorithms.
Definition AtomicOperations.h:9
Result reduce(Index begin, Index end, Fetch &&fetch, Reduction &&reduction, const Result &identity)
reduce implements (parallel) reduction for vectors and arrays.
Definition reduce.h:65
Namespace for TNL containers.
Definition Array.h:17
The main TNL namespace.
Definition AtomicOperations.h:9
Computes segmented scan (or prefix sum) on a vector.
Definition SegmentedScan.h:56
Output
host_flags = [ 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0 ]
host_v = [ 1, 3, 5, 2, 4, 6, 9, 3, 5, 3, 6, 9, 12, 15 ]
The segmented prefix sum of the host array is [ 1, 4, 9, 2, 6, 12, 21, 3, 8, 3, 9, 18, 30, 45 ].

The documentation for this struct was generated from the following files: