Template Numerical Library version\ main:f17d0c8
Searching...
No Matches
TNL::Algorithms::SegmentedScan< Devices::Host, Type > Struct Template Reference

## Static Public Member Functions

template<typename Vector , typename Reduction , typename Flags >
static void perform (Vector &v, Flags &flags, typename Vector::IndexType begin, typename Vector::IndexType end, const Reduction &reduction, typename Vector::ValueType identity)
Computes segmented scan (prefix sum) using OpenMP.

## ◆ perform()

template<detail::ScanType Type>
template<typename Vector , typename Reduction , typename Flags >
 void TNL::Algorithms::SegmentedScan< Devices::Host, Type >::perform ( Vector & v, Flags & flags, typename Vector::IndexType begin, typename Vector::IndexType end, const Reduction & reduction, typename Vector::ValueType identity )
static

Computes segmented scan (prefix sum) using OpenMP.

Template Parameters
 Vector type vector being used for the scan. Reduction lambda function defining the reduction operation Flags array type containing zeros and ones defining the segments begining
Parameters
 v input vector, the result of scan is stored in the same vector flags is an array with zeros and ones defining the segments begining begin the first element in the array to be scanned end the last element in the array to be scanned reduction lambda function implementing the reduction operation identity is the identity element for the reduction operation, i.e. element which does not change the result of the reduction.

The reduction lambda function takes two variables which are supposed to be reduced:

auto reduction = [] __cuda_callable__ ( const Result& a, const Result& b ) { return ... };
#define __cuda_callable__
Definition Macros.h:49
Example
#include <iostream>
#include <TNL/Containers/Array.h>
#include <TNL/Algorithms/SegmentedScan.h>
using namespace TNL;
using namespace TNL::Containers;
using namespace TNL::Algorithms;
template< typename Device >
void
{
/***
* Reduction is sum of two numbers.
*/
auto reduce = [] __cuda_callable__( const double& a, const double& b )
{
return a + b;
};
/***
* As parameters, we pass array on which the scan is to be performed, interval
* where the scan is performed, lambda function which is used by the scan and
* zero as the identity element of the 'sum' operation.
*/
SegmentedScan< Device >::perform( v, flags, 0, v.getSize(), reduce, 0.0 );
}
int
main( int argc, char* argv[] )
{
/***
* Firstly, test the segmented prefix sum with arrays allocated on CPU.
*/
Array< bool, Devices::Host > host_flags{ 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0 };
Array< double, Devices::Host > host_v{ 1, 3, 5, 2, 4, 6, 9, 3, 5, 3, 6, 9, 12, 15 };
std::cout << "host_flags = " << host_flags << std::endl;
std::cout << "host_v = " << host_v << std::endl;
segmentedScan( host_v, host_flags );
std::cout << "The segmented prefix sum of the host array is " << host_v << "." << std::endl;
/***
* And then also on GPU.
*/
#ifdef __CUDACC__
//Array< bool, Devices::Cuda > cuda_flags{ 1,0,0,1,0,0,0,1,0,1,0,0, 0, 0 };
//Array< double, Devices::Cuda > cuda_v { 1,3,5,2,4,6,9,3,5,3,6,9,12,15 };
//std::cout << "cuda_flags = " << cuda_flags << std::endl;
//std::cout << "cuda_v = " << cuda_v << std::endl;
//segmentedScan( cuda_v, cuda_flags );
//std::cout << "The segmnted prefix sum of the CUDA array is " << cuda_v << "." << std::endl;
#endif
return EXIT_SUCCESS;
}
Array is responsible for memory management, access to array elements, and general array operations.
Definition Array.h:64
__cuda_callable__ IndexType getSize() const
Returns the current array size.
Definition Array.hpp:245
T endl(T... args)
Namespace for fundamental TNL algorithms.
Definition AtomicOperations.h:9
Result reduce(Index begin, Index end, Fetch &&fetch, Reduction &&reduction, const Result &identity)
reduce implements (parallel) reduction for vectors and arrays.
Definition reduce.h:65
Namespace for TNL containers.
Definition Array.h:17
The main TNL namespace.
Definition AtomicOperations.h:9
Computes segmented scan (or prefix sum) on a vector.
Definition SegmentedScan.h:56
Output
host_flags = [ 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0 ]
host_v = [ 1, 3, 5, 2, 4, 6, 9, 3, 5, 3, 6, 9, 12, 15 ]
The segmented prefix sum of the host array is [ 1, 4, 9, 2, 6, 12, 21, 3, 8, 3, 9, 18, 30, 45 ].

The documentation for this struct was generated from the following files:
• src/TNL/Algorithms/SegmentedScan.h
• src/TNL/Algorithms/SegmentedScan.hpp