Template Numerical Library version\ main:481315e2
Loading...
Searching...
No Matches
Vectors

Introduction

This chapter introduces vectors in TNL. Vector, in addition to Array, offers also basic operations from linear algebra. The reader will mainly learn how to do Blas level 1 operations in TNL. Thanks to the design of TNL, it is easier to implement, hardware architecture transparent and in some cases even faster then Blas or cuBlas implementation.

Dynamic vectors

Vector is, similar to Array, templated class defined in namespace TNL::Containers having three template parameters:

  • Real is type of data to be stored in the vector
  • Device is the device where the vector is allocated. Currently it can be either Devices::Host for CPU or Devices::Cuda for GPU supporting CUDA.
  • Index is the type to be used for indexing the vector elements.

Vector, unlike Array, requires that the Real type is numeric or a type for which basic algebraic operations are defined. What kind of algebraic operations is required depends on what vector operations the user will call. Vector is derived from Array so it inherits all its methods. In the same way the Array has its counterpart ArraView, Vector has VectorView which is derived from ArrayView. See Arrays for more details.

Horizontal operations

By horizontal operations we mean vector expressions where we have one or more vectors as an input and a vector as an output. In TNL, this kind of operations is performed by the Expression Templates. It makes algebraic operations with vectors easy to do and very efficient at the same time. In some cases, one get even more efficient code compared to Blas and cuBlas. See the following example.

1#include <iostream>
2#include <TNL/Containers/Vector.h>
3#include <TNL/Containers/VectorView.h>
4
5using namespace TNL;
6using namespace TNL::Containers;
7
8template< typename Device >
9void expressions()
10{
11 using RealType = float;
12 using VectorType = Vector< RealType, Device >;
13
14 /****
15 * Create vectors
16 */
17 const int size = 11;
18 VectorType a( size ), b( size ), c( size );
19 a.forAllElements( [] __cuda_callable__ ( int i, RealType& value ) { value = 3.14 * ( i - 5.0 ) / 5.0; } );
20 b = a * a;
21 c = 3 * a + sign( a ) * sin( a );
22 std::cout << "a = " << a << std::endl;
23 std::cout << "sin( a ) = " << sin( a ) << std::endl;
24 std::cout << "abs( sin( a ) ) = " << abs( sin ( a ) ) << std::endl;
25 std::cout << "b = " << b << std::endl;
26 std::cout << "c = " << c << std::endl;
27}
28
29int main( int argc, char* argv[] )
30{
31 /****
32 * Perform test on CPU
33 */
34 std::cout << "Expressions on CPU ..." << std::endl;
35 expressions< Devices::Host >();
36
37 /****
38 * Perform test on GPU
39 */
41 std::cout << "Expressions on GPU ..." << std::endl;
42 expressions< Devices::Cuda >();
43}
44
45
#define __cuda_callable__
Definition CudaCallable.h:22
Vector extends Array with algebraic operations.
Definition Vector.h:39
T endl(T... args)
Namespace for TNL containers.
Definition Array.h:20
The main TNL namespace.
Definition AtomicOperations.h:12

Output is:

Expressions on CPU ...
a = [ -3.14, -2.512, -1.884, -1.256, -0.628, 0, 0.628, 1.256, 1.884, 2.512, 3.14 ]
sin( a ) = [ -0.00159255, -0.588816, -0.951351, -0.950859, -0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
abs( sin( a ) ) = [ 0.00159255, 0.588816, 0.951351, 0.950859, 0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
b = [ 9.8596, 6.31014, 3.54946, 1.57754, 0.394384, 0, 0.394384, 1.57754, 3.54946, 6.31014, 9.8596 ]
c = [ -9.41841, -6.94718, -4.70065, -2.81714, -1.29647, 0, 2.47153, 4.71886, 6.60335, 8.12482, 9.42159 ]
Expressions on GPU ...
a = [ -3.14, -2.512, -1.884, -1.256, -0.628, 0, 0.628, 1.256, 1.884, 2.512, 3.14 ]
sin( a ) = [ -0.00159255, -0.588816, -0.951351, -0.950859, -0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
abs( sin( a ) ) = [ 0.00159255, 0.588816, 0.951351, 0.950859, 0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
b = [ 9.8596, 6.31014, 3.54946, 1.57754, 0.394384, 0, 0.394384, 1.57754, 3.54946, 6.31014, 9.8596 ]
c = [ -9.41841, -6.94718, -4.70065, -2.81714, -1.29647, 0, 2.47153, 4.71886, 6.60335, 8.12482, 9.42159 ]

The expression is evaluated on the same device where the vectors are allocated, this is done automatically. One cannot, however, mix vectors from different devices in one expression. Vector expression may contain any common function like the following:

Expression Meaning
v = TNL::logicalAnd( expr1, expr2 ) v[ i ] = expr1[ i ] && expr2[ i ]
v = TNL::logicalOr( expr1, expr2 ) v[ i ] = expr1[ i ] || expr2[ i ]
v = TNL::bitwiseAnd( expr1, expr2 ) v[ i ] = expr1[ i ] & expr2[ i ]
v = TNL::bitwiseOr( expr1, expr2 ) v[ i ] = expr1[ i ] | expr2[ i ]
v = TNL::bitwiseXor( expr1, expr2 ) v[ i ] = expr1[ i ] ^ expr2[ i ]
v = TNL::equalTo( expr1, expr2 ) v[ i ] = expr1[ i ] == expr2[ i ]
v = TNL::notEqualTo( expr1, expr2 ) v[ i ] = expr1[ i ] != expr2[ i ]
v = TNL::greater( expr1, expr2 ) v[ i ] = expr1[ i ] > expr2[ i ]
v = TNL::greaterEqual( expr1, expr2 ) v[ i ] = expr1[ i ] >= expr2[ i ]
v = TNL::less( expr1, expr2 ) v[ i ] = expr1[ i ] < expr2[ i ]
v = TNL::lessEqual( expr1, expr2 ) v[ i ] = expr1[ i ] <= expr2[ i ]
v = TNL::minimum( expr1, expr2 ) v[ i ] = min( expr1[ i ], expr2[ i ] )
v = TNL::maximum( expr1, expr2 ) v[ i ] = max( expr1[ i ], expr2[ i ] )
v = TNL::abs( expr ) v[ i ] = abs( expr[ i ] )
v = TNL::sin( expr ) v[ i ] = sin( expr[ i ] )
v = TNL::cos( expr ) v[ i ] = cos( expr[ i ] )
v = TNL::tan( expr ) v[ i ] = tan( expr[ i ] )
v = TNL::asin( expr ) v[ i ] = asin( expr[ i ] )
v = TNL::acos( expr ) v[ i ] = acos( expr[ i ] )
v = TNL::atan( expr ) v[ i ] = atan( expr[ i ] )
v = TNL::sinh( expr ) v[ i ] = sinh( expr[ i ] )
v = TNL::cosh( expr ) v[ i ] = cosh( expr[ i ] )
v = TNL::tanh( expr ) v[ i ] = tanh( expr[ i ] )
v = TNL::asinh( expr ) v[ i ] = asinh( expr[ i ] )
v = TNL::acosh( expr ) v[ i ] = acosh( expr[ i ] )
v = TNL::atanh( expr ) v[ i ] = atanh( expr[ i ] )
v = TNL::exp( expr ) v[ i ] = exp( expr[ i ] )
v = TNL::log( expr ) v[ i ] = log( expr[ i ] )
v = TNL::log10( expr ) v[ i ] = log10( expr[ i ] )
v = TNL::log2( expr ) v[ i ] = log2( expr[ i ] )
v = TNL::sqrt( expr ) v[ i ] = sqrt( expr[ i ] )
v = TNL::cbrt( expr ) v[ i ] = cbrt( expr[ i ] )
v = TNL::pow( expr ) v[ i ] = pow( expr[ i ] )
v = TNL::floor( expr ) v[ i ] = floor( expr[ i ] )
v = TNL::ceil( expr ) v[ i ] = ceil( expr[ i ] )
v = TNL::sign( expr ) v[ i ] = sign( expr[ i ] )

Where v is a result vector and expr, expr1 and expr2 are vector expressions. Vector expressions can be combined with vector views (TNL::Containers::VectorView) as well.

Vertical operations

By vertical operations we mean (parallel) reduction based operations where we have one vector expressions as an input and one value as an output. For example computing scalar product, vector norm or finding minimum or maximum of vector elements is based on reduction. See the following example.

1#include <iostream>
2#include <TNL/Containers/Vector.h>
3#include <TNL/Containers/VectorView.h>
4
5using namespace TNL;
6using namespace TNL::Containers;
7
8template< typename Device >
9void expressions()
10{
11 using RealType = float;
12 using VectorType = Vector< RealType, Device >;
13 using ViewType = VectorView< RealType, Device >;
14
15 /****
16 * Create vectors
17 */
18 const int size = 11;
19 VectorType a_v( size ), b_v( size ), c_v( size );
20 ViewType a = a_v.getView();
21 ViewType b = b_v.getView();
22 ViewType c = c_v.getView();
23 a.forAllElements( [] __cuda_callable__ ( int i, RealType& value ) { value = i; } );
24 b.forAllElements( [] __cuda_callable__ ( int i, RealType& value ) { value = i - 5.0; } );
25 c = -5;
26
27 std::cout << "a == " << a << std::endl;
28 std::cout << "b == " << b << std::endl;
29 std::cout << "c == " << c << std::endl;
30 auto arg_min_a = argMin( a );
31 auto arg_max_a = argMax( a );
32 auto arg_min_b = argMin( b );
33 auto arg_max_b = argMax( b );
34 std::cout << "min( a ) == " << arg_min_a.second << " at " << arg_min_a.first << std::endl;
35 std::cout << "max( a ) == " << arg_max_a.second << " at " << arg_max_a.first << std::endl;
36 std::cout << "min( b ) == " << arg_min_b.second << " at " << arg_min_b.first << std::endl;
37 std::cout << "max( b ) == " << arg_max_b.second << " at " << arg_max_b.first << std::endl;
38 std::cout << "min( abs( b ) ) == " << min( abs( b ) ) << std::endl;
39 std::cout << "sum( b ) == " << sum( b ) << std::endl;
40 std::cout << "sum( abs( b ) ) == " << sum( abs( b ) ) << std::endl;
41 std::cout << "Scalar product: ( a, b ) == " << ( a, b ) << std::endl;
42 std::cout << "Scalar product: ( a + 3, abs( b ) / 2 ) == " << ( a + 3, abs( b ) / 2 ) << std::endl;
43 const bool cmp = all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) );
44 std::cout << "all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) ) == " << (cmp ? "true" : "false") << std::endl;
45}
46
47int main( int argc, char* argv[] )
48{
49 /****
50 * Perform test on CPU
51 */
52 std::cout << "Expressions on CPU ..." << std::endl;
53 expressions< Devices::Host >();
54
55 /****
56 * Perform test on GPU
57 */
58#ifdef __CUDACC__
60 std::cout << "Expressions on GPU ..." << std::endl;
61 expressions< Devices::Cuda >();
62#endif
63}
64
65
VectorView extends ArrayView with algebraic operations.
Definition VectorView.h:27
T min(T... args)

Output is:

Expressions on CPU ...
a == [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
b == [ -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 ]
c == [ -5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5 ]
min( a ) == 0 at 0
max( a ) == 10 at 10
min( b ) == 0 at -5
max( b ) == 10 at 5
min( abs( b ) ) == 0
sum( b ) == 0
sum( abs( b ) ) == 30
Scalar product: ( a, b ) == 110
Scalar product: ( a + 3, abs( b ) / 2 ) == 120
all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) ) == true
Expressions on GPU ...
a == [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
b == [ -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 ]
c == [ -5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5 ]
min( a ) == 0 at 0
max( a ) == 10 at 10
min( b ) == 0 at -5
max( b ) == 10 at 5
min( abs( b ) ) == 0
sum( b ) == 0
sum( abs( b ) ) == 30
Scalar product: ( a, b ) == 110
Scalar product: ( a + 3, abs( b ) / 2 ) == 120
all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) ) == true

The following table shows vertical operations that can be used on vector expressions:

Expression Meaning
v = TNL::min( expr ) v is the minimum of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
auto [ v, i ] = TNL::argMin( expr ) v is the minimum of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ], i is the index of the smallest element.
v = TNL::max( expr ) v is the maximum of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
auto [ v, i ] = TNL::argMax( expr ) v is the maximum of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ], i is the index of the largest element.
v = TNL::sum( expr ) v is the sum of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
v = TNL::maxNorm( expr ) v is the maximum norm of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
v = TNL::l1Norm( expr ) v is the l1 norm of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
v = TNL::l2Norm( expr ) v is the l2 norm of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
v = TNL::lpNorm( expr, p ) v is the lp norm of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
v = TNL::product( expr ) v is product of expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ].
v = TNL::all( expr ) v is the result of expr[ 0 ] && expr[ 1 ] && ... && expr[ n-1 ].
v = TNL::any( expr ) v is the result of expr[ 0 ] || expr[ 1 ] || ... || expr[ n-1 ].

Static vectors

Static vectors are derived from static arrays and so they are allocated on the stack and can be created in CUDA kernels as well. Their size is fixed and given by a template parameter. The StaticVector class template is defined in the TNL::Containers namespace and has two template parameters:

  1. Size is the vector size.
  2. Real is type of elements stored in the vector.

The interface of StaticVector is smilar to Vector. StaticVector also supports expression templates, which make the use of static vectors simple and efficient at the same time. Additionally, the comparison operators <, <=, >, and >= are defined for StaticVector. The comparison follows the lexicographic order and it is performed by an algorithm equivalent to std::lexicographical_compare.

Example:

#include <iostream>
#include <TNL/Containers/StaticVector.h>
using namespace TNL;
using namespace TNL::Containers;
int main( int argc, char* argv[] )
{
Vector v1( 1.0 ), v2( 1.0, 2.0, 3.0 ), v3( v1 - v2 / 2.0 );
Vector v4( 0.0 ), v5( 1.0 );
v4 += v2;
v5 *= v3;
std::cout << "v1 = " << v1 << std::endl;
std::cout << "v2 = " << v2 << std::endl;
std::cout << "v3 = " << v3 << std::endl;
std::cout << "v4 = " << v4 << std::endl;
std::cout << "v5 = " << v5 << std::endl;
std::cout << "abs( v3 - 2.0 ) = " << abs( v3 - 2.0 ) << std::endl;
std::cout << "v2 * v2 = " << v2 * v2 << std::endl;
}
Vector with constant size.
Definition StaticVector.h:22

The output looks as:

v1 = [ 1, 1, 1 ]
v2 = [ 1, 2, 3 ]
v3 = [ 0.5, 0, -0.5 ]
v4 = [ 1, 2, 3 ]
v5 = [ 0.5, 0, -0.5 ]
abs( v3 - 2.0 ) = [ 1.5, 2, 2.5 ]
v2 * v2 = [ 1, 4, 9 ]

Distributed vectors

DistributedVector extends DistributedArray with algebraic operations. The functionality is similar to how Vector extends Array. DistributedVector also supports expression templates and other operations present in Vector.

Example:

#include <iostream>
#include <TNL/Containers/Partitioner.h>
#include <TNL/Containers/DistributedVector.h>
#include <TNL/MPI/ScopedInitializer.h>
/***
* The following works for any device (CPU, GPU ...).
*/
template< typename Device >
void distributedVectorExample()
{
using IndexType = typename VectorType::IndexType;
using LocalRangeType = typename VectorType::LocalRangeType;
const TNL::MPI::Comm communicator = MPI_COMM_WORLD;
// We set the global vector size to a prime number to force non-uniform distribution.
const int size = 97;
const int ghosts = (communicator.size() > 1) ? 4 : 0;
const LocalRangeType localRange = TNL::Containers::Partitioner< IndexType >::splitRange( size, communicator );
VectorType v( localRange, ghosts, size, communicator );
v.forElements( 0, size, [] __cuda_callable__ ( int idx, int& value ) { value = idx; } );
std::cout << "Rank " << communicator.rank() << " has subrange " << localRange << std::endl;
const int sum = TNL::sum( v );
if( communicator.rank() == 0 )
std::cout << "Global sum is " << sum << std::endl;
}
int main( int argc, char* argv[] )
{
TNL::MPI::ScopedInitializer mpi(argc, argv);
if( TNL::MPI::GetRank() == 0 )
std::cout << "The first test runs on CPU ..." << std::endl;
distributedVectorExample< TNL::Devices::Host >();
#ifdef __CUDACC__
TNL::MPI::Barrier();
if( TNL::MPI::GetRank() == 0 )
std::cout << "The second test runs on GPU ..." << std::endl;
distributedVectorExample< TNL::Devices::Cuda >();
#endif
}
DistributedVector extends DistributedArray with algebraic operations.
Definition DistributedVector.h:27
Definition Partitioner.h:22
An RAII wrapper for custom MPI communicators.
Definition Comm.h:66
int size() const
Returns the size of the group associated with a communicator.
Definition Comm.h:226
int rank() const
Determines the rank of the calling process in the communicator.
Definition Comm.h:219
Definition ScopedInitializer.h:66

The output looks as:

Rank 0: rank on node is 0, using GPU id 0 of 2, CUDA_VISIBLE_DEVICES=
The first test runs on CPU ...
Rank 0 has subrange TNL::Containers::Subrange<int>( 0, 25 )
Rank 2: rank on node is 2, using GPU id 0 of 2, CUDA_VISIBLE_DEVICES=
Rank 2 has subrange TNL::Containers::Subrange<int>( 49, 73 )
Rank 3: rank on node is 3, using GPU id 1 of 2, CUDA_VISIBLE_DEVICES=
Rank 3 has subrange TNL::Containers::Subrange<int>( 73, 97 )
Rank 1: rank on node is 1, using GPU id 1 of 2, CUDA_VISIBLE_DEVICES=
Rank 1 has subrange TNL::Containers::Subrange<int>( 25, 49 )
Global sum is 4656
The second test runs on GPU ...
Rank 0 has subrange TNL::Containers::Subrange<int>( 0, 25 )
Rank 2 has subrange TNL::Containers::Subrange<int>( 49, 73 )
Rank 1 has subrange TNL::Containers::Subrange<int>( 25, 49 )
Rank 3 has subrange TNL::Containers::Subrange<int>( 73, 97 )
Global sum is 4656