Introduction
Dynamic vectors
- Horizontal operations
- Vertical operations
Static vectors
Distributed vectors

Introduction

This chapter introduces vectors in TNL. Vector, in addition to Array, offers also basic operations from linear algebra. The reader will mainly learn how to do BLAS level 1 operations in TNL. Thanks to the design of TNL, it is easier to implement, independent of the hardware architecture, and in some cases even faster then BLAS or cuBLAS implementation.

Dynamic vectors

Vector is a class template similar to Array, which is defined in the TNL::Containers namespace and has four template parameters:

Real is type of data to be stored in the vector
Device is the device to be used for the execution of vector operations. It can be any class defined in the TNL::Devices namespace.
Index is the type to be used for indexing the vector elements.
Allocator is the type of the allocator used for the allocation and deallocation of memory used by the vector. By default, an appropriate allocator for the specified Device is selected with TNL::Allocators::Default.

Unlike Array, Vector requires that the Real type is numeric or a type for which basic algebraic operations are defined. What kind of algebraic operations are required depends on what vector operations the user will call. Vector is derived from Array so it inherits all its methods. In the same way the Array has its counterpart ArraView, Vector has VectorView which is derived from ArrayView. See Arrays for more details.

In addition to the comparison operators == and != defined for Array and ArrayView, the comparison operators <, <=, >, and >= are available for Vector and VectorView. The comparison follows the lexicographic order and it is performed by an algorithm equivalent to std::lexicographical_compare.

Horizontal operations

By horizontal operations we mean vector expressions where we have one or more vectors as an input and a vector as an output. Horizontal operations in TNL are performed by the expression templates. It makes algebraic operations with vectors easy to do and very efficient at the same time. In some cases, one gets code even more efficient compared to BLAS and cuBLAS. See the following example.

#include <iostream>
#include <TNL/Containers/Vector.h>
#include <TNL/Containers/VectorView.h>
 
using namespace TNL;
using namespace TNL::Containers;
 
template< typename Device >
void
expressions()
{
   using RealType = float;
   using VectorType = Vector< RealType, Device >;
 
   /****
    * Create vectors
    */
   const int size = 11;
   VectorType a( size ), b( size ), c( size );
   a.forAllElements(
      [] __cuda_callable__( int i, RealType& value )
      {
         value = 3.14 * ( i - 5.0 ) / 5.0;
      } );
   b = a * a;
   c = 3 * a + sign( a ) * sin( a );
   std::cout << "a = " << a << std::endl;
   std::cout << "sin( a ) = " << sin( a ) << std::endl;
   std::cout << "abs( sin( a ) ) = " << abs( sin( a ) ) << std::endl;
   std::cout << "b = " << b << std::endl;
   std::cout << "c = " << c << std::endl;
}
 
int
main( int argc, char* argv[] )
{
   /****
    * Perform test on CPU
    */
   std::cout << "Expressions on CPU ..." << std::endl;
   expressions< Devices::Host >();
 
   /****
    * Perform test on GPU
    */
#ifdef __CUDACC__
   std::cout << std::endl;
   std::cout << "Expressions on GPU ..." << std::endl;
   expressions< Devices::Cuda >();
#endif
}

Output is:

Expressions on CPU ...
a = [ -3.14, -2.512, -1.884, -1.256, -0.628, 0, 0.628, 1.256, 1.884, 2.512, 3.14 ]
sin( a ) = [ -0.00159255, -0.588816, -0.951351, -0.950859, -0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
abs( sin( a ) ) = [ 0.00159255, 0.588816, 0.951351, 0.950859, 0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
b = [ 9.8596, 6.31014, 3.54946, 1.57754, 0.394384, 0, 0.394384, 1.57754, 3.54946, 6.31014, 9.8596 ]
c = [ -9.41841, -6.94718, -4.70065, -2.81714, -1.29647, 0, 2.47153, 4.71886, 6.60335, 8.12482, 9.42159 ]
 
Expressions on GPU ...
a = [ -3.14, -2.512, -1.884, -1.256, -0.628, 0, 0.628, 1.256, 1.884, 2.512, 3.14 ]
sin( a ) = [ -0.00159255, -0.588816, -0.951351, -0.950859, -0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
abs( sin( a ) ) = [ 0.00159255, 0.588816, 0.951351, 0.950859, 0.587528, 0, 0.587528, 0.950859, 0.951351, 0.588816, 0.00159255 ]
b = [ 9.8596, 6.31014, 3.54946, 1.57754, 0.394384, 0, 0.394384, 1.57754, 3.54946, 6.31014, 9.8596 ]
c = [ -9.41841, -6.94718, -4.70065, -2.81714, -1.29647, 0, 2.47153, 4.71886, 6.60335, 8.12482, 9.42159 ]

The expression is evaluated on the same device where the vectors are allocated, this is done automatically. One cannot, however, mix vectors from different devices in one expression.

Vector expressions consist of operands, operators, and functions. Operands may be vectors, vector views, or other vector expressions. Operators available for vector expressions are listed in the following table, where v is a result vector and expr, expr1, expr2 are vector expressions:

Expression	Meaning
`v = expr1 + expr2`	`v[ i ] = expr1[ i ] + expr2[ i ]`
`v = expr1 - expr2`	`v[ i ] = expr1[ i ] - expr2[ i ]`
`v = expr1 * expr2`	`v[ i ] = expr1[ i ] * expr2[ i ]`
`v = expr1 / expr2`	`v[ i ] = expr1[ i ] / expr2[ i ]`
`v = expr1 / expr2`	`v[ i ] = expr1[ i ] / expr2[ i ]`
`v = expr1 % expr2`	`v[ i ] = expr1[ i ] % expr2[ i ]`
`v = expr1 && expr2`	`v[ i ] = expr1[ i ] && expr2[ i ]`
`v = expr1 \|\| expr2`	`v[ i ] = expr1[ i ] \|\| expr2[ i ]`
`v = expr1 & expr2`	`v[ i ] = expr1[ i ] & expr2[ i ]`
`v = expr1 \| expr2`	`v[ i ] = expr1[ i ] \| expr2[ i ]`
`v = +expr1`	`v[ i ] = +expr1[ i ]`
`v = -expr1`	`v[ i ] = -expr1[ i ]`
`v = !expr1`	`v[ i ] = !expr1[ i ]`
`v = ~expr1`	`v[ i ] = ~expr1[ i ]`

Additionally, vector expressions may contain any function listed in the following table, where v is a result vector and expr, expr1, expr2 are vector expressions:

Expression	Meaning
`v = TNL::equalTo( expr1, expr2 )`	`v[ i ] = expr1[ i ] == expr2[ i ]`
`v = TNL::notEqualTo( expr1, expr2 )`	`v[ i ] = expr1[ i ] != expr2[ i ]`
`v = TNL::greater( expr1, expr2 )`	`v[ i ] = expr1[ i ] > expr2[ i ]`
`v = TNL::greaterEqual( expr1, expr2 )`	`v[ i ] = expr1[ i ] >= expr2[ i ]`
`v = TNL::less( expr1, expr2 )`	`v[ i ] = expr1[ i ] < expr2[ i ]`
`v = TNL::lessEqual( expr1, expr2 )`	`v[ i ] = expr1[ i ] <= expr2[ i ]`
`v = TNL::minimum( expr1, expr2 )`	`v[ i ] = min( expr1[ i ], expr2[ i ] )`
`v = TNL::maximum( expr1, expr2 )`	`v[ i ] = max( expr1[ i ], expr2[ i ] )`
`v = TNL::abs( expr )`	`v[ i ] = abs( expr[ i ] )`
`v = TNL::sin( expr )`	`v[ i ] = sin( expr[ i ] )`
`v = TNL::cos( expr )`	`v[ i ] = cos( expr[ i ] )`
`v = TNL::tan( expr )`	`v[ i ] = tan( expr[ i ] )`
`v = TNL::asin( expr )`	`v[ i ] = asin( expr[ i ] )`
`v = TNL::acos( expr )`	`v[ i ] = acos( expr[ i ] )`
`v = TNL::atan( expr )`	`v[ i ] = atan( expr[ i ] )`
`v = TNL::sinh( expr )`	`v[ i ] = sinh( expr[ i ] )`
`v = TNL::cosh( expr )`	`v[ i ] = cosh( expr[ i ] )`
`v = TNL::tanh( expr )`	`v[ i ] = tanh( expr[ i ] )`
`v = TNL::asinh( expr )`	`v[ i ] = asinh( expr[ i ] )`
`v = TNL::acosh( expr )`	`v[ i ] = acosh( expr[ i ] )`
`v = TNL::atanh( expr )`	`v[ i ] = atanh( expr[ i ] )`
`v = TNL::exp( expr )`	`v[ i ] = exp( expr[ i ] )`
`v = TNL::log( expr )`	`v[ i ] = log( expr[ i ] )`
`v = TNL::log10( expr )`	`v[ i ] = log10( expr[ i ] )`
`v = TNL::log2( expr )`	`v[ i ] = log2( expr[ i ] )`
`v = TNL::sqrt( expr )`	`v[ i ] = sqrt( expr[ i ] )`
`v = TNL::cbrt( expr )`	`v[ i ] = cbrt( expr[ i ] )`
`v = TNL::pow( expr )`	`v[ i ] = pow( expr[ i ] )`
`v = TNL::floor( expr )`	`v[ i ] = floor( expr[ i ] )`
`v = TNL::ceil( expr )`	`v[ i ] = ceil( expr[ i ] )`
`v = TNL::sign( expr )`	`v[ i ] = sign( expr[ i ] )`

Vertical operations

By vertical operations we mean (parallel) reduction based operations where we have one vector expressions as an input and one value as an output. For example computing scalar product, vector norm or finding minimum or maximum of vector elements is based on reduction. See the following example.

#include <iostream>
#include <TNL/Containers/Vector.h>
#include <TNL/Containers/VectorView.h>
 
using namespace TNL;
using namespace TNL::Containers;
 
template< typename Device >
void
expressions()
{
   using RealType = float;
   using VectorType = Vector< RealType, Device >;
   using ViewType = VectorView< RealType, Device >;
 
   /****
    * Create vectors
    */
   const int size = 11;
   VectorType a_v( size ), b_v( size ), c_v( size );
   ViewType a = a_v.getView();
   ViewType b = b_v.getView();
   ViewType c = c_v.getView();
   a.forAllElements(
      [] __cuda_callable__( int i, RealType& value )
      {
         value = i;
      } );
   b.forAllElements(
      [] __cuda_callable__( int i, RealType& value )
      {
         value = i - 5.0;
      } );
   c = -5;
 
   std::cout << "a == " << a << std::endl;
   std::cout << "b == " << b << std::endl;
   std::cout << "c == " << c << std::endl;
   auto [ min_a_val, min_a_pos ] = argMin( a );
   auto [ max_a_val, max_a_pos ] = argMax( a );
   auto [ min_b_val, min_b_pos ] = argMin( b );
   auto [ max_b_val, max_b_pos ] = argMax( b );
   std::cout << "min( a ) == " << min_a_val << " at " << min_a_pos << std::endl;
   std::cout << "max( a ) == " << max_a_val << " at " << max_a_pos << std::endl;
   std::cout << "min( b ) == " << min_b_val << " at " << min_b_pos << std::endl;
   std::cout << "max( b ) == " << max_b_val << " at " << max_b_pos << std::endl;
   std::cout << "min( abs( b ) ) == " << min( abs( b ) ) << std::endl;
   std::cout << "sum( b ) == " << sum( b ) << std::endl;
   std::cout << "sum( abs( b ) ) == " << sum( abs( b ) ) << std::endl;
   std::cout << "Scalar product: ( a, b ) == " << ( a, b ) << std::endl;
   std::cout << "Scalar product: ( a + 3, abs( b ) / 2 ) == " << ( a + 3, abs( b ) / 2 ) << std::endl;
   const bool cmp = all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) );
   std::cout << "all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) ) == " << ( cmp ? "true" : "false" ) << std::endl;
   auto [ equal_val, equal_pos ] = argAny( equalTo( a, 5 ) );
   if( equal_val )
      std::cout << equal_pos << "-th element of a is equal to 5" << std::endl;
   else
      std::cout << "No element of a is equal to 5" << std::endl;
}
 
int
main( int argc, char* argv[] )
{
   /****
    * Perform test on CPU
    */
   std::cout << "Expressions on CPU ..." << std::endl;
   expressions< Devices::Host >();
 
   /****
    * Perform test on GPU
    */
#ifdef __CUDACC__
   std::cout << std::endl;
   std::cout << "Expressions on GPU ..." << std::endl;
   expressions< Devices::Cuda >();
#endif
}

Output is:

Expressions on CPU ...
a == [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
b == [ -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 ]
c == [ -5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5 ]
min( a ) == 0 at 0
max( a ) == 10 at 10
min( b ) == -5 at 0
max( b ) == 5 at 10
min( abs( b ) ) == 0
sum( b ) == 0
sum( abs( b ) ) == 30
Scalar product: ( a, b ) == 110
Scalar product: ( a + 3, abs( b ) / 2 ) == 120
all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) ) == true
5-th element of a is equal to 5
 
Expressions on GPU ...
a == [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
b == [ -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 ]
c == [ -5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5 ]
min( a ) == 0 at 0
max( a ) == 10 at 10
min( b ) == -5 at 0
max( b ) == 5 at 10
min( abs( b ) ) == 0
sum( b ) == 0
sum( abs( b ) ) == 30
Scalar product: ( a, b ) == 110
Scalar product: ( a + 3, abs( b ) / 2 ) == 120
all( lessEqual( abs( a + b ), abs( a ) + abs( b ) ) ) == true
5-th element of a is equal to 5

The following table shows vertical operations that can be used on vector expressions:

Expression	Meaning
`v = TNL::min( expr )`	`v` is the minimum of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`auto [ v, i ] = TNL::argMin( expr )`	`v` is the minimum of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`, `i` is the index of the smallest element.
`v = TNL::max( expr )`	`v` is the maximum of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`auto [ v, i ] = TNL::argMax( expr )`	`v` is the maximum of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`, `i` is the index of the largest element.
`v = TNL::sum( expr )`	`v` is the sum of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`v = TNL::maxNorm( expr )`	`v` is the maximum norm of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`v = TNL::l1Norm( expr )`	`v` is the l1 norm of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`v = TNL::l2Norm( expr )`	`v` is the l2 norm of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`v = TNL::lpNorm( expr, p )`	`v` is the lp norm of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`v = TNL::product( expr )`	`v` is product of `expr[ 0 ], expr[ 1 ], ..., expr[ n-1 ]`.
`v = TNL::all( expr )`	`v` is the result of `expr[ 0 ] && expr[ 1 ] && ... && expr[ n-1 ]`.
`v = TNL::any( expr )`	`v` is the result of `expr[ 0 ] \|\| expr[ 1 ] \|\| ... \|\| expr[ n-1 ]`.
`auto [ v, i ] = TNL::argAny( expr )`	`v` is the result of `expr[ 0 ] \|\| expr[ 1 ] \|\| ... \|\| expr[ n-1 ]`, `i` is the index of the first `true` element.

Static vectors

Static vectors are derived from static arrays and so they are allocated on the stack and can be created in CUDA kernels as well. Their size is fixed and given by a template parameter. The StaticVector class template is defined in the TNL::Containers namespace and has two template parameters:

Size is the vector size.
Real is type of elements stored in the vector.

The interface of StaticVector is smilar to Vector. StaticVector also supports expression templates, which make the use of static vectors simple and efficient at the same time, and lexicographic comparison by operators <, <=, > and >=.

Example:

#include <iostream>
#include <TNL/Containers/StaticVector.h>
 
using namespace TNL;
using namespace TNL::Containers;
 
int
main( int argc, char* argv[] )
{
   using Vector = StaticVector< 3, double >;
 
   Vector v1( 1.0 ), v2( 1.0, 2.0, 3.0 ), v3( v1 - v2 / 2.0 );
   Vector v4( 0.0 ), v5( 1.0 );
 
   v4 += v2;
   v5 *= v3;
   std::cout << "v1 = " << v1 << std::endl;
   std::cout << "v2 = " << v2 << std::endl;
   std::cout << "v3 = " << v3 << std::endl;
   std::cout << "v4 = " << v4 << std::endl;
   std::cout << "v5 = " << v5 << std::endl;
   std::cout << "abs( v3 - 2.0 ) = " << abs( v3 - 2.0 ) << std::endl;
   std::cout << "v2 * v2 = " << v2 * v2 << std::endl;
}

The output looks as:

v1 = [ 1, 1, 1 ]
v2 = [ 1, 2, 3 ]
v3 = [ 0.5, 0, -0.5 ]
v4 = [ 1, 2, 3 ]
v5 = [ 0.5, 0, -0.5 ]
abs( v3 - 2.0 ) = [ 1.5, 2, 2.5 ]
v2 * v2 = [ 1, 4, 9 ]

Distributed vectors

DistributedVector extends DistributedArray with algebraic operations. The functionality is similar to how Vector extends Array. DistributedVector also supports expression templates and other operations present in Vector.