Template Numerical Library version\ main:4e58ea6
Loading...
Searching...
No Matches
TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization > Class Template Reference

Implementation of dense matrix view. More...

#include <TNL/Matrices/DenseMatrixBase.h>

Inheritance diagram for TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >:
Collaboration diagram for TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >:

Public Types

using ConstRowView = typename RowView::ConstRowView
 Type for accessing immutable matrix row.
 
using DeviceType = Device
 The device where the matrix is allocated.
 
using IndexType = Index
 The type used for matrix elements indexing.
 
using RealType = Real
 The type of matrix elements.
 
using RowView = DenseMatrixRowView< SegmentViewType, typename Base::ValuesViewType >
 Type for accessing matrix row.
 
- Public Types inherited from TNL::Matrices::MatrixBase< Real, Device, Index, GeneralMatrix, Organization >
using ConstValuesViewType
 Type of constant vector view holding values of matrix elements.
 
using DeviceType
 The device where the matrix is allocated.
 
using IndexType
 The type used for matrix elements indexing.
 
using RealType
 The type of matrix elements.
 
using RowCapacitiesType
 
using ValuesViewType
 Type of vector view holding values of matrix elements.
 

Public Member Functions

__cuda_callable__ DenseMatrixBase ()=default
 Constructor without parameters.
 
__cuda_callable__ DenseMatrixBase (const DenseMatrixBase &matrix)=default
 Copy constructor.
 
__cuda_callable__ DenseMatrixBase (DenseMatrixBase &&matrix) noexcept=default
 Move constructor.
 
__cuda_callable__ DenseMatrixBase (IndexType rows, IndexType columns, typename Base::ValuesViewType values)
 Constructor with matrix dimensions and values.
 
__cuda_callable__ void addElement (IndexType row, IndexType column, const RealType &value, const RealType &thisElementMultiplicator=1.0)
 Add element at given row and column to given value.
 
template<typename Matrix >
void addMatrix (const Matrix &matrix, const RealType &matrixMultiplicator=1.0, const RealType &thisMatrixMultiplicator=1.0, TransposeState transpose=TransposeState::None)
 Computes matrix addition.
 
template<typename Function >
void forAllElements (Function &&function)
 This method calls forElements for all matrix rows.
 
template<typename Function >
void forAllElements (Function &&function) const
 This method calls forElements for all matrix rows.
 
template<typename Function >
void forAllRows (Function &&function)
 Method for parallel iteration over all matrix rows.
 
template<typename Function >
void forAllRows (Function &&function) const
 Method for parallel iteration over all matrix rows for constant instances.
 
template<typename Function >
void forElements (IndexType begin, IndexType end, Function &&function)
 Method for iteration over all matrix rows for non-constant instances.
 
template<typename Function >
void forElements (IndexType begin, IndexType end, Function &&function) const
 Method for iteration over all matrix rows for constant instances.
 
template<typename Function >
void forRows (IndexType begin, IndexType end, Function &&function)
 Method for parallel iteration over matrix rows from interval [begin, end).
 
template<typename Function >
void forRows (IndexType begin, IndexType end, Function &&function) const
 Method for parallel iteration over matrix rows from interval [begin, end) for constant instances.
 
template<typename Vector >
void getCompressedRowLengths (Vector &rowLengths) const
 Computes number of non-zeros in each row.
 
__cuda_callable__ Real getElement (IndexType row, IndexType column) const
 Returns value of matrix element at position given by its row and column index.
 
__cuda_callable__ RowView getRow (IndexType rowIdx)
 Non-constant getter of simple structure for accessing given matrix row.
 
__cuda_callable__ ConstRowView getRow (IndexType rowIdx) const
 Constant getter of simple structure for accessing given matrix row.
 
template<typename Vector >
void getRowCapacities (Vector &rowCapacities) const
 Compute capacities of all rows.
 
__cuda_callable__ IndexType getRowCapacity (IndexType row) const
 Returns capacity of given matrix row.
 
template<typename Real_ , typename Device_ , typename Index_ >
bool operator!= (const DenseMatrixBase< Real_, Device_, Index_, Organization > &matrix) const
 Comparison operator with another dense matrix view.
 
__cuda_callable__ Realoperator() (IndexType row, IndexType column)
 Returns non-constant reference to element at row row and column column.
 
__cuda_callable__ const Realoperator() (IndexType row, IndexType column) const
 Returns constant reference to element at row row and column column.
 
DenseMatrixBaseoperator= (const DenseMatrixBase &)=delete
 Copy-assignment operator.
 
DenseMatrixBaseoperator= (DenseMatrixBase &&)=delete
 Move-assignment operator.
 
template<typename Real_ , typename Device_ , typename Index_ >
bool operator== (const DenseMatrixBase< Real_, Device_, Index_, Organization > &matrix) const
 Comparison operator with another dense matrix view.
 
void print (std::ostream &str) const
 Method for printing the matrix to output stream.
 
template<typename Fetch , typename Reduce , typename Keep , typename FetchReal >
void reduceAllRows (Fetch &&fetch, const Reduce &reduce, Keep &&keep, const FetchReal &identity) const
 Method for performing general reduction on ALL matrix rows for constant instances.
 
template<typename Fetch , typename Reduce , typename Keep , typename FetchReal >
void reduceRows (IndexType begin, IndexType end, Fetch &&fetch, const Reduce &reduce, Keep &&keep, const FetchReal &identity) const
 Method for performing general reduction on matrix rows for constant instances.
 
template<typename Fetch , typename Reduce , typename Keep , typename FetchValue >
void reduceRows (IndexType begin, IndexType end, Fetch &&fetch, const Reduce &reduce, Keep &&keep, const FetchValue &identity) const
 
template<typename Function >
void sequentialForAllRows (Function &&function)
 This method calls sequentialForRows for all matrix rows.
 
template<typename Function >
void sequentialForAllRows (Function &&function) const
 This method calls sequentialForRows for all matrix rows (for constant instances).
 
template<typename Function >
void sequentialForRows (IndexType begin, IndexType end, Function &&function)
 Method for sequential iteration over all matrix rows for non-constant instances.
 
template<typename Function >
void sequentialForRows (IndexType begin, IndexType end, Function &&function) const
 Method for sequential iteration over all matrix rows for constant instances.
 
__cuda_callable__ void setElement (IndexType row, IndexType column, const RealType &value)
 Sets element at given row and column to given value.
 
void setValue (const RealType &v)
 Sets all matrix elements to value v.
 
template<typename InVector , typename OutVector >
void vectorProduct (const InVector &inVector, OutVector &outVector, const RealType &matrixMultiplicator=1.0, const RealType &outVectorMultiplicator=0.0, IndexType begin=0, IndexType end=0) const
 Computes product of matrix and vector.
 
- Public Member Functions inherited from TNL::Matrices::MatrixBase< Real, Device, Index, GeneralMatrix, Organization >
__cuda_callable__ MatrixBase ()=default
 Basic constructor with no parameters.
 
__cuda_callable__ MatrixBase (const MatrixBase &view)=default
 Shallow copy constructor.
 
__cuda_callable__ MatrixBase (IndexType rows, IndexType columns, ValuesViewType values)
 Constructor with matrix dimensions and matrix elements values.
 
__cuda_callable__ MatrixBase (MatrixBase &&view) noexcept=default
 Move constructor.
 
IndexType getAllocatedElementsCount () const
 Tells the number of allocated matrix elements.
 
__cuda_callable__ IndexType getColumns () const
 Returns number of matrix columns.
 
virtual IndexType getNonzeroElementsCount () const
 Computes a current number of nonzero matrix elements.
 
__cuda_callable__ IndexType getRows () const
 Returns number of matrix rows.
 
__cuda_callable__ ValuesViewTypegetValues ()
 Returns a reference to a vector with the matrix elements values.
 
__cuda_callable__ const ValuesViewTypegetValues () const
 Returns a constant reference to a vector with the matrix elements values.
 
bool operator!= (const Matrix &matrix) const
 Comparison operator with another arbitrary matrix view type.
 
bool operator!= (const MatrixT &matrix) const
 
__cuda_callable__ MatrixBaseoperator= (const MatrixBase &)=delete
 Copy-assignment operator.
 
__cuda_callable__ MatrixBaseoperator= (MatrixBase &&)=delete
 Move-assignment operator.
 
bool operator== (const Matrix &matrix) const
 Comparison operator with another arbitrary matrix view type.
 
bool operator== (const MatrixT &matrix) const
 

Static Public Member Functions

static std::string getSerializationType ()
 Returns string with serialization type.
 
- Static Public Member Functions inherited from TNL::Matrices::MatrixBase< Real, Device, Index, GeneralMatrix, Organization >
static constexpr ElementsOrganization getOrganization ()
 Matrix elements organization getter.
 
static constexpr bool isBinary ()
 Test of binary matrix type.
 
static constexpr bool isSymmetric ()
 Test of symmetric matrix type.
 

Protected Types

using Base = MatrixBase< Real, Device, Index, GeneralMatrix, Organization >
 
using SegmentsReductionKernel = Algorithms::SegmentsReductionKernels::EllpackKernel< Index, Device >
 
using SegmentsType
 
using SegmentsViewType = typename SegmentsType::ViewType
 
using SegmentViewType = typename SegmentsType::SegmentViewType
 

Protected Member Functions

__cuda_callable__ void bind (IndexType rows, IndexType columns, typename Base::ValuesViewType values, SegmentsViewType segments)
 Re-initializes the internal attributes of the base class.
 
__cuda_callable__ IndexType getElementIndex (IndexType row, IndexType column) const
 
- Protected Member Functions inherited from TNL::Matrices::MatrixBase< Real, Device, Index, GeneralMatrix, Organization >
__cuda_callable__ void bind (IndexType rows, IndexType columns, ValuesViewType values)
 Re-initializes the internal attributes of the base class.
 

Protected Attributes

SegmentsViewType segments
 
- Protected Attributes inherited from TNL::Matrices::MatrixBase< Real, Device, Index, GeneralMatrix, Organization >
IndexType columns
 
IndexType rows
 
ValuesViewType values
 

Detailed Description

template<typename Real, typename Device, typename Index, ElementsOrganization Organization>
class TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >

Implementation of dense matrix view.

It serves as an accessor to DenseMatrix for example when passing the matrix to lambda functions. DenseMatrix view can be also created in CUDA kernels.

Template Parameters
Realis a type of matrix elements.
Deviceis a device where the matrix is allocated.
Indexis a type for indexing of the matrix elements.
MatrixElementsOrganizationtells the ordering of matrix elements in memory. It is either TNL::Algorithms::Segments::RowMajorOrder or TNL::Algorithms::Segments::ColumnMajorOrder.

See DenseMatrix.

Member Typedef Documentation

◆ SegmentsType

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
using TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::SegmentsType
protected
Initial value:
Algorithms::Segments::
Ellpack< Device, Index, typename Allocators::Default< Device >::template Allocator< Index >, Organization, 1 >

Constructor & Destructor Documentation

◆ DenseMatrixBase() [1/3]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::DenseMatrixBase ( IndexType rows,
IndexType columns,
typename Base::ValuesViewType values )

Constructor with matrix dimensions and values.

Parameters
rowsnumber of matrix rows.
columnsnumber of matrix columns.
valuesis vector view with matrix elements values.

◆ DenseMatrixBase() [2/3]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::DenseMatrixBase ( const DenseMatrixBase< Real, Device, Index, Organization > & matrix)
default

Copy constructor.

Parameters
matrixis the source matrix view.

◆ DenseMatrixBase() [3/3]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::DenseMatrixBase ( DenseMatrixBase< Real, Device, Index, Organization > && matrix)
defaultnoexcept

Move constructor.

Parameters
matrixis the source matrix view.

Member Function Documentation

◆ addElement()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::addElement ( IndexType row,
IndexType column,
const RealType & value,
const RealType & thisElementMultiplicator = 1.0 )

Add element at given row and column to given value.

This method can be called from the host system (CPU) no matter where the matrix is allocated. If the matrix is allocated on GPU this method can be called even from device kernels. If the matrix is allocated in GPU device this method is called from CPU, it transfers values of each matrix element separately and so the performance is very low. For higher performance see. DenseMatrix::getRow or DenseMatrix::forElements and DenseMatrix::forAllElements.

Parameters
rowis row index of the element.
columnis columns index of the element.
valueis the value the element will be set to.
thisElementMultiplicatoris multiplicator the original matrix element value is multiplied by before addition of given value.
Example
#include <iostream>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
template< typename Device >
void
addElements()
{
auto matrixView = matrix.getView();
for( int i = 0; i < 5; i++ )
matrixView.setElement( i, i, i ); // or matrix.setElement
std::cout << "Initial matrix is: " << std::endl << matrix << std::endl;
for( int i = 0; i < 5; i++ )
for( int j = 0; j < 5; j++ )
matrixView.addElement( i, j, 1.0, 5.0 ); // or matrix.addElement
std::cout << "Matrix after addition is: " << std::endl << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Add elements on host:" << std::endl;
addElements< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Add elements on CUDA device:" << std::endl;
addElements< TNL::Devices::Cuda >();
#endif
}
Implementation of dense matrix, i.e. matrix storing explicitly all of its elements including zeros.
Definition DenseMatrix.h:31
T endl(T... args)
Output
Add elements on host:
Initial matrix is:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:0 1:1 2:0 3:0 4:0
Row: 2 -> 0:0 1:0 2:2 3:0 4:0
Row: 3 -> 0:0 1:0 2:0 3:3 4:0
Row: 4 -> 0:0 1:0 2:0 3:0 4:4
Matrix after addition is:
Row: 0 -> 0:1 1:1 2:1 3:1 4:1
Row: 1 -> 0:1 1:6 2:1 3:1 4:1
Row: 2 -> 0:1 1:1 2:11 3:1 4:1
Row: 3 -> 0:1 1:1 2:1 3:16 4:1
Row: 4 -> 0:1 1:1 2:1 3:1 4:21
Add elements on CUDA device:
Initial matrix is:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:0 1:1 2:0 3:0 4:0
Row: 2 -> 0:0 1:0 2:2 3:0 4:0
Row: 3 -> 0:0 1:0 2:0 3:3 4:0
Row: 4 -> 0:0 1:0 2:0 3:0 4:4
Matrix after addition is:
Row: 0 -> 0:1 1:1 2:1 3:1 4:1
Row: 1 -> 0:1 1:6 2:1 3:1 4:1
Row: 2 -> 0:1 1:1 2:11 3:1 4:1
Row: 3 -> 0:1 1:1 2:1 3:16 4:1
Row: 4 -> 0:1 1:1 2:1 3:1 4:21

◆ addMatrix()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Matrix >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::addMatrix ( const Matrix & matrix,
const RealType & matrixMultiplicator = 1.0,
const RealType & thisMatrixMultiplicator = 1.0,
TransposeState transpose = TransposeState::None )

Computes matrix addition.

Template Parameters
Matrixis type of the matrix to be added. It can be DenseMatrix or DenseMatrixView.
Parameters
matrixis the matrix to be added.
matrixMultiplicatoris a factor by which the matrix is multiplied. It is one by default.
thisMatrixMultiplicatoris a factor by which this matrix is multiplied. It is one by default.
transposeindicates if the matrix is added as transposed. It is None by default.

◆ bind()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::bind ( IndexType rows,
IndexType columns,
typename Base::ValuesViewType values,
SegmentsViewType segments )
protected

Re-initializes the internal attributes of the base class.

Note that this function is protected to ensure that the user cannot modify the base class of a matrix. For the same reason, in future code development we also need to make sure that all non-const functions in the base class return by value and not by reference.

◆ forAllElements() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forAllElements ( Function && function)

This method calls forElements for all matrix rows.

See DenseMatrix::forAllElements.

Template Parameters
Functionis a type of lambda function that will operate on matrix elements.
Parameters
functionis an instance of the lambda function to be called in each row.
Example
#include <iostream>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
forAllElementsExample()
{
auto matrixView = matrix.getView();
auto f = [] __cuda_callable__( int rowIdx, int columnIdx, int globalIdx, double& value )
{
if( rowIdx >= columnIdx )
value = rowIdx + columnIdx;
};
matrixView.forAllElements( f ); // or matrix.forAllElements
std::cout << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Creating matrix on host: " << std::endl;
forAllElementsExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Creating matrix on CUDA device: " << std::endl;
forAllElementsExample< TNL::Devices::Cuda >();
#endif
}
#define __cuda_callable__
Definition Macros.h:49
Output
Creating matrix on host:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:1 1:2 2:0 3:0 4:0
Row: 2 -> 0:2 1:3 2:4 3:0 4:0
Row: 3 -> 0:3 1:4 2:5 3:6 4:0
Row: 4 -> 0:4 1:5 2:6 3:7 4:8
Creating matrix on CUDA device:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:1 1:2 2:0 3:0 4:0
Row: 2 -> 0:2 1:3 2:4 3:0 4:0
Row: 3 -> 0:3 1:4 2:5 3:6 4:0
Row: 4 -> 0:4 1:5 2:6 3:7 4:8

◆ forAllElements() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forAllElements ( Function && function) const

This method calls forElements for all matrix rows.

See DenseMatrix::forElements.

Template Parameters
Functionis a type of lambda function that will operate on matrix elements.
Parameters
functionis an instance of the lambda function to be called in each row.

◆ forAllRows() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forAllRows ( Function && function)

Method for parallel iteration over all matrix rows.

In each row, given lambda function is performed. Each row is processed by at most one thread unlike the method DenseMatrixBase::forAllElements where more than one thread can be mapped to each row.

Template Parameters
Functionis type of the lambda function.
Parameters
functionis an instance of the lambda function to be called for each row.
auto function = [] __cuda_callable__ ( RowView& row ) { ... };
RowView is a simple structure for accessing rows of dense matrix.
Definition DenseMatrixRowView.h:27

RowView represents matrix row - see TNL::Matrices::DenseMatrixBase::RowView.

Example
#include <iostream>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
forRowsExample()
{
using RowView = typename MatrixType::RowView;
const int size = 5;
MatrixType matrix( size, size );
auto view = matrix.getView();
/***
* Set the matrix elements.
*/
auto f = [] __cuda_callable__( RowView & row )
{
const int& rowIdx = row.getRowIndex();
if( rowIdx > 0 )
row.setValue( rowIdx - 1, -1.0 );
row.setValue( rowIdx, rowIdx + 1.0 );
if( rowIdx < size - 1 )
row.setValue( rowIdx + 1, -1.0 );
};
view.forAllRows( f ); // or matrix.forAllRows
std::cout << matrix << std::endl;
/***
* Now divide each matrix row by its largest element - with the use of iterators.
*/
view.forAllRows(
[] __cuda_callable__( RowView & row )
{
for( auto element : row )
largest = TNL::max( largest, element.value() );
for( auto element : row )
element.value() /= largest;
} );
std::cout << "Divide each matrix row by its largest element... " << std::endl;
std::cout << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Getting matrix rows on host: " << std::endl;
forRowsExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Getting matrix rows on CUDA device: " << std::endl;
forRowsExample< TNL::Devices::Cuda >();
#endif
}
__cuda_callable__ void setValue(IndexType column, const RealType &value)
Sets value of matrix element with given column index.
Definition DenseMatrixRowView.hpp:63
__cuda_callable__ IndexType getRowIndex() const
Returns the matrix row index.
Definition DenseMatrixRowView.hpp:28
T lowest(T... args)
constexpr ResultType max(const T1 &a, const T2 &b)
This function returns maximum of two numbers.
Definition Math.h:48
Structure for specifying type of sparse matrix.
Definition MatrixType.h:17
Output
Getting matrix rows on host:
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-1 1:2 2:-1 3:0 4:0
Row: 2 -> 0:0 1:-1 2:3 3:-1 4:0
Row: 3 -> 0:0 1:0 2:-1 3:4 4:-1
Row: 4 -> 0:0 1:0 2:0 3:-1 4:5
Divide each matrix row by its largest element...
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-0.5 1:1 2:-0.5 3:0 4:0
Row: 2 -> 0:0 1:-0.333333 2:1 3:-0.333333 4:0
Row: 3 -> 0:0 1:0 2:-0.25 3:1 4:-0.25
Row: 4 -> 0:0 1:0 2:0 3:-0.2 4:1
Getting matrix rows on CUDA device:
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-1 1:2 2:-1 3:0 4:0
Row: 2 -> 0:0 1:-1 2:3 3:-1 4:0
Row: 3 -> 0:0 1:0 2:-1 3:4 4:-1
Row: 4 -> 0:0 1:0 2:0 3:-1 4:5
Divide each matrix row by its largest element...
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-0.5 1:1 2:-0.5 3:0 4:0
Row: 2 -> 0:0 1:-0.333333 2:1 3:-0.333333 4:0
Row: 3 -> 0:0 1:0 2:-0.25 3:1 4:-0.25
Row: 4 -> 0:0 1:0 2:0 3:-0.2 4:1

◆ forAllRows() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forAllRows ( Function && function) const

Method for parallel iteration over all matrix rows for constant instances.

In each row, given lambda function is performed. Each row is processed by at most one thread unlike the method DenseMatrixBase::forAllElements where more than one thread can be mapped to each row.

Template Parameters
Functionis type of the lambda function.
Parameters
functionis an instance of the lambda function to be called for each row.
auto function = [] __cuda_callable__ ( const ConstRowView& row ) { ... };
typename RowView::ConstRowView ConstRowView
Type for accessing immutable matrix row.
Definition DenseMatrixBase.h:68

ConstRowView represents matrix row - see TNL::Matrices::DenseMatrixBase::ConstRowView.

◆ forElements() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forElements ( IndexType begin,
IndexType end,
Function && function )

Method for iteration over all matrix rows for non-constant instances.

Template Parameters
Functionis type of lambda function that will operate on matrix elements. It should have form like
auto function = [=] __cuda_callable__ ( IndexType rowIdx, IndexType columnIdx, IndexType columnIdx, RealType& value ) {
... };
Index IndexType
The type used for matrix elements indexing.
Definition DenseMatrixBase.h:58
Definition Real.h:14

The column index repeats twice only for compatibility with sparse matrices.

Parameters
begindefines beginning of the range [begin,end) of rows to be processed.
enddefines ending of the range [begin,end) of rows to be processed.
functionis an instance of the lambda function to be called in each row.
Example
#include <iostream>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
forElementsExample()
{
auto matrixView = matrix.getView();
auto f = [] __cuda_callable__( int rowIdx, int columnIdx, int globalIdx, double& value )
{
if( columnIdx <= rowIdx )
value = rowIdx + columnIdx;
};
matrixView.forElements( 0, matrix.getRows(), f ); // or matrix.forElements
std::cout << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Creating matrix on host: " << std::endl;
forElementsExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Creating matrix on CUDA device: " << std::endl;
forElementsExample< TNL::Devices::Cuda >();
#endif
}
Output
Creating matrix on host:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:1 1:2 2:0 3:0 4:0
Row: 2 -> 0:2 1:3 2:4 3:0 4:0
Row: 3 -> 0:3 1:4 2:5 3:6 4:0
Row: 4 -> 0:4 1:5 2:6 3:7 4:8
Creating matrix on CUDA device:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:1 1:2 2:0 3:0 4:0
Row: 2 -> 0:2 1:3 2:4 3:0 4:0
Row: 3 -> 0:3 1:4 2:5 3:6 4:0
Row: 4 -> 0:4 1:5 2:6 3:7 4:8

◆ forElements() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forElements ( IndexType begin,
IndexType end,
Function && function ) const

Method for iteration over all matrix rows for constant instances.

Template Parameters
Functionis type of lambda function that will operate on matrix elements. It should have form like
auto function = [] __cuda_callable__ ( IndexType rowIdx, IndexType columnIdx, IndexType columnIdx, const RealType& value
) { ... };

The column index repeats twice only for compatibility with sparse matrices.

Parameters
begindefines beginning of the range [begin,end) of rows to be processed.
enddefines ending of the range [begin,end) of rows to be processed.
functionis an instance of the lambda function to be called in each row.

◆ forRows() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forRows ( IndexType begin,
IndexType end,
Function && function )

Method for parallel iteration over matrix rows from interval [begin, end).

In each row, given lambda function is performed. Each row is processed by at most one thread unlike the method DenseMatrix::forElements where more than one thread can be mapped to each row.

Template Parameters
Functionis type of the lambda function.
Parameters
begindefines beginning of the range [begin, end) of rows to be processed.
enddefines ending of the range [begin, end) of rows to be processed.
functionis an instance of the lambda function to be called for each row.
auto function = [] __cuda_callable__ ( RowView& row ) { ... };

RowView represents matrix row - see TNL::Matrices::DenseMatrix::RowView.

Example
#include <iostream>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
forRowsExample()
{
using RowView = typename MatrixType::RowView;
const int size = 5;
MatrixType matrix( size, size );
auto view = matrix.getView();
/***
* Set the matrix elements.
*/
auto f = [] __cuda_callable__( RowView & row )
{
const int& rowIdx = row.getRowIndex();
if( rowIdx > 0 )
row.setValue( rowIdx - 1, -1.0 );
row.setValue( rowIdx, rowIdx + 1.0 );
if( rowIdx < size - 1 )
row.setValue( rowIdx + 1, -1.0 );
};
view.forAllRows( f ); // or matrix.forAllRows
std::cout << matrix << std::endl;
/***
* Now divide each matrix row by its largest element - with the use of iterators.
*/
view.forAllRows(
[] __cuda_callable__( RowView & row )
{
for( auto element : row )
largest = TNL::max( largest, element.value() );
for( auto element : row )
element.value() /= largest;
} );
std::cout << "Divide each matrix row by its largest element... " << std::endl;
std::cout << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Getting matrix rows on host: " << std::endl;
forRowsExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Getting matrix rows on CUDA device: " << std::endl;
forRowsExample< TNL::Devices::Cuda >();
#endif
}
Output
Getting matrix rows on host:
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-1 1:2 2:-1 3:0 4:0
Row: 2 -> 0:0 1:-1 2:3 3:-1 4:0
Row: 3 -> 0:0 1:0 2:-1 3:4 4:-1
Row: 4 -> 0:0 1:0 2:0 3:-1 4:5
Divide each matrix row by its largest element...
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-0.5 1:1 2:-0.5 3:0 4:0
Row: 2 -> 0:0 1:-0.333333 2:1 3:-0.333333 4:0
Row: 3 -> 0:0 1:0 2:-0.25 3:1 4:-0.25
Row: 4 -> 0:0 1:0 2:0 3:-0.2 4:1
Getting matrix rows on CUDA device:
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-1 1:2 2:-1 3:0 4:0
Row: 2 -> 0:0 1:-1 2:3 3:-1 4:0
Row: 3 -> 0:0 1:0 2:-1 3:4 4:-1
Row: 4 -> 0:0 1:0 2:0 3:-1 4:5
Divide each matrix row by its largest element...
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-0.5 1:1 2:-0.5 3:0 4:0
Row: 2 -> 0:0 1:-0.333333 2:1 3:-0.333333 4:0
Row: 3 -> 0:0 1:0 2:-0.25 3:1 4:-0.25
Row: 4 -> 0:0 1:0 2:0 3:-0.2 4:1

◆ forRows() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::forRows ( IndexType begin,
IndexType end,
Function && function ) const

Method for parallel iteration over matrix rows from interval [begin, end) for constant instances.

In each row, given lambda function is performed. Each row is processed by at most one thread unlike the method DenseMatrixBase::forElements where more than one thread can be mapped to each row.

Template Parameters
Functionis type of the lambda function.
Parameters
begindefines beginning of the range [begin, end) of rows to be processed.
enddefines ending of the range [begin, end) of rows to be processed.
functionis an instance of the lambda function to be called for each row.
auto function = [] __cuda_callable__ ( const ConstRowView& row ) { ... };

ConstRowView represents matrix row - see TNL::Matrices::DenseMatrixBase::ConstRowView.

◆ getCompressedRowLengths()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Vector >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getCompressedRowLengths ( Vector & rowLengths) const

Computes number of non-zeros in each row.

Parameters
rowLengthsis a vector into which the number of non-zeros in each row will be stored.
Example
#include <iostream>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
getCompressedRowLengthsExample()
{
// clang-format off
{ 1 },
{ 2, 3 },
{ 4, 5, 6 },
{ 7, 8, 9, 10 },
{ 11, 12, 13, 14, 15 }
// clang-format on
};
auto denseMatrixView = denseMatrix.getConstView();
std::cout << denseMatrixView << std::endl;
denseMatrixView.getCompressedRowLengths( rowLengths );
std::cout << "Compressed row lengths are: " << rowLengths << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Getting compressed row lengths on host: " << std::endl;
getCompressedRowLengthsExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Getting compressed row lengths on CUDA device: " << std::endl;
getCompressedRowLengthsExample< TNL::Devices::Cuda >();
#endif
}
Vector extends Array with algebraic operations.
Definition Vector.h:36
ConstViewType getConstView() const
Returns a non-modifiable view of the dense matrix.
Definition DenseMatrix.hpp:113
Output
Getting compressed row lengths on host:
Row: 0 -> 0:1 1:0 2:0 3:0 4:0
Row: 1 -> 0:2 1:3 2:0 3:0 4:0
Row: 2 -> 0:4 1:5 2:6 3:0 4:0
Row: 3 -> 0:7 1:8 2:9 3:10 4:0
Row: 4 -> 0:11 1:12 2:13 3:14 4:15
Compressed row lengths are: [ 1, 2, 3, 4, 5 ]
Getting compressed row lengths on CUDA device:
Row: 0 -> 0:1 1:0 2:0 3:0 4:0
Row: 1 -> 0:2 1:3 2:0 3:0 4:0
Row: 2 -> 0:4 1:5 2:6 3:0 4:0
Row: 3 -> 0:7 1:8 2:9 3:10 4:0
Row: 4 -> 0:11 1:12 2:13 3:14 4:15
Compressed row lengths are: [ 1, 2, 3, 4, 5 ]

◆ getElement()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ Real TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getElement ( IndexType row,
IndexType column ) const
nodiscard

Returns value of matrix element at position given by its row and column index.

This method can be called from the host system (CPU) no matter where the matrix is allocated. If the matrix is allocated on GPU this method can be called even from device kernels. If the matrix is allocated in GPU device this method is called from CPU, it transfers values of each matrix element separately and so the performance is very low. For higher performance see. DenseMatrix::getRow or DenseMatrix::forElements and DenseMatrix::forAllElements.

Parameters
rowis a row index of the matrix element.
columni a column index of the matrix element.
Returns
value of given matrix element.
Example
#include <iostream>
#include <iomanip>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
template< typename Device >
void
getElements()
{
// clang-format off
{ 1, 0, 0, 0, 0 },
{ -1, 2, -1, 0, 0 },
{ 0, -1, 2, -1, 0 },
{ 0, 0, -1, 2, -1 },
{ 0, 0, 0, 0, 1 }
// clang-format on
};
auto matrixView = matrix.getConstView();
for( int i = 0; i < 5; i++ ) {
for( int j = 0; j < 5; j++ )
std::cout << std::setw( 5 ) << std::ios::right << matrixView.getElement( i, i ); // or matrix.getElement
}
}
int
main( int argc, char* argv[] )
{
std::cout << "Get elements on host:" << std::endl;
getElements< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Get elements on CUDA device:" << std::endl;
getElements< TNL::Devices::Cuda >();
#endif
}
T setw(T... args)
Output
Get elements on host:
1281 1281 1281 1281 1281
1282 1282 1282 1282 1282
1282 1282 1282 1282 1282
1282 1282 1282 1282 1282
1281 1281 1281 1281 1281
Get elements on CUDA device:
1281 1281 1281 1281 1281
1282 1282 1282 1282 1282
1282 1282 1282 1282 1282
1282 1282 1282 1282 1282
1281 1281 1281 1281 1281

◆ getRow() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ auto TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getRow ( IndexType rowIdx)
nodiscard

Non-constant getter of simple structure for accessing given matrix row.

Parameters
rowIdxis matrix row index.
Returns
RowView for accessing given matrix row.
Example
#include <iostream>
#include <TNL/Algorithms/parallelFor.h>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
getRowExample()
{
const int size = 5;
/***
* Create dense matrix view which can be captured by the following lambda
* function.
*/
auto matrixView = matrix.getView();
auto f = [ = ] __cuda_callable__( int rowIdx ) mutable
{
auto row = matrixView.getRow( rowIdx );
if( rowIdx > 0 )
row.setValue( rowIdx - 1, -1.0 );
row.setValue( rowIdx, rowIdx + 1.0 );
if( rowIdx < size - 1 )
row.setValue( rowIdx + 1, -1.0 );
};
/***
* Set the matrix elements.
*/
TNL::Algorithms::parallelFor< Device >( 0, matrix.getRows(), f );
std::cout << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Getting matrix rows on host: " << std::endl;
getRowExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Getting matrix rows on CUDA device: " << std::endl;
getRowExample< TNL::Devices::Cuda >();
#endif
}
std::enable_if_t< std::is_integral_v< Begin > &&std::is_integral_v< End > > parallelFor(const Begin &begin, const End &end, typename Device::LaunchConfiguration launch_config, Function f, FunctionArgs... args)
Parallel for-loop function for 1D range specified with integral values.
Definition parallelFor.h:41
Output
Getting matrix rows on host:
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-1 1:2 2:-1 3:0 4:0
Row: 2 -> 0:0 1:-1 2:3 3:-1 4:0
Row: 3 -> 0:0 1:0 2:-1 3:4 4:-1
Row: 4 -> 0:0 1:0 2:0 3:-1 4:5
Getting matrix rows on CUDA device:
Row: 0 -> 0:1 1:-1 2:0 3:0 4:0
Row: 1 -> 0:-1 1:2 2:-1 3:0 4:0
Row: 2 -> 0:0 1:-1 2:3 3:-1 4:0
Row: 3 -> 0:0 1:0 2:-1 3:4 4:-1
Row: 4 -> 0:0 1:0 2:0 3:-1 4:5

See DenseMatrixRowView.

◆ getRow() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ auto TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getRow ( IndexType rowIdx) const
nodiscard

Constant getter of simple structure for accessing given matrix row.

Parameters
rowIdxis matrix row index.
Returns
RowView for accessing given matrix row.
Example
#include <iostream>
#include <functional>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
#include <TNL/Devices/Cuda.h>
template< typename Device >
void
getRowExample()
{
// clang-format off
{ 1, 0, 0, 0, 0 },
{ 1, 2, 0, 0, 0 },
{ 1, 2, 3, 0, 0 },
{ 1, 2, 3, 4, 0 },
{ 1, 2, 3, 4, 5 }
// clang-format on
};
/***
* We need a matrix view to pass the matrix to lambda function even on CUDA device.
*/
const auto matrixView = matrix.getConstView();
/***
* Fetch lambda function returns diagonal element in each row.
*/
auto fetch = [ = ] __cuda_callable__( int rowIdx ) -> double
{
auto row = matrixView.getRow( rowIdx );
return row.getValue( rowIdx );
};
int trace = TNL::Algorithms::reduce< Device >( 0, matrix.getRows(), fetch, std::plus<>{}, 0 );
std::cout << "Matrix trace is " << trace << "." << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Getting matrix rows on host: " << std::endl;
getRowExample< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Getting matrix rows on CUDA device: " << std::endl;
getRowExample< TNL::Devices::Cuda >();
#endif
}
Result reduce(Index begin, Index end, Fetch &&fetch, Reduction &&reduction, const Result &identity)
reduce implements (parallel) reduction for vectors and arrays.
Definition reduce.h:65
Output
Getting matrix rows on host:
Matrix trace is 15.
Getting matrix rows on CUDA device:
Matrix trace is 15.

See DenseMatrixRowView.

◆ getRowCapacities()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Vector >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getRowCapacities ( Vector & rowCapacities) const

Compute capacities of all rows.

The row capacities are not stored explicitly and must be computed.

Parameters
rowCapacitiesis a vector where the row capacities will be stored.

◆ getRowCapacity()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ Index TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getRowCapacity ( IndexType row) const
nodiscard

Returns capacity of given matrix row.

Parameters
rowindex of matrix row.
Returns
number of matrix elements allocated for the row.

◆ getSerializationType()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
std::string TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::getSerializationType ( )
staticnodiscard

Returns string with serialization type.

The string has a form `MatricesDenseMatrix< RealType, [any_device], IndexType, [any_allocator], true/false >`.

Returns
String with the serialization type.

◆ operator!=()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Real_ , typename Device_ , typename Index_ >
bool TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::operator!= ( const DenseMatrixBase< Real_, Device_, Index_, Organization > & matrix) const
nodiscard

Comparison operator with another dense matrix view.

Parameters
matrixis the right-hand side matrix.
Returns
false if the RHS matrix view is equal, true otherwise.

◆ operator()() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ Real & TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::operator() ( IndexType row,
IndexType column )
nodiscard

Returns non-constant reference to element at row row and column column.

Since this method returns reference to the element, it cannot be called across different address spaces. It means that it can be called only form CPU if the matrix is allocated on CPU or only from GPU kernels if the matrix is allocated on GPU.

Parameters
rowis a row index of the element.
columnis a columns index of the element.
Returns
reference to given matrix element.

◆ operator()() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ const Real & TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::operator() ( IndexType row,
IndexType column ) const
nodiscard

Returns constant reference to element at row row and column column.

Since this method returns reference to the element, it cannot be called across different address spaces. It means that it can be called only form CPU if the matrix is allocated on CPU or only from GPU kernels if the matrix is allocated on GPU.

Parameters
rowis a row index of the element.
columnis a columns index of the element.
Returns
reference to given matrix element.

◆ operator=()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
DenseMatrixBase & TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::operator= ( const DenseMatrixBase< Real, Device, Index, Organization > & )
delete

Copy-assignment operator.

It is a deleted function, because matrix assignment in general requires reallocation.

◆ operator==()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Real_ , typename Device_ , typename Index_ >
bool TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::operator== ( const DenseMatrixBase< Real_, Device_, Index_, Organization > & matrix) const
nodiscard

Comparison operator with another dense matrix view.

Parameters
matrixis the right-hand side matrix view.
Returns
true if the RHS matrix view is equal, false otherwise.

◆ print()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::print ( std::ostream & str) const

Method for printing the matrix to output stream.

Parameters
stris the output stream.

◆ reduceAllRows()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Fetch , typename Reduce , typename Keep , typename FetchReal >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::reduceAllRows ( Fetch && fetch,
const Reduce & reduce,
Keep && keep,
const FetchReal & identity ) const

Method for performing general reduction on ALL matrix rows for constant instances.

Template Parameters
Fetchis a type of lambda function for data fetch declared as
auto fetch = [] __cuda_callable__ ( IndexType rowIdx, IndexType columnIdx, RealType elementValue ) -> FetchValue { ... };

The return type of this lambda can be any non void.

Template Parameters
Reduceis a type of lambda function for reduction declared as
auto reduce = [] __cuda_callable__ ( const FetchValue& v1, const FetchValue& v2 ) -> FetchValue { ... };
Template Parameters
Keepis a type of lambda function for storing results of reduction in each row. It is declared as
auto keep = [=] __cuda_callable__ ( IndexType rowIdx, const RealType& value ) { ... };
Template Parameters
FetchValueis type returned by the Fetch lambda function.
Parameters
fetchis an instance of lambda function for data fetch.
reduceis an instance of lambda function for reduction.
keepin an instance of lambda function for storing results.
identityis the identity element for the reduction operation, i.e. element which does not change the result of the reduction.
Example
#include <iostream>
#include <iomanip>
#include <functional>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
template< typename Device >
void
reduceAllRows()
{
// clang-format off
{ 1, 0, 0, 0, 0 },
{ 1, 2, 0, 0, 0 },
{ 0, 1, 8, 0, 0 },
{ 0, 0, 1, 9, 0 },
{ 0, 0, 0, 0, 1 }
// clang-format on
};
auto matrixView = matrix.getView();
/***
* Find largest element in each row.
*/
TNL::Containers::Vector< double, Device > rowMax( matrix.getRows() );
/***
* Prepare vector view and matrix view for lambdas.
*/
auto rowMaxView = rowMax.getView();
/***
* Fetch lambda just returns absolute value of matrix elements.
*/
auto fetch = [] __cuda_callable__( int rowIdx, int columnIdx, const double& value ) -> double
{
return TNL::abs( value );
};
/***
* Reduce lambda return maximum of given values.
*/
auto reduce = [] __cuda_callable__( const double& a, const double& b ) -> double
{
return TNL::max( a, b );
};
/***
* Keep lambda store the largest value in each row to the vector rowMax.
*/
auto keep = [ = ] __cuda_callable__( int rowIdx, const double& value ) mutable
{
rowMaxView[ rowIdx ] = value;
};
/***
* Compute the largest values in each row.
*/
matrixView.reduceAllRows( fetch, reduce, keep, std::numeric_limits< double >::lowest() ); // or matrix.reduceAllRows
std::cout << "Max. elements in rows are: " << rowMax << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "All rows reduction on host:" << std::endl;
reduceAllRows< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "All rows reduction on CUDA device:" << std::endl;
reduceAllRows< TNL::Devices::Cuda >();
#endif
}
ViewType getView()
Returns a modifiable view of the dense matrix.
Definition DenseMatrix.hpp:106
__cuda_callable__ T abs(const T &n)
This function returns absolute value of given number n.
Definition Math.h:74
Output
All rows reduction on host:
Max. elements in rows are: [ 1, 2, 8, 9, 1 ]
All rows reduction on CUDA device:
Max. elements in rows are: [ 1, 2, 8, 9, 1 ]

◆ reduceRows()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Fetch , typename Reduce , typename Keep , typename FetchReal >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::reduceRows ( IndexType begin,
IndexType end,
Fetch && fetch,
const Reduce & reduce,
Keep && keep,
const FetchReal & identity ) const

Method for performing general reduction on matrix rows for constant instances.

Template Parameters
Fetchis a type of lambda function for data fetch declared as
auto fetch = [] __cuda_callable__ ( IndexType rowIdx, IndexType columnIdx, RealType elementValue ) -> FetchValue { ... };

The return type of this lambda can be any non void.

Template Parameters
Reduceis a type of lambda function for reduction declared as
auto reduce = [] __cuda_callable__ ( const FetchValue& v1, const FetchValue& v2 ) -> FetchValue { ... };
Template Parameters
Keepis a type of lambda function for storing results of reduction in each row. It is declared as
auto keep = [=] __cuda_callable__ ( IndexType rowIdx, const RealType& value ) { ... };
Template Parameters
FetchValueis type returned by the Fetch lambda function.
Parameters
begindefines beginning of the range [begin, end) of rows to be processed.
enddefines ending of the range [begin, end) of rows to be processed.
fetchis an instance of lambda function for data fetch.
reduceis an instance of lambda function for reduction.
keepin an instance of lambda function for storing results.
identityis the identity element for the reduction operation, i.e. element which does not change the result of the reduction.
Example
#include <iostream>
#include <iomanip>
#include <functional>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
template< typename Device >
void
reduceRows()
{
// clang-format off
{ 1, 0, 0, 0, 0 },
{ 1, 2, 0, 0, 0 },
{ 0, 1, 8, 0, 0 },
{ 0, 0, 1, 9, 0 },
{ 0, 0, 0, 0, 1 }
// clang-format on
};
auto matrixView = matrix.getView();
/***
* Find largest element in each row.
*/
TNL::Containers::Vector< double, Device > rowMax( matrix.getRows() );
/***
* Prepare vector view for lambdas.
*/
auto rowMaxView = rowMax.getView();
/***
* Fetch lambda just returns absolute value of matrix elements.
*/
auto fetch = [] __cuda_callable__( int rowIdx, int columnIdx, const double& value ) -> double
{
return TNL::abs( value );
};
/***
* Reduce lambda return maximum of given values.
*/
auto reduce = [] __cuda_callable__( const double& a, const double& b ) -> double
{
return TNL::max( a, b );
};
/***
* Keep lambda store the largest value in each row to the vector rowMax.
*/
auto keep = [ = ] __cuda_callable__( int rowIdx, const double& value ) mutable
{
rowMaxView[ rowIdx ] = value;
};
/***
* Compute the largest values in each row.
*/
matrixView.reduceRows(
0, matrix.getRows(), fetch, reduce, keep, std::numeric_limits< double >::lowest() ); // or matrix.reduceRows
std::cout << "Max. elements in rows are: " << rowMax << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Rows reduction on host:" << std::endl;
reduceRows< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Rows reduction on CUDA device:" << std::endl;
reduceRows< TNL::Devices::Cuda >();
#endif
}
Output
Rows reduction on host:
Max. elements in rows are: [ 1, 2, 8, 9, 1 ]
Rows reduction on CUDA device:
Max. elements in rows are: [ 1, 2, 8, 9, 1 ]

◆ sequentialForAllRows() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::sequentialForAllRows ( Function && function)

This method calls sequentialForRows for all matrix rows.

See DenseMatrixBase::sequentialForAllRows.

Template Parameters
Functionis a type of lambda function that will operate on matrix elements.
Parameters
functionis an instance of the lambda function to be called in each row.

◆ sequentialForAllRows() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::sequentialForAllRows ( Function && function) const

This method calls sequentialForRows for all matrix rows (for constant instances).

See DenseMatrixBase::sequentialForRows.

Template Parameters
Functionis a type of lambda function that will operate on matrix elements.
Parameters
functionis an instance of the lambda function to be called in each row.

◆ sequentialForRows() [1/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::sequentialForRows ( IndexType begin,
IndexType end,
Function && function )

Method for sequential iteration over all matrix rows for non-constant instances.

Template Parameters
Functionis type of lambda function that will operate on matrix elements. It should have form like
auto function = [] __cuda_callable__ ( RowView& row ) { ... };

RowView represents matrix row - see TNL::Matrices::DenseMatrixBase::RowView.

Parameters
begindefines beginning of the range [begin,end) of rows to be processed.
enddefines ending of the range [begin,end) of rows to be processed.
functionis an instance of the lambda function to be called in each row.

◆ sequentialForRows() [2/2]

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename Function >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::sequentialForRows ( IndexType begin,
IndexType end,
Function && function ) const

Method for sequential iteration over all matrix rows for constant instances.

Template Parameters
Functionis type of lambda function that will operate on matrix elements. It should have form like
auto function = [] __cuda_callable__ ( const ConstRowView& row ) { ... };

ConstRowView represents matrix row - see TNL::Matrices::DenseMatrixBase::ConstRowView.

Parameters
begindefines beginning of the range [begin,end) of rows to be processed.
enddefines ending of the range [begin,end) of rows to be processed.
functionis an instance of the lambda function to be called in each row.

◆ setElement()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
__cuda_callable__ void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::setElement ( IndexType row,
IndexType column,
const RealType & value )

Sets element at given row and column to given value.

This method can be called from the host system (CPU) no matter where the matrix is allocated. If the matrix is allocated on GPU this method can be called even from device kernels. If the matrix is allocated in GPU device this method is called from CPU, it transfers values of each matrix element separately and so the performance is very low. For higher performance see. DenseMatrix::getRow or DenseMatrix::forElements and DenseMatrix::forAllElements.

Parameters
rowis row index of the element.
columnis columns index of the element.
valueis the value the element will be set to.
Example
#include <iostream>
#include <TNL/Algorithms/parallelFor.h>
#include <TNL/Containers/StaticArray.h>
#include <TNL/Matrices/DenseMatrix.h>
#include <TNL/Devices/Host.h>
template< typename Device >
void
setElements()
{
auto matrixView = matrix.getView();
for( int i = 0; i < 5; i++ )
matrixView.setElement( i, i, i ); // or matrix.setElement
std::cout << "Matrix set from the host:" << std::endl;
std::cout << matrix << std::endl;
auto f = [ = ] __cuda_callable__( const TNL::Containers::StaticArray< 2, int >& i ) mutable
{
matrixView.addElement( i[ 0 ], i[ 1 ], 5.0 );
};
std::cout << "Matrix set from its native device:" << std::endl;
std::cout << matrix << std::endl;
}
int
main( int argc, char* argv[] )
{
std::cout << "Set elements on host:" << std::endl;
setElements< TNL::Devices::Host >();
#ifdef __CUDACC__
std::cout << "Set elements on CUDA device:" << std::endl;
setElements< TNL::Devices::Cuda >();
#endif
}
Array with constant size.
Definition StaticArray.h:20
Output
Set elements on host:
Matrix set from the host:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:0 1:1 2:0 3:0 4:0
Row: 2 -> 0:0 1:0 2:2 3:0 4:0
Row: 3 -> 0:0 1:0 2:0 3:3 4:0
Row: 4 -> 0:0 1:0 2:0 3:0 4:4
Matrix set from its native device:
Row: 0 -> 0:5 1:5 2:5 3:5 4:5
Row: 1 -> 0:5 1:6 2:5 3:5 4:5
Row: 2 -> 0:5 1:5 2:7 3:5 4:5
Row: 3 -> 0:5 1:5 2:5 3:8 4:5
Row: 4 -> 0:5 1:5 2:5 3:5 4:9
Set elements on CUDA device:
Matrix set from the host:
Row: 0 -> 0:0 1:0 2:0 3:0 4:0
Row: 1 -> 0:0 1:1 2:0 3:0 4:0
Row: 2 -> 0:0 1:0 2:2 3:0 4:0
Row: 3 -> 0:0 1:0 2:0 3:3 4:0
Row: 4 -> 0:0 1:0 2:0 3:0 4:4
Matrix set from its native device:
Row: 0 -> 0:5 1:5 2:5 3:5 4:5
Row: 1 -> 0:5 1:6 2:5 3:5 4:5
Row: 2 -> 0:5 1:5 2:7 3:5 4:5
Row: 3 -> 0:5 1:5 2:5 3:8 4:5
Row: 4 -> 0:5 1:5 2:5 3:5 4:9

◆ setValue()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::setValue ( const RealType & v)

Sets all matrix elements to value v.

Parameters
vis value all matrix elements will be set to.

◆ vectorProduct()

template<typename Real , typename Device , typename Index , ElementsOrganization Organization>
template<typename InVector , typename OutVector >
void TNL::Matrices::DenseMatrixBase< Real, Device, Index, Organization >::vectorProduct ( const InVector & inVector,
OutVector & outVector,
const RealType & matrixMultiplicator = 1.0,
const RealType & outVectorMultiplicator = 0.0,
IndexType begin = 0,
IndexType end = 0 ) const

Computes product of matrix and vector.

More precisely, it computes:

outVector = matrixMultiplicator * ( *this ) * inVector + outVectorMultiplicator * outVector
Template Parameters
InVectoris type of input vector. It can be TNL::Containers::Vector, TNL::Containers::VectorView, TNL::Containers::Array, TNL::Containers::ArrayView, or similar container.
OutVectoris type of output vector. It can be TNL::Containers::Vector, TNL::Containers::VectorView, TNL::Containers::Array, TNL::Containers::ArrayView, or similar container.
Parameters
inVectoris input vector.
outVectoris output vector.
matrixMultiplicatoris a factor by which the matrix is multiplied. It is one by default.
outVectorMultiplicatoris a factor by which the outVector is multiplied before added to the result of matrix-vector product. It is zero by default.
beginis the beginning of the rows range for which the vector product is computed. It is zero by default.
endis the end of the rows range for which the vector product is computed. It is number if the matrix rows by default.

Note that the ouput vector dimension must be the same as the number of matrix rows no matter how we set begin and end parameters. These parameters just say that some matrix rows and the output vector elements are omitted.


The documentation for this class was generated from the following files: