.. |br| raw:: html
.. _TUT-PAR: Distributed and shared memory parallelism ***************************************** FleCSI provides two different levels of parallelism: distributed memory parallelism and shared memory parallelism. Distributed memory parallelism is provided through topology coloring and distribution of the data between different processes (shards). FleCSI provides macros *forall* and *reduceall* for shared memory parallelism. Currently, it uses Kokkos programing model. ---- Shared memory ************* Example 1: forall macro / parallel_for interface ++++++++++++++++++++++++++++++++++++++++++++++++ This example is an extension to the data-dense tutorial example with the only difference of an additional "modify1" and "modify2" tasks that use *forall* macro / *parallel_for* interface. Both "modify" tasks are executed on the FleCSI default accelerator. Second template parameter to the execute function is a *processor_type* with *loc* (latency optimized core) as a default value. *default_accelerator* is a processor type that corresponds to Kokkos default execution space. For example, if Kokkos is built with Cuda and Serial, Cuda will be a default execution space or *toc* (throughput optimized core) *processor type* in FleCSI. .. note:: With the Legion backend, OpenMP task execution can be improved with the ``omp`` processor type. Legion knows to assign an entire node to such a task. .. warning:: With the MPI backend, running one process per node with ``toc`` tasks or one process per core with ``omp`` tasks likely leads to poor performance. .. literalinclude:: ../../../../tutorial/5-parallel/1-forall.cc :language: cpp Example 2: reduceall macro / parallel_reduce interface ++++++++++++++++++++++++++++++++++++++++++++++++++++++ This example is an extension to the data-dense tutorial example with the only difference of an additional "reduce1" and "reduce2" tasks that use *reduceall* macro / *parallel_reduce* interface. Both "modify" tasks are executed on the FleCSI default accelerator. .. literalinclude:: ../../../../tutorial/5-parallel/2-reduceall.cc :language: cpp .. vim: set tabstop=2 shiftwidth=2 expandtab fo=cqt tw=72 :