Execution Model

FleCSI has two mechanisms for expressing work:

Tasks

Tasks operate on data distributed to one or more address spaces and use data privileges to maintain memory consistency. FleCSI tasks are like a more flexible version of MPI that does not require the user to explicitly update dependencies between different ranks and which does not use static process mappings: i.e., relocatable, distributed-memory data parallelism.

Kernels

Kernels operate on data in a single address space but require explicit barriers to ensure consistency. This is generally referred to as a relaxed-consistency memory model. The kernel interface in FleCSI is defined by two parallel operations: forall and reduceall. Each of these is a fine-grained, data-parallel operation. The use of the kernel nomenclature is derived from CUDA and OpenCL and is conceptually consistent with those models. Please see the example of using forall kernels in the parallel section of the tutorial.


Tasks

Example 1: Single Tasks

A single task launches on a single process, i.e., only one instance of the task is executed. This is in contrast to an index launch, which executes a task as a data-parallel operation, potentially across many processes. FleCSI uses information about the arguments passed to a task to decide how to launch the task: If no parameter is passed that defines a launch domain, e.g., an explicit launch domain, a topology instance, or a future map, FleCSI will launch the task as single.

The trivial task is an example of a single task. Consider the following from tutorial/3-execution/1-single-task.cc:

// Trivial task (no arguments, no return).

void
trivial() {
  flog(info) << "Hello World" << std::endl;
}

Execution of the task is trivial:

  // Execute a trivial task.

  execute<trivial>();

A single task can return a value:

// Task with return value.

int
with_return() {
  int value{100};

The return value can be retrieved with a future:

    // A future is a mechanism to access the result of an asynchronous
    // operation.

    auto future = execute<with_return>();

    // The 'wait()' method waits for the result to become available.

    future.wait();

    // The 'get()' method returns the result. Note that calling 'get()' by
    // itself will wait for the result to become available. The call to 'wait()'
    // in this example is illustrative.

    flog(info) << "Got value " << future.get() << std::endl;
  } // scope

FleCSI tasks can take any valid C++ type as an argument by-value, e.g., a std::vector:

Caution

FleCSI tasks can take any valid C++ type by value. However, because task data must be relocatable, you cannot pass pointer arguments or arguments that contain pointers. Modifications made to by-value data are local to the task and will not be reflected at the call site.

// Task with by-value argument.

int
with_by_value_argument(std::vector<size_t> v) {
  std::stringstream ss;
  int retval{0};
  ss << "Parameter values: ";
  for(auto i : v) {
    retval += i;
    ss << i << " ";
  } // for
  flog(info) << ss.str() << std::endl;

  return retval;
} // with_by_value_argument

Execution of such a task is what you would expect:

  // Execute a task that takes an argument by-value. FleCSI tasks can take any
  // valid C++ type by value. However, because task data must be relocatable,
  // you cannot pass pointer arguments, or arguments that contain pointers.
  // Modifications made to by-value data are local to the task and will not be
  // reflected at the call site.

  {
    std::vector<size_t> v = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34};
    auto future = execute<with_by_value_argument>(v);
    flog(info) << "Sum is " << future.get() << std::endl;
  } // scope

FleCSI tasks can also be templated:

template<typename Type>
Type
templated_task(Type t) {
  Type retval{t + Type(10)};
  flog(info) << "Returning value " << retval << " with type "
             << typeid(t).name() << std::endl;
  return retval;
} // template

Again, execution is straightforward:

  // Execute a templated task.

  {
    double value{32.0};
    auto future = execute<templated_task<double>>(value);
    flog(info) << "Got templated value " << future.get() << std::endl;
  } // scope

Example 2: Index Tasks

Index task is a task that is executed by several processes. It is often used to operate on different parts of the input data (like partitioned mesh) asynchronously.

In this example we explicitly ask to execute task on 4 processes via the launch_domain argument.

// Task with no arguments.

void
task(exec::launch_domain) {
  flog(info) << "Hello World from color " << color() << " of " << colors()
             << std::endl;
}

// Advance control point.

void
advance(control_policy &) {
  exec::launch_domain ld{4};

  execute<task>(ld);
} // advance()

Launch Domains

Launch domain (exec::launch_domain) is used to define how many index points an index task should have. If no launch_domain is passed to the execute method, the default will be used. If a topology instance is passed the default is the number of colors that instance has. Otherwise, the default is to launch a single task.

Example 3: MPI Tasks

MPI task is an index task that has launch domain size equal to number of MPI ranks and index points mapped to corresponding MPI ranks. Executing MPI task adds synchronization between Legion and MPI and, therefore, should only be used when one needs to call MPI library. To execute an MPI task, the second template argument to the execute method should be set to mpi.

// Task with no arguments.

void
task() {
  flog(info) << "Hello World from process: " << process() << std::endl;
}

// Advance control point.

void
advance(control_policy &) {
  execute<task, mpi>();
} // advance()