Execution Model
FleCSI has two mechanisms for expressing work:
- Tasks
Tasks operate on data distributed to one or more address spaces and use data privileges to maintain memory consistency. FleCSI tasks are like a more flexible version of MPI that does not require the user to explicitly update dependencies between different ranks and which does not use static process mappings: i.e., relocatable, distributed-memory data parallelism.
- Kernels
Kernels operate on data in a single address space but require explicit barriers to ensure consistency. This is generally referred to as a relaxed-consistency memory model. The kernel interface in FleCSI is defined by two parallel operations: forall and reduceall. Each of these is a fine-grained, data-parallel operation. The use of the kernel nomenclature is derived from CUDA and OpenCL and is conceptually consistent with those models. Please see the example of using forall kernels in the parallel section of the tutorial.
Tasks are launched by schedulers.
Example 1: Single Tasks
A single task launch calls a given function just once (across all processes). This is in contrast to an index launch, which executes a task as a data-parallel operation, potentially across many processes.
The trivial
task is an example of a single
task.
Consider the following from tutorial/3-execution/1-single-task.cc
:
// Trivial task (no arguments, no return).
void
trivial() noexcept {
flog(info) << "Hello World" << std::endl;
}
Since they are not invoked directly, tasks cannot throw exceptions and must be declared noexcept
.
Execution of the task is a trivial use of the scheduler
provided to the action:
// Execute a trivial task.
s.execute<trivial>();
A single task can return a value:
// Task with return value.
int
with_return() noexcept {
int value{100};
The return value can be retrieved with a future
:
// A future is a mechanism to access the result of an asynchronous
// operation.
auto future = s.execute<with_return>();
// The 'wait()' method waits for the result to become available.
future.wait();
// The 'get()' method returns the result. Note that calling 'get()' by
// itself will wait for the result to become available. The call to 'wait()'
// in this example is illustrative.
flog(info) << "Got value " << future.get() << std::endl;
} // scope
Tasks can take many non-trivial C++ types as parameters,
e.g., a std::vector
:
Caution
Because they run asynchronously and not necessarily the same number of times as their callers, normal tasks cannot accept pointers or references to non-const types.
// Task with non-trivial parameter.
int
nontrivial_parameter(const std::vector<size_t> & v) noexcept {
std::stringstream ss;
int retval{0};
ss << "Parameter values: ";
for(auto i : v) {
retval += i;
ss << i << " ";
} // for
Execution of such a task is what you would expect:
// Execute a task with a non-trivial argument.
// Pointers/references must be to const.
{
std::vector<size_t> v = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34};
auto future = s.execute<nontrivial_parameter>(v);
flog(info) << "Sum is " << future.get() << std::endl;
} // scope
FleCSI tasks can also be templated:
template<typename Type>
Type
templated_task(Type t) noexcept {
Type retval{t + Type(10)};
flog(info) << "Returning value " << retval << " with type "
<< typeid(t).name() << std::endl;
return retval;
} // template
Again, execution is straightforward:
// Execute a templated task.
{
double value{32.0};
auto future = s.execute<templated_task<double>>(value);
flog(info) << "Got templated value " << future.get() << std::endl;
} // scope
Example 2: Index Tasks
An index task launch calls a given function a number of times asynchronously, typically distributed over multiple processes; each is called a point task. The usual purpose is operating on different parts of a distributed data structure (different colors of a topology) in parallel.
In this example we explicitly ask to call task
4 times via
the launch_domain
argument; the task must declare a parameter for it, but it need not be named or used.
To receive information about the task launch, a task can declare an execution space parameter; the task launch provides the dummy value exec::on
to initialize it.
An execution space parameter also controls where the task runs; exec::cpu
is the default, but others will be used later.
// Task with special arguments.
void
task(exec::cpu s, exec::launch_domain) noexcept {
flog(info) << "Hello World from point task " << s.launch().index << " of "
<< s.launch().size << std::endl;
}
// Advance control point.
void
advance(control_policy & p) {
exec::launch_domain ld{4};
p.scheduler().execute<task>(exec::on, ld);
} // advance()
Launch Domains
Launch domain (exec::launch_domain
) is used to define how many index
points an index task should have. If no launch_domain
is passed to the
execute
method, the default will be used.
If the task uses a field or topology accessor, the default is the number of colors of the topology used.
If no argument indicates a number, the default is to launch a single task.
Example 3: MPI Tasks
MPI task is an index task that has launch domain size equal to number of
MPI ranks and index points mapped to corresponding MPI ranks. Executing
MPI task adds synchronization between Legion and MPI and, therefore,
should only be used when one needs to call MPI library.
To execute an MPI task, flecsi::execute
must be used, with its second template argument set to mpi
.
The launch
information provided is equivalent to process
and processes
.
// Task with no arguments.
void
task(exec::cpu s) {
flog(info) << "Hello World from process: " << s.launch().index << std::endl;
}
// Advance control point.
void
advance(control_policy &) {
execute<task, mpi>(exec::on);
} // advance()