Performance Effect of `get()` Outside of a Task

This is a weak scaling plot of Red-Black Gauss-Seidel iteration for Poisson’s Equation in 2D. The green squares are the weak scaling results when get() is used outside of a task. Already at fifty nodes there is a 30X performance difference. This blocked solution time scales as \(\left(\texttt{nodes}\right)^{0.7}\). So this difference will only get worse.

../../_images/blocked_vs_nonblocking.png

Forcing Bulk-Synchronousity in Your Code

Certain operations like future::get block the caller until the result is available. While an action is blocked, no further tasks can be launched and execution resources may become idle.

Listing 3 Forced bulk-synchronousity anti-pattern

using namespace flecsi;

std::size_t sub{3};
std::size_t ita{0};

static exec::trace t;         // trace object
t.skip();                     // skip tracing first time through loop

do {
  auto g = t.make_guard();    // turn tracing on for enclosing do loop
  for(std::size_t i{0}; i < sub; ++i) {
    s.execute<task::red>(m, ud(m), fd(m));
    s.execute<task::black>(m, ud(m), fd(m));
  }
  ita += sub;

 s.execute<task::discrete_operator>(m, ud(m), Aud(m));
 auto residual = s.reduce<task::diff, exec::fold::sum>(m, fd(m), Aud(m));
 err = std::sqrt(residual.get());
 flog(info) << "residual: " << err << " (" << ita << " iterations)"
            << std::endl;

} while(ita < max_iterations.value());

Removing Bulk-Synchronousity from Your Code

Here we pass the future to the print_residual task. Calling get() inside a task is correct as it allows the runtime to continue with other tasks while print_residual is waiting on the reduction.

Listing 4 Red-Black Gauss-Seidel non-blocking, no tracing

using namespace flecsi;

std::size_t sub{3};
std::size_t ita{0};

static exec::trace t;         // trace object
t.skip();                     // skip tracing first time through loop

do {
  auto g = t.make_guard();    // turn tracing on for enclosing do loop
  for(std::size_t i{0}; i < sub; ++i) {
    s.execute<task::red>(m, ud(m), fd(m));
    s.execute<task::black>(m, ud(m), fd(m));
  }
  ita += sub;

  s.execute<task::discrete_operator>(m, ud(m), Aud(m));
  auto residual = s.reduce<task::diff, exec::fold::sum>(m, fd(m), Aud(m));
  s.execute<task::print_residual>(residual, ita+sub);

} while(ita < max_iterations.value());

Note

Residual tolerance termination conditions are usually employed for solvers, but FleCSI does not yet support futures in this way.

Listing 5 print_residual task

void task::print_residual(future<double> residual, std::size_t ita) {
  double err = std::sqrt(residual.get());
  std::cout << "residual: " << err << " (" << ita << " iterations)"
    << std::endl << std::flush;
}

When to Call `get()` Outside of a Task

Basically, never call get() outside of a task. If you are using it in initialization once, that probably won’t hurt much.

Performance Effect of get() Outside of a Task

Forcing Bulk-Synchronousity in Your Code

Removing Bulk-Synchronousity from Your Code

When to Call get() Outside of a Task

Performance Effect of `get()` Outside of a Task

When to Call `get()` Outside of a Task