7.5. Barriers¶

Semaphores provide a general mechanism that allows one thread to indicate that a particular event has happened. One type of event that arises quite commonly in concurrent applications is the need to make sure all threads reach some common point before any of them continues on. That is, multiple threads may perform some initial computation then pause; at that point some manager thread checks these initial results and allows the threads to continue once their results have been approved.

As an example, consider the case of an autonomous car or flying drone. These types of systems require accurate measurements of real-world phenomena like speed, position, and acceleration. However, these measurements involve floating-point calculations that may encounter rounding errors. To improve the accuracy of the vehicle’s autonomous controls, multiple threads may be simultaneously and independently performing these calculations. The manager control thread will occasionally require these threads to reach a common point in time where each thread reports on its best guess of the measurements. The manager may then correct errors in some threads, then allow them to continue.

Achieving this kind of synchronization with semaphores is complex and error-prone. One approach would be to use two semaphores: the manager thread would repeatedly wait on one of them as the threads were finishing their calculations; the manager would then signal on a different semaphore that each thread was waiting on. However, if all threads agree but one crashes, then the entire system would fail; the manager thread would continue to wait on that last thread to finish (which will never happen).

As an alternative, synchronization barriers allow programs to specify a minimum number of threads that must reach a common point before any can continue. For instance, if the autonomous vehicle manager thread receives reports from 7 out of 10 threads about the current position, then the system may be designed to consider this measurement good enough and let the threads continue rather than requiring the last three threads to finish.

The base functionality of barriers is specified with thread functions. The barrier is initialized with pthread_barrier_init(), with the last parameter indicating the minimum number of threads that must reach the barrier before any can be allowed to continue. Threads use the pthread_barrier_wait() function to indicate that they have reached the common point. If the minimum number of threads have not yet reached the barrier, then the current thread will become blocked. Once the minimum number has been reached, all threads waiting at the barrier become unblocked; any future thread that calls pthread_barrier_wait() will pass right through the barrier without any delay. Finally, pthread_barrier_destroy() is used to clean up any resources associated with the barrier.

C library functions – <pthread.h>

int pthread_barrier_init (pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned count);: Initialize a synchronization barrier with the specified attributes.
int pthread_barrier_destroy (pthread_barrier_t *barrier);: Delete a synchronization barrier.
int pthread_barrier_wait (pthread_barrier_t *barrier);: Make a thread wait until enough have reached the barrier.

7.5.1. Concurrent Calculations with Barriers¶

To show how barriers work, consider using two threads to determine which function grows faster: the Fibonnacci sequence or the exponential function. That is, is the 100th Fibonacci number greater than $c^{100}$ for some constant $c$ ? [1] It can be shown that each Fibonacci number is slightly less than double the previous value, so the answer would be no for any constant integer greater than 1. But what if the constant is 1.6? [2] Then the answer is not obvious, but it is easy enough to calculate.

Code Listing 7.13 shows how barriers can address this problem. The fibonacci() and exponential() threads take a struct that contains a shared pthread_barrier_t, a digit that specifies how far into the sequence and what exponent to calculate, and a base value that specifies the base of the exponent. That is, if digit is specified as 30 and base is 1.6, fibonacci() will calculate the 30th Fibonacci number, while exponent() will calculate (approximately) $1.6^{30}$ .

/* Code Listing 7.13:
   Using a barrier to pause multiple threads
 */

/* Use a common struct with the barrier and parameters */
struct args {
  pthread_barrier_t *barrier;
  unsigned int digit;
  double base;
  unsigned long fib;
  unsigned long exp;
};

/* Iterative version of the Fibonacci calculation */
void *
fibonacci (void *_args)
{
  struct args *args = (struct args *) _args;
  unsigned long previous = 1;
  unsigned long current = 1;
  unsigned int i;

  for (i = 2; i < args->digit; i++)
    {
      current += previous;
      previous = current - previous;
    }
  /* current is the i-th Fibonacci number */
  args->fib = current;

  /* Wait at the barrier and compare results */
  pthread_barrier_wait (args->barrier);
  if (args->fib > args->exp)
    printf ("Fibonacci wins: %lu > %lu\n", args->fib, args->exp);

  pthread_exit (NULL);
}

/* Calculate base^digit using repeated multiplication */
void *
exponential (void *_args)
{
  struct args *args = (struct args *) _args;
  double result = 1;
  unsigned int i;

  /* WARNING: Don't actually do this in real code that needs
     Accurate calculations. Repeated floating point arithmetic
     inherently has rounding errors that will compound. */

  /* Calculate base^digit; cast to unsigned long for comparison */
  for (i = 1; i < args->digit; i++)
    result *= args->base;
  args->exp = (unsigned long) result;

  /* Wait on threads to reach the barrier and check the result */
  pthread_barrier_wait (args->barrier);
  if (args->exp > args->fib)
    printf ("Exponential wins: %lu > %lu\n", args->exp, args->fib);

  pthread_exit (NULL);
}

Bug Warning

As indicated in the code, you should not try to calculate the exponential value as shown in this code. For one thing, this code is (intentionally) inefficient and could be easily optimized. More importantly, though, the result is guaranteed to be inaccurate. Floating-point arithmetic is susceptible to rounding errors, particularly with repeated calculations and large values. Consequently, this example should primarily be considered for its use of barriers rather than its accuracy regarding the comparison of Fibonacci and exponential growth.

The key lines to note regarding these threads are the calls to pthread_barrier_wait() and the references to the fields fib and exp, which store the results of each calculation. Specifically, note that there is no race condition regarding these fields because the threads only read the values after the barrier. That is, nothing changes after the barrier, so there is no concern that the values checked are inaccurate. This guarantee arises from the fact that both threads must reach the barrier (i.e., the pthread_barrier_wait() call) before either can proceed.

One question that cannot be answered with this code is which thread runs faster. With barriers, all we can say for sure is that both threads reach the barrier before either proceeds. We have no information about which one arrived first. From the perspective of the barrier, such a question is irrelevant. The only thing that matters is that both finish before they each check the results.

Code Listing 7.14 shows the initialization required. The main thread initializes the barrier so that 2 threads are required, then creates the threads. Notice that both threads receive pointers to the same struct instance, so they share the same barrier, fib, and exp fields. The main thread joins both of the helper threads, but the join must occur after the threads have called pthread_exit(). Since both threads call pthread_barrier_wait() before that, the main thread is guaranteed to clean up the barrier only after it is no longer needed.

/* Code Listing 7.14:
   Initializing the barrier and data structures for Code Listing 7.13 
 */

/* Initialize the barrier so that both threads must wait */
pthread_barrier_t barrier;
pthread_barrier_init (&barrier, NULL, 2);

/* Set up parameters to get the 30th Fibonacci and 1.6^30 */
struct args args;
args.barrier = &barrier;
args.digit = 30;
args.base = 1.6;
args.fib = 0;
args.exp = 0;

/* Create and join the threads, then destroy the barrier */
assert (pthread_create (&threads[0], NULL, fibonacci, &args) == 0);
assert (pthread_create (&threads[1], NULL, exponential, &args) == 0);

pthread_join (threads[0], NULL);
pthread_join (threads[1], NULL);

pthread_barrier_destroy (&barrier);

[1]	To be clear, the `exponential()` function is truly the exponential $c^n$ rather than the polynomial $n^c$ . In both `exponential()` and `fibonacci(),` we are using $n = 100$ .

[2]	The selection of 1.6 as the value was not an arbitrary choice. There is a closed-form solution known as Binet’s formula, and calculations based on this formula can be done to show that each Fibonacci number is approximately 1.6 times the value of the previous number.