6.8. Extended Example: Concurrent Prime Number Search¶

This second example shows a different style of multithreaded programming. Instead of assigning a unique task to a particular thread, this example creates multiple threads that all execute the same code, but on different input parameters. For systems with multiprocessing capabilities, this style of programming forms the foundation of parallel execution. Since each thread’s calculations are independent of all the others’ calculations, they can run in parallel and compute the same answers.

#include <stdio.h>
#include <stdbool.h>
#include <pthread.h>
#include <stdlib.h>
#include <assert.h>

/* Specify the number of threads to run in concurrently */
#define NUM_THREADS 4

/* Each thread gets a start and end number and returns the number
   Of primes in that range */
struct range {
  unsigned long start;
  unsigned long end;
  unsigned long count;
};

/* Thread function for counting primes */
void *
prime_check (void *_args)
{
  /* Cast the args to a usable struct type */
  struct range *args = (struct range*) _args;
  unsigned long iter = 2;
  unsigned long value;

  /* Skip over any numbers < 2, which is the smallest prime */
  if (args->start < 2) args->start = 2;

  /* Loop from this thread's start to this thread's end */
  args->count = 0;
  for (value = args->start; value <= args->end; value++)
    {
      /* Trivial and intentionally slow algorithm:
         Start with iter = 2; see if iter divides the number evenly.
         If it does, it's not prime.
         Stop when iter exceeds the square root of value */
      bool is_prime = true;
      for (iter = 2; iter * iter <= value && is_prime; iter++)
        if (value % iter == 0) is_prime = false;

      if (is_prime) args->count++;
    }

  /* All values in the range have been counted, so exit */
  pthread_exit (NULL);
}

int
main (int argc, char **argv)
{
  pthread_t threads[NUM_THREADS];
  struct range *args[NUM_THREADS];
  int thread;

  /* Count the number of primes smaller than this value: */
  unsigned long number_count = 100000000L;

  /* Specify start and end values, then split based on number of
     threads */
  unsigned long start = 0;
  unsigned long end = start + number_count;
  unsigned long number_per_thread = number_count / NUM_THREADS;

  /* Simplification: make range divide evenly among the threads */
  assert (number_count % NUM_THREADS == 0);

  /* Assign a start/end value for each thread, then create it */
  for (thread = 0; thread < NUM_THREADS; thread++)
    {
      args[thread] = calloc (sizeof (struct range), 1);
      args[thread]->start = start + (thread * number_per_thread);
      args[thread]->end =
        args[thread]->start + number_per_thread - 1;
      assert (pthread_create (&threads[thread], NULL, prime_check,
                              args[thread]) == 0);
    }

  /* All threads are running. Join all to collect the results. */
  unsigned long total_number = 0;
  for (thread = 0; thread < NUM_THREADS; thread++)
    {
      pthread_join (threads[thread], NULL);
      printf ("From %ld to %ld: %ld\n", args[thread]->start,
              args[thread]->end, args[thread]->count);
      total_number += args[thread]->count;
      free (args[thread]);
    }

  /* Display the total number of primes in the specified range. */
  printf ("===============================================\n");
  printf ("Total number of primes less than %ld: %ld\n", end, 
          total_number);

  return 0;
}

The time command-line utility can be used to measure the amount of time it takes to run a program. However, time gives you multiple pieces of information to interpret. The real time is how much total time elapsed on the system clock from the start to finish. The user time is how much total CPU time was spent on all of the processors combined. For instance, if a program runs in perfect parallelism on 2 cores for 15 minutes, the real time would be 15 minutes, but the user time would be 30 minutes.

The following output shows the results for running the prime number program above twice. The first set of results (approximately 7.5 minutes for both real and user time) used only a single thread. The second set used 4 threads (best choice for the quad-core system used).

$ time ./prime
From 2 to 99999999: 5761455
===============================================
Total number of primes less than 100000000: 5761455

real	7m39.361s
user	7m38.650s
sys	0m0.325s

[... recompile to use 4 threads ...]

$ time ./prime 
From 2 to 24999999: 1565927
From 25000000 to 49999999: 1435207
From 50000000 to 74999999: 1393170
From 75000000 to 99999999: 1367151
===============================================
Total number of primes less than 100000000: 5761455

real	3m24.030s
user	9m49.405s
sys	0m0.438s

First, note that the parallel version took more total CPU time (almost 10 minutes); this kind of overhead is typical, as there are overhead performance penalties for using multiple threads. Second, the real time was only about 1/3 of the user time, not 1/4. In practice, no system ever achieves perfect parallel execution. Chapter 9 will explain the limits of parallelism in more detail.