Context Navigation

Posts in category openmp

openmp on the altix

The following are some timing results on the Altix. The test case is a matrix multiplication. The time goes like N**3 where N is a side of the matrix. The numbers are averaged over 5 runs.

N	Threads	speedups	scale factor	raw times (s)
1000	1	1.00	1.00	9.99
1000	2	1.69	1.69	5.93
1000	4	2.86	1.69	3.23
1000	8	4.66	1.67	2.09
1000	16	8.05	1.68	1.17
1000	32	8.08	1.52	1.05
1000	64	9.53	1.46	0.95
2000	1	1.00	1.00	134.38
2000	2	1.79	1.79	75.21
2000	4	3.26	1.81	38.96
2000	8	5.85	1.80	22.89
2000	16	10.95	1.82	12.28
2000	32	13.35	1.68	8.03
2000	64	18.32	1.62	6.81
4000	1	1.00	1.00	1139.32
4000	2	1.77	1.77	640.28
4000	4	3.24	1.80	348.68
4000	8	5.90	1.81	190.78
4000	16	11.03	1.82	102.81
4000	32	16.39	1.75	63.89
4000	64	22.61	1.68	44.12
500	1	1.00	1.00	0.32
500	2	1.49	1.49	0.21
500	4	2.91	1.71	0.11
500	8	2.44	1.35	0.11
500	16	2.91	1.31	0.11
500	32	1.46	1.08	0.21
500	64	1.46	1.07	0.21

The important consideration is the scale factor and how that holds up as the threads increase. The C code used on the Altix is below. Still having problems with fipy as distutils is missing. As a point of reference I found this quote in an article, "More typical code will have a lower limit; 1.7x-1.8x are generally considered very good speedup numbers for code run on two threads" (http://cache-www.intel.com/cd/00/00/31/64/316421_316421.pdf). There is a start up time associated with threading. It is recommended that threads are maintained during the duration of a programming running. This may be difficult to achieve with weave.

Note:

The results need to be tested against unthreaded code.

/******************************************************************************
* FILE: omp_mm.c
* DESCRIPTION:
*   OpenMp Example - Matrix Multiply - C Version
*   Demonstrates a matrix multiply using OpenMP. Threads share row iterations
*   according to a predefined chunk size.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/28/05
******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
  int   tid, nthreads, i, j, k, chunk;
int N=4000;
              //  int loop;
  //  int loops=1;
  double *a;
  a = malloc(N * N * sizeof(double *));
  double *b;
  b = malloc(N * N * sizeof(double *));
  double *c;
  c = malloc(N * N * sizeof(double *));
  printf("finished allocating\n");
  chunk = 10;                    /* set loop iteration chunk size */
  /*** Spawn a parallel region explicitly scoping all variables ***/
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)
  {
  tid = omp_get_thread_num();
  if (tid == 0)
    {
    nthreads = omp_get_num_threads();
    printf("Starting matrix multiple example with %d threads\n",nthreads);
    //printf("Initializing matrices...\n");
    }
  /*** Initialize matrices ***/
#pragma omp for schedule (runtime)
  //  #pragma omp for schedule (static, chunk)
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      a[i * N + j]= i+j;
  //#pragma omp for schedule (static, chunk)
#pragma omp for schedule (runtime)
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      b[i * N + j]= i*j;
#pragma omp for schedule (runtime)
  //#pragma omp for schedule (static, chunk)
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      c[i * N + j]= 0;
  /*** Do matrix multiply sharing iterations on outer loop ***/
  /*** Display who does which iterations for demonstration purposes ***/
  //  printf("Thread %d starting matrix multiply...\n",tid);
#pragma omp for schedule (runtime)
  //  #pragma omp for schedule (static, chunk)
  //  for(loop=0; loop<loops; loop++)
    for(i=0; i<N; i++)
      for(j=0; j<N; j++)
        for (k=0; k<N; k++)
          c[i * N + j] += a[i * N + k] * b[k * N + j];
  }   /*** End of parallel region ***/
/*** Print results ***/
//
//printf("******************************************************\n");
//printf("Result Matrix:\n");
//for (i=0; i<NRA; i++)
//  {
//  for (j=0; j<NCB; j++)
//    printf("%6.2f   ", c[i][j]);
//  printf("\n");
//  }
//printf("******************************************************\n");
//printf ("Done.\n");
}

Posted: 2008-03-28 19:30 (Updated: 2008-03-28 21:02)
Author: wd15
Categories: openmp altix fipy
Comments (0)

openmp/weave timings.

A matrix multiplication in weave really scales well with openmp. The code is here. The observed speedup is almost perfect with two threads.

This code, involving large array multiplications of size N, has the following speedups with two threads.

N	Speedup
1E7	1.51
1E6	1.37
1E5	1.39
1E4	1.0

It should be noted that the number of loops in python increased inversely with the size of the array.

The question remains whether we can get speed ups for smaller arrays typically used in FiPy.

Posted: 2008-03-24 20:53 (Updated: 2008-03-24 22:23)
Author: wd15
Categories: reactiveWetting openmp fipy weave
Comments (0)

Download in other formats:

RSS Feed

Daniel Wheeler Trac Site

Context Navigation

openmp on the altix

openmp/weave timings.

Download in other formats: