Context Navigation

Posts for the month of March 2008

openmp on the altix

The following are some timing results on the Altix. The test case is a matrix multiplication. The time goes like N**3 where N is a side of the matrix. The numbers are averaged over 5 runs.

N	Threads	speedups	scale factor	raw times (s)
1000	1	1.00	1.00	9.99
1000	2	1.69	1.69	5.93
1000	4	2.86	1.69	3.23
1000	8	4.66	1.67	2.09
1000	16	8.05	1.68	1.17
1000	32	8.08	1.52	1.05
1000	64	9.53	1.46	0.95
2000	1	1.00	1.00	134.38
2000	2	1.79	1.79	75.21
2000	4	3.26	1.81	38.96
2000	8	5.85	1.80	22.89
2000	16	10.95	1.82	12.28
2000	32	13.35	1.68	8.03
2000	64	18.32	1.62	6.81
4000	1	1.00	1.00	1139.32
4000	2	1.77	1.77	640.28
4000	4	3.24	1.80	348.68
4000	8	5.90	1.81	190.78
4000	16	11.03	1.82	102.81
4000	32	16.39	1.75	63.89
4000	64	22.61	1.68	44.12
500	1	1.00	1.00	0.32
500	2	1.49	1.49	0.21
500	4	2.91	1.71	0.11
500	8	2.44	1.35	0.11
500	16	2.91	1.31	0.11
500	32	1.46	1.08	0.21
500	64	1.46	1.07	0.21

The important consideration is the scale factor and how that holds up as the threads increase. The C code used on the Altix is below. Still having problems with fipy as distutils is missing. As a point of reference I found this quote in an article, "More typical code will have a lower limit; 1.7x-1.8x are generally considered very good speedup numbers for code run on two threads" (http://cache-www.intel.com/cd/00/00/31/64/316421_316421.pdf). There is a start up time associated with threading. It is recommended that threads are maintained during the duration of a programming running. This may be difficult to achieve with weave.

Note:

The results need to be tested against unthreaded code.

/******************************************************************************
* FILE: omp_mm.c
* DESCRIPTION:
*   OpenMp Example - Matrix Multiply - C Version
*   Demonstrates a matrix multiply using OpenMP. Threads share row iterations
*   according to a predefined chunk size.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/28/05
******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
  int   tid, nthreads, i, j, k, chunk;
int N=4000;
              //  int loop;
  //  int loops=1;
  double *a;
  a = malloc(N * N * sizeof(double *));
  double *b;
  b = malloc(N * N * sizeof(double *));
  double *c;
  c = malloc(N * N * sizeof(double *));
  printf("finished allocating\n");
  chunk = 10;                    /* set loop iteration chunk size */
  /*** Spawn a parallel region explicitly scoping all variables ***/
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)
  {
  tid = omp_get_thread_num();
  if (tid == 0)
    {
    nthreads = omp_get_num_threads();
    printf("Starting matrix multiple example with %d threads\n",nthreads);
    //printf("Initializing matrices...\n");
    }
  /*** Initialize matrices ***/
#pragma omp for schedule (runtime)
  //  #pragma omp for schedule (static, chunk)
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      a[i * N + j]= i+j;
  //#pragma omp for schedule (static, chunk)
#pragma omp for schedule (runtime)
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      b[i * N + j]= i*j;
#pragma omp for schedule (runtime)
  //#pragma omp for schedule (static, chunk)
  for (i=0; i<N; i++)
    for (j=0; j<N; j++)
      c[i * N + j]= 0;
  /*** Do matrix multiply sharing iterations on outer loop ***/
  /*** Display who does which iterations for demonstration purposes ***/
  //  printf("Thread %d starting matrix multiply...\n",tid);
#pragma omp for schedule (runtime)
  //  #pragma omp for schedule (static, chunk)
  //  for(loop=0; loop<loops; loop++)
    for(i=0; i<N; i++)
      for(j=0; j<N; j++)
        for (k=0; k<N; k++)
          c[i * N + j] += a[i * N + k] * b[k * N + j];
  }   /*** End of parallel region ***/
/*** Print results ***/
//
//printf("******************************************************\n");
//printf("Result Matrix:\n");
//for (i=0; i<NRA; i++)
//  {
//  for (j=0; j<NCB; j++)
//    printf("%6.2f   ", c[i][j]);
//  printf("\n");
//  }
//printf("******************************************************\n");
//printf ("Done.\n");
}

Posted: 2008-03-28 19:30 (Updated: 2008-03-28 21:02)
Author: wd15
Categories: openmp altix fipy
Comments (0)

openmp/weave timings.

A matrix multiplication in weave really scales well with openmp. The code is here. The observed speedup is almost perfect with two threads.

This code, involving large array multiplications of size N, has the following speedups with two threads.

N	Speedup
1E7	1.51
1E6	1.37
1E5	1.39
1E4	1.0

It should be noted that the number of loops in python increased inversely with the size of the array.

The question remains whether we can get speed ups for smaller arrays typically used in FiPy.

Posted: 2008-03-24 20:53 (Updated: 2008-03-24 22:23)
Author: wd15
Categories: reactiveWetting openmp fipy weave
Comments (0)

openmp and weave

The following steps are required to build openmp to work with weave. I've tried this on poole and rosie. I followed the follwoing steps

1) Install mpfr version 2.3.1

        $ ../configure --prefix=${USR}
        $ make
        $ make install

2) Get gcc version 4.3 or 4.2.4 (when it is released). This version is needed otherwise you will get the "ImportError: libgomp.so.1: shared object cannot be dlopen()ed" error. gomp was not set up for dynamic loading in early gcc openmp compatible versions.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28482

3) Create a new directory and configure with

    $ ../configure --prefix=${USR} --disable-multilib

where USR is in your local directories somewhere.

4) make; make install

The code to test weave and openmp.

A C code that actually works in parallel with openmp.

Posted: 2008-03-20 20:05 (Updated: 2008-03-20 21:20)
Author: wd15
Categories: fipy
Comments (0)

Streamlines

How to make a contour plot of the streamlines for a flow field in 2D?

Given a velocity field $\vec{u}$ we need a field $\psi$ that is constant in directions that are tangent to the velocity field. This leads to the equation,

$\vec{u} \cdot \nabla \psi = 0$

The relationships

$u = \frac{ \partial \psi }{ \partial y}$

and

$v = -\frac{ \partial \psi }{ \partial x}$

satisfy the equation. From the above equations we can say,

$\nabla^2 \psi = \frac{\partial u}{\partial y} - \frac{\partial v}{\partial x}$

Solving this gives the stream-function. If we have no inflow conditions on regular domain walls, then we can choose $\psi=0$ on these walls. Since these walls have no circulation, we have made the choice that $\psi=0$ represents no circulation, which is sensible. The circulation orientation is given by the sign of $\psi$ .

See an example of streamlines

FiPy code to compute streamlines

Posted: 2008-03-17 20:39
Author: wd15
Categories: reactiveWetting
Comments (0)

2D simulations integrated with subversion

ID	status	condor ID	$\Delta t_{\text{max}}$		$\sigma_{\text{CFL}}$	$\bar{M}_S$	$\bar{M}_L$	$\mu_S$	$\mu_L$	Jim's movie	Bill's movie
1	unstable			$5\times10^{-7}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
2	running	12095.0	$1\times 10^{-4}$	$5\times10^{-7}$	$5\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
3	stopped			$1\times10^{-6}$	$5\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
4	running	12096.0		$1\times10^{-6}$	$5\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
5	unstable			$5\times10^{-7}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
6	running	12097.0		$1\times10^{-6}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
7	stopped		$1\times 10^{-4}$	$5\times10^{-7}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
8	running	12100.0	$1\times 10^{-4}$	$1\times10^{-6}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
9	unstable		$1\times 10^{-4}$	$5\times10^{-7}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
10	running	poole 5545	$1\times 10^{-4}$	$2\times10^{-6}$	$5\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
11	running	poole 9411	$1\times 10^{-4}$	$4\times10^{-6}$	$5\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
12	stopped		$1\times 10^{-4}$	$2\times10^{-6}$	$2\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie
13	stopped		$1\times 10^{-4}$	$4\times10^{-6}$	$5\times10^{-2}$	$10^{-11}$	$10^{-7}$	$10^{12}$		movie	movie