GRAPE newletter vol.5 ( Feb 20 2004)

Dear Colleagues:

This is the fifth issue of our GRAPE newsletter, a brief summary of
recent developments regarding the GRAPE special-purpose computer for
stellar dynamics.  For further information: "http://www.astrogrape.org".

       +----------------------- CONTENTS:-----------------------------+
       | 1) A Parallel Treecode on GRAPE Systems
       | 2) MicroGRAPE availability
       | 3) SPH on Programmable GRAPE 
       | 4) GRAPE-6 on a 64-bit host
       | 5) Gordon-Bell Prize 2003
       | 6) Reports from the field: GRAPEs in Cambridge
       +--------------------------------------------------------------+

1---------1---------1---------1---------1---------1---------1---------1

 A PARALLEL TREECODE ON GRAPE SYSTEMS

We have developed a parallel treecode that runs on a distributed
memory system consisting of PCs and GRAPE processor boards.  The
parallelization scheme is basically the same as Warren & Salmon's
(1993) Hashed Oct-Tree algorithm. The source code is available from
A. Kawai (kawai@sit.ac.jp) upon request.

Measurement on the MDM system in RIKEN (MDGRAPE-2, COMPAQ DS20, Myrinet),
show a good scalability up to 16 nodes. According to a recent report by
T. Fukushige, a single timestep with 32M particles took 61 seconds on a
GRAPE-5 system (GRAPE-5, Pentium4/2.8GHz, Fast Ethernet, 8 nodes). The
same run took 64 seconds on a GRAPE-6 system (GRAPE-6A, Pentium4/2.4GHz,
Fast Ethernet, 8 nodes).

The code has been applied to production runs for dark halo formation
(Fukushige et al. 2003, astro-ph/0306203). In the largest run, 60
million particles were integrated over 4k timesteps. The run took 3
weeks on a GRAPE-5 system (GRAPE-5, Pentium4/1.9GHz, Fast Ethernet).

2---------2---------2---------2---------2---------2---------2---------2

 MICROGRAPE AVAILABILITY

Now it is possible to let your desktop computer run at more than
100 Gflops on real calculations, just by adding a single card to the
computer!

The MicroGRAPE, which used to be called "Baby GRAPE-6" in our previous
letter (http://grape.artcompsci.org/newsletter/030526.html), is now
available from Hamamatsu Metrix (http://www.metrix.co.jp/, contact:
Ken Yazawa, "k_yazawa@metrix.co.jp"). It is a single PCI card with 4
GRAPE-6 chips, which you can install directly in the computer on your
desk, instead of adding a special box. It offers a peak speed of 120
Gflops, and more than 50% of the peak speed is achieved already for
10K particles. Early adopters of the MicroGRAPE include: National
Observatory of Japan, Osaka University, Tokyo Institute of Technology.

3---------3---------3---------3---------3---------3---------3---------3

 SPH ON PROGRAMMABLE GRAPE

Toshiyuki Fukushige and Tsuyoshi Hamada have been working on the
so-called PROGRAPE (programmable GRAPE), which uses FPGA (field-
programmable gate array) chips to implement pipeline processors.  The
advantage of the FPGA technology is that, as its name suggests, it can
be configured to implement different pipeline processors (the drawback
is that the circuit size you can fit on an FPGA chip is much smaller
than what you can fit on a full-custom LSI chip).  As such, the FPGA-
based GRAPE is an ideal platform to implement complex and moderately
expensive operations such as SPH interactions.

Fukushige and Hamada completed the PROGRAPE-II (PGII)system, which
is realized as a processor module directly mountable on existing
GRAPE-6/MicroGRAPE motherboards. They also developed a software system,
PGPG,  specialized for the design of pipeline processors.

Using PGPG, they implemented an SPH simulation code on a PGII system.
With 4 PGII modules on MicroGRAPE cards on a single P4 host, they
achieved a speed of 2 x 10^4 particle steps/sec, which compares
favorably with the speed of existing parallel SPH codes with 8 or
more processors.

We hope to make the PGII commercially available in the near future.

4---------4---------4---------4---------4---------4---------4---------4

 GRAPE-6 ON A 64-BIT HOST

For those of you who are tired of the 4GB memory limit and the 2GB
file size limit of the Linux operating system on Intel IA32 processors
and who wish that the Alpha were still around, there is good news: the
GRAPE-6 is back in the 64-bit world! The GRAPE-6 library was adapted
and tested on an AMD Athlon 64 system with TurboLinux 8 for AMD64.
Currently, the recommended motherboard for an Athlon 64 system is
that with a VIA K8T800 chipset.  Along with the advantage of 64-bit
operation, Athlon 64 offers a noticeable speed advantage over Intel P4
systems.  The library archive newer than Jan 13, 2004, supports AMD64.

We plan to test the driver software on SuSE (soon) and RedHat (in some
not-so-far future).

We currently have no plan to port the GRAPE-6 driver to Itanium 2
systems.


5---------5---------5---------5---------5---------5---------5---------5

 GORDON-BELL PRIZE 2003

On Nov 20, 2003, the GRAPE-6 was awarded the Gordon Bell prize for
special achievement.  The calculations submitted were a simulation of
a triple black hole system embedded in the center of a galaxy simulated
by 2M stars (achieving 35.3 Tflops) and a simulation of a Kuiper-Belt
region with 1.8M bodies (at 33.4 Tflops).

6---------6---------6---------6---------6---------6---------6---------6

 REPORTS FROM THE FIELD: GRAPES IN CAMBRIDGE

 by Sverre Aarseth (email: sverre@ast.cam.ac.uk)

The interest in special-purpose computers at the Institute of Astronomy
dates back to 1994 when we bought the 8-pipe version HARP-2. Already
one year later the 88-pipe HARP-3 was obtained by a research grant.
HARP-2 proved to be extremely reliable over the subsequent seven years,
whereas HARP-3 was sometimes affected by intermittent hanging. HARP-2
was used exclusively by Sverre Aarseth for code development and star
cluster modeling, typically with 10,000 stars and 5% hard primordial
binaries. The early implementation of stellar evolution and associated
processes was carried out by Chris Tout. A number of visits by Seppo
Mikkola (Turku) led to several new algorithms for studying binaries as
well as strong interactions in subsystems. Several visits by Rosemary
Mardling (Monash) resulted in a self-consistent treatment of tidal
circularization, as well as a semi-analytical stability criterion for
triples and higher multiplicities. The resulting code called NBODY4 has
been the work-horse for nearly all the Grape-related work at Cambridge
and several other institutions. In addition to the main projects
described below, several other local researchers have also made use of
our facilities.

The somewhat more powerful HARP-3 (peak 20 Gflops for N=40,000) was
first used heavily by Douglas Heggie and Enrico Vesperini (Edinburgh).
For his Ph.D. thesis, Jarrod Hurley refined the synthetic stellar
evolution and extended the metallicity to low globular cluster values.
He also performed a series of open cluster simulations in order to
model M67 with special emphasis on its blue straggler population.
During the last few years until its retirement in 2002, HARP-3 was
kept busy by Mark Wilkinson to study the dynamics of globular clusters
and compare the results with our HST observations of young LMC
clusters.

A standard GRAPE-6 board (48 virtual pipes) was acquired by a PPARC
grant in 2001. For the host we use a 2.0 GHz Pentium 4. During his
visit last year, Jun Makino made some improvements to the Grape
library and introduced the Intel Fortran compiler. In particular, the
facility to obtain only the nearest neighbour was highly beneficial
for unperturbed KS procedures. After leaving the Institute, Jarrod
Hurley has continued to improve the treatment of astrophysical
processes, such as tides and spin-orbit coupling in binaries.

Early investigations include a re-examination of the M67 model based
on 10,000 single particles and 10,000 hard primordial binaries. The
formation and evolution of hierarchical systems in such models have
also been examined. Thus up to 50 stable systems may occur at the same
time after about 1 Gyr. Although the formation rate is small, this is
compensated by long lifetimes. The onset of internal instability by
external perturbations often result in very large escape velocities.
The major effort this year has been a star cluster simulation with
95,000 single particles and 5,000 hard binaries. The binary fraction
was preserved throughout, with a life-time until disruption at more
than 18 Gyr. At the same time, Jarrod Hurley has been studying similar
models on several GRAPE-6 boards in New York. An identical version of
the code is maintained in order to work as a team. By now, a large
data set is awaiting analysis.

Mark Wilkinson has also been active with the GRAPE-6. He has examined
why older LMC clusters display a significantly larger spread in core
radii than the younger ones. It is well known that wind loss from a
population of high-mass stars can produce expansion. However, from
detailed analysis of HST data intercluster variations in the IMF have
been ruled out for intermediate-age and old LMC clusters. This implies
that the initial luminosity functions were very similar. Simulations of
5,000 stars with up to 50% hard binaries were performed both in
Cambridge and New York. The external potential was modelled as a point
mass choosing circular or elliptical cluster orbits, with the initial
outer cluster radii for the latter corresponding to the tidal radii at
pericentre. Since the true mass distribution is more extended, this
constitutes an upper limit to the possible effects of tides on LMC
clusters. When the calculations are started at apocentre, this leads
to an initial cluster expansion driven by a combination of mass loss
from stellar evolution and two-body effects in the core. The results
are presented in terms of observable quantities, with the core radii
obtained using the HST reduction pipeline. The effect of the external
tidal field experienced by the clusters on non-circular orbits is
negligible on timescales of less than 1 Gyr. The presence of primordial
binaries leads to some expansion of core radii. However, the magnitude
of this effect is insufficient to explain the observed trend even when
large variations in the binary fraction are considered. In the future,
the evolution of clusters in time-varying tidal fields will continue to
be a focus of research. Other projects include cluster interactions and
low-mass binary black holes in clusters. 

These relatively small-N models have been extended by Dougal Mackey as 
part of his Ph.D thesis, again with the focus on the core radii of LMC
clusters. The GRAPE-6 has allowed some of the first realistic 
simulations of LMC clusters, of ages up to ~2 Gyr. Two models have
been fully analysed. The first constitutes a "control" run, consisting
of a cluster of 100,000 single stars with masses selected from an
appropriate IMF. The initial conditions have been carefully 
constructed so that the cluster structure matches that observed for 
young LMC clusters. The initial mass, central density, and scale (core)
radius are also appropriate for young LMC objects. The second model
is exactly similar, but has a hard binary fraction of 10%. These binary
stars are centrally concentrated so that the local binary fraction
is 35% within ~2 core radii. The advantage of such large cluster models
is that they may be "observed" in an identical manner to the genuine 
HST measurements, and directly comparable parameters may be derived.
Scaling uncertainties are also circumvented. The core radius evolution
of these two clusters has been measured. No core expansion is evident,
although the effects of mass segregation are clear. The result from
the small models is upheld -- specifically that the heating effect
of a population of hard binary stars is insufficient to explain the
observed trend in core radius for LMC clusters. Additional large 
simulations have been carried out, but are yet to be fully analysed.
The first of these concerns the heating effects of a small population
of massive (50 solar mass) binary black holes. Preliminary 
investigation shows significant core expansion on the correct time-
scale, but still of insufficient magnitude to account for the
observed trend.

Last year, a large fraction of our GRAPE-6 time was devoted to the
binary black hole project. The time-transformed leapfrog scheme
developed by Seppo Mikkola was implemented in a new code called NBODY7
(IAU Symp. 208). Initial models containing two merging clusters with a
central massive body in each core were studied for long time intervals
and a total membership of up to 240,000 particles. The massive binary
was integrated until GR coalescence due to extremely high eccentricity.
Although the inclusion of GR two-body terms is time-consuming for small
periods, the treatment has proved successful. However, a wider range of
initial conditions need to be explored. The new micro-GRAPE appears to
be an ideal system for such problems because of the heavy load on the
host during the late stages.

0---------0---------0---------0---------0---------0---------0---------0

           Piet Hut and Jun Makino
           (submissions to: piet@astrogrape.org or grape@astrogrape.org)


HOW TO (UN)SUBSCRIBE:  send an email message to grape@astrogrape.org
                       with Subject: (un)subscribe

00--------00--------00--------00--------00--------00--------00--------00


Back to GRAPE Project