GRAPE newletter vol.8 ( Dec 16 2006)

Dear Colleagues:

This is the eighth issue of our GRAPE newsletter, a brief summary of
recent developments regarding the GRAPE special-purpose computer for
stellar dynamics.  For further information: "http://www.astrogrape.org".

       +----------------------- CONTENTS:-----------------------------+
       | 1) Status of the GRAPE-DR Project
       | 2) K & F Computing Research
       | 3) MDGRAPE-3 wins a Gordon Bell prize
       | 4) Winter School for N-Body Simulations at NAOJ, Tokyo
       | 5) GRAPE Clusters in Heidelberg, Rochester, Amsterdam and Kiev
       | 6) Reports from the field: GRAPEs in Tsukuba
       +--------------------------------------------------------------+

1---------1---------1---------1---------1---------1---------1---------1

 STATUS OF THE GRAPE-DR PROJECT

The GRAPE-DR project is a five-year project started in FY 2004
(officially in July 2004), to develop a new generation of GRAPE
hardware.  As reported in the previous newsletter, the processor
chip for GRAPE-DR, the GRAPE-DR chip, is a massively-parallel SIMD
processor with 512 processing elements, each with 1Gflops theoretical
peak speed.  We received the first sample of the chip in May, and as
of November 2006, a prototype board with single a GRAPE-DR chip and
PCI-X interface is in operation, running a few application software
including GRAPE-6 emulation.  The current prototype board operates
correctly (well, not perfectly reliably yet, though... We are working
on this issue) on a 500MHz system clock, for the real performance of
512 Gflops per chip.

We (here "we" actually means Toshi Fukushige) are currently working on
the pre-mass-production version of the board, with a single GRAPE-DR
chip and PCI-Express (8-lanes) interface. We hope to finish this board
by spring 2007, and if everything goes fine, K&F Computing (see below)
will start to accept purchase orders by mid-2007.

We plan to finish a big machine (4096 chips, 2 Pflops) by the end of
2008.

2---------2---------2---------2---------2---------2---------2---------2

 K & F COMPUTING RESEARCH

K&F Computing Research Co. is a new manufacturing company of GRAPE
series hardware.  It was established by Toshiyuki Fukushige in
March 2006, after his spin-off from the University of Tokyo.  Now,
K&F Computing Research Co. provides GRAPE-7, which is a FPGA-based
acceleration board with PCI-X bus.  The GRAPE-7 mainly works as a
successor of the GRAPE-5, and can store at a maximum 96 GRAPE-5
equivalent pipelines for force calculation operating at 100MHz.
More information is available at www.kfcr.jp/index-e.html.

3---------3---------3---------3---------3---------3---------3---------3

 MDGRAPE-3 WINS A GORDON BELL PRIZE

A simulation using MDGRAPE-3 won the Gordon Bell Prize Honorable Mention
for Peak Performance at Supercomputing 2006 (Nov. 11-17, Tampa, Florida).
They obtained sustained performance of 185-TFLOPS using the 860-TFLOPS
MDGRAPE-3 system for the simulation of the aggregation process of
peptides derived from Yeast Sup35.  The aggregation process of proteins
are considered to play an important role in several important diseases
in central nervous systems like Alzheimer's, Parkinson's or Creutzfeldt-
Jacob's diseases.  Yeast Sup35 protein, often called as Yeast Prion,
is also known to form minicrystals.  Since even a small heptapeptide
derived from this protein has an ability to form minicrystals, it is
often used to study an aggregatgion process of proteins.  Here they
performed large-scale simulation of aggregation process for the peptides
in aquaous solutions.

For molecular dynamics simulations with 17 million atoms using a cutoff
of 44.5 angstrom, a sustained performance of 185 TFLOPS has been obtained.
This is a performance after conversion to the corresponding value on a
general-purpose computer.  The actual FLOPS value was 370 TFLOPS, however,
it should be halved since in the MDGRAPE system an action and its reaction
are calculated doubly.  The efficiency of hardware usage has already
reached 45 % for the simulation.  This achivement was done by a team in
RIKEN and Intel K. K. including former Komaba GRAPE Yard members, Tetsu
Narumi, Yousuke Ohno, and Makoto Taiji.  The paper is available at
http://sc06.supercomputing.org/schedule/event_detail.php?evid=5197

4---------4---------4---------4---------4---------4---------4---------4

 WINTER SCHOOL FOR N-BODY SIMULATIONS AT NAOJ, TOKYO

The Center for Computational Astrophysics, at the National Astronomical
Observatory of Japan will hold a three-day school for graduate students 
and post-docs on N-body simulations during March 6-8 at the Tokyo Mitaka
campus.  The school covers the basics of stellar dynamics, algorithms
for integration of ordinary differential equations, and the practice of
collisionless N-body codes with GRAPE-7.  Lecturers include Jun Makino,
Eiichiro Kokubo, Hiroshi Daisaka, etc.

5---------5---------5---------5---------5---------5---------5---------5

 GRAPE CLUSTERS IN HEIDELBERG, ROCHESTER, AMSTERDAM AND KIEV

The GRAPE clusters in Heidelberg and Rochester (see GRAPE newsletter 
Vol. 7) have been successfully used to demonstrate a possible new path 
to bridge the so-called final parsec problem and bring supermassive
binary black holes in a few hundred million years close enough together 
to trigger a relativistic coalescence (see Berczik, Merritt & Spurzem
2006, ApJ 642, L21).  For such simulations a sustained speed of 3.2
Tflop/s (of a peak of 4 Tflop/s) has been achieved with 4 million
particles in a direct $N$-body simulation.  This is again a demonstration
of the extremely efficient use of silicon in GRAPE as compared to general
purpose CPUs (where such simulations typically only reach some 5-10% of
the peak speed).  The code used (called phi-GRAPE) was written by Peter
Berczik (in C) and Stefan Harfst (in Fortran) and is in the public domain
at  ftp://ftp.ari.uni-heidelberg.de/pub/staff/berczik/phi-GRAPE-C/ .
The code is a simple Hermite scheme 4th order N-body code without
regularisation and neighbour scheme - work to deploy NBODY6++ to the
GRAPE clusters is still in progress.

In a collaboration between the GRAPE sites in Amsterdam, Rochester and
Heidelberg we have published an extensive profiling and benchmarking
study of such simple parallel N-body codes, and developed a timing model
(Harfst, Gualandris, Merritt, Spurzem, Portegies Zwart, Berczik, 2006,
New Astronomy, in press, astro-ph/0608125).  The timing model includes
computations on the parallel hosts and on the GRAPEs, communication
between the nodes and between host and GRAPE.  It can be used to
extrapolate the expected performance of the next generation of such
clusters; we conclude that a 50 Tflop/s system with order of 200 nodes
could be built for a very moderate price.  Presently the situation
regarding accelerator boards (be it GRAPE-DR or other reconfigurable
hardware boards) is in such a fast development that such new clusters
will probably use different hardware than the presently common GRAPE-6a.

We hope that in the next GRAPE newsletter more news can be reported.
For example on Nov. 24 the layout of the new reconfigurable MPRACE-2
board at the Univ. of Mannheim (Germany) has been completed and the
first prototype boards are being assembled, expected delivery in the
second week of January.  MPRACE-2 is operating on a 250 MHz clock with
a Xilinx Virtex-4 FPGA chip and supports the PCI-Express interconnect.
It will support pipelines specially adapted to the Ahmad-Cohen neighbour
scheme and smoothed particle hydrodynamics.  The effective speed reached
with such a board strongly depends on the details of the algorithm,
e.g. the word length used in every operation.  It can be freely
adjusted, which is the advantage of using reconfigurable chips (FPGA).
Codes are being developed already now with emulators of the MPRACE
board, and as soon as the first prototype boards are available
benchmarking will start.
 
A final piece of good news is that the Ukrainian academy of sciences
has decided to support the Main Astronomical Observatory in Kiev with
a grant to build an 8-node GRAPE cluster as well.  Its construction is
supervised by Peter Berczik and it is our aim to develop an integrated
software development and networking between the Ukrainian and the other
GRAPE clusters.

6---------6---------6---------6---------6---------6---------6---------6

 REPORTS FROM THE FIELD: GRAPES IN TSUKUBA

FIRST Project at Center for Computational Sciences, University of Tsukuba

by Masayuki Umemura (CCS, Univ. of Tsukuba, Japan)

The FIRST project is aiming at elucidating the origin of first
generation objects in the Universe through large-scale radiation
hydrodynamic simulations with a Heterogeneous Multi-Computer System
(HMCS).  For this HMCS, we have developed a new processor board with
four GRAPE-6 chips, which is called "Blade-GRAPE".  This board is
embedded in each node of PC cluster.  We have constructed a 240 node
system with Blade-GRAPEs, the peak performance of which is 33.3TFLOPS.
This project is supported by a Specially Promoted Research in
Grants-in-Aid for Scientific Research over four years (2004~2007) with
the fund of 329.5 million yen (US$2.8 million), approved by The
Ministry of Education, Culture, Sports, Science and Technology (MEXT)
in Japan.  The core menders of the project is Masayuki Umemura
(Project Leader), Taishi Nakamoto (TiTec), Hiroyuki Hirashita, Hajime
Susa (U Rikyo), Masao Mori (U Senshu), Yoshiaki Kato, Jun'ichi Sato,
Tamon Suwa, Taisuke Boku, Daisuke Takahashi, and Osamu Tatebe.

The Blade-GRAPE is a new-type of GRAPE board, which is designed for a
full size PCI slot.  It is embedded in a 2U-size of 19-inch rack
mountable server PC (HP ProLiant DL380 G4) that has dual CPUs in SMP
configuration.  The Blade- GRAPE is directly connected via PCI-X bus,
and occupies the space of two PCI-X bus slots.  The Blade-GRAPE's
electric power is supplied from the PCI-X bus (3.3V) as well as from
the cluster server, +12V (54W).  The Blade-GRAPE consists of four
GRAPE-6 chips, and the theoretical peak performance is 136.8GFLOPS.
We have constructed a 240 node system using 240 Blade-GRAPEs, which is
called "FIRST simulator".  Each server PC is equipped with multi-port
Gigabit Ethernet NIC to be connected to a special interconnection
network with commodity Ethernet switches.  The total theoretical peak
speed of the FIRST simulator is 33.3TFLOPS (33TFLOPS in Blade-GRAPEs
and 3TFLOPS in host nodes).  The Blade-GRAPE boards were manufactured
by Hamamatsu Metrics Co. and the 2Uservers were procured from Nihon
Hewlett-Packard Co. Business Search Technology Co., Sumi-Sho Computer
Systems Co., and Bestsystems Inc. also joined in the development of
the system.

Using the FIRST simulator, we are attempting to elucidate the formation
of first generation objects and the link to first galaxies in the Universe.
The first generation objects are thought to form in so-called cosmic Dark
Age at redshifts of 20

Back to GRAPE Project