cluster (pandoc markdown)

title: 'Report: Computer cluster implementation in MPI "Alfred Eckstein"' author: various toc: false date: \today lang: en-US otherlangs: de-DE papersize: a4paper fontsize: 12pt documentclass: article indent: true fontenc: T1 colorlinks: true ...

The new Max Plank institue "Alfred Eckstein" has high-end computing needs. A typical high-performance-computing-job would be the computational simulation of neutron stars with different masses collapsing to black holes.

One way to satisfy the scientists' requirements is to buy and maintain a single supercomputer. But much less expensive is a technique, that achieves high-performance-computing by simulating a supercomputer with the help of cluster computing. In general, a computer cluster is nothing more than the connection of many single computers via a LAN, mostly separated in a single room (the Alfred Eckstein institute calculates with at least 200 computers). Unlike grid computing, clustering is a much more centralized management approach, that is a kind of orchestration process between master and individual nodes. Control and scheduling is done by specialized software. In a cluster, the nodes usually use similar hardware and perform the same task. Next to high performance computation, the main benefit of clusters is scalability (fault tolerance, redundancy etc.). That is, we are able to add and remove nodes without shutting down the entire services.

In consultation with the two other project groups, the cluster will be very tightly integrated in the institute's network infrastructure: Firstly, the data for analysing and simulation will not beeing stored in the cluster. We need direct I/O exchange with the data server group (there will be special I/O nodes for that). Secondly, the cluster grants access to users/workstation via SSH connections. Therefore, scientists, who intend to use computional services/resources of the cluster, will get access through special login nodes.

Obviously, planning a computer cluster is a very complicated task. We really should involve external experts, who are familiar in engineering clusters with necessary care. Expertise is essential here. We have to consider the following points:

Hardware: Because of the high-end requirements of our scientists, we need to make sure to use modern and higly-efficient computers. But we also need to take into account, that the cluster network itself is very efficient, too. We require a high performance network, high bandwith, low latency, and possibly special hardware for parallel I/O. We consider Intel Bridge processors and InfiniBand or OmniPath respectively.
Software: A cluster needs management software to be able to orchestrate its nodes. Task scheduling and parallelism is essential. Further, users, who want to work with the cluster, should be confronted with an interface and an operating system that suits their needs. Virtualization is a keyword in here. Most of the Top500 supercomputers run on unix-based operating systems such as Scientific Linux for various reasons.
Networking: Nodes in the cluster have to be connected, physically and logically. Login nodes and I/O nodes provide external access. We need to think about topologies. It might be possible to separate the cluster into domains. For highly efficient communication, a fat tree topology is a modern solution.
Maintenance: Once the cluster has been planned and integrated, a team of system administrators has to keep the cluster running and to give support to the scientists.

Looking at our budget, there is not much flexibility. Theoretically, it is possible to create a cluster with just two computers and to easily scale it up. But computation in astrophysics is very high-level as mentioned above. We are in the region of thousands of cores, and we require over thousand TFlop/s performance. Regardless of our budget, we have been planning to choose a Linux system, because it is suited to scientific work and has well-known network capacities. The cost factor is also very low in the unix-world. A really important aspect is the cluster's maintenance. We need experienced employees, who are into networking and cluster computing. We should not cut corners here. So the only real criterion is the technical equipment, the hardware our cluster will be run on.

Finally we would like to refer to our colleagues from Max planck data center (MPCDF) and to describe two of their HPCs.¹ That gives us an idea on how our cluster might look like:

Processor	Nodes	Cores	Rate	Memory	Logins	I/O
Skylake	2888	40	2.4	96;192;768	2	1 x 5 PetaByte

Table: Cobra (127520 cores with 483 TB main memory and 10 PetaFlop/s peak performance; interconnection via OmniPath; 5 domains)

Processor	Nodes	Cores	Rate	Memory	Logins	I/O
Ivy;Sandy	3500;610	20;16	2.8;2.6	64;20;128	8	26 x 5 PetaByte

Table: Hydra (83000 cores with 280 TB main memory and 1.7 PetaFlop/s peak performance; interconnection via InfiniBand; 5 domains)

See http://www.mpcdf.mpg.de/services/computing ↩

d630/cluster.md

Footnotes