Overviewhenry2 is a Linux cluster constructed using IBM and Lenovo blade servers. Blade servers are housed in blade chassis that each hold fourteen blades. The chassis provides power, cooling, management, and network connectivity for the blade servers. [Much of the design of IBM BladeCenter and IBM/Lenovo Flex System occured and occurs at IBM's and Lenovo's Research Triangle facilities. A number of the BladeCenter and Flex System design team members are graduates of NC State.]
|Image of IBM BladeCenter E chassis||Diagram of BladeCenter chassis|
When the central HPC service was initiated at NC State in 2003 a hardware strategy was adopted to incrementally grow the cluster rather than to periodically buy large monolithic clusters. The incremental approach was a better match with NC State's HPC funding and staffing. henry2 started with 64 nodes and 128 processors in 2003. Currently henry2 has 898 nodes and more than 9000 processor cores.
henry2 nodes are typically dual-socket Xeon blade servers. There are a mix of Xeon processor models. Nodes typically have 2-4GB of memory per processor core and a modest size disk drive that is used to hold the operating system, swap space, and a small local scratch space. Nodes have two Ethernet interfaces (mix of 1Gb and 10Gb). One of these interfaces is used for a private network connecting the compute nodes to the login nodes and to cluster storage resources. On compute nodes the second Gigabit Ethernet interface is used for a separate private network dedicated to message passing traffic. On login nodes the second interface is used for access to the campus network.
Typical compute node chassis network connections use a 10Gb
Ethernet link for the private HPC network connecting computed nodes to login
nodes and storage and four aggregated Gigabit Ethernet links for the message
henry2 cluster is assembled using several core Ethernet switches - one dedicated for message passing traffic and the rest for other network connections needed in the cluster. The message passing switch can provide up to 384 Gigabit Ethernet ports and can support up to 96 chassis aggregating four links per chassis. Therefore, the current network architecture will support about 1344 nodes (this is an approximate number because some of the chassis have separate low latency networks and have a single Gigabit Ethernet link to the core message passing switch).
Communications between compute nodes is non-uniform due to henry2's network architecture. Nodes within a single chassis can communicate with no bandwidth restriction (full gigabit or 10 gigabit each direction) via chassis Ethernet switches.
|Typical network connections between compute node chassis|
Communications between blades in different chassis is limited to 4 gigabits per second in each direction. So for example if eight nodes in one chassis were exchanging messages with eight nodes in another chassis they could each potentially have a gigabit per second of traffic in each direction for total of 8 gigabits of data per second in each direction. However the henry2 network architecutre would limit the achievable communication rate to four gigabits per second in each direction. The communication could also be impacted by message passing network traffic from the other blades in each chassis that share the aggregated links.
In addition to the bandwitdh effects of the network design there are also latency effects. Within a chassis messages have a single switch to traverse. For communications between chassis there are three switches that must be traversed (the chassis switch in each chassis plus the core switch). Each network switch adds some additional time it takes the message to reach its destination.
StorageThere are three types of file systems on the henry2 cluster. However only two types are generally accessed by users: Network File Systems (NFS) and General Parallel File Systems (GPFS). Directories such as /home and /usr/local are NFS mounted file systems. /gpfs_share is a GPFS file system.
File systems on henry2 use a variety of disk arrays typically fibre channel attached to file servers. In general, blade servers are used for file servers on henry2. File server blades are located in chassis with an additional IO module that provides fibre channel connections to each blade server.
NFS file systems on henry2 typically have a single file server with fibre channel connection to a disk array. NFS tends to perform poorly when accessed from a large number of nodes concurrently. The typical henry2 configuration of NFS has several bottlenecks including single NFS server and single fibre channel connection to disk array.
|Typical henry2 NFS configuration|
GPFS provides capabilities to support concurrent parallel access of a single file from multiple nodes. However, GPFS tends to not perform well for accessing large numbers of small files.
|henry2 GPFS configuration for /gpfs_share|
GPFS as configured on henry2 uses multiple servers that create network shared disks (NSDs). These NSDs create a file system that is mounted on each henry2 login and compute node. GPFS IO operations are cached on the local node and then synchronized with the file system. Multiple NSD servers are able to support more concurrent use and also provide resiliency against failures.
Last modified: October 23 2016 13:24:27