AI Cluster

The AI cluster adopts the framework of K8S+Docker, supporting completely independent environment for single user. The cluster can help users easily upload private images or import images from external container libraries in docker community for development and training. The cluster helps users greatly reduce costs of learning through a friendly graphical interface, quickly complete the deployment of the computing environment and start their scientific research calculations. Currently, the AI cluster contains 25 computing nodes, a total of 1584 cores in the CPU, and a total of 146 GPU cards.

CE Cluster

This cluster uses HPC cluster management software to manage software and hardware resources, and rationally schedules jobs submitted by users according to the resource usage, so as to improve resource utilization and job execution efficiency. Currently, it is equipped with 29 computing nodes, 1344 CPU cores, 38 GPU cards, and 8 DCU cards.

OD Cluster

The OD cluster is similar to the CE cluster in architecture and usage, but its computing nodes have a graphical interface and can directly connect to the Internet, which is a good complement to the CE cluster in function. At present, the OD cluster has 27 computing nodes, 782 CPU cores, and 18 GPUs.

Storage

The parallel file system makes all computing nodes in the cluster capable of reading and writing files in the storage system through the same file directory. Furthermore, it can accommodate the large-scale random IO, frequent read and write operations, and massive communication loads. The high-performance computing platform adopts a stable commercial version of parallel storage, with an available capacity of 2PB.

Computing network

The platform adopts the IB HDR 200G network architecture, with a bandwidth of 100G.