This cluster uses HPC cluster management software to manage software and hardware resources, and rationally schedules jobs submitted by users according to the resource usage, so as to improve resource utilization and job execution efficiency. Currently, it is equipped with 16 computing nodes, 680 CPU cores, and 20 GPU cards.
The AI cluster adopts the framework of K8S+Docker, supporting completely independent environment for single user. The cluster can help users easily upload private images or import images from external container libraries in docker community for development and training. The cluster helps users greatly reduce costs of learning through a friendly graphical interface, quickly complete the deployment of the computing environment and start their scientific research calculations. Currently, the AI cluster contains 11 computing nodes, a total of 448 cores in the CPU, and a total of 60 GPU cards.
The OD cluster is similar to the CE cluster in architecture and usage, but its computing nodes have a graphical interface and can directly connect to the Internet, which is a good complement to the CE cluster in function. At present, the OD cluster has 30 computing nodes, 930 CPU cores, and 34 GPUs.
The parallel file system makes all computing nodes in the cluster capable of reading and writing files in the storage system through the same file directory. Furthermore, it can accommodate the large-scale random IO, frequent read and write operations, and massive communication loads. The high-performance computing platform adopts a stable commercial version of BeeGFS parallel storage, with a bare storage capacity of up to 1PB.
The platform adopts the IB HDR 200G network architecture. Currently, the bandwidth is 100G and can be expanded to 200G bandwidth according to actual calculation requirements, minimizing the impact of network bottlenecks in calculations.