New Ascend NPU and MindIE Support Capabilities in SmartX AI Infrastructure: Solutions and Evaluations

Recently, SmartX launched the DeepSeek solution based on AI infrastructure, providing a unified infrastructure resource stack for both AI large models and other enterprise business systems, helping enterprises quickly advance the implementation and validation processes of large models. At the same time, considering the needs of users in industries such as finance and healthcare for data compliance and domestic transformation, it is necessary to conduct AI large model testing and usage in a self-controllable environment. SmartX AI infrastructure has added support for Ascend NPU and MindIE platform, and performance evaluations have been conducted based on hyper-converged (virtual machine) and SMTX Kubernetes Service (SKS) environments.

Using Ascend NPU and MindIE platform in SmartX AI infrastructure

Currently, SmartX AI infrastructure is fully compatible with the Ascend ecosystem, supporting users to deploy and use Ascend NPU and MindIE platform based on virtual machines, virtualized Kubernetes, and bare-metal Kubernetes environments, helping users quickly build a full-stack hardware and software domestic AI infrastructure.

Solution Features

High-performance heterogeneous computing: Compatible with general computing and AI-specific loads, building a heterogeneous resource pool.
Flexible and diverse deployment solutions: Supports deployment solutions based on virtual machines, virtualized Kubernetes, and bare-metal Kubernetes, where the bare-metal Kubernetes deployment solution helps users avoid virtualization overhead and achieve optimal performance.
Efficient resource utilization: Dynamic allocation and horizontal scaling of virtual machine resources can be achieved through DRS and virtual machine placement groups, adapting to business load fluctuations and improving resource utilization.
Unified management and simplified operations: Manage virtual machines, containers, and physical machine nodes with a unified control plane, simplifying environment operations.
Full-stack domestic support: From CPU to Ascend NPU, from SmartX AI infrastructure to AI inference MindIE framework, achieving full-stack domestic adaptation of hardware and software, fully meeting the requirements of Xinchuang.

Deployment Solutions

Deployment based on virtual machines

Users can quickly deploy Ascend NPU and MindIE in the SmartX hyper-converged virtual machine environment through the Ascend inference hardware passthrough feature and MindIE image method, rapidly building an AI inference framework based on the Ascend ecosystem to run large language models like DeepSeek. The brief steps are as follows:

1. Infrastructure preparation:

Ascend inference hardware passthrough: Connect Atlas 300I Pro / Atlas 300I Duo for direct use by virtual machines through SmartX hyper-convergence.

Install Ascend driver: Install the Ascend Ascend driver package in the virtual machine to provide hardware support.
Install Docker: Install the Docker runtime environment in the virtual machine to provide the model inference runtime environment.

2. Inference framework installation:

Pull MindIE image: The Ascend image repository provides the MindIE base image, and users can pull the corresponding image file based on their hardware. This image has the basic environment required for model operation, including: CANN, FrameworkPTAdapter, MindIE, and ATB-Models, enabling quick inference of models.
Start MindIE image: Users can start the MindIE runtime environment based on their hardware and model files.

3. Model inference startup:

Configure model inference: Users need to configure the model’s input and output according to their needs, taking a maximum input of 1024 tokens and a maximum output of 4096 tokens as an example:

maxIterTimes = 4096
maxInputTokenLen = 1024
maxSeqLen = maxIterTimes + maxInputTokenLen = 5120

Start inference model: Enter the container and execute the MindIE startup command, allowing users to run large language models like DeepSeek.

Deployment based on SKS

Users can deploy Ascend NPU and MindIE in the SmartX hyper-converged virtual machine Kubernetes environment or bare-metal Kubernetes environment through Ascend inference hardware passthrough/manage bare-metal services and pair with the MindCluster cluster scheduling component, quickly building a cloud-native AI inference framework based on the Ascend ecosystem to run large language models like DeepSeek. The brief steps are as follows:

1. Infrastructure preparation:

Create Ascend inference workload cluster: Users can quickly create an inference workload cluster with Ascend NPU in SKS.

Install Ascend driver: Install the Ascend Ascend driver package in the SKS workload cluster worker nodes to provide hardware support.

2. Deployment of MindCluster cluster scheduling component:

Pull MindCluster component image: The Ascend image repository provides the image files for MindCluster related components, and users can pull the corresponding image files based on the components they need to install.
Deploy MindCluster components: The MindCluster component package provided by Ascend includes essential modules and optional modules, and users can choose to install based on actual business needs.

New Ascend NPU and MindIE Support Capabilities in SmartX AI Infrastructure: Solutions and Evaluations

3. Model inference startup:

MindIE MS image creation: Users can create MindIE MS images based on their Ascend hardware following the guidance provided on the Ascend official website.
Configure MindIE MS deployment script: Ascend provides a MindIE MS deployment script that requires users to configure NPU type, NPU quantity, model files, model parameters, etc.
Start inference model: Run the deployment script to automatically issue business configurations and startup scripts, automatically generate a ranktable containing node device information, and schedule Pods to compute nodes to run large language models like DeepSeek.

Deployment Validation – Integrating with Open WebUI

MindIE is compatible with various RESTful interfaces, and users can integrate with Open WebUI using interfaces compatible with OpenAI. During integration, it is necessary to use the IP address, port, and model name of the MindIE business side, which are defined in the config.json file as “ipAddress”, “port”, and “modelName”, with parameters as follows:

The effect of the integrated DeepSeek-R1 model through Open WebUI is as follows:

Performance of SmartX AI infrastructure deploying Ascend NPU

Environment Configuration

We built a 3-node SmartX HCI hyper-converged cluster, establishing the MindIE inference platform based on the Atlas 300I Duo inference card in a virtual machine environment, deploying DeepSeek-R1-14B, 32B, and 70B models for performance testing.

SMTX OS cluster hardware configuration (the following is for a single node configuration)

MindIE virtual machine configuration

Test Results

The above performance data is for reference. We recommend:

Running the DeepSeek-R1-32B model requires 2 Ascend 300I Duo inference cards. If higher concurrency is needed, it is recommended to configure 4 inference cards.
Running the DeepSeek-R1-70B model requires 4 Ascend 300I Duo inference cards. If higher concurrency is needed, it is recommended to configure 8 inference cards.

Download the eBook “Building Enterprise AI Infrastructure: Technical Trends, Product Solutions, and Testing Validation” to learn more about enterprise deployment validation and application practices in AI large models. New Ascend NPU and MindIE Support Capabilities in SmartX AI Infrastructure: Solutions and Evaluations

Related posts

Leave a Comment Cancel reply