Blog post Details

The computing power anxiety of the AI large model ultimately depends on the CPU?

Jessie

March 20, 2024

ChatGPT triggered the concept of AI large model has been hot for a year, until today, the heat of AI has not only not decreased, the industry has also burst out more and more disruptive applications. Since the beginning of 2024, AI PC, AI mobile phone, AI edge and other products have been sold, and Sora has triggered large-scale discussions during the Chinese New Year.

It can be said that the field of AI is always changing. However, with the rapid growth of the demand for computing power in large models, the chips produced at this stage are difficult to meet the needs of the industry.

In the AI boom, accelerators such as GPU and AISC are the protagonists in the industry. In fact, no data center can be separated from the CPU, which is likened to the relationship between fish and water. Last December, Intel's fifth-generation Xeon scalable processors (codenamed Emerald Rapids) were officially released, and it has a number of surprising AI secrets.

Do AI, only one choice?

As we all know, in the face of this new opening of large models, global technology companies have turned their attention to AI chips, especially Gpus. But GPU production is directly tied to HBM, or 2.5D package capability. This makes the already tight supply of Gpus bottleneck, resulting in a serious imbalance between supply and demand.

Contrary to it, the key to the current AI big model "fight" is to make a large number of parameters, and use "force big bricks to fly" to achieve more powerful intelligence. It can be said that even in the face of AI chip price increases, how many companies will choose to buy, after all, miss this tuyere, may lose competitiveness.

For large data centers, each chip is running at full power, and if it can have more AI performance, do you need to purchase an additional batch of Gpus?

In fact, we have fallen into a mindset, in fact, running AI is not only a choice for the GPU, and the CPU already has a very strong AI performance.

Asiainfo Technology uses CPU as a hardware platform in its own OCR-AIRPA solution to achieve quantization from FP32 to INT8/BF16, thereby increasing throughput and accelerating inference with acceptable accuracy loss. The labor cost is reduced to one-fifth to one-ninth of the original, and the efficiency is also increased by 5 to 10 times.

It is not just the Internet and communications that have been changed, but AI pharmaceuticals are seen as the hope of ending the "double ten laws" of drug development, in which large models such as AlphaFold2 are seen as the most important algorithms. Since last year, Xeon's scalable platform has increased AlphaFold2's end-to-end throughput by 23.11x, and the fourth-generation scalable processor has increased this number by 3.02x again.

It can be said that using the CPU for AI reasoning is proving to be feasible. Now, the fifth-generation Xeon scalable processor is capable of running up to 20 billion parameter model inferences with latency of less than 100 milliseconds without the need to add an independent accelerator. A more powerful processor for AI acceleration was born.

CPU, how to make AI run

Many people will wonder why Xeon 5, as a general-purpose processor, is able to run AI loads. In fact, in addition to the AI load itself falling on the fifth generation Xeon, a series of accelerators built into it are key.

This design can be compared with the popular practice of the MCU (single-chip microcomputer) nowadays, through the built-in DSP, NPU, part of the AI load, so that the AI task runs more efficiently, thus saving more power, and Xeon is a similar principle.

This kind of design appeared in the early Xeon scalable processors, but at that time, there was not much attention, and there was not as much AI tasks to run.

Specifically looking at the fifth generation Xeon, its built-in Intel AVX-512 and Intel ® AMX (Intel ® Advanced Matrix Extensions) features are key, these two accelerators have been installed in the fourth generation Xeon, and in the fifth generation Xeon, AMX supports new FP16 instructions, while hybrid AI workload performance is improved by 2 to 3 times.

In addition, the performance of the fifth generation Xeon itself has been improved, so that its own performance can more calmly cope with AI load: the number of CPU cores has increased to 64, the single-core performance is higher, and each core has AI acceleration function; New I/O technologies (CXL, PCIe5) improve UPI speed.

According to industry analysts, the CPU to make a large model reasoning, the biggest difficulty is not in computing power, but in memory bandwidth. The memory bandwidth of Xeon 5 has increased from 4800 MT/s to 5600 MT/s, the level 3 cache capacity has increased to nearly three times as much, and the slot scalability has expanded from one slot to eight slots, which provides a solid backing for the large model of Xeon 5.

From the data point of view, compared with the previous generation, the average performance of the fifth generation Xeon under the same thermal design power consumption is improved by 21%; Compared with the third generation products, the average performance is improved by 87%. Compared with the previous generation of products, the fifth generation Xeon not only iterated the performance, but also brought a 42% improvement in AI reasoning performance.

In addition, in a range of accelerators, Intel ® Trusted Domain Extension (Intel ® TDX) provides isolation and confidentiality at the virtual machine (VM) level to enhance privacy and management of data.

Not only that, the fifth-generation Xeon processor is the greenest Xeon processor to date, helping users manage energy consumption and reduce their carbon footprint. It can be said that software is only on the one hand, thanks to a variety of innovative technologies and functions in the fifth generation Xeon, working together, more efficient, and ultimately reflected in lower power consumption.

The future development trend of CPU must be power consumption, which requires all-round force. The first is the process, as the process gradually improves to Intel 3, Intel 20A, Intel 18A, power consumption will be lower and lower, each generation will have a double-digit power reduction. Packaging is the same, the use of advanced packaging technology to put different processes of the chip through the Chiplet architecture together to carry out an operation, do not need to use all places, but only use the corresponding area, so that power consumption is naturally reduced. Also, optimize for different workloads.

Sometimes adjusting the architecture of an application can also minimize power consumption. For example, if you want to train a large model, let's say there are 20 large models in total, each with a training cycle of 3 months, and you need 1,000 machines to train, each with 10,000 watts of power. If only 5 of these models need to be trained, and the remaining 15 models do not need to be trained, this can save 75% of the power. Therefore, power consumption can sometimes be reduced more effectively by adjusting the architecture of the application.

"With the continuous rapid development of computing power, how to achieve energy saving and carbon reduction of data centers, change the image of the 'electric tiger', there is a higher demand for seeking to adopt renewable energy and more environmentally friendly technologies." Chen Baoli, vice president and general manager of Intel's data center and Artificial Intelligence Group in China, raised such concerns about the era of AI large models, and the fifth generation Xeon is the key to energy conservation and carbon reduction.

At the same time, Intel also has a series of product and technological innovation, such as through more efficient cooling technology, intelligent energy management system to promote new and inventory data centers for energy conservation and emission reduction, and work with Chinese partners to promote application landing.

How Intel supports AI development

The development of the GPU, the software ecosystem has also played a crucial role, such as the industry's well-known CUDA. For Intel, software has always been a strength, and at the same time, Intel's software stack has continuously increased investment, which has brought a huge natural advantage to the development of the fifth generation Xeon in AI.

Intel has always emphasized uniformity and ease of use, and this is also true in AI. Developers can leverage OpenVINO to realize the vision of "write once, deploy anywhere." The underlying software and databases developed by Intel support their own cpus, Gpus, IPUs, and AI accelerators through popular frameworks such as Pytorch and ONNX Runtime.

In addition, Intel provides library extensions for PyTorch and TensorFlow, which will help developers run these extensions with the default installation to get the latest software acceleration. This means that users can continue to use PyTorch or TensorFlow, or they can develop with OpenVINO, and developers with different languages can develop under the same platform.

OpenVINO 2023.1 is accelerating Intel's "any hardware, any model, anywhere" goal of scaling OpenVINO to become a complete software environment that runs AI models for inference and deployment across clients and edges.

"I think ChatGPT technology is not only about human language, English language, but also programming language. As a result, productivity gains can be achieved. You can generate automated code reviews from ChatGPT and other similar technologies. I think there's a lot of opportunity here, but I think it's in the Python programming model that leading companies in the industry are using. It's not new, it's been around for a while and we geeks call it SMLAR technology." Intel experts once shared that.

Simply explained, it is the relationship of "chicken and egg, egg and chicken", that is to say, the future AI large model will also be used in the development of AI large models. Now CUDA has begun to have such a move, Intel is also gearing up.

At MWC2024, which just ended at the end of February, Intel showed Sierra Forest, an energy efficient core (E-core) processor with up to 288 cores, and Granite Rapids, a performance core (P-core) processor, is also ready to launch. It can be said that in the future, in the field of AI reasoning, Xeon will be stronger.

Disclaimer: This article is a reprint article, reprint this article to convey more information, copyright belongs to the original author. If the video, pictures and text used in this article involve copyright issues, please contact Xiaobian for processing. Email address: sales08@elecsaler.com

Return to the list

Design, Performance, and Applications of the AUIPS2041LTR Power MOSFET

Next article： Design, Performance, and Applications of the AUIPS2041LTR Power MOSFET

Your better message and reply

Your email address will not be published. The option marked with an asterisk is required

Submit a Message