New AI benchmarks added to test hardware strength

Voltaire Staff
Mar 28, 2024
2 min read

MLCommons, an artificial intelligence engineering consortium, has unveiled a new series of tests and outcomes, evaluating the performance of hardware in handling AI tasks and catering to user interactions.

The latest benchmarks, introduced on Wednesday specifically gauge the efficiency of AI chips and systems in processing data-rich models to produce swift responses.

The findings offer valuable insights into the speed at which AI applications like ChatGPT can deliver responses to user inquiries.

"Today, MLCommons announced new results from our industry-standard MLPerf Inference v4.0 benchmark suite, which delivers industry standard machine learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner," the company said.

One the two benchmarks, Llama 2, evaluates how quickly large language models can handle question-and-answer scenarios. Developed by Meta Platforms, Llama 2 boasts 70 billion parameters.

Mitchelle Rasquinha, co-chair of the MLPerf Inference working group said, "In terms of model parameters, Llama 2 is a dramatic increase to the models in the inference suite. Dedicated task-forces worked around the clock to set up the benchmarks and both models received competitive submissions. Congratulations to all!"

The other benchmark, MLPerf, which tests text-to-image generators, is based on Stability AI's Stable Diffusion XL model.

"This popular model is used to create compelling images through a text-based prompt. By generating a high number of images, the benchmark is able to calculate metrics such as latency and throughput to understand overall performance," MLCommons said.

Servers equipped with Nvidia's H100 chips, produced by companies such as Alphabet's Google, Supermicro, and Nvidia itself, emerged as clear winners in both new benchmarks for their raw performance. Intel also participated by submitting a design utilising its Gaudi2 accelerator chips, with the company describing the results as "solid."

Various, server builders presented designs utilising Nvidia's less powerful L40S chip.

Krai, a server builder, entered the image generation benchmark with a design featuring Qualcomm's AI chip, known for its significantly lower power consumption compared to Nvidia's advanced processors.

While raw performance is crucial, it's not the sole determining factor in deploying AI applications.

Advanced AI chips consume substantial energy, posing a significant challenge for AI companies. The key lies in deploying chips that offer optimal performance while minimizing energy consumption—a balance that remains paramount in the field.