Nvidia's general-purpose GPU chips have once again made a nearly clean sweep of one of the most popular benchmarks for measuring chip performance in artificial intelligence, this time with a new focus on generative AI applications such as large language models (LLMs).
There wasn't much competition.
Systems put together by SuperMicro, Hewlett Packard Enterprise, Lenovo, and others -- packed with as many as eight Nvidia chips -- on Wednesday took most of the top honors in the MLPerf benchmark test organized by the MLCommons, an industry consortium.
Also: With AI models clobbering every benchmark, it's time for human evaluation
The test, measuring how fast machines can produce tokens, process queries, or output samples of data -- known as AI inference -- is the fifth installment of the prediction-making benchmark that has been going on for years.
This time, the MLCommons updated the speed tests with two tests representing common generative AI uses. One test is how fast the chips perform on Meta's open-source LLM Llama 3.1 405b, which is one of the larger gen AI programs in common use.
The MLCommons also added an interactive version of Meta's smaller Llama 2 70b. That test is meant to simulate what happens with a chatbot, where response time is a factor. The machines are tested for how fast they generate the first token of output from the language model, to simulate the need for a quick response when someone has typed a prompt.
A third new test measures the speed of processing graph neural networks, which are problems composed of a bunch of entities and their relations, such as in a social network.
Graph neural nets have grown in importance as a component of programs that use gen AI. For example, Google's DeepMind unit used graph nets extensively to make stunning breakthroughs in protein-folding predictions with its AlphaFold 2 model in 2021.
A fourth new test measures how fast LiDAR sensing data can be assembled in an automobile map of the road. The MLCommons built its own version of a neural net for the test, combining existing open-source approaches.
The MLPerf competition comprises computers assembled by Lenovo, HPE, and others according to strict requirements for the accuracy of neural net output. Each computer system submitted reports to the MLCommons of its best speed in producing output per second. In some tasks, the benchmark is the average latency, how long it takes for the response to come back from the server.
Nvidia's GPUs produced top results in almost every test in the closed division, where the rules for the software setup are the most strict.
Competitor AMD, running its MI300X GPU, took the top score in two of the tests of Llama 2 70b. It produced 103,182 tokens per second, significantly better than the second-best result from Nvidia's newer Blackwell GPU.
That winning AMD system was put together by a new entrant to the MLPerf benchmark, the startup MangoBoost, which makes plug-in cards that can speed data transfer between GPU racks. The company also develops software to improve serving of gen AI, called LLMboost.
Nvidia disputes the comparison of the AMD score to its Blackwell score, citing the need to "normalize" scores across the number of chips and computer "nodes" used in each
Said Nvidia's director of accelerated computing products, Dave Salvator, in an email to :
"MangoBoost's results do not reflect an accurate performance comparison against NVIDIA's results. AMD's testing applied 4X the number of GPUs