Meta's 'pruning' Of Llama 2 Model Shows Path To Slimmer Ai

Серверы с серверами

marching-band-full-2 — Like rows of a marching band that aren't heard, layers of a neural network can be silenced and have little effect on the accuracy of the net's predictions.

Tiernan Ray/

One of the seminal insights of artificial intelligence work in the past decade is that very large AI programs contain smaller sections within them that can do the work of the total program with less memory and fewer operations, thereby speeding up performance and reducing energy use.

That insight, most commonly referred to as the "lottery ticket hypothesis," for a famous paper in 2019 by scholars Jonathan Frankle and Michael Carbin (then at MIT, currently at database company DataBricks), is now being put to increasingly practical use as companies find ways to shrink down AI to fit on fewer GPU chips and with less memory and bandwidth needed.

Also: Move over Gemini, open-source AI has video tricks of its own

In a paper introduced last week by a team of scholars -- from Meta's AI lab, MIT, Cisco Systems, and start-up Zyphra -- removing as much as halfof Meta's open-source Llama 2 large language model cut the amount of memory needed by three quarters, with the result that the program could be run on a consumer-grade Nvidia or AMD GPU rather than a huge rack of servers.

"We can remove a substantial fraction of thedeepest layersfrom models with minimal degradation in downstream performance, write Andrey Gromov and colleagues in the paper, somewhat mysteriously titled "The Unreasonable Ineffectiveness of the Deeper Layers" and posted on the arXiv pre-print server.

For Llama 2, the authors write, "we can eliminate up to roughlyhalf of the layers before the performance collapses."

The reference to "deep layers" refers to the latter parts of a neural network. Imagine a neural network as ranks of musicians in a marching band. The direction of marching is the way the whole enterprise flows through the data, if you will. At the front of the band might be smaller brass instruments such as trumpets; at the middle of the pack, trombones and tubas; and at the back, the "deep" part, might be percussion instruments such as drums of various sizes and symbols.

What Gromov and team are seeing is that the drums and cymbals, and perhaps even some tubas, are making no discernible contribution to the sound. They're there but ineffectual; all the output that matters is in the smaller brass and maybe some of the tubas. It's as if you could remove a good chunk of the musicians -- just do without them -- and have a more efficient band.

Also:Generative AI fails in this very common ability of human thought

In actual neural networks, including generative AI programs such as OpenAI's GPT-4, instead of rows of musicians, you have successive layers of neural network "parameters" or "weights" -- mathematical values that successively transform the input data by multiplying and summing it up, and then producing the output, i.e., the prediction.

The experimental approach taken by Gromov and team is to "prune" layers of the network to see what removing them does.

They start by building on insights from other scholars who have tried to take apart OpenAI's GPT to see what's making it tick. For example, a 2022 study by Kevin Meng and team at MIT's Computer Science and Artificial Intelligence Laboratory used a variety of techniques to find out which GPT layers seem to contain information of a factual nature. By following the "information flow," Meng and colleagues deduced the facts are usually in the "middle" layers of a deep neural network.

Also: The best AI chatbots: ChatGPT isn't the only one worth trying

Building on that insight, Gromov and team hypothesize that removing the deep layers -- the percussion and some tubas -- should have little effect on benchmark tests of AI skill that large language models use, such as question answering. They go about that in two steps.

First, they try a sophisticated approach, which involves measuring which layers are most similar, and dropping ones that seem to add little. It's as if you asked one of two rows of trumpeters to leave. With each pruning step, they continuously test how the modified network performs on tests such as question answering and a basic test of "predicting the next token" that's common for generative AI.

meta-2024-pruning-transformer-blocks — Blocks of a Transformer-based language model contain successive layers. The Meta team tested whether removing layers starting at the final, or deepest, layers of the network, would affect performance.

Meta

Then they try an even simpler approach: successively removing layers starting from the back of the neural net. It turns out that in the second case, the simpler case, all they need to do is apply a little re-training of the remaining layers, via what's called fine-tuning, to maintain performance at a relatively constant level.

meta-2024-pruning-accuracy — Layers of a neural net can be removed up to about half, as shown in the blue and black lines, and the accuracy, left, remains about the same as the baseline, the normal, untouched neural net. Past about forty-five percent of layers removed, the neural net plunges in accuracy.

Meta

Gromov and team find that their pruned neural nets score just as well as the original version. That implies that "the essential knowledge required to achieve a model's top score isn't removed by significant layer removal -even though the fraction can be quite large(!) -until eventually that knowledge is lost at a critical model-dependent threshold."

The findings of Gromov and team deliver good news and bad news.

Also: 2024 may be the year AI learns in the palm of your hand

On the one hand, their findings mean that large language models can dramatically shrink down in the computing they need. "In particular, the released version of Llama-2-70B spans 140 GB of memory and consumes approximately 3

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Серверы с серверами

Новости по теме

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Huawei Switches Distributor in UAE

Meta's 'pruning' of Llama 2 model shows path to slimmer AI

Горячие метки: 3. Инновации

Ordering Guide

Ресурсы по программам

О нас

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Серверы с серверами

Новости по теме

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Huawei Switches Distributor in UAE

Meta's 'pruning' of Llama 2 model shows path to slimmer AI

Горячие метки: 3. Инновации

Ordering Guide

Ресурсы по программам

О нас

Introduction to Huawei CloudEngine S6730-H Series Switches