BitNet.cpp by Microsoft: Framework for 1-bit LLMs out now

Pace · October 18, 2024, 4:20pm

BitNet.cpp is an official framework to run and load 1-bit LLMs, based on the paper ‘The Era of 1-bit LLMs.’ It allows running large language models even on CPUs. The framework currently supports three models, and you can find more details here: https://youtu.be/ojTGcjD5x58?si=K3MVtxhdIgZHHmP7

Blane · October 18, 2024, 4:20pm

This is huge. Edge device LLMs will be revolutionary for low latency, privacy, and can work even without internet connectivity.

Jai · October 18, 2024, 4:20pm

Blane said:
This is huge. Edge device LLMs will be revolutionary for low latency, privacy, and can work even without internet connectivity.

What are the early or key use cases for such a tiny model on edge devices?

Jesse · October 18, 2024, 4:20pm

@Jai
One idea could be video game interpolation. In first-person shooters, servers have a ‘tick rate,’ which is how often they update clients with data from other players. If your server tick rate is 30 per second but your frame rate is 120 per second, your game client fills in the gaps using interpolation.

LLMs excel at predicting ‘what comes next,’ so they might be trained on game data to predict and fill in player positions more accurately than traditional interpolation. Just a thought from a hobbyist—others might have more ideas!

Pace · October 18, 2024, 4:20pm

Blane said:
This is huge. Edge device LLMs will be revolutionary for low latency, privacy, and can work even without internet connectivity.

Yep, this is big.

Oli · October 18, 2024, 4:20pm

Check out the GitHub repo: https://github.com/microsoft/BitNet

Aki · October 18, 2024, 4:21pm

Curious about how the 1-bit models compare to regular ones in benchmarks.

Maxwell · October 18, 2024, 4:21pm

Aki said:
Curious about how the 1-bit models compare to regular ones in benchmarks.

Yes, that’s an important aspect.

Maxwell · October 18, 2024, 4:21pm

Aki said:
Curious about how the 1-bit models compare to regular ones in benchmarks.

There’s a blog on this: https://mobiusml.github.io/1bit_blog/.

Paris · October 18, 2024, 4:21pm

I still don’t quite get what this is. Could anyone ELI5?

Pace · October 18, 2024, 4:21pm

Paris said:
I still don’t quite get what this is. Could anyone ELI5?

Try reading this article: What are 1-bit LLMs?. The Era of 1-bit LLMs with BitNet b1.58 | by Mehul Gupta | Data Science in your pocket | Medium

Jai · October 18, 2024, 4:21pm

Paris said:
I still don’t quite get what this is. Could anyone ELI5?

1-bit LLMs use just 1 bit (0 or 1) to store weights instead of the usual 32 or 16 bits, which greatly reduces their size. This makes them more accessible for smaller devices, like phones. BitNet b1.58 uses around 1.58 bits per weight but can still perform comparably to standard models while using less memory and running faster. If this works as claimed, it could be a game-changer for running LLMs on smaller devices.

Morgan · October 18, 2024, 4:21pm

Imagine running this on an older Android phone using Termux. I can already run smaller LLMs on my 2019 Xiaomi, but this is next-level.