BitNet.cpp by Microsoft: Framework for 1-bit LLMs out now

BitNet.cpp is an official framework to run and load 1-bit LLMs, based on the paper ‘The Era of 1-bit LLMs.’ It allows running large language models even on CPUs. The framework currently supports three models, and you can find more details here: https://youtu.be/ojTGcjD5x58?si=K3MVtxhdIgZHHmP7

This is huge. Edge device LLMs will be revolutionary for low latency, privacy, and can work even without internet connectivity.

Blane said:
This is huge. Edge device LLMs will be revolutionary for low latency, privacy, and can work even without internet connectivity.

What are the early or key use cases for such a tiny model on edge devices?

@Jai
One idea could be video game interpolation. In first-person shooters, servers have a ‘tick rate,’ which is how often they update clients with data from other players. If your server tick rate is 30 per second but your frame rate is 120 per second, your game client fills in the gaps using interpolation.

LLMs excel at predicting ‘what comes next,’ so they might be trained on game data to predict and fill in player positions more accurately than traditional interpolation. Just a thought from a hobbyist—others might have more ideas!

Blane said:
This is huge. Edge device LLMs will be revolutionary for low latency, privacy, and can work even without internet connectivity.

Yep, this is big.

Check out the GitHub repo: https://github.com/microsoft/BitNet

Curious about how the 1-bit models compare to regular ones in benchmarks.

Aki said:
Curious about how the 1-bit models compare to regular ones in benchmarks.

Yes, that’s an important aspect.

Aki said:
Curious about how the 1-bit models compare to regular ones in benchmarks.

There’s a blog on this: https://mobiusml.github.io/1bit_blog/.

I still don’t quite get what this is. Could anyone ELI5?

Paris said:
I still don’t quite get what this is. Could anyone ELI5?

Try reading this article: What are 1-bit LLMs?. The Era of 1-bit LLMs with BitNet b1.58 | by Mehul Gupta | Data Science in your pocket | Medium

Paris said:
I still don’t quite get what this is. Could anyone ELI5?

1-bit LLMs use just 1 bit (0 or 1) to store weights instead of the usual 32 or 16 bits, which greatly reduces their size. This makes them more accessible for smaller devices, like phones. BitNet b1.58 uses around 1.58 bits per weight but can still perform comparably to standard models while using less memory and running faster. If this works as claimed, it could be a game-changer for running LLMs on smaller devices.

Imagine running this on an older Android phone using Termux. I can already run smaller LLMs on my 2019 Xiaomi, but this is next-level.