fllama 0.0.1 fllama: ^0.0.1 copied to clipboard
A Flutter binding for llama.cpp.
Fllama #
Flutter binding of llama.cpp.
llama.cpp: Inference of LLaMA model in pure C/C++
Installation #
Flutter #
flutter pub add fllama
iOS
Please run pod install
or pod update
in your iOS project.
Android
You need install cmake 3.31.0、android sdk 35 and ndk 28.0.12433566. No additional operation required .
OpenHarmonyOS/HarmonyOS #
This is the fastest and recommended way to add HLlama to your project.
ohpm install hllama
Or, you can add it to your project manually.
- Add the following lines to
oh-package.json5
on your app module.
"dependencies": {
"hllama": "^0.0.1",
}
- Then run
ohpm install
How to use #
Flutter #
- Initializing Llama
import 'package:fllama/fllama.dart';
Fllama.instance()?.initContext("model path",emitLoadProgress: true)
.then((context) {
modelContextId = context?["contextId"].toString() ?? "";
if (modelContextId.isNotEmpty) {
// you can get modelContextId,if modelContextId > 0 is success.
}
});
- Bench model on device
import 'package:fllama/fllama.dart';
Fllama.instance()?.bench(double.parse(modelContextId),pp:8,tg:4,pl:2,nr: 1).then((res){
Get.log("[FLlama] Bench Res $res");
});
- Tokenize and Detokenize
import 'package:fllama/fllama.dart';
Fllama.instance()?.tokenize(double.parse(modelContextId), text: "What can you do?").then((res){
Get.log("[FLlama] Tokenize Res $res");
Fllama.instance()?.detokenize(double.parse(modelContextId), tokens: res?['tokens']).then((res){
Get.log("[FLlama] Detokenize Res $res");
});
});
- Streaming monitoring
import 'package:fllama/fllama.dart';
Fllama.instance()?.onTokenStream?.listen((data) {
if(data['function']=="loadProgress"){
Get.log("[FLlama] loadProgress=${data['result']}");
}else if(data['function']=="completion"){
Get.log("[FLlama] completion=${data['result']}");
final tempRes = data["result"]["token"];
// tempRes is ans
}
});
- Release this or Stop one
import 'package:fllama/fllama.dart';
Fllama.instance()?.stopCompletion(contextId: double.parse(modelContextId)); // stop one completion
Fllama.instance()?.releaseContext(double.parse(modelContextId)); // release one
Fllama.instance()?.releaseAllContexts(); // release all
OpenHarmonyOS/HarmonyOS #
You can see this file
Support System #
System | Min SDK | Arch | Other |
---|---|---|---|
Android | 23 | arm64-v8a、x86_64、armeabi-v7a | Supports additional optimizations for certain CPUs |
iOS | 14 | arm64 | Support Metal |
OpenHarmonyOS/HarmonyOS | 12 | arm64-v8a、x86_64 | No additional optimizations for certain CPUs are supported |
Obtain the model #
You can search HuggingFace for available models (Keyword: GGUF
).
For get a GGUF model or quantize manually, see Prepare and Quantize
section in llama.cpp.
NOTE #
iOS:
- The Extended Virtual Addressing capability is recommended to enable on iOS project.
- Metal:
- We have tested to know some devices is not able to use Metal ('params.n_gpu_layers > 0') due to llama.cpp used SIMD-scoped operation, you can check if your device is supported in Metal feature set tables, Apple7 GPU will be the minimum requirement.
- It's also not supported in iOS simulator due to this limitation, we used constant buffers more than 14.
Android:
- Currently only supported arm64-v8a / x86_64 / armeabi-v7a platform, this means you can't initialize a context on another platforms. The 64-bit platform are recommended because it can allocate more memory for the model.
- No integrated any GPU backend yet.
License #
MIT