Ollama Dart Client
Unofficial Dart client for Ollama API.
Features
- Fully type-safe, documented and tested
- All platforms supported (including streaming on web)
- Custom base URL, headers and query params support (e.g. HTTP proxies)
- Custom HTTP client support (e.g. SOCKS5 proxies or advanced use cases)
Supported endpoints:
- Completions (with streaming support)
- Chat completions (with streaming and tool calling support)
- Embeddings
- Models
- Blobs
- Version
Table of contents
Usage
Refer to the documentation for more information about the API.
Completions
Given a prompt, the model will generate a response.
Generate completion
final generated = await client.generateCompletion(
request: GenerateCompletionRequest(
model: 'mistral:latest',
prompt: 'Why is the sky blue?',
),
);
print(generated.response);
// The sky appears blue because of a phenomenon called Rayleigh scattering...
Stream completion
final stream = client.generateCompletionStream(
request: GenerateCompletionRequest(
model: 'mistral:latest',
prompt: 'Why is the sky blue?',
),
);
String text = '';
await for (final res in stream) {
text += res.response?.trim() ?? '';
}
print(text);
// The sky appears blue because of a phenomenon called Rayleigh scattering...
Chat completions
Given a prompt, the model will generate a response in a chat format.
Generate chat completion
final res = await client.generateChatCompletion(
request: GenerateChatCompletionRequest(
model: defaultModel,
messages: [
Message(
role: MessageRole.system,
content: 'You are a helpful assistant.',
),
Message(
role: MessageRole.user,
content: 'List the numbers from 1 to 9 in order.',
),
],
keepAlive: 1,
),
);
print(res);
// Message(role: MessageRole.assistant, content: 123456789)
Stream chat completion
final stream = client.generateChatCompletionStream(
request: GenerateChatCompletionRequest(
model: defaultModel,
messages: [
Message(
role: MessageRole.system,
content: 'You are a helpful assistant.',
),
Message(
role: MessageRole.user,
content: 'List the numbers from 1 to 9 in order.',
),
],
keepAlive: 1,
),
);
String text = '';
await for (final res in stream) {
text += (res.message?.content ?? '').trim();
}
print(text);
// 123456789
Tool calling
Tool calling allows a model to respond to a given prompt by generating output that matches a user-defined schema, that you can then use to call the tools in your code and return the result back to the model to complete the conversation.
Notes:
- Tool calling requires Ollama 0.2.8 or newer.
- Streaming tool calls is not supported at the moment.
- Not all models support tool calls. Check the Ollama catalogue for models that have the
Tools
tag (e.g.llama3.2
).
const tool = Tool(
function: ToolFunction(
name: 'get_current_weather',
description: 'Get the current weather in a given location',
parameters: {
'type': 'object',
'properties': {
'location': {
'type': 'string',
'description': 'The city and country, e.g. San Francisco, US',
},
'unit': {
'type': 'string',
'description': 'The unit of temperature to return',
'enum': ['celsius', 'fahrenheit'],
},
},
'required': ['location'],
},
),
);
const userMsg = Message(
role: MessageRole.user,
content: 'What’s the weather like in Barcelona in celsius?',
);
final res1 = await client.generateChatCompletion(
request: GenerateChatCompletionRequest(
model: 'llama3.2',
messages: [userMsg],
tools: [tool],
),
);
print(res1.message.toolCalls);
// [
// ToolCall(
// function:
// ToolCallFunction(
// name: get_current_weather,
// arguments: {
// location: Barcelona, ES,
// unit: celsius
// }
// )
// )
// ]
// Call your tool here. For this example, we'll just mock the response.
const toolResult = '{"location": "Barcelona, ES", "temperature": 20, "unit": "celsius"}';
// Submit the response of the tool call to the model
final res2 = await client.generateChatCompletion(
request: GenerateChatCompletionRequest(
model: 'llama3.2',
messages: [
userMsg,
res1.message,
Message(
role: MessageRole.tool,
content: toolResult,
),
],
),
);
print(res2.message.content);
// The current weather in Barcelona is 20°C.
Embeddings
Given a prompt, the model will generate an embedding representing the prompt.
Generate embedding
final generated = await client.generateEmbedding(
request: GenerateEmbeddingRequest(
model: 'mistral:latest',
prompt: 'Here is an article about llamas...',
),
);
print(generated.embedding);
// [8.566641807556152, 5.315540313720703, ...]
Models
Create model
Creates a new local model using a modelfile.
await client.createModel(
request: CreateModelRequest(
model: 'mario',
modelfile: 'FROM mistral:latest\nSYSTEM You are mario from Super Mario Bros.',
),
);
You can also stream the status of the model creation:
final stream = client.createModelStream(
request: CreateModelRequest(
model: 'mario',
modelfile: 'FROM mistral:latest\nSYSTEM You are mario from Super Mario Bros.',
),
);
await for (final res in stream) {
print(res.status);
}
List models
List models that are available locally.
final res = await client.listModels();
print(res.models);
List running models
Lists models currently loaded and their memory footprint.
final res = await client.listRunningModels();
print(res.models);
Show Model Information
Show details about a model including modelfile, template, parameters, license, and system prompt.
final res = await client.showModelInfo(
request: ModelInfoRequest(model: 'mistral:latest'),
);
print(res);
Pull a Model
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
final res = await client.pullModel(
request: PullModelRequest(model: 'yarn-llama3:13b-128k-q4_1'),
);
print(res.status);
You can also stream the pulling status:
final stream = client.pullModelStream(
request: PullModelRequest(model: 'yarn-llama3:13b-128k-q4_1'),
);
await for (final res in stream) {
print(res.status);
}
Push a Model
Upload a model to a model library.
Requires registering for ollama.ai and adding a public key first.
final res = await client.pushModel(
request: PushModelRequest(model: 'mattw/pygmalion:latest'),
);
print(res.status);
You can also stream the pushing status:
final stream = client.pushModelStream(
request: PushModelRequest(model: 'mattw/pygmalion:latest'),
);
await for (final res in stream) {
print(res.status);
}
Check if a Blob Exists
Ensures that the file blob used for a FROM or ADAPTER field exists on the server. This is checking your Ollama server and not Ollama.ai.
await client.checkBlob(
digest: 'sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2',
);
If the blob doesn't exist, an OllamaClientException
exception will be thrown.
Version
Get the version of the Ollama server.
final res = await client.getVersion();
print(res.version);
Advance Usage
Default HTTP client
By default, the client uses http://localhost:11434/api
as the baseUrl
and the following implementations of http.Client
:
- Non-web:
IOClient
- Web:
FetchClient
(to support streaming on web)
Custom HTTP client
You can always provide your own implementation of http.Client
for further customization:
final client = OllamaClient(
client: MyHttpClient(),
);
Using a proxy
HTTP proxy
You can use your own HTTP proxy by overriding the baseUrl
and providing your required headers
:
final client = OllamaClient(
baseUrl: 'https://my-proxy.com',
headers: {
'x-my-proxy-header': 'value',
},
);
If you need further customization, you can always provide your own http.Client
.
SOCKS5 proxy
To use a SOCKS5 proxy, you can use the socks5_proxy
package:
final baseHttpClient = HttpClient();
SocksTCPClient.assignToHttpClient(baseHttpClient, [
ProxySettings(InternetAddress.loopbackIPv4, 1080),
]);
final httpClient = IOClient(baseClient);
final client = OllamaClient(
client: httpClient,
);
Acknowledgements
The generation of this client was made possible by the openapi_spec package.
License
Ollama Dart Client is licensed under the MIT License.
Libraries
- ollama_dart
- Dart Client for the Ollama API (run Llama 3, Code Llama, and other models locally).