ollama_dart 2.0.0 copy "ollama_dart: ^2.0.0" to clipboard
ollama_dart: ^2.0.0 copied to clipboard

Dart client for the Ollama API to run LLMs locally (OpenAI gpt-oss, DeepSeek-R1, Gemma 3, Llama 4, and more).

Ollama Dart Client #

tests ollama_dart Discord MIT

Dart client for the Ollama API to run local and self-hosted models — chat, streaming, tool calling, embeddings, and model management. It gives Dart and Flutter applications a pure Dart, type-safe client across iOS, Android, macOS, Windows, Linux, Web, and server-side Dart.

Tip

Coding agents: start with llms.txt. It links to the package docs, examples, and optional references in a compact format.

Table of Contents

Features #

Generation and streaming #

  • Chat completions with context memory and multimodal inputs
  • Text generation for prompt-style completions
  • Embeddings for semantic search and retrieval
  • NDJSON streaming for chat and completions
  • Tool calling, thinking mode, and structured output

Local model operations #

  • Pull, push, copy, create, delete, and inspect models
  • List running models and query server version
  • Connect to local or remote Ollama instances with optional auth

Why choose this client? #

  • Pure Dart with no Flutter dependency — works in mobile apps, backends, and CLIs.
  • Type-safe request and response models with minimal dependencies (http, logging, meta).
  • Streaming, retries, interceptors, and error handling built into the client.
  • Mirrors the Ollama API closely, including model management endpoints most wrappers skip.

Quickstart #

dependencies:
  ollama_dart: ^2.0.0
import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final response = await client.chat.create(
      request: ChatRequest(
        model: 'gpt-oss',
        messages: [ChatMessage.user('Explain what Dart isolates do.')],
      ),
    );

    print(response.message?.content);
  } finally {
    client.close();
  }
}

Configuration #

Configure local hosts, remote servers, and retries

Use OllamaClient() for the default local daemon at http://localhost:11434, or OllamaClient.fromEnvironment() to read OLLAMA_HOST. Use OllamaConfig when you need a remote host, bearer auth, or a different timeout policy.

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient(
    config: OllamaConfig(
      baseUrl: 'http://localhost:11434',
      timeout: const Duration(minutes: 5),
      retryPolicy: RetryPolicy(
        maxRetries: 3,
        initialDelay: Duration(seconds: 1),
      ),
    ),
  );

  client.close();
}

Environment variable:

  • OLLAMA_HOST

Use BearerTokenProvider when the Ollama server is exposed behind an authenticated reverse proxy or remote deployment.

Usage #

How do I run a chat completion? #

Show example

Use client.chat.create(...) for conversational flows. The chat response exposes message?.content, which keeps simple completions ergonomic in Dart and Flutter UIs.

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final response = await client.chat.create(
      request: ChatRequest(
        model: 'gpt-oss',
        messages: [
          ChatMessage.system('You are a concise assistant.'),
          ChatMessage.user('What is hot reload?'),
        ],
      ),
    );

    print(response.message?.content);
  } finally {
    client.close();
  }
}

For structured output, set format to constrain the response to valid JSON:

final response = await client.chat.create(
  request: ChatRequest(
    model: 'gpt-oss',
    messages: [ChatMessage.user('List 3 colors as JSON')],
    format: ResponseFormat.json,
  ),
);

Full example

How do I stream local model output? #

Show example

Streaming uses Ollama's NDJSON response format and works well for terminals and live Flutter widgets. This is the fastest way to surface partial output from a local model.

import 'dart:io';

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final stream = client.chat.createStream(
      request: ChatRequest(
        model: 'gpt-oss',
        messages: [ChatMessage.user('Write a haiku about local models.')],
      ),
    );

    await for (final chunk in stream) {
      stdout.write(chunk.message?.content ?? '');
    }
  } finally {
    client.close();
  }
}

Full example

How do I use tool calling? #

Show example

Tool calling is declared on the request with typed ToolDefinition objects. This makes local agent-style workflows possible without switching to another API format.

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final response = await client.chat.create(
      request: ChatRequest(
        model: 'gpt-oss',
        messages: [ChatMessage.user('What is the weather in Paris?')],
        tools: [
          ToolDefinition(
            type: ToolType.function,
            function: ToolFunction(
              name: 'get_weather',
              description: 'Get the current weather for a location',
              parameters: {
                'type': 'object',
                'properties': {
                  'location': {'type': 'string'},
                },
                'required': ['location'],
              },
            ),
          ),
        ],
      ),
    );

    print(response.message?.toolCalls?.length ?? 0);
  } finally {
    client.close();
  }
}

Full example

How do I generate plain text? #

Show example

Use the completions resource when you want prompt-style generation instead of chat messages. This is useful for legacy templates, code infill helpers, or smaller server utilities.

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final result = await client.completions.generate(
      request: GenerateRequest(
        model: 'gpt-oss',
        prompt: 'Complete this sentence: Dart is great for',
      ),
    );

    print(result.response);
  } finally {
    client.close();
  }
}

Full example

How do I create embeddings? #

Show example

Embeddings are exposed as a first-class resource, so semantic search or retrieval code can stay inside the same Ollama client. This is useful for local RAG pipelines in Dart.

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final response = await client.embeddings.create(
      request: const EmbedRequest(
        model: 'nomic-embed-text',
        input: EmbedInput.list(['Dart', 'Flutter']),
      ),
    );

    print(response.embeddings.length);
  } finally {
    client.close();
  }
}

Full example

How do I manage local models? #

Show example

Model management is part of the same client, which means pull, inspect, and runtime checks do not require a separate admin tool. That is useful for installers, desktop apps, and local dev tooling.

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    final models = await client.models.list();
    print(models.models.length);
  } finally {
    client.close();
  }
}

Full example

Error Handling #

Handle local daemon failures, retries, and streaming issues

ollama_dart throws typed exceptions so you can distinguish between API failures, timeouts, aborts, and streaming problems. Catch ApiException first for HTTP errors, then fall back to OllamaException for everything else.

import 'dart:io';

import 'package:ollama_dart/ollama_dart.dart';

Future<void> main() async {
  final client = OllamaClient();

  try {
    await client.version.get();
  } on ApiException catch (error) {
    stderr.writeln('Ollama API error ${error.statusCode}: ${error.message}');
  } on OllamaException catch (error) {
    stderr.writeln('Ollama client error: $error');
  } finally {
    client.close();
  }
}

Full example

Examples #

See the example/ directory for complete examples:

Example Description
chat_example.dart Chat completions
streaming_example.dart Streaming responses
tool_calling_example.dart Tool calling
completions_example.dart Plain text generation
embeddings_example.dart Text embeddings
models_example.dart Model management
version_example.dart Server version
error_handling_example.dart Exception handling patterns
ollama_dart_example.dart Quick-start overview

API Coverage #

API Status
Chat ✅ Full
Completions ✅ Full
Embeddings ✅ Full
Models ✅ Full
Version ✅ Full

Official Documentation #

If these packages are useful to you or your company, please consider sponsoring the project. Development and maintenance are provided to the community for free, but integration tests against real APIs and the tooling required to build and verify releases still have real costs. Your support, at any level, helps keep these packages maintained and free for the Dart & Flutter community.

License #

This package is licensed under the MIT License.

This is a community-maintained package and is not affiliated with or endorsed by Ollama.

82
likes
160
points
3.5k
downloads

Documentation

Documentation
API reference

Publisher

verified publisherdavidmiguel.com

Weekly Downloads

Dart client for the Ollama API to run LLMs locally (OpenAI gpt-oss, DeepSeek-R1, Gemma 3, Llama 4, and more).

Homepage
Repository (GitHub)
View/report issues

Topics

#nlp #gen-ai #llms #ollama

Funding

Consider supporting this project:

github.com

License

MIT (license)

Dependencies

http, logging, meta

More

Packages that depend on ollama_dart