llm_core

pub.dev

Core abstractions for LLM (Large Language Model) interactions in Dart.

Available on pub.dev.

This package provides the foundational interfaces and models used by LLM backend implementations such as llm_ollama, llm_chatgpt, and llm_llamacpp.

Important: interfaces only

llm_core does not connect to any LLM by itself. It defines the shared types (messages, chunks, tools, options, etc.) and the LLMChatRepository interface.

To actually run chat/embeddings you must use a backend implementation, for example:

  • llm_ollama (talks to a local/remote Ollama server)
  • llm_chatgpt (talks to OpenAI / ChatGPT-compatible APIs)
  • llm_llamacpp (runs local inference via llama.cpp)

Installation

Most users should depend on a backend implementation (it re-exports llm_core types):

dependencies:
  llm_ollama: ^0.1.5

If you're implementing your own backend, depend on llm_core directly:

dependencies:
  llm_core: ^0.1.5

Core Types

Messages

// Create messages for conversation
final messages = [
  LLMMessage(role: LLMRole.system, content: 'You are helpful.'),
  LLMMessage(role: LLMRole.user, content: 'Hello!'),
  LLMMessage(role: LLMRole.assistant, content: 'Hi there!'),
];

Repository Interface

abstract class LLMChatRepository {
  Stream<LLMChunk> streamChat(
    String model, {
    required List<LLMMessage> messages,
    bool think = false,
    List<LLMTool> tools = const [],
    dynamic extra,
    StreamChatOptions? options, // Optional: encapsulates all options
  });

  Future<LLMResponse> chatResponse(
    String model, {
    required List<LLMMessage> messages,
    bool think = false,
    List<LLMTool> tools = const [],
    dynamic extra,
    StreamChatOptions? options,
  });

  Future<List<LLMEmbedding>> embed({
    required String model,
    required List<String> messages,
    Map<String, dynamic> options = const {},
  });

  Future<List<LLMEmbedding>> batchEmbed({
    required String model,
    required List<String> messages,
    Map<String, dynamic> options = const {},
  });
}

Tools

class MyTool extends LLMTool {
  @override
  String get name => 'my_tool';

  @override
  String get description => 'Does something useful';

  @override
  List<LLMToolParam> get parameters => [
    LLMToolParam(
      name: 'input',
      type: 'string',
      description: 'The input to process',
      isRequired: true,
    ),
  ];

  @override
  Future<String> execute(Map<String, dynamic> args, {dynamic extra}) async {
    return 'Result: ${args['input']}';
  }
}

StreamChatOptions

Encapsulates all optional parameters for streamChat() to reduce parameter proliferation:

final options = StreamChatOptions(
  think: true,
  tools: [MyTool()],
  toolAttempts: 5,
  timeout: Duration(minutes: 5),
  retryConfig: RetryConfig(maxAttempts: 3),
);

final stream = repo.streamChat('model', messages: messages, options: options);

Retry Configuration

Configure automatic retries with exponential backoff:

final retryConfig = RetryConfig(
  maxAttempts: 3,                    // Maximum retry attempts
  initialDelay: Duration(seconds: 1), // Initial delay before first retry
  maxDelay: Duration(seconds: 30),    // Maximum delay between retries
  backoffMultiplier: 2.0,            // Exponential backoff multiplier
  retryableStatusCodes: [429, 500, 502, 503, 504], // HTTP codes to retry
);

// Use RetryUtil for custom retry logic
await RetryUtil.executeWithRetry(
  operation: () async => someOperation(),
  config: retryConfig,
  isRetryable: (error) => error is TimeoutException,
);

Timeout Configuration

Configure connection and read timeouts:

final timeoutConfig = TimeoutConfig(
  connectionTimeout: Duration(seconds: 10),        // Connection timeout
  readTimeout: Duration(minutes: 2),                // Read timeout
  totalTimeout: Duration(minutes: 10),               // Total request timeout
  readTimeoutForLargePayloads: Duration(minutes: 5), // Timeout for large payloads (>1MB)
);

// Get appropriate timeout based on payload size
final timeout = timeoutConfig.getReadTimeoutForPayload(payloadSizeBytes);

Metrics Collection

Optional metrics collection for monitoring LLM operations:

// Use default implementation
final metrics = DefaultLLMMetrics();

// Metrics are automatically recorded by repositories
// Access collected metrics:
final stats = metrics.getMetrics();
print('Total requests: ${stats['model.total_requests']}');
print('Successful: ${stats['model.successful_requests']}');
print('Failed: ${stats['model.failed_requests']}');
print('Avg latency: ${stats['model.avg_latency_ms']}ms');
print('P95 latency: ${stats['model.p95_latency_ms']}ms');
print('Total tokens: ${stats['model.total_generated_tokens']}');

// Reset metrics
metrics.reset();

// Or implement custom metrics collector
class MyMetrics implements LLMMetrics {
  @override
  void recordRequest({required String model, required bool success}) {
    // Send to your observability stack
  }
  // ... implement other methods
}

Validation

Input validation utilities:

// Validate model name
Validation.validateModelName('gpt-4o');

// Validate messages
Validation.validateMessages([
  LLMMessage(role: LLMRole.user, content: 'Hello!'),
]);

// Validate tool arguments
Validation.validateToolArguments(
  {'expression': '2+2'},
  'calculator',
);

Exceptions

  • ThinkingNotSupportedException - Model doesn't support thinking
  • ToolsNotSupportedException - Model doesn't support tools
  • VisionNotSupportedException - Model doesn't support vision
  • LLMApiException - API request failed
  • ModelLoadException - Model loading failed

Usage with Backends

This package is typically used indirectly through backend packages:

import 'package:llm_ollama/llm_ollama.dart'; // Re-exports llm_core

final repo = OllamaChatRepository(baseUrl: 'http://localhost:11434');

final stream = repo.streamChat('qwen3:0.6b', messages: [
  LLMMessage(role: LLMRole.user, content: 'Hello!'),
]);

await for (final chunk in stream) {
  print(chunk.message?.content ?? '');
}

Libraries

llm_core
Core abstractions for LLM (Large Language Model) interactions.