fire_rag
fire_rag is a Firebase-first ingestion pipeline for retrieval-augmented generation.
It is built for the case where you already have:
- a Firestore-backed application
- a Cloud Tasks queue
- a text source arriving in Cloud Storage
- an embedding model
- optionally, a chat model for recursive summarization
Instead of indexing everything in one request, fire_rag turns ingestion into resumable Cloud Tasks:
- download a source file
- split it into base chunks
- persist chunk records to Firestore
- embed those records
- optionally distill them into higher levels of detail
- embed the distilled records too
The result is a Firestore collection containing serialized agentic Chunk records, vectors, and chunk relationships that can later be queried with rag.
Concept
This package is not a chat agent and it is not a vector query layer by itself.
It is the ingestion half of a Firebase RAG stack.
The core idea is:
- use
arcane_adminto run resumable task work in Cloud Tasks - use
agenticto chunk text and call models - store chunk documents in Firestore
- use
raglater to retrieve those embedded records during answering
If you already have documents landing in Cloud Storage, fire_rag gives you a straightforward path from uploaded file to vectorized Firestore records.
How It Works
fire_rag wires three task types into an arcane_admin TaskManager.
1. TaskChunk
TaskChunk is the entry task.
It:
- downloads a source file from Cloud Storage to a local temp path
- uses
agentic.IChunkerto split the file into baseChunkmodels - writes each chunk into Firestore using the
Chunkmodel shape - batches the chunk document IDs into
TaskEmbedjobs - optionally schedules
TaskDistillif recursive summarization is enabled
Each base chunk document stores the normal Chunk fields, including:
contentpostContentindexlodcharStartcharEndrecordmetadata
destinationMetadata from TaskChunk is merged into Chunk.metadata, not written as extra top-level Firestore fields.
2. TaskEmbed
TaskEmbed reads stored chunk text from Firestore, reconstructs Chunk.fullContent, calls your connected embedding model, and writes the resulting vector back onto the same document.
By default it embeds:
content + postContent
and stores the result in:
vector
3. TaskDistill
TaskDistill is the recursive summarization stage.
It:
- reads groups of
factorchunks from one level of detail - sends them to your connected chat model
- writes a distilled
Chunkinto the nextlod - links source and distilled chunks with top-level
downandup - schedules embedding for the newly created distilled chunks
- continues level-by-level until only one distilled output remains
This gives you a hierarchy of chunks:
- L0: original chunked source text
- L1: distilled groups of L0
- L2: distilled groups of L1
- and so on until a single higher-level summary remains
Package Surface
The public bootstrap is small:
FireRag.init(...)
It registers task executors for:
TaskChunkTaskDistillTaskEmbed
and exposes the configured TaskManager, embedding model, and chat model through FireRag.instance.
Installation
Add the package:
dart pub add fire_rag
Typical companion packages are:
dart pub add arcane_admin
dart pub add agentic
dart pub add rag
If you are developing this package or changing artifact-backed task models, keep generated code up to date:
dart run build_runner build --delete-conflicting-outputs
Getting Started
At startup you usually do two things:
- initialize
ArcaneAdmin - initialize
FireRag
import 'package:agentic/agentic.dart';
import 'package:arcane_admin/arcane_admin.dart';
import 'package:fire_rag/fire_rag.dart';
Future<void> main() async {
await ArcaneAdmin.initialize(
projectId: 'my-project-id',
defaultStorageBucket: 'my-project-id.firebasestorage.app',
);
ConnectedEmbeddingModel embedder = OpenAIConnector(
apiKey: const String.fromEnvironment('OPENAI_API_KEY'),
).asEmbedder('text-embedding-3-small');
ConnectedChatModel llm = OpenAIConnector(
apiKey: const String.fromEnvironment('OPENAI_API_KEY'),
).connect(ChatModel.openai4_1Mini);
FireRag.init(
embed: embedder,
llm: llm,
taskQueue: 'rag-ingest',
endpointUrl: 'https://your-service.run.app/event/executeJob',
);
}
Usage
The usual deployment shape is:
- one endpoint that receives a Cloud Storage finalization event
- one endpoint that executes scheduled tasks
Minimal Server Wiring
import 'package:arcane_admin/arcane_admin.dart';
import 'package:fire_rag/fire_rag.dart';
import 'package:fire_rag/task/task_chunk.dart';
import 'package:shelf/shelf.dart';
import 'package:shelf/shelf_io.dart' as io;
import 'package:shelf_router/shelf_router.dart';
Future<void> main() async {
await ArcaneAdmin.initialize(
projectId: 'my-project-id',
defaultStorageBucket: 'my-project-id.firebasestorage.app',
);
FireRag.init(
embed: /* your ConnectedEmbeddingModel */,
llm: /* your ConnectedChatModel */,
taskQueue: 'rag-ingest',
endpointUrl: 'https://your-service.run.app/event/executeJob',
);
Router router = Router();
router.taskManager(FireRag.instance.taskManager);
router.post('/storageFinalized', (Request request) {
return request.storageEvent((ArcaneStorageEvent event) async {
if (!event.path.endsWith('.txt')) {
return Response.ok('');
}
await FireRag.instance.taskManager.schedule(
TaskChunk(
taskId: 'ingest.${event.bucket}.${event.path}',
sourceBucket: event.bucket,
sourcePath: event.path,
destinationCollection: 'rag_chunks',
record: event.path,
maxChunkSize: 500,
maxPostOverlap: 100,
embedBatchSize: 25,
chunkBatchSize: 100,
distillationFactor: 4,
destinationMetadata: {
'sourceBucket': event.bucket,
'sourcePath': event.path,
},
),
);
return Response.ok('');
});
});
await io.serve(router.call, '0.0.0.0', 8080);
}
What The Example Does
/storageFinalizedreceives a storage event from Eventarc or your own webhook bridge- a new
TaskChunkis scheduled /event/executeJobis automatically handled byrouter.taskManager(...)- the task manager keeps re-queuing work until the current task is complete
Data Model
Chunk document IDs follow this pattern:
{record}.{index}L{lod}
Examples:
customer-handbook.pdf.0L0
customer-handbook.pdf.1L0
customer-handbook.pdf.0L1
customer-handbook.pdf.0L2
Useful stored fields include:
content: the main body of the chunkpostContent: overlap from the following chunkindex: the chunk index within that levellod: level of detailcharStartcharEndrecord: logical record identifiermetadata: extra application metadata such as source bucket or source pathvector: embedding written byTaskEmbeddown: child chunk indexes used to create a distilled chunkup: parent chunk index created from a source chunk
In practice, Firestore documents are stored in the same shape as agentic's Chunk.toMap():
{
"index": 0,
"content": "Chunk body",
"postContent": " overlap from the next chunk",
"charStart": 0,
"charEnd": 532,
"lod": 0,
"record": "customer-handbook.txt",
"metadata": {
"sourceBucket": "docs",
"sourcePath": "customer-handbook.txt"
},
"up": 0,
"down": [0, 1, 2, 3],
"vector": {
"vector": [0.12, -0.04, 0.98]
}
}
Notes about that shape:
up,down, andvectorare optional and appear only after later stages populate themmetadatais the right place for custom application values- top-level chunk fields remain available for app-side orchestration and debugging
Choosing Distillation Settings
The most important knobs are:
maxChunkSize: target size of each stored chunkmaxPostOverlap: overlap appended from the next chunkchunkBatchSize: how many chunks are persisted before scheduling embed workembedBatchSize: how many document IDs are sent per embedding taskdistillationFactor: how many chunks are combined into one higher-LOD chunk
Rules of thumb:
- start with
maxChunkSize: 500andmaxPostOverlap: 100 - use
distillationFactor: 4if you want a compact hierarchy - omit
distillationFactorif you only want base chunks plus embeddings - increase
embedBatchSizeonly if your embedding provider comfortably supports it
Relationship To Other Packages
fire_rag is intentionally small because it leans on a few other packages:
-
agenticUsed forIChunker, chunk text splitting, connected chat models, and connected embedding models.TaskChunkuses the chunker.TaskDistillandTaskEmbeduse the connected models. -
arcane_adminUsed for Firebase admin initialization, Cloud Storage download access, Firestore access, Eventarc helpers, and the resumableTaskManager/TaskExecutorsystem that drives ingestion. -
ragNot used to ingest documents, but intended as the retrieval-side companion package once your Firestore chunk collection has vectors. The nestedmetadatamap written byfire_raglines up withrag's Firestore vector-space metadata convention. -
fire_apiUsed indirectly througharcane_adminfor Firestore and Storage abstractions, including document reads, writes, andVectorValue. -
artifactUsed for serializable task objects so task state can be preserved between Cloud Task executions.
You can think of the stack like this:
agentichandles model calls and chunkingarcane_adminhandles Firebase admin and task orchestrationfire_ragturns those pieces into a resumable ingestion pipelineragconsumes the resulting embedded records for retrieval
Typical Flow In Production
- A text file is uploaded to Cloud Storage.
- A storage event schedules
TaskChunk. TaskChunkwrites L0 chunks and schedulesTaskEmbed.- If enabled,
TaskChunkschedulesTaskDistill. TaskDistillwrites L1 chunks, schedules embeds for those chunks, and recursively schedules higher levels.- Firestore ends up containing both raw and distilled chunk records plus vectors.
- Your retrieval layer queries that collection later with
rag.
Notes
- This package currently assumes text-file ingestion. If your upstream source is PDF, OCR, HTML, or something else, convert it to text before scheduling
TaskChunk. TaskDistillrequires a chat model. If you do not want summarization, leavedistillationFactorunset.- Query-time schema expectations are up to your retrieval layer.
fire_ragfocuses on ingestion and vectorization, not retrieval policy.
Contributing
If you change task models or artifact-backed state fields, regenerate code before publishing:
dart run build_runner build --delete-conflicting-outputs