transform method
Transforms a text stream into non-overlapping chunks of approximately equal size.
This method breaks the input text stream into chunks that are close to the target chunk size. It uses the cleanChunks extension from the bpe package to prepare the text before chunking.
@param rawFeed The input stream of text @return A stream of non-overlapping chunks
Implementation
Stream<Chunk> transform(Stream<String> rawFeed) async* {
int start = 0;
int lengthBuffer = 0;
List<String> buffer = [];
int id = 0;
await for (String i in rawFeed.cleanChunks(
size: max(1, chunkSize ~/ 2),
grace: max(1, chunkSize ~/ 4),
)) {
buffer.add(i);
lengthBuffer += i.length;
if (lengthBuffer >= chunkSize && lengthBuffer - i.length <= chunkSize) {
Chunk c = Chunk(
id++,
start,
lengthBuffer - i.length,
buffer.sublist(0, buffer.length - 1).join(),
);
start += c.length;
lengthBuffer = i.length;
buffer = [i];
yield c;
}
}
if (buffer.isNotEmpty) {
yield Chunk(id++, start, lengthBuffer, buffer.join());
}
}