Introduction
This is libdbm.dart, a dart implementation of a dbm
like database. It is extremely simple
and extremely fast. For ease-of-use, an implementation of the dart Map
is provided in addition
to a lower-level API. This Map
interface can be used to persist any data given the appropriate
serialization parameters. Like many other dbm
based systems, it uses a hashing approach to
provide a very fast key-value store. It is purposefully intended to be very minimalistic and
to have no dependencies.
This is an early preview. There will be additional capabilities added, including the
ability to maintain multiple indexes over the keys, and support for IndexedDB
APIs.
Getting Started
The API is deliberately extremely simple. In order to use this library, import the package, open a database, and store/fetch values.
PersistentMap
Using the PersistentMap
interface is very much like using a regular map, though the
data is stored on disk as shown below. All the regular Map
interfaces are supported
with the exception of the cast()
operation.
import 'dart:io';
import 'package:libdbm/libdbm.dart';
void main() {
final file = File('dummy.db');
var db = PersistentMap.withStringValue(file, create:true);
// persistent
db['foo'] = 'bar';
var result = db['foo'];
print('$result');
db.remove('foo');
db.close();
file.delete();
}
The PersistentMap
implementation will not overwrite an existing database, though it will
create a new one if create:true
is specified.
Raw DBM/HashDBM
This API is the lowest-level API, upon which PersistentMap
is written. It is functionally
very similar, but requires a little more plumbing to use.
import 'dart:io';
import 'dart:convert' show utf8;
import 'package:libdbm/libdbm.dart';
void main() {
final key = utf8.encoder.convert('A key');
final value = utf8.encoder.convert('A value');
final file = File('dummy.db');
final db = HashDBM(file.openSync(mode: FileMode.write));
db.put(key, value);
var result = db.get(key);
print('${utf8.decode(result!.toList())}');
for (var i = db.entries(); i.moveNext();) {
print('${utf8.decode(i.current.key)}');
print('${utf8.decode(i.current.value)}');
}
db.remove(key);
db.get(key); // will return null
db.close();
file.delete();
}
Note that to open an already closed database, use FileMode.append
otherwise the old
data will be overwritten (this is a simple way to truncate the database).
Performance
Benchmarks are notoriously difficult, but as a general guideline, libdbm
hash storage is
capable of reading and writing 1000s of key-value pairs per second. The following numbers are
taken from the test suite for 10,000 pairs.
buckets | op | time |
---|---|---|
103 | insert | 0:00:05.483791 |
103 | fetch | 0:00:02.668405 |
1009 | insert | 0:00:03.275287 |
1009 | fetch: | 0:00:01.393490 |
10007 | insert | 0:00:00.371533 |
10007 | fetch | 0:00:00.126720 |
100003 | insert | 0:00:00.126404 |
100003 | fetch | 0:00:00.059652 |
As can be seen, one of the main factors in performance is how large the internal hash table is.
This is persisted to external storage when flush()
or close()
is called, and will generally
consume 16*num
bytes. As a general rule, having this be a largish prime number is good.
Other major factors are whether flush
is set, which will force memory-based data structures to
disk whenever they are modified. It is also possible to add a CRC check to records, which will
roughly halve the throughput (i.e. operations will take twice as long);
Space and memory usage
The database file format has some fixed and dynamic sized overheads. As a general rule, the static overhead is < 1k. The dynamic overhead is whatever size is needed for the hash table and memory pool (roughly 16 bytes per entry each), and then a per-record overhead of about 32 bytes, and records are aligned to 128 byte boundaries. As such, the overhead for storing many tiny values will be fairly high, so it is better to aggregate such values into a single record. Conversely the overhead for storing largish values (such as text or JSON data) will be relatively low.
Limitations
The biggest current limitations are related to robustness. The library doesn't (yet) support
transactions and while care has been taken to ensure reliability, the library doesn't use a WAL
so in extreme cases, there is a small chance of corruption. The best way to mitigate this
is to have flush
turned on. Further tests need to be/will be written to handle bad input etc.
but the library is well tested and is used in production.
Currently, the hash table size is fixed, though the file format supports rehashing/reallocating the hash table. In the future, this capability will be used to optimize performance automatically.
Planned Enhancements
- Versioning of values so that
n
previous values will (optionally) be kept. This will probably be done by implementing pointer versioning. - Transaction support, which will basically buffer pointer updates and then write out atomically. This will be relatively simple with pointer versioning implemented.
- An extreme form of versioning will be purely append-only behavior.
- Index to support ordered traversal and simple queries. Probably both
btree
andsplay-tree
indexes. IndexDB
API support.- Maybe implement an STM server.
Exposed API
The interface to the underlying storage engine is basically that of a simple map from
Uint8List
to Uint8List
.
abstract class DBM {
Uint8List? get(Uint8List key);
Uint8List? remove(Uint8List key);
Uint8List? put(Uint8List key, Uint8List value);
Uint8List putIfAbsent(Uint8List key, Uint8List value);
Iterator<MapEntry<Uint8List,Uint8List>> entries();
DateTime modified();
int version();
int size();
int count();
void clear();
void flush();
void close();
}
This API is expected to be stable over time, with enhancements being additive.
Licence
Copyright 2021 Gavin Nicol
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.