mirror of
https://github.com/JHUAPL/kvspool.git
synced 2026-01-08 22:27:56 -05:00
79 lines
4.7 KiB
Plaintext
79 lines
4.7 KiB
Plaintext
Design concepts for "v2" rewrite of kvspool
|
|
-------------------------------------------
|
|
1. Support multi-writer, multi-reader from same spool
|
|
2. Use a memory-mapped file for reading/writing spool data so that:
|
|
(1) I/O occurs through shared memory even without a ramdisk, while
|
|
(2) data is still persisted back to disk
|
|
3. Support for multi-writers requires a synchronization mechanism.
|
|
(1) This is one of the functions of the "control file".
|
|
(a) this file exists alongside the spool data file
|
|
(b) by flock'ing it (or fcntl lock on a region of it), one writer
|
|
can gain exclusive write (which applies to the spool data file too);
|
|
a second level of record-locking using fcntl lock on the spool data
|
|
file can act as a redundant safeguard
|
|
(c) the control file has the min and max sequence number in it
|
|
(d) the max sequence number is just the "frame number" of the next
|
|
frame to be written
|
|
(e) the min sequence number is incremented (sometimes by an increment
|
|
greater than one) when the writer is overwriting previous frame(s).
|
|
It's purpose is explained under the "Support for multi-readers" later.
|
|
(f) The offset of the min and max frames are also stored
|
|
(g) The control block may also contain a few time-series on write rates.
|
|
(i) It would also be possible to place the control file into the data
|
|
spool itself, in which case its a "control block" of fixed size
|
|
at the beginning; this would eliminate some failure modes and
|
|
reduce the file descriptor bookkeeping by one
|
|
4. Spool data file is a single, large, circular data buffer
|
|
(a) It is pre-created prior to data being written to reserve the space
|
|
(b) This requires that it be a non-sparse file
|
|
(c) It is used as a cyclic buffer
|
|
(d) When the end is reached, a new frame may not quite fit at the end,
|
|
in which case the frame starts at the beginning of the file; but
|
|
this requires that the frame's content-length may differ from its
|
|
stored length (so that the frame that ends up at the end of the
|
|
buffer can be adjusted to consume the full remaining space).
|
|
(e) Thus the frame format is
|
|
(1) sequence number
|
|
(2) storage length
|
|
(3) content length
|
|
(4) data (in JSON)
|
|
(f) The single large data file replaces the kvspool-v1 approach
|
|
where ten sequenced files contain the spool data, and old files
|
|
are deleted as new files are written. The v1 logic requires
|
|
detection of new files in the spool, although its advantegous
|
|
in that read/write through standard (non-mmap) calls does not
|
|
swap in the entire data spool as the v2 approach may tend to do
|
|
5. Support for multi-readers
|
|
(1) since readers that are inactive for a long time may get to the point
|
|
that their next read position is potentially invalid (due to a
|
|
writer wraparound that puts a new frame into the read area),
|
|
(a) the reader that is entering the 'read' state will first
|
|
lock the control file, acquire the minimum sequence number
|
|
to see if its exceeded its own read position
|
|
(i) If it has, then the reader has experienced frame loss
|
|
and adjusts its next-read-position to the min frame
|
|
(ii) if not, the reader can then record-lock the spool
|
|
data and read the next frame
|
|
(iii) note that if the max sequence number is the same as
|
|
the read position, then the reader needs to block
|
|
(by placing inotify on the control file, unlocking
|
|
and going into a select/epoll).
|
|
(2) If reader needs persistence for its read position it should
|
|
store its own sequence number and identifier in the spool dir
|
|
6. Key repetition
|
|
(1) if every frame tends to repeat the same keys or a subset of a
|
|
relatively small set of keys (as is typically expected) then
|
|
the keys are highly redundant and suitable for compression
|
|
(2) one option is to use indexes instead of the keys themselves;
|
|
seperately a key-store would be maintained with a table
|
|
(into which the index points) whose value is the offset on
|
|
disk of the key itself. Adds complexity but saves a lot of
|
|
space. Alternatively some kind of run time compression on
|
|
the frames is possible particularly if some kind of frame
|
|
history is kept (e.g. as with video, key frames at intervals
|
|
would keep the whole keys, into which the indexes would point
|
|
for subsequent frames; this would complicate the cyclic
|
|
wraparound logic for recycling old frames by pushing it to
|
|
key-frame boundaries
|
|
|