Files
kvspool/doc/future.txt
2012-03-18 23:01:37 -04:00

79 lines
4.7 KiB
Plaintext

Design concepts for "v2" rewrite of kvspool
-------------------------------------------
1. Support multi-writer, multi-reader from same spool
2. Use a memory-mapped file for reading/writing spool data so that:
(1) I/O occurs through shared memory even without a ramdisk, while
(2) data is still persisted back to disk
3. Support for multi-writers requires a synchronization mechanism.
(1) This is one of the functions of the "control file".
(a) this file exists alongside the spool data file
(b) by flock'ing it (or fcntl lock on a region of it), one writer
can gain exclusive write (which applies to the spool data file too);
a second level of record-locking using fcntl lock on the spool data
file can act as a redundant safeguard
(c) the control file has the min and max sequence number in it
(d) the max sequence number is just the "frame number" of the next
frame to be written
(e) the min sequence number is incremented (sometimes by an increment
greater than one) when the writer is overwriting previous frame(s).
It's purpose is explained under the "Support for multi-readers" later.
(f) The offset of the min and max frames are also stored
(g) The control block may also contain a few time-series on write rates.
(i) It would also be possible to place the control file into the data
spool itself, in which case its a "control block" of fixed size
at the beginning; this would eliminate some failure modes and
reduce the file descriptor bookkeeping by one
4. Spool data file is a single, large, circular data buffer
(a) It is pre-created prior to data being written to reserve the space
(b) This requires that it be a non-sparse file
(c) It is used as a cyclic buffer
(d) When the end is reached, a new frame may not quite fit at the end,
in which case the frame starts at the beginning of the file; but
this requires that the frame's content-length may differ from its
stored length (so that the frame that ends up at the end of the
buffer can be adjusted to consume the full remaining space).
(e) Thus the frame format is
(1) sequence number
(2) storage length
(3) content length
(4) data (in JSON)
(f) The single large data file replaces the kvspool-v1 approach
where ten sequenced files contain the spool data, and old files
are deleted as new files are written. The v1 logic requires
detection of new files in the spool, although its advantegous
in that read/write through standard (non-mmap) calls does not
swap in the entire data spool as the v2 approach may tend to do
5. Support for multi-readers
(1) since readers that are inactive for a long time may get to the point
that their next read position is potentially invalid (due to a
writer wraparound that puts a new frame into the read area),
(a) the reader that is entering the 'read' state will first
lock the control file, acquire the minimum sequence number
to see if its exceeded its own read position
(i) If it has, then the reader has experienced frame loss
and adjusts its next-read-position to the min frame
(ii) if not, the reader can then record-lock the spool
data and read the next frame
(iii) note that if the max sequence number is the same as
the read position, then the reader needs to block
(by placing inotify on the control file, unlocking
and going into a select/epoll).
(2) If reader needs persistence for its read position it should
store its own sequence number and identifier in the spool dir
6. Key repetition
(1) if every frame tends to repeat the same keys or a subset of a
relatively small set of keys (as is typically expected) then
the keys are highly redundant and suitable for compression
(2) one option is to use indexes instead of the keys themselves;
seperately a key-store would be maintained with a table
(into which the index points) whose value is the offset on
disk of the key itself. Adds complexity but saves a lot of
space. Alternatively some kind of run time compression on
the frames is possible particularly if some kind of frame
history is kept (e.g. as with video, key frames at intervals
would keep the whole keys, into which the indexes would point
for subsequent frames; this would complicate the cyclic
wraparound logic for recycling old frames by pushing it to
key-frame boundaries