Design concepts for "v2" rewrite of kvspool ------------------------------------------- 1. Support multi-writer, multi-reader from same spool 2. Use a memory-mapped file for reading/writing spool data so that: (1) I/O occurs through shared memory even without a ramdisk, while (2) data is still persisted back to disk 3. Support for multi-writers requires a synchronization mechanism. (1) This is one of the functions of the "control file". (a) this file exists alongside the spool data file (b) by flock'ing it (or fcntl lock on a region of it), one writer can gain exclusive write (which applies to the spool data file too); a second level of record-locking using fcntl lock on the spool data file can act as a redundant safeguard (c) the control file has the min and max sequence number in it (d) the max sequence number is just the "frame number" of the next frame to be written (e) the min sequence number is incremented (sometimes by an increment greater than one) when the writer is overwriting previous frame(s). It's purpose is explained under the "Support for multi-readers" later. (f) The offset of the min and max frames are also stored (g) The control block may also contain a few time-series on write rates. (i) It would also be possible to place the control file into the data spool itself, in which case its a "control block" of fixed size at the beginning; this would eliminate some failure modes and reduce the file descriptor bookkeeping by one 4. Spool data file is a single, large, circular data buffer (a) It is pre-created prior to data being written to reserve the space (b) This requires that it be a non-sparse file (c) It is used as a cyclic buffer (d) When the end is reached, a new frame may not quite fit at the end, in which case the frame starts at the beginning of the file; but this requires that the frame's content-length may differ from its stored length (so that the frame that ends up at the end of the buffer can be adjusted to consume the full remaining space). (e) Thus the frame format is (1) sequence number (2) storage length (3) content length (4) data (in JSON) (f) The single large data file replaces the kvspool-v1 approach where ten sequenced files contain the spool data, and old files are deleted as new files are written. The v1 logic requires detection of new files in the spool, although its advantegous in that read/write through standard (non-mmap) calls does not swap in the entire data spool as the v2 approach may tend to do 5. Support for multi-readers (1) since readers that are inactive for a long time may get to the point that their next read position is potentially invalid (due to a writer wraparound that puts a new frame into the read area), (a) the reader that is entering the 'read' state will first lock the control file, acquire the minimum sequence number to see if its exceeded its own read position (i) If it has, then the reader has experienced frame loss and adjusts its next-read-position to the min frame (ii) if not, the reader can then record-lock the spool data and read the next frame (iii) note that if the max sequence number is the same as the read position, then the reader needs to block (by placing inotify on the control file, unlocking and going into a select/epoll). (2) If reader needs persistence for its read position it should store its own sequence number and identifier in the spool dir 6. Key repetition (1) if every frame tends to repeat the same keys or a subset of a relatively small set of keys (as is typically expected) then the keys are highly redundant and suitable for compression (2) one option is to use indexes instead of the keys themselves; seperately a key-store would be maintained with a table (into which the index points) whose value is the offset on disk of the key itself. Adds complexity but saves a lot of space. Alternatively some kind of run time compression on the frames is possible particularly if some kind of frame history is kept (e.g. as with video, key frames at intervals would keep the whole keys, into which the indexes would point for subsequent frames; this would complicate the cyclic wraparound logic for recycling old frames by pushing it to key-frame boundaries