Doc edit and multi-arg to sub

2026-01-09 15:37:56 -05:00 · 2012-09-27 09:29:44 -04:00
parent 0a9f341760
commit 15584515d6
3 changed files with 96 additions and 129 deletions
--- a/doc/kvspool.html
+++ b/doc/kvspool.html
@@ -772,8 +772,8 @@ asciidoc.install(2);
 <h1>kvspool: a tool for data streams</h1>
 <span id="author">Troy D. Hanson</span><br />
 <span id="email"><tt>&lt;<a href="mailto:tdh@tkhanson.net">tdh@tkhanson.net</a>&gt;</tt></span><br />
-<span id="revnumber">version 0.7,</span>
-<span id="revdate">April 2012</span>
+<span id="revnumber">version 0.8,</span>
+<span id="revdate">September 2012</span>
 <div id="toc">
  <div id="toctitle">Table of Contents</div>
  <noscript><p><b>JavaScript must be enabled in your browser to display the table of contents.</b></p></noscript>
@@ -799,12 +799,13 @@ kv-spool ("key-value" spool)
 <div class="sect1">
 <h2 id="_kvspool_8217_s_niche">kvspool&#8217;s niche</h2>
 <div class="sectionbody">
-<div class="paragraph"><p>Kvspool falls somewhere between the Unix pipe, a file-backed queue and a message-passing
-library. Its "unit of data" is the <strong>dictionary</strong>. (Or so Python calls it). Perl calls it a
-hash. It&#8217;s a set of key-value pairs.</p></div>
-<div class="paragraph"><p>To use kvspool, two programs open the same spool (which is just a directory). The writer
-puts dictionaries into the spool.  The reader gets dictionaries from the spool, blocking
-when it&#8217;s caught up.  Like this,</p></div>
+<div class="paragraph"><p>Kvspool is a tiny API to stream dictionaries between programs.  The dictionaries have
+textual keys and values. Note that what we&#8217;re calling a dictionary- what Python calls a
+dictionary- is known as a hash in Perl, and is manifested in the Java API as a HashMap.
+It&#8217;s a set of key-value pairs.</p></div>
+<div class="paragraph"><p>To use kvspool, two programs open the same spool- which is just a directory. The writer
+puts dictionaries into the spool.  The reader gets dictionaries from the spool. It blocks
+when it&#8217;s caught up, waiting for more data.  Like this,</p></div>
 <div class="paragraph"><p><span class="image">
 <img src="reader-writer.png" alt="A spool writer and reader" />
 </span></p></div>
@@ -832,22 +833,13 @@ http://www.gnu.org/software/src-highlite -->
 <div class="content">
 <div class="title">Why did I write kvspool?</div>
 <div class="paragraph"><p>I wanted a very simple library that only writes to the local file system, so
-applications can link with kvspool and use it without undesirable side effects
-(such as creation of threads, sockets, or incurring blocking operations). I
-wanted fewer, rather than more, features- taking the Unix pipe as a role model.</p></div>
-<div class="paragraph"><p>I also wanted to cater to the needs of my specific application- a never-ending
-event stream consumed by slower processes that might come and go. I didn&#8217;t want
-to fill up the disk if the reader was gone. I also didn&#8217;t want to block the
-writer while waiting for a reader. So kvspool keeps data until the spool is full,
-then deletes the old data to make room for new- regardless of whether its been
-read.  This makes sense when individual events are disposable. Obviously, its
-not for finance, life support, and situations where every event is critical.
-I also wanted rewind and replay- to take a snapshot of a running event stream,
-then be able to work with it offline. I wrote kvspool because I wanted just the
-features that I needed, that fit my use cases, without heavy dependencies.</p></div>
+applications can use kvspool without having to set anything up ahead of time-
+no servers to run, no configuration files. I wanted no "side effects" to happen
+in my programs-- no thread creation, no sockets, nothing going on underneath.
+I wanted fewer rather than more features- taking the Unix pipe as a role model.</p></div>
 </div></div>
 <div class="paragraph"><div class="title">Loose coupling</div><p>Because the spooled data goes into the disk, the reader and writer are decoupled. They
-don&#8217;t have to run at the same time. They can come and go.  If the reader exits and
+don&#8217;t have to run at the same time. They can come and go.  Also, if the reader exits and
 restarts, it picks up where it left off.</p></div>
 <div class="sect2">
 <h3 id="_space_management">Space management</h3>
@@ -866,7 +858,7 @@ fully read. (The data is kept around to reserve that disk space, and to support
 <div class="sect2">
 <h3 id="_shared_memory_i_o">Shared memory I/O</h3>
 <div class="paragraph"><p>You can locate a spool on a RAM disk if you want the speed of shared memory without true
-disk persistence- kvspool comes with a <tt>ramdisk</tt> utility to make one easily.</p></div>
+disk persistence- kvspool comes with a <tt>ramdisk</tt> utility to make one.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_data_attrition">Data attrition</h3>
@@ -919,24 +911,36 @@ that each reader gets it&#8217;s own spool:</p></div>
 <div class="content">
 <pre><tt>% kvsp-sub -d spool tcp://192.168.1.9:1110</tt></pre>
 </div></div>
-<div class="paragraph"><p>Obviously, the IP address must be valid on the publisher side. The port is up to you. This
-type of publish-subscribe does a "fan-out" (each subscriber gets a copy of the data). If
-you use the <tt>-s</tt> switch, on both pub and sub, it changes so each subscriber gets only a
-"1/n" share of the data. The latter mode is also preferred for 1-1 network replication.</p></div>
+<div class="paragraph"><p>This type of publish-subscribe does a "fan-out". Each subscriber gets a copy of the data.
+(It also drops data is no subscriber is connected- it&#8217;s a blast to whoever is listening).</p></div>
+<div class="sect3">
+<h4 id="_s_mode">-s mode</h4>
+<div class="paragraph"><p>If you use the <tt>-s</tt> switch, on both ends (kvsp-pub and kvsp-sub), two things change:
+data remains queued in the spool until a subscriber connects (instead of being dropped if
+no one is listening). Secondly, if more than one subscriber connects, the data gets
+divided among them rather than sent to all of them. Generally the -s mode is preferred
+if it fits your use case.</p></div>
+</div>
+<div class="sect3">
+<h4 id="_concentration">Concentration</h4>
+<div class="paragraph"><p>If you give multiple addresses to <tt>kvsp-sub</tt>, it connects to all of them and concentrates
+their published output into a single spool.</p></div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>% kvsp-sub -d spool tcp://192.168.1.9:1110 tcp://192.168.1.10:1111</tt></pre>
+</div></div>
 <div class="sidebarblock">
 <div class="content">
 <div class="title">The big picture</div>
 <div class="paragraph"><p>Before moving on- let&#8217;s take a deep breath and recap. With kvspool, the writer
-is completely unaware (blissfully ignorant) of whether network replication is
-taking place. The writer just writes to the local spool. We run the <tt>kvsp-pub</tt>
-utility in the background; as data comes into the spool, it transmits it on the
-network.</p></div>
-<div class="paragraph"><p>On the other computer (the receiving side), we run <tt>kvsp-sub</tt> in the background.
+is unaware of whether network replication is taking place. The writer just writes
+to the local spool. We run the <tt>kvsp-pub</tt> utility in the background; as data
+comes into the spool, it transmits it on the network.</p></div>
+<div class="paragraph"><p>On the other computer- the receiving side- we run <tt>kvsp-sub</tt> in the background.
 It receives the network transmissions, and writes them to its local spool.</p></div>
-<div class="paragraph"><p>Using <tt>kvsp-pub</tt> and <tt>kvsp-sub</tt>, we completely decouple the writer and reader
-from having to run on the same computer. They maintain a live, continuous
-replication. Whenever data is written to the source spool, it just "shows up"
-in the remote spool.</p></div>
+<div class="paragraph"><p>Use <tt>kvsp-pub</tt> and <tt>kvsp-sub</tt> to maintain a live, continuous replication. As
+data is written to the source spool, it just "shows up" in the remote spool.
+The reader and writer are completely uninvolved in the replication process.</p></div>
 </div></div>
 <div class="paragraph"><p><span class="image">
 <img src="pub-sub.png" alt="Publish and Subscribe" />
@@ -946,12 +950,13 @@ in the remote spool.</p></div>
 <td class="icon">
 <div class="title">Tip</div>
 </td>
-<td class="content">Use a daemon supervisor such as the author&#8217;s <a href="http://troydhanson.github.com/pmtr/">pmtr
-process monitor</a> to start up these commands at boot up and keep them running in the
-background.</td>
+<td class="content">A job manager such as the author&#8217;s <a href="http://troydhanson.github.com/pmtr/">pmtr process
+monitor</a> can be used to run <tt>kvsp-sub</tt> and <tt>kvsp-pub</tt> in the background, and restart
+them when the system reboots.</td>
 </tr></table>
 </div>
 </div>
+</div>
 <div class="sect2">
 <h3 id="_license">License</h3>
 <div class="paragraph"><p>See the <a href="LICENSE.txt">LICENSE.txt</a> file. Kvspool is free and open source.</p></div>
@@ -1308,39 +1313,6 @@ has the spool open at the time. It takes the spool directory as its only argumen
 </div>
 </div>
 <div class="sect1">
-<h2 id="_roadmap">Roadmap</h2>
-<div class="sectionbody">
-<div class="paragraph"><p>Kvspool is a young library and has some rough edges and room for improvement.</p></div>
-<div class="ulist"><ul>
-<li>
-<p>
-Autoconf detection for Perl, Python, Java should be improved
-</p>
-</li>
-<li>
-<p>
-Test suite is minimal, although kvspool has extensive production use
-</p>
-</li>
-<li>
-<p>
-It&#8217;s only been tested with Ubuntu 10.04
-</p>
-</li>
-<li>
-<p>
-Support multi-writer, multi-reader (see doc/future.txt)
-</p>
-</li>
-<li>
-<p>
-Replace segmented data files with one memory mapped, circular file
-</p>
-</li>
-</ul></div>
-</div>
-</div>
-<div class="sect1">
 <h2 id="_acknowledgments">Acknowledgments</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>Thanks to Trevor Adams for writing the original Perl and Java bindings and to
@@ -1351,8 +1323,8 @@ Replace segmented data files with one memory mapped, circular file
 <div id="footnotes"><hr /></div>
 <div id="footer">
 <div id="footer-text">
-Version 0.7<br />
-Last updated 2012-04-22 12:21:05 EDT
+Version 0.8<br />
+Last updated 2012-09-27 09:28:59 EDT
 </div>
 </div>
 </body>
--- a/doc/kvspool.txt
+++ b/doc/kvspool.txt
@@ -1,7 +1,7 @@
 kvspool: a tool for data streams
 ================================
 Troy D. Hanson <tdh@tkhanson.net>
-v0.7, April 2012
+v0.8, September 2012

 kv-spool ("key-value" spool):: 
         a Linux-based C library, with Perl, Python and Java bindings, to stream data
@@ -10,13 +10,14 @@ kv-spool ("key-value" spool)::

 kvspool's niche
 ---------------
-Kvspool falls somewhere between the Unix pipe, a file-backed queue and a message-passing
-library. Its "unit of data" is the *dictionary*. (Or so Python calls it). Perl calls it a
-hash. It's a set of key-value pairs.
+Kvspool is a tiny API to stream dictionaries between programs.  The dictionaries have
+textual keys and values. Note that what we're calling a dictionary- what Python calls a 
+dictionary- is known as a hash in Perl, and is manifested in the Java API as a HashMap.
+It's a set of key-value pairs.

-To use kvspool, two programs open the same spool (which is just a directory). The writer
-puts dictionaries into the spool.  The reader gets dictionaries from the spool, blocking
-when it's caught up.  Like this,
+To use kvspool, two programs open the same spool- which is just a directory. The writer
+puts dictionaries into the spool.  The reader gets dictionaries from the spool. It blocks
+when it's caught up, waiting for more data.  Like this,

 image:reader-writer.png[A spool writer and reader]

@@ -39,25 +40,15 @@ Here's a sneak peak at a really simple writer and reader:
 .Why did I write kvspool?
 *******************************************************************************
 I wanted a very simple library that only writes to the local file system, so 
-applications can link with kvspool and use it without undesirable side effects 
-(such as creation of threads, sockets, or incurring blocking operations). I 
-wanted fewer, rather than more, features- taking the Unix pipe as a role model.
-
-I also wanted to cater to the needs of my specific application- a never-ending
-event stream consumed by slower processes that might come and go. I didn't want 
-to fill up the disk if the reader was gone. I also didn't want to block the 
-writer while waiting for a reader. So kvspool keeps data until the spool is full,
-then deletes the old data to make room for new- regardless of whether its been 
-read.  This makes sense when individual events are disposable. Obviously, its 
-not for finance, life support, and situations where every event is critical. 
-I also wanted rewind and replay- to take a snapshot of a running event stream, 
-then be able to work with it offline. I wrote kvspool because I wanted just the 
-features that I needed, that fit my use cases, without heavy dependencies.
+applications can use kvspool without having to set anything up ahead of time-
+no servers to run, no configuration files. I wanted no "side effects" to happen
+in my programs-- no thread creation, no sockets, nothing going on underneath.
+I wanted fewer rather than more features- taking the Unix pipe as a role model.
 *******************************************************************************

 .Loose coupling
 Because the spooled data goes into the disk, the reader and writer are decoupled. They
-don't have to run at the same time. They can come and go.  If the reader exits and
+don't have to run at the same time. They can come and go.  Also, if the reader exits and
 restarts, it picks up where it left off.

 Space management
@@ -76,7 +67,7 @@ fully read. (The data is kept around to reserve that disk space, and to support
 Shared memory I/O
 ~~~~~~~~~~~~~~~~~
 You can locate a spool on a RAM disk if you want the speed of shared memory without true
-disk persistence- kvspool comes with a `ramdisk` utility to make one easily. 
+disk persistence- kvspool comes with a `ramdisk` utility to make one. 

 Data attrition
 ~~~~~~~~~~~~~~
@@ -124,34 +115,45 @@ Now, on the remote computers where you wish to subscribe to the spool, run:

  % kvsp-sub -d spool tcp://192.168.1.9:1110

-Obviously, the IP address must be valid on the publisher side. The port is up to you. This
-type of publish-subscribe does a "fan-out" (each subscriber gets a copy of the data). If
-you use the `-s` switch, on both pub and sub, it changes so each subscriber gets only a
-"1/n" share of the data. The latter mode is also preferred for 1-1 network replication.
+This type of publish-subscribe does a "fan-out". Each subscriber gets a copy of the data.
+(It also drops data is no subscriber is connected- it's a blast to whoever is listening).
+
+-s mode
+^^^^^^^
+If you use the `-s` switch, on both ends (kvsp-pub and kvsp-sub), two things change: 
+data remains queued in the spool until a subscriber connects (instead of being dropped if
+no one is listening). Secondly, if more than one subscriber connects, the data gets 
+divided among them rather than sent to all of them. Generally the -s mode is preferred
+if it fits your use case.
+
+Concentration
+^^^^^^^^^^^^^
+If you give multiple addresses to `kvsp-sub`, it connects to all of them and concentrates
+their published output into a single spool.
+
+  % kvsp-sub -d spool tcp://192.168.1.9:1110 tcp://192.168.1.10:1111

 .The big picture
 *******************************************************************************
 Before moving on- let's take a deep breath and recap. With kvspool, the writer
-is completely unaware (blissfully ignorant) of whether network replication is
-taking place. The writer just writes to the local spool. We run the `kvsp-pub` 
-utility in the background; as data comes into the spool, it transmits it on the 
-network.
+is unaware of whether network replication is taking place. The writer just writes 
+to the local spool. We run the `kvsp-pub` utility in the background; as data 
+comes into the spool, it transmits it on the network.

-On the other computer (the receiving side), we run `kvsp-sub` in the background. 
+On the other computer- the receiving side- we run `kvsp-sub` in the background. 
 It receives the network transmissions, and writes them to its local spool.

-Using `kvsp-pub` and `kvsp-sub`, we completely decouple the writer and reader
-from having to run on the same computer. They maintain a live, continuous
-replication. Whenever data is written to the source spool, it just "shows up"
-in the remote spool.
+Use `kvsp-pub` and `kvsp-sub` to maintain a live, continuous replication. As 
+data is written to the source spool, it just "shows up" in the remote spool.
+The reader and writer are completely uninvolved in the replication process.
 *******************************************************************************

 image:pub-sub.png[Publish and Subscribe]

 [TIP]
-Use a daemon supervisor such as the author's http://troydhanson.github.com/pmtr/[pmtr
-process monitor] to start up these commands at boot up and keep them running in the
-background.
+A job manager such as the author's http://troydhanson.github.com/pmtr/[pmtr process
+monitor] can be used to run `kvsp-sub` and `kvsp-pub` in the background, and restart
+them when the system reboots.

 License
 ~~~~~~~
@@ -426,16 +428,6 @@ has the spool open at the time. It takes the spool directory as its only argumen

 sp_reset(dir);

-Roadmap
-------
-Kvspool is a young library and has some rough edges and room for improvement.
-
-* Autoconf detection for Perl, Python, Java should be improved
-* Test suite is minimal, although kvspool has extensive production use
-* It's only been tested with Ubuntu 10.04
-* Support multi-writer, multi-reader (see doc/future.txt)
-* Replace segmented data files with one memory mapped, circular file
-
 Acknowledgments
 ---------------
 Thanks to Trevor Adams for writing the original Perl and Java bindings and to
--- a/utils/kvsp-sub.c
+++ b/utils/kvsp-sub.c
@@ -15,14 +15,13 @@ void *sp;
 int verbose;
 int pull_mode;
 char *dir;
-char *pub;

 void *context;
 void *socket;


 void usage(char *exe) {
-  fprintf(stderr,"usage: %s [-v] [-s] -d <dir> <pub>\n", exe);
+  fprintf(stderr,"usage: %s [-v] [-s] -d <dir> <pub> [<pub> ...]\n", exe);
  fprintf(stderr,"       -s runs in push-pull mode instead of lossy pub-sub\n");
  exit(-1);
 }
@@ -64,7 +63,7 @@ int json_to_frame(void *sp, void *set, void *msg_data, size_t msg_len) {
 int main(int argc, char *argv[]) {

  zmq_rcvmore_t more; size_t more_sz = sizeof(more);
-  char *exe = argv[0], *filter = "";
+  char *exe = argv[0], *filter = "", *pub;
  int part_num,opt,rc=-1;
  void *msg_data, *sp, *set=NULL;
  size_t msg_len;
@@ -78,16 +77,20 @@ int main(int argc, char *argv[]) {
      default: usage(exe); break;
    }
  }
-  if (optind < argc) pub=argv[optind++];
  if (!dir) usage(exe);
-  if (!pub) usage(exe);
+  if (optind >= argc) usage(exe);
+
  sp = kv_spoolwriter_new(dir);
  if (!sp) usage(exe);
  set = kv_set_new();

+  /* connect socket to each publisher. yes, zeromq lets you connect n times */
  if ( !(context = zmq_init(1))) goto done;
  if ( !(socket = zmq_socket(context, pull_mode?ZMQ_PULL:ZMQ_SUB))) goto done;
-  if (zmq_connect(socket, pub)) goto done;
+  while (optind < argc) {
+    pub = argv[optind++];
+    if (zmq_connect(socket, pub)) goto done;
+  }
  if (!pull_mode) {
    if (zmq_setsockopt(socket, ZMQ_SUBSCRIBE, filter, strlen(filter))) goto done;
  }