mirror of
https://github.com/JHUAPL/kvspool.git
synced 2026-01-09 15:37:56 -05:00
Doc edit and multi-arg to sub
This commit is contained in:
120
doc/kvspool.html
120
doc/kvspool.html
@@ -772,8 +772,8 @@ asciidoc.install(2);
|
||||
<h1>kvspool: a tool for data streams</h1>
|
||||
<span id="author">Troy D. Hanson</span><br />
|
||||
<span id="email"><tt><<a href="mailto:tdh@tkhanson.net">tdh@tkhanson.net</a>></tt></span><br />
|
||||
<span id="revnumber">version 0.7,</span>
|
||||
<span id="revdate">April 2012</span>
|
||||
<span id="revnumber">version 0.8,</span>
|
||||
<span id="revdate">September 2012</span>
|
||||
<div id="toc">
|
||||
<div id="toctitle">Table of Contents</div>
|
||||
<noscript><p><b>JavaScript must be enabled in your browser to display the table of contents.</b></p></noscript>
|
||||
@@ -799,12 +799,13 @@ kv-spool ("key-value" spool)
|
||||
<div class="sect1">
|
||||
<h2 id="_kvspool_8217_s_niche">kvspool’s niche</h2>
|
||||
<div class="sectionbody">
|
||||
<div class="paragraph"><p>Kvspool falls somewhere between the Unix pipe, a file-backed queue and a message-passing
|
||||
library. Its "unit of data" is the <strong>dictionary</strong>. (Or so Python calls it). Perl calls it a
|
||||
hash. It’s a set of key-value pairs.</p></div>
|
||||
<div class="paragraph"><p>To use kvspool, two programs open the same spool (which is just a directory). The writer
|
||||
puts dictionaries into the spool. The reader gets dictionaries from the spool, blocking
|
||||
when it’s caught up. Like this,</p></div>
|
||||
<div class="paragraph"><p>Kvspool is a tiny API to stream dictionaries between programs. The dictionaries have
|
||||
textual keys and values. Note that what we’re calling a dictionary- what Python calls a
|
||||
dictionary- is known as a hash in Perl, and is manifested in the Java API as a HashMap.
|
||||
It’s a set of key-value pairs.</p></div>
|
||||
<div class="paragraph"><p>To use kvspool, two programs open the same spool- which is just a directory. The writer
|
||||
puts dictionaries into the spool. The reader gets dictionaries from the spool. It blocks
|
||||
when it’s caught up, waiting for more data. Like this,</p></div>
|
||||
<div class="paragraph"><p><span class="image">
|
||||
<img src="reader-writer.png" alt="A spool writer and reader" />
|
||||
</span></p></div>
|
||||
@@ -832,22 +833,13 @@ http://www.gnu.org/software/src-highlite -->
|
||||
<div class="content">
|
||||
<div class="title">Why did I write kvspool?</div>
|
||||
<div class="paragraph"><p>I wanted a very simple library that only writes to the local file system, so
|
||||
applications can link with kvspool and use it without undesirable side effects
|
||||
(such as creation of threads, sockets, or incurring blocking operations). I
|
||||
wanted fewer, rather than more, features- taking the Unix pipe as a role model.</p></div>
|
||||
<div class="paragraph"><p>I also wanted to cater to the needs of my specific application- a never-ending
|
||||
event stream consumed by slower processes that might come and go. I didn’t want
|
||||
to fill up the disk if the reader was gone. I also didn’t want to block the
|
||||
writer while waiting for a reader. So kvspool keeps data until the spool is full,
|
||||
then deletes the old data to make room for new- regardless of whether its been
|
||||
read. This makes sense when individual events are disposable. Obviously, its
|
||||
not for finance, life support, and situations where every event is critical.
|
||||
I also wanted rewind and replay- to take a snapshot of a running event stream,
|
||||
then be able to work with it offline. I wrote kvspool because I wanted just the
|
||||
features that I needed, that fit my use cases, without heavy dependencies.</p></div>
|
||||
applications can use kvspool without having to set anything up ahead of time-
|
||||
no servers to run, no configuration files. I wanted no "side effects" to happen
|
||||
in my programs-- no thread creation, no sockets, nothing going on underneath.
|
||||
I wanted fewer rather than more features- taking the Unix pipe as a role model.</p></div>
|
||||
</div></div>
|
||||
<div class="paragraph"><div class="title">Loose coupling</div><p>Because the spooled data goes into the disk, the reader and writer are decoupled. They
|
||||
don’t have to run at the same time. They can come and go. If the reader exits and
|
||||
don’t have to run at the same time. They can come and go. Also, if the reader exits and
|
||||
restarts, it picks up where it left off.</p></div>
|
||||
<div class="sect2">
|
||||
<h3 id="_space_management">Space management</h3>
|
||||
@@ -866,7 +858,7 @@ fully read. (The data is kept around to reserve that disk space, and to support
|
||||
<div class="sect2">
|
||||
<h3 id="_shared_memory_i_o">Shared memory I/O</h3>
|
||||
<div class="paragraph"><p>You can locate a spool on a RAM disk if you want the speed of shared memory without true
|
||||
disk persistence- kvspool comes with a <tt>ramdisk</tt> utility to make one easily.</p></div>
|
||||
disk persistence- kvspool comes with a <tt>ramdisk</tt> utility to make one.</p></div>
|
||||
</div>
|
||||
<div class="sect2">
|
||||
<h3 id="_data_attrition">Data attrition</h3>
|
||||
@@ -919,24 +911,36 @@ that each reader gets it’s own spool:</p></div>
|
||||
<div class="content">
|
||||
<pre><tt>% kvsp-sub -d spool tcp://192.168.1.9:1110</tt></pre>
|
||||
</div></div>
|
||||
<div class="paragraph"><p>Obviously, the IP address must be valid on the publisher side. The port is up to you. This
|
||||
type of publish-subscribe does a "fan-out" (each subscriber gets a copy of the data). If
|
||||
you use the <tt>-s</tt> switch, on both pub and sub, it changes so each subscriber gets only a
|
||||
"1/n" share of the data. The latter mode is also preferred for 1-1 network replication.</p></div>
|
||||
<div class="paragraph"><p>This type of publish-subscribe does a "fan-out". Each subscriber gets a copy of the data.
|
||||
(It also drops data is no subscriber is connected- it’s a blast to whoever is listening).</p></div>
|
||||
<div class="sect3">
|
||||
<h4 id="_s_mode">-s mode</h4>
|
||||
<div class="paragraph"><p>If you use the <tt>-s</tt> switch, on both ends (kvsp-pub and kvsp-sub), two things change:
|
||||
data remains queued in the spool until a subscriber connects (instead of being dropped if
|
||||
no one is listening). Secondly, if more than one subscriber connects, the data gets
|
||||
divided among them rather than sent to all of them. Generally the -s mode is preferred
|
||||
if it fits your use case.</p></div>
|
||||
</div>
|
||||
<div class="sect3">
|
||||
<h4 id="_concentration">Concentration</h4>
|
||||
<div class="paragraph"><p>If you give multiple addresses to <tt>kvsp-sub</tt>, it connects to all of them and concentrates
|
||||
their published output into a single spool.</p></div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre><tt>% kvsp-sub -d spool tcp://192.168.1.9:1110 tcp://192.168.1.10:1111</tt></pre>
|
||||
</div></div>
|
||||
<div class="sidebarblock">
|
||||
<div class="content">
|
||||
<div class="title">The big picture</div>
|
||||
<div class="paragraph"><p>Before moving on- let’s take a deep breath and recap. With kvspool, the writer
|
||||
is completely unaware (blissfully ignorant) of whether network replication is
|
||||
taking place. The writer just writes to the local spool. We run the <tt>kvsp-pub</tt>
|
||||
utility in the background; as data comes into the spool, it transmits it on the
|
||||
network.</p></div>
|
||||
<div class="paragraph"><p>On the other computer (the receiving side), we run <tt>kvsp-sub</tt> in the background.
|
||||
is unaware of whether network replication is taking place. The writer just writes
|
||||
to the local spool. We run the <tt>kvsp-pub</tt> utility in the background; as data
|
||||
comes into the spool, it transmits it on the network.</p></div>
|
||||
<div class="paragraph"><p>On the other computer- the receiving side- we run <tt>kvsp-sub</tt> in the background.
|
||||
It receives the network transmissions, and writes them to its local spool.</p></div>
|
||||
<div class="paragraph"><p>Using <tt>kvsp-pub</tt> and <tt>kvsp-sub</tt>, we completely decouple the writer and reader
|
||||
from having to run on the same computer. They maintain a live, continuous
|
||||
replication. Whenever data is written to the source spool, it just "shows up"
|
||||
in the remote spool.</p></div>
|
||||
<div class="paragraph"><p>Use <tt>kvsp-pub</tt> and <tt>kvsp-sub</tt> to maintain a live, continuous replication. As
|
||||
data is written to the source spool, it just "shows up" in the remote spool.
|
||||
The reader and writer are completely uninvolved in the replication process.</p></div>
|
||||
</div></div>
|
||||
<div class="paragraph"><p><span class="image">
|
||||
<img src="pub-sub.png" alt="Publish and Subscribe" />
|
||||
@@ -946,12 +950,13 @@ in the remote spool.</p></div>
|
||||
<td class="icon">
|
||||
<div class="title">Tip</div>
|
||||
</td>
|
||||
<td class="content">Use a daemon supervisor such as the author’s <a href="http://troydhanson.github.com/pmtr/">pmtr
|
||||
process monitor</a> to start up these commands at boot up and keep them running in the
|
||||
background.</td>
|
||||
<td class="content">A job manager such as the author’s <a href="http://troydhanson.github.com/pmtr/">pmtr process
|
||||
monitor</a> can be used to run <tt>kvsp-sub</tt> and <tt>kvsp-pub</tt> in the background, and restart
|
||||
them when the system reboots.</td>
|
||||
</tr></table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect2">
|
||||
<h3 id="_license">License</h3>
|
||||
<div class="paragraph"><p>See the <a href="LICENSE.txt">LICENSE.txt</a> file. Kvspool is free and open source.</p></div>
|
||||
@@ -1308,39 +1313,6 @@ has the spool open at the time. It takes the spool directory as its only argumen
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect1">
|
||||
<h2 id="_roadmap">Roadmap</h2>
|
||||
<div class="sectionbody">
|
||||
<div class="paragraph"><p>Kvspool is a young library and has some rough edges and room for improvement.</p></div>
|
||||
<div class="ulist"><ul>
|
||||
<li>
|
||||
<p>
|
||||
Autoconf detection for Perl, Python, Java should be improved
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
Test suite is minimal, although kvspool has extensive production use
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
It’s only been tested with Ubuntu 10.04
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
Support multi-writer, multi-reader (see doc/future.txt)
|
||||
</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>
|
||||
Replace segmented data files with one memory mapped, circular file
|
||||
</p>
|
||||
</li>
|
||||
</ul></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect1">
|
||||
<h2 id="_acknowledgments">Acknowledgments</h2>
|
||||
<div class="sectionbody">
|
||||
<div class="paragraph"><p>Thanks to Trevor Adams for writing the original Perl and Java bindings and to
|
||||
@@ -1351,8 +1323,8 @@ Replace segmented data files with one memory mapped, circular file
|
||||
<div id="footnotes"><hr /></div>
|
||||
<div id="footer">
|
||||
<div id="footer-text">
|
||||
Version 0.7<br />
|
||||
Last updated 2012-04-22 12:21:05 EDT
|
||||
Version 0.8<br />
|
||||
Last updated 2012-09-27 09:28:59 EDT
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
kvspool: a tool for data streams
|
||||
================================
|
||||
Troy D. Hanson <tdh@tkhanson.net>
|
||||
v0.7, April 2012
|
||||
v0.8, September 2012
|
||||
|
||||
kv-spool ("key-value" spool)::
|
||||
a Linux-based C library, with Perl, Python and Java bindings, to stream data
|
||||
@@ -10,13 +10,14 @@ kv-spool ("key-value" spool)::
|
||||
|
||||
kvspool's niche
|
||||
---------------
|
||||
Kvspool falls somewhere between the Unix pipe, a file-backed queue and a message-passing
|
||||
library. Its "unit of data" is the *dictionary*. (Or so Python calls it). Perl calls it a
|
||||
hash. It's a set of key-value pairs.
|
||||
Kvspool is a tiny API to stream dictionaries between programs. The dictionaries have
|
||||
textual keys and values. Note that what we're calling a dictionary- what Python calls a
|
||||
dictionary- is known as a hash in Perl, and is manifested in the Java API as a HashMap.
|
||||
It's a set of key-value pairs.
|
||||
|
||||
To use kvspool, two programs open the same spool (which is just a directory). The writer
|
||||
puts dictionaries into the spool. The reader gets dictionaries from the spool, blocking
|
||||
when it's caught up. Like this,
|
||||
To use kvspool, two programs open the same spool- which is just a directory. The writer
|
||||
puts dictionaries into the spool. The reader gets dictionaries from the spool. It blocks
|
||||
when it's caught up, waiting for more data. Like this,
|
||||
|
||||
image:reader-writer.png[A spool writer and reader]
|
||||
|
||||
@@ -39,25 +40,15 @@ Here's a sneak peak at a really simple writer and reader:
|
||||
.Why did I write kvspool?
|
||||
*******************************************************************************
|
||||
I wanted a very simple library that only writes to the local file system, so
|
||||
applications can link with kvspool and use it without undesirable side effects
|
||||
(such as creation of threads, sockets, or incurring blocking operations). I
|
||||
wanted fewer, rather than more, features- taking the Unix pipe as a role model.
|
||||
|
||||
I also wanted to cater to the needs of my specific application- a never-ending
|
||||
event stream consumed by slower processes that might come and go. I didn't want
|
||||
to fill up the disk if the reader was gone. I also didn't want to block the
|
||||
writer while waiting for a reader. So kvspool keeps data until the spool is full,
|
||||
then deletes the old data to make room for new- regardless of whether its been
|
||||
read. This makes sense when individual events are disposable. Obviously, its
|
||||
not for finance, life support, and situations where every event is critical.
|
||||
I also wanted rewind and replay- to take a snapshot of a running event stream,
|
||||
then be able to work with it offline. I wrote kvspool because I wanted just the
|
||||
features that I needed, that fit my use cases, without heavy dependencies.
|
||||
applications can use kvspool without having to set anything up ahead of time-
|
||||
no servers to run, no configuration files. I wanted no "side effects" to happen
|
||||
in my programs-- no thread creation, no sockets, nothing going on underneath.
|
||||
I wanted fewer rather than more features- taking the Unix pipe as a role model.
|
||||
*******************************************************************************
|
||||
|
||||
.Loose coupling
|
||||
Because the spooled data goes into the disk, the reader and writer are decoupled. They
|
||||
don't have to run at the same time. They can come and go. If the reader exits and
|
||||
don't have to run at the same time. They can come and go. Also, if the reader exits and
|
||||
restarts, it picks up where it left off.
|
||||
|
||||
Space management
|
||||
@@ -76,7 +67,7 @@ fully read. (The data is kept around to reserve that disk space, and to support
|
||||
Shared memory I/O
|
||||
~~~~~~~~~~~~~~~~~
|
||||
You can locate a spool on a RAM disk if you want the speed of shared memory without true
|
||||
disk persistence- kvspool comes with a `ramdisk` utility to make one easily.
|
||||
disk persistence- kvspool comes with a `ramdisk` utility to make one.
|
||||
|
||||
Data attrition
|
||||
~~~~~~~~~~~~~~
|
||||
@@ -124,34 +115,45 @@ Now, on the remote computers where you wish to subscribe to the spool, run:
|
||||
|
||||
% kvsp-sub -d spool tcp://192.168.1.9:1110
|
||||
|
||||
Obviously, the IP address must be valid on the publisher side. The port is up to you. This
|
||||
type of publish-subscribe does a "fan-out" (each subscriber gets a copy of the data). If
|
||||
you use the `-s` switch, on both pub and sub, it changes so each subscriber gets only a
|
||||
"1/n" share of the data. The latter mode is also preferred for 1-1 network replication.
|
||||
This type of publish-subscribe does a "fan-out". Each subscriber gets a copy of the data.
|
||||
(It also drops data is no subscriber is connected- it's a blast to whoever is listening).
|
||||
|
||||
-s mode
|
||||
^^^^^^^
|
||||
If you use the `-s` switch, on both ends (kvsp-pub and kvsp-sub), two things change:
|
||||
data remains queued in the spool until a subscriber connects (instead of being dropped if
|
||||
no one is listening). Secondly, if more than one subscriber connects, the data gets
|
||||
divided among them rather than sent to all of them. Generally the -s mode is preferred
|
||||
if it fits your use case.
|
||||
|
||||
Concentration
|
||||
^^^^^^^^^^^^^
|
||||
If you give multiple addresses to `kvsp-sub`, it connects to all of them and concentrates
|
||||
their published output into a single spool.
|
||||
|
||||
% kvsp-sub -d spool tcp://192.168.1.9:1110 tcp://192.168.1.10:1111
|
||||
|
||||
.The big picture
|
||||
*******************************************************************************
|
||||
Before moving on- let's take a deep breath and recap. With kvspool, the writer
|
||||
is completely unaware (blissfully ignorant) of whether network replication is
|
||||
taking place. The writer just writes to the local spool. We run the `kvsp-pub`
|
||||
utility in the background; as data comes into the spool, it transmits it on the
|
||||
network.
|
||||
is unaware of whether network replication is taking place. The writer just writes
|
||||
to the local spool. We run the `kvsp-pub` utility in the background; as data
|
||||
comes into the spool, it transmits it on the network.
|
||||
|
||||
On the other computer (the receiving side), we run `kvsp-sub` in the background.
|
||||
On the other computer- the receiving side- we run `kvsp-sub` in the background.
|
||||
It receives the network transmissions, and writes them to its local spool.
|
||||
|
||||
Using `kvsp-pub` and `kvsp-sub`, we completely decouple the writer and reader
|
||||
from having to run on the same computer. They maintain a live, continuous
|
||||
replication. Whenever data is written to the source spool, it just "shows up"
|
||||
in the remote spool.
|
||||
Use `kvsp-pub` and `kvsp-sub` to maintain a live, continuous replication. As
|
||||
data is written to the source spool, it just "shows up" in the remote spool.
|
||||
The reader and writer are completely uninvolved in the replication process.
|
||||
*******************************************************************************
|
||||
|
||||
image:pub-sub.png[Publish and Subscribe]
|
||||
|
||||
[TIP]
|
||||
Use a daemon supervisor such as the author's http://troydhanson.github.com/pmtr/[pmtr
|
||||
process monitor] to start up these commands at boot up and keep them running in the
|
||||
background.
|
||||
A job manager such as the author's http://troydhanson.github.com/pmtr/[pmtr process
|
||||
monitor] can be used to run `kvsp-sub` and `kvsp-pub` in the background, and restart
|
||||
them when the system reboots.
|
||||
|
||||
License
|
||||
~~~~~~~
|
||||
@@ -426,16 +428,6 @@ has the spool open at the time. It takes the spool directory as its only argumen
|
||||
|
||||
sp_reset(dir);
|
||||
|
||||
Roadmap
|
||||
-------
|
||||
Kvspool is a young library and has some rough edges and room for improvement.
|
||||
|
||||
* Autoconf detection for Perl, Python, Java should be improved
|
||||
* Test suite is minimal, although kvspool has extensive production use
|
||||
* It's only been tested with Ubuntu 10.04
|
||||
* Support multi-writer, multi-reader (see doc/future.txt)
|
||||
* Replace segmented data files with one memory mapped, circular file
|
||||
|
||||
Acknowledgments
|
||||
---------------
|
||||
Thanks to Trevor Adams for writing the original Perl and Java bindings and to
|
||||
|
||||
@@ -15,14 +15,13 @@ void *sp;
|
||||
int verbose;
|
||||
int pull_mode;
|
||||
char *dir;
|
||||
char *pub;
|
||||
|
||||
void *context;
|
||||
void *socket;
|
||||
|
||||
|
||||
void usage(char *exe) {
|
||||
fprintf(stderr,"usage: %s [-v] [-s] -d <dir> <pub>\n", exe);
|
||||
fprintf(stderr,"usage: %s [-v] [-s] -d <dir> <pub> [<pub> ...]\n", exe);
|
||||
fprintf(stderr," -s runs in push-pull mode instead of lossy pub-sub\n");
|
||||
exit(-1);
|
||||
}
|
||||
@@ -64,7 +63,7 @@ int json_to_frame(void *sp, void *set, void *msg_data, size_t msg_len) {
|
||||
int main(int argc, char *argv[]) {
|
||||
|
||||
zmq_rcvmore_t more; size_t more_sz = sizeof(more);
|
||||
char *exe = argv[0], *filter = "";
|
||||
char *exe = argv[0], *filter = "", *pub;
|
||||
int part_num,opt,rc=-1;
|
||||
void *msg_data, *sp, *set=NULL;
|
||||
size_t msg_len;
|
||||
@@ -78,16 +77,20 @@ int main(int argc, char *argv[]) {
|
||||
default: usage(exe); break;
|
||||
}
|
||||
}
|
||||
if (optind < argc) pub=argv[optind++];
|
||||
if (!dir) usage(exe);
|
||||
if (!pub) usage(exe);
|
||||
if (optind >= argc) usage(exe);
|
||||
|
||||
sp = kv_spoolwriter_new(dir);
|
||||
if (!sp) usage(exe);
|
||||
set = kv_set_new();
|
||||
|
||||
/* connect socket to each publisher. yes, zeromq lets you connect n times */
|
||||
if ( !(context = zmq_init(1))) goto done;
|
||||
if ( !(socket = zmq_socket(context, pull_mode?ZMQ_PULL:ZMQ_SUB))) goto done;
|
||||
if (zmq_connect(socket, pub)) goto done;
|
||||
while (optind < argc) {
|
||||
pub = argv[optind++];
|
||||
if (zmq_connect(socket, pub)) goto done;
|
||||
}
|
||||
if (!pull_mode) {
|
||||
if (zmq_setsockopt(socket, ZMQ_SUBSCRIBE, filter, strlen(filter))) goto done;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user