Initial import from Subversion sources.

This commit is contained in:
Robert J. Hansen
2013-03-02 11:50:00 -05:00
parent 3c2f198d16
commit 0b6f9655a8
21 changed files with 3720 additions and 1 deletions

5
AUTHORS Normal file
View File

@@ -0,0 +1,5 @@
Robert J. Hansen <rjh@secret-alchemy.com>
* nsrlsvr

80
CHANGELOG Normal file
View File

@@ -0,0 +1,80 @@
1.1.1: February 22, 2013
* Code cleanups. Now has better support for using custom datasets.
Less dependence on scripts for building.
1.1: May 2, 2012
* Supports version 2 of the wire protocol, which introduces new
commands: STATUS (gives server status), BYE (what it says),
UPSHIFT (attempt to negotiate to a more recent protocol) and
DOWNSHIFT (negotiate to a lower protocol). Version 2 also
supports multiple QUERY commands in a single connection, which
helps a lot when fighting off port exhaustion.
* Switched from blocking I/O to poll()-based I/O. This helps
deal with the out-of-control system loads that some users
were seeing.
* Uses RDS 2.36
1.0.6: January 20, 2012
* Discovered that Win32 I/O redirection didn't work at all.
Whoops. This got fixed.
1.0.5: January 17, 2012
* 1.0.4 added a bad regex that didn't match as much as it
should have. This had the effects of stripping SHA-1 hashes
down to 128 bits. Whoops.
* Now compiles on FreeBSD 9.0.
* nsrlparse became nsrlparse.py
* nsrllookup became nsrllookup.py
* Fixed documentation to reflect these name changes
1.0.4: January 3, 2012
* Added a preflight script to help in development. This has no
effect on end-users.
* Removed a bit of debugging output that was accidentally left
in.
* Moved the 'populate' script to 'nsrlparse' and added it to
the list of installed files
* MD5 is now fully supported, as is interoperability with
md5deep.
1.0.3: January 3, 2012
* Fixed an interoperability bug with sha1deep.
1.0.2: December 30, 2011
* Ubuntu 11.10 complains about handler.cc, on account of how
there are some write() calls that aren't checked for returning
a -1. Virtually all of those were superfluous warnings: one
could possibly have created an intermittent error sooner or
later. They have all been patched, and it now compiles cleanly
on Ubuntu 11.10.
1.0.1: December 30, 2011
* nsrllookup had a bug that would become manifest while querying
millions of records. Now nsrllookup breaks it up into blocks
of 4096 queries (a maximum of 164k of data per connection).
This will hopefully improve performance for those times when
you want to push millions of queries to the server.
1.0: December 30, 2011
* First ready-for-the-users release. The only new feature over
the release candidate series is a much improved installation
procedure.
* It should be possible to make RPMs, Debian packages, or
what-have-you, since the install process is now bog-standard
GNU ./configure && make && make install
1.0rcX: December, 2011
* Ready for limited beta testing. The only change visible to
end users was introducing support for OS X 10.6.
* A bug that prevented reliable functioning on Fedora and OpenSUSE
was found and crushed.
* The internals were ported from a very C-like C++ subset to a
much more C++ code style. This reduced our dependency on GNU
getline(), which had been the major obstacle to OS X 10.6
support.
0.9: December, 2011
* Successfully tested nsrlsvr with the full NIST NSRL RDS on a 4Gb
Apple iMac. It made it completely unusable as a desktop, but was
able to successfully service requests.

56
INSTALL Normal file
View File

@@ -0,0 +1,56 @@
Installation instructions:
1. Decide what data set you want nsrlsvr to query against by
default. Your options are:
a. NIST's NSRL RDS (http://http://www.nsrl.nist.gov/).
b. A dataset that you provide at compile-time. For instance,
if you have a proprietary set of SHA-1 hashes of known
malware and know you'll only ever want to use that, this is
the way to go.
You may also tell nsrlsvr to use a different file by passing the
"-f" flag when launching the server. This file must contain nothing
but MD5, SHA-1 or SHA-256 hashes, one per line in hexadecimal format,
with no other content on a line. This option is mostly for developer
testing: most users will never touch it.
2. If you're compiling it using your own dataset, your dataset must
be in a format nsrlsvr understands. One good way to do this is with
Jesse Kornblum's md5deep tool:
$ md5deep -c [FILES] > my_dataset.txt
3. Run the ./configure script, passing it one or more of:
a. No options: if the current RDS zipfile exists in the build
directory, use that; otherwise, try to download it.
b. --with-custom=my_set.txt: use your own dataset
c. --with-nsrl=filename: use an already-downloaded NSRL RDS
zip file (one that lives, e.g., outside the build dir).
You will want to use this option if a newer NSRL RDS has
been released than the one nsrlsvr knows about.
4. Once you've completed the "make && make install" dance, an
executables will be installed to $PREFIX/bin: nsrlsvr, the server
application, which runs as a UNIX daemon
5. As an example of how it can be used:
$ md5deep -c /path/to/evil/files > evil_dataset.txt
$ ./configure --with-custom=evil_dataset.txt
$ make
$ sudo make install
You've now created a custom dataset that contains MD5 hashes
of files you've declared to be evil.
$ nsrlsvr -t 1800
You've started the server and instructed it to automatically
shut down after a half-hour of inactivity.

19
INSTALL.GIT Normal file
View File

@@ -0,0 +1,19 @@
If you're reading this, then you're using a Subversion snapshot of
nsrlsvr. Please check your configure.in script to ensure the
version has "svn" after it. If it doesn't, please holler at me
that I've got a broken version string. :)
Building from Subversion sources is not recommended. At any given
moment the tree may be broken. That said, if you want to live on
the edge, go for it.
1. Do an 'svn up'. Don't assume that just because you checked the
code out yesterday that it's still the same today. Seriously,
svn up.
2. 'sh ./bootstrap.sh'. The Subversion tree does not include a
configure script. If you have a configure script in your
directory, then it is something you created and it may no longer
be in sync with changes to the tree. Running the bootstrap
script will create a new configure script for you.
3. Once you've recreated the configure script, build it just as
you would a released version.

13
LICENSE Normal file
View File

@@ -0,0 +1,13 @@
Copyright (c) 2011-2013, Robert J. Hansen <rjh@secret-alchemy.com>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

3
Makefile.am Normal file
View File

@@ -0,0 +1,3 @@
EXTRA_DIST=LICENSE README CHANGELOG AUTHORS INSTALL THANKS convert-format.py denistify.py
SUBDIRS=src man
ACLOCAL_AMFLAGS=-I m4

View File

@@ -1,2 +1,11 @@
nsrlsvr
=======
=======
nsrlsvr is a tool to facilitate looking up data in the National Software
Reference Library (NSRL). It's in a 1.1 state, which means it's unlikely
to break in two if you look at it the wrong way but still may not be as
stable as you'd like.
Installation instructions are found in the INSTALL file. Please read them.
Due to the size of the NSRL's reference data set (RDS), installing nsrlsvr
is a little bit more involved than one would like. It isn't hard: it's just
not quite a configure, make, make install dance.

14
THANKS Normal file
View File

@@ -0,0 +1,14 @@
* RedJack Security <http://www.redjack.com>
- RedJack's been kind enough to let me hack on nsrlquery during
business hours during lulls in work. Thanks, guys. It's immensely
appreciated.
* Jesse Kornblum <jessekornblum@gmail.com>
- Proposed the original "you know, there ought to be a way..." that
led to nsrlquery
- helped make nsrlquery work on OS X 10.6
- noticed more bugs than can quickly be listed here :)
* Mark Kealiher <mkealiher@gmail.com>
- Early adopters get to bleed on the cutting edge, and he shed more
than his due. Thanks, Mark. Hopefully it works better now. :)

5
bootstrap.sh Executable file
View File

@@ -0,0 +1,5 @@
#!/bin/sh
aclocal -I m4
automake --foreign --add-missing
autoheader
autoconf

163
config.h.in Normal file
View File

@@ -0,0 +1,163 @@
/* config.h.in. Generated from configure.ac by autoheader. */
/* Define to 1 if you have the <arpa/inet.h> header file. */
#undef HAVE_ARPA_INET_H
/* Define to 1 if you have the `fork' function. */
#undef HAVE_FORK
/* Define to 1 if you have the `inet_ntoa' function. */
#undef HAVE_INET_NTOA
/* Define to 1 if the system has the type `intmax_t'. */
#undef HAVE_INTMAX_T
/* Define to 1 if you have the <inttypes.h> header file. */
#undef HAVE_INTTYPES_H
/* Define to 1 if you have the <limits.h> header file. */
#undef HAVE_LIMITS_H
/* Define to 1 if the system has the type `long long int'. */
#undef HAVE_LONG_LONG_INT
/* Define to 1 if you have the <memory.h> header file. */
#undef HAVE_MEMORY_H
/* Define to 1 if you have the `memset' function. */
#undef HAVE_MEMSET
/* Define to 1 if you have the <netinet/in.h> header file. */
#undef HAVE_NETINET_IN_H
/* Define if you have POSIX threads libraries and header files. */
#undef HAVE_PTHREAD
/* Define to 1 if you have the `socket' function. */
#undef HAVE_SOCKET
/* Define to 1 if stdbool.h conforms to C99. */
#undef HAVE_STDBOOL_H
/* Define to 1 if you have the <stdint.h> header file. */
#undef HAVE_STDINT_H
/* Define to 1 if you have the <stdlib.h> header file. */
#undef HAVE_STDLIB_H
/* Define to 1 if you have the <strings.h> header file. */
#undef HAVE_STRINGS_H
/* Define to 1 if you have the <string.h> header file. */
#undef HAVE_STRING_H
/* Define to 1 if you have the <syslog.h> header file. */
#undef HAVE_SYSLOG_H
/* Define to 1 if you have the <sys/socket.h> header file. */
#undef HAVE_SYS_SOCKET_H
/* Define to 1 if you have the <sys/stat.h> header file. */
#undef HAVE_SYS_STAT_H
/* Define to 1 if you have the <sys/types.h> header file. */
#undef HAVE_SYS_TYPES_H
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H
/* Define to 1 if you have the `vfork' function. */
#undef HAVE_VFORK
/* Define to 1 if you have the <vfork.h> header file. */
#undef HAVE_VFORK_H
/* Define to 1 if `fork' works. */
#undef HAVE_WORKING_FORK
/* Define to 1 if `vfork' works. */
#undef HAVE_WORKING_VFORK
/* Define to 1 if the system has the type `_Bool'. */
#undef HAVE__BOOL
/* Name of package */
#undef PACKAGE
/* Define to the address where bug reports for this package should be sent. */
#undef PACKAGE_BUGREPORT
/* Define to the full name of this package. */
#undef PACKAGE_NAME
/* Define to the full name and version of this package. */
#undef PACKAGE_STRING
/* Define to the one symbol short name of this package. */
#undef PACKAGE_TARNAME
/* Define to the version of this package. */
#undef PACKAGE_VERSION
/* Define to necessary symbol if this constant uses a non-standard name on
your system. */
#undef PTHREAD_CREATE_JOINABLE
/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS
/* Version number of package */
#undef VERSION
/* Define for Solaris 2.5.1 so the uint32_t typedef from <sys/synch.h>,
<pthread.h>, or <semaphore.h> is not used. If the typedef was allowed, the
#define below would cause a syntax error. */
#undef _UINT32_T
/* Define for Solaris 2.5.1 so the uint8_t typedef from <sys/synch.h>,
<pthread.h>, or <semaphore.h> is not used. If the typedef was allowed, the
#define below would cause a syntax error. */
#undef _UINT8_T
/* Define to empty if `const' does not conform to ANSI C. */
#undef const
/* Define to the type of a signed integer type of width exactly 16 bits if
such a type exists and the standard includes do not define it. */
#undef int16_t
/* Define to the type of a signed integer type of width exactly 32 bits if
such a type exists and the standard includes do not define it. */
#undef int32_t
/* Define to the type of a signed integer type of width exactly 8 bits if such
a type exists and the standard includes do not define it. */
#undef int8_t
/* Define to the widest signed integer type if <stdint.h> and <inttypes.h> do
not define. */
#undef intmax_t
/* Define to `int' if <sys/types.h> does not define. */
#undef pid_t
/* Define to `unsigned int' if <sys/types.h> does not define. */
#undef size_t
/* Define to `int' if <sys/types.h> does not define. */
#undef ssize_t
/* Define to the type of an unsigned integer type of width exactly 16 bits if
such a type exists and the standard includes do not define it. */
#undef uint16_t
/* Define to the type of an unsigned integer type of width exactly 32 bits if
such a type exists and the standard includes do not define it. */
#undef uint32_t
/* Define to the type of an unsigned integer type of width exactly 8 bits if
such a type exists and the standard includes do not define it. */
#undef uint8_t
/* Define as `fork' if `vfork' does not work. */
#undef vfork

118
configure.ac Normal file
View File

@@ -0,0 +1,118 @@
AC_INIT([NSRL Server], [1.1.2], [Robert J. Hansen <rjh@secret-alchemy.com>], [nsrlsvr], [http://nsrlquery.sourceforge.net])
AC_ARG_WITH([nsrl],
[AS_HELP_STRING([--with-nsrl],
[use NIST's NSRL RDS @<:@default: use the NSRL RDS@:>@])],
[nsrl=${withval}], [nsrl=no])
AC_ARG_WITH([custom],
[AS_HELP_STRING([--with-custom],
[use a custom dataset @<:@default: don't@:>@])],
[custom=${withval}], [custom=no])
if test "x$custom" != "xno" ; then
AM_PATH_PYTHON([2.7])
fi
if test "x$nsrl" != "xno" && test "x$custom" != "xno" ; then
AC_MSG_ERROR([The --with-nsrl and --with-custom flags are mutually exclusive.]);
fi
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_SRCDIR([src/main.cc])
AC_PREREQ([2.58])
AC_CONFIG_HEADERS([config.h])
AM_INIT_AUTOMAKE([foreign])
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
AC_PROG_CXX
AC_TYPE_INT8_T
AC_TYPE_UINT8_T
AC_TYPE_INT16_T
AC_TYPE_UINT16_T
AC_TYPE_INT32_T
AC_TYPE_UINT32_T
AC_TYPE_INTMAX_T
ACX_PTHREAD([], AC_MSG_ERROR([pthreads does not appear usable.]))
AC_CHECK_FUNCS([inet_ntoa])
AC_CHECK_FUNCS([memset])
AC_CHECK_FUNCS([socket])
AC_CHECK_HEADERS([arpa/inet.h])
AC_CHECK_HEADERS([limits.h])
AC_CHECK_HEADERS([netinet/in.h])
AC_CHECK_HEADERS([sys/socket.h])
AC_CHECK_HEADERS([syslog.h])
AC_C_CONST
AC_FUNC_FORK
dnl AC_FUNC_GETLOADAVG
AC_HEADER_STDBOOL
AC_TYPE_PID_T
AC_TYPE_SIZE_T
AC_TYPE_SSIZE_T
RDS_URL=http://www.nsrl.nist.gov/RDS/rds_2.39/RDS_239m.zip
nsrl_filename=RDS_239m.zip
if test "x$nsrl" != xno ; then
if ! test -r $nsrl ; then
AC_MSG_ERROR([Couldn't find the dataset specified.])
else
nsrl_filename = $nsrl
fi
fi
if ! test -r $nsrl_filename && test "x$custom" = xno ; then
AC_CHECK_PROG([UNZIP], [unzip], [unzip], AC_MSG_ERROR([unzip not found: this is necessary to use the downloaded NIST NSRL RDS]))
AC_CHECK_PROG([WGET], [wget], [wget], [no])
if test "x$WGET" = xwget ; then
wget $RDS_URL ;
else
AC_CHECK_PROG([CURL], [curl], [curl], [no])
if test "x$CURL" = xcurl ; then
curl -O $RDS_URL ;
else
AC_MSG_ERROR([The NIST NSRL RDS must be downloaded, but neither curl nor wget are in your PATH. Please fix this, and try again.])
fi
fi
AC_MSG_NOTICE([
***
*** I'm going to leave the file $nsrl_filename around in the toplevel of the
*** build directory. If you leave it here, the next time you build this it
*** will save you a long download.
***])
fi
if test "x$custom" = xno ; then
if ! test -r $nsrl_filename ; then
AC_MSG_ERROR([
***
*** Couldn't open $nsrl_filename for reading.
***
*** If you used a tilde ("~") in the path, try giving a full directory
*** path: sometimes tilde expansion confuses configure.
***]);
else
AC_MSG_NOTICE([uncompressing the NSRL RDS -- this may take a while...])
rm -f NSRLFile.txt src/NSRLFile.txt
unzip -o $nsrl_filename NSRLFile.txt
AC_MSG_NOTICE([converting into nsrlsvr's data format -- please wait...])
$PYTHON ./denistify.py
rm -f NSRLFile.txt
fi
else
if ! test -r $custom ; then
AC_MSG_ERROR([
***
*** Couldn't open $custom for reading.
***
*** If you used a tilde ("~") in the path, try giving a full directory
*** path: sometimes tilde expansion confuses configure.
***]);
fi
AC_MSG_NOTICE([converting $custom to the proper data format -- please wait...])
rm -f src/NSRLFile.txt
$PYTHON ./convert-format.py $custom
fi
AC_OUTPUT([Makefile src/Makefile man/Makefile])

30
convert-format.py Executable file
View File

@@ -0,0 +1,30 @@
#!/usr/bin/env python
from __future__ import print_function
import re, sys, os
hash_re = re.compile(r"([0-9A-Fa-f]{64}|[0-9A-Fa-f]{40}|[0-9A-Fa-f]{32})")
if len(sys.argv) != 2:
print("No file specified.")
exit(-1)
if not os.access(sys.argv[1], os.R_OK):
print("Couldn't read " + sys.argv[1])
exit(-2)
with open(sys.argv[1]) as fh:
hashes = [hash_re.search(X).group(1) for X in fh.readlines() if hash_re.search(X)]
if not hashes:
print("Zero hashes found -- check to see if this is correct.")
exit(-4)
first_len = len(hashes[0])
if [X for X in hashes[1:] if len(X) != first_len]:
print("Multiple different hash algorithms present in " + sys.argv[1])
exit(-8)
with open("src/NSRLFile.txt", "w") as output:
for hash in hashes:
output.write(hash + "\n")

23
denistify.py Executable file
View File

@@ -0,0 +1,23 @@
#!/usr/bin/env python
#coding=UTF-8
import re, sys
md5_re = re.compile('^.*"([0-9A-Fa-f]{32})".*$')
hashes = []
count = 0
with open("NSRLFile.txt") as fh:
line = fh.readline()
while line:
elements = line.split(",")
if len(elements) >= 2:
match = md5_re.match(elements[1])
if match:
hashes.append(match.group(1))
line = fh.readline()
hashes.sort()
with open("src/NSRLFile.txt", "w") as fh:
for entry in hashes:
fh.write(entry + "\n")

242
m4/acx_pthread.m4 Normal file
View File

@@ -0,0 +1,242 @@
dnl @synopsis ACX_PTHREAD([ACTION-IF-FOUND[, ACTION-IF-NOT-FOUND]])
dnl
dnl @summary figure out how to build C programs using POSIX threads
dnl
dnl This macro figures out how to build C programs using POSIX threads.
dnl It sets the PTHREAD_LIBS output variable to the threads library and
dnl linker flags, and the PTHREAD_CFLAGS output variable to any special
dnl C compiler flags that are needed. (The user can also force certain
dnl compiler flags/libs to be tested by setting these environment
dnl variables.)
dnl
dnl Also sets PTHREAD_CC to any special C compiler that is needed for
dnl multi-threaded programs (defaults to the value of CC otherwise).
dnl (This is necessary on AIX to use the special cc_r compiler alias.)
dnl
dnl NOTE: You are assumed to not only compile your program with these
dnl flags, but also link it with them as well. e.g. you should link
dnl with $PTHREAD_CC $CFLAGS $PTHREAD_CFLAGS $LDFLAGS ... $PTHREAD_LIBS
dnl $LIBS
dnl
dnl If you are only building threads programs, you may wish to use
dnl these variables in your default LIBS, CFLAGS, and CC:
dnl
dnl LIBS="$PTHREAD_LIBS $LIBS"
dnl CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
dnl CC="$PTHREAD_CC"
dnl
dnl In addition, if the PTHREAD_CREATE_JOINABLE thread-attribute
dnl constant has a nonstandard name, defines PTHREAD_CREATE_JOINABLE to
dnl that name (e.g. PTHREAD_CREATE_UNDETACHED on AIX).
dnl
dnl ACTION-IF-FOUND is a list of shell commands to run if a threads
dnl library is found, and ACTION-IF-NOT-FOUND is a list of commands to
dnl run it if it is not found. If ACTION-IF-FOUND is not specified, the
dnl default action will define HAVE_PTHREAD.
dnl
dnl Please let the authors know if this macro fails on any platform, or
dnl if you have any other suggestions or comments. This macro was based
dnl on work by SGJ on autoconf scripts for FFTW (www.fftw.org) (with
dnl help from M. Frigo), as well as ac_pthread and hb_pthread macros
dnl posted by Alejandro Forero Cuervo to the autoconf macro repository.
dnl We are also grateful for the helpful feedback of numerous users.
dnl
dnl @category InstalledPackages
dnl @author Steven G. Johnson <stevenj@alum.mit.edu>
dnl @version 2006-05-29
dnl @license GPLWithACException
AC_DEFUN([ACX_PTHREAD], [
AC_REQUIRE([AC_CANONICAL_HOST])
AC_LANG_SAVE
AC_LANG_C
acx_pthread_ok=no
# We used to check for pthread.h first, but this fails if pthread.h
# requires special compiler flags (e.g. on True64 or Sequent).
# It gets checked for in the link test anyway.
# First of all, check if the user has set any of the PTHREAD_LIBS,
# etcetera environment variables, and if threads linking works using
# them:
if test x"$PTHREAD_LIBS$PTHREAD_CFLAGS" != x; then
save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
save_LIBS="$LIBS"
LIBS="$PTHREAD_LIBS $LIBS"
AC_MSG_CHECKING([for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS])
AC_TRY_LINK_FUNC(pthread_join, acx_pthread_ok=yes)
AC_MSG_RESULT($acx_pthread_ok)
if test x"$acx_pthread_ok" = xno; then
PTHREAD_LIBS=""
PTHREAD_CFLAGS=""
fi
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
fi
# We must check for the threads library under a number of different
# names; the ordering is very important because some systems
# (e.g. DEC) have both -lpthread and -lpthreads, where one of the
# libraries is broken (non-POSIX).
# Create a list of thread flags to try. Items starting with a "-" are
# C compiler flags, and other items are library names, except for "none"
# which indicates that we try without any flags at all, and "pthread-config"
# which is a program returning the flags for the Pth emulation library.
acx_pthread_flags="pthreads none -Kthread -kthread lthread -pthread -pthreads -mthreads pthread --thread-safe -mt pthread-config"
# The ordering *is* (sometimes) important. Some notes on the
# individual items follow:
# pthreads: AIX (must check this before -lpthread)
# none: in case threads are in libc; should be tried before -Kthread and
# other compiler flags to prevent continual compiler warnings
# -Kthread: Sequent (threads in libc, but -Kthread needed for pthread.h)
# -kthread: FreeBSD kernel threads (preferred to -pthread since SMP-able)
# lthread: LinuxThreads port on FreeBSD (also preferred to -pthread)
# -pthread: Linux/gcc (kernel threads), BSD/gcc (userland threads)
# -pthreads: Solaris/gcc
# -mthreads: Mingw32/gcc, Lynx/gcc
# -mt: Sun Workshop C (may only link SunOS threads [-lthread], but it
# doesn't hurt to check since this sometimes defines pthreads too;
# also defines -D_REENTRANT)
# ... -mt is also the pthreads flag for HP/aCC
# pthread: Linux, etcetera
# --thread-safe: KAI C++
# pthread-config: use pthread-config program (for GNU Pth library)
case "${host_cpu}-${host_os}" in
*solaris*)
# On Solaris (at least, for some versions), libc contains stubbed
# (non-functional) versions of the pthreads routines, so link-based
# tests will erroneously succeed. (We need to link with -pthreads/-mt/
# -lpthread.) (The stubs are missing pthread_cleanup_push, or rather
# a function called by this macro, so we could check for that, but
# who knows whether they'll stub that too in a future libc.) So,
# we'll just look for -pthreads and -lpthread first:
acx_pthread_flags="-pthreads pthread -mt -pthread $acx_pthread_flags"
;;
esac
if test x"$acx_pthread_ok" = xno; then
for flag in $acx_pthread_flags; do
case $flag in
none)
AC_MSG_CHECKING([whether pthreads work without any flags])
;;
-*)
AC_MSG_CHECKING([whether pthreads work with $flag])
PTHREAD_CFLAGS="$flag"
;;
pthread-config)
AC_CHECK_PROG(acx_pthread_config, pthread-config, yes, no)
if test x"$acx_pthread_config" = xno; then continue; fi
PTHREAD_CFLAGS="`pthread-config --cflags`"
PTHREAD_LIBS="`pthread-config --ldflags` `pthread-config --libs`"
;;
*)
AC_MSG_CHECKING([for the pthreads library -l$flag])
PTHREAD_LIBS="-l$flag"
;;
esac
save_LIBS="$LIBS"
save_CFLAGS="$CFLAGS"
LIBS="$PTHREAD_LIBS $LIBS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
# Check for various functions. We must include pthread.h,
# since some functions may be macros. (On the Sequent, we
# need a special flag -Kthread to make this header compile.)
# We check for pthread_join because it is in -lpthread on IRIX
# while pthread_create is in libc. We check for pthread_attr_init
# due to DEC craziness with -lpthreads. We check for
# pthread_cleanup_push because it is one of the few pthread
# functions on Solaris that doesn't have a non-functional libc stub.
# We try pthread_create on general principles.
AC_TRY_LINK([#include <pthread.h>],
[pthread_t th; pthread_join(th, 0);
pthread_attr_init(0); pthread_cleanup_push(0, 0);
pthread_create(0,0,0,0); pthread_cleanup_pop(0); ],
[acx_pthread_ok=yes])
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
AC_MSG_RESULT($acx_pthread_ok)
if test "x$acx_pthread_ok" = xyes; then
break;
fi
PTHREAD_LIBS=""
PTHREAD_CFLAGS=""
done
fi
# Various other checks:
if test "x$acx_pthread_ok" = xyes; then
save_LIBS="$LIBS"
LIBS="$PTHREAD_LIBS $LIBS"
save_CFLAGS="$CFLAGS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
# Detect AIX lossage: JOINABLE attribute is called UNDETACHED.
AC_MSG_CHECKING([for joinable pthread attribute])
attr_name=unknown
for attr in PTHREAD_CREATE_JOINABLE PTHREAD_CREATE_UNDETACHED; do
AC_TRY_LINK([#include <pthread.h>], [int attr=$attr; return attr;],
[attr_name=$attr; break])
done
AC_MSG_RESULT($attr_name)
if test "$attr_name" != PTHREAD_CREATE_JOINABLE; then
AC_DEFINE_UNQUOTED(PTHREAD_CREATE_JOINABLE, $attr_name,
[Define to necessary symbol if this constant
uses a non-standard name on your system.])
fi
AC_MSG_CHECKING([if more special flags are required for pthreads])
flag=no
case "${host_cpu}-${host_os}" in
*-aix* | *-freebsd* | *-darwin*) flag="-D_THREAD_SAFE";;
*solaris* | *-osf* | *-hpux*) flag="-D_REENTRANT";;
esac
AC_MSG_RESULT(${flag})
if test "x$flag" != xno; then
PTHREAD_CFLAGS="$flag $PTHREAD_CFLAGS"
fi
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
# More AIX lossage: must compile with xlc_r or cc_r
if test x"$GCC" != xyes; then
AC_CHECK_PROGS(PTHREAD_CC, xlc_r cc_r, ${CC})
else
PTHREAD_CC=$CC
fi
else
PTHREAD_CC="$CC"
fi
AC_SUBST(PTHREAD_LIBS)
AC_SUBST(PTHREAD_CFLAGS)
AC_SUBST(PTHREAD_CC)
# Finally, execute ACTION-IF-FOUND/ACTION-IF-NOT-FOUND:
if test x"$acx_pthread_ok" = xyes; then
ifelse([$1],,AC_DEFINE(HAVE_PTHREAD,1,[Define if you have POSIX threads libraries and header files.]),[$1])
:
else
acx_pthread_ok=no
$2
fi
AC_LANG_RESTORE
])dnl ACX_PTHREAD

2
man/Makefile.am Normal file
View File

@@ -0,0 +1,2 @@
EXTRA_DIST=nsrlsvr.1
man_MANS=nsrlsvr.1

62
man/nsrlsvr.1 Normal file
View File

@@ -0,0 +1,62 @@
.Dd January 30, 2012
.Dt NSRLSVR 1
.Os
.Sh NAME
.Nm nsrlsvr
.Nd server yielding hashes from NIST's NSRL RDS
.Sh SYNOPSIS
.Nm nsrlsvr
.Op Fl b
.Op Fl h
.Op Fl o
.Op Fl s
.Op Fl S
.Op Fl v
.Op Fl f Ar RDS-file
.Op Fl p Ar port
.Op Fl t Ar timeout
.Sh DESCRIPTION
nsrlsvr provides a daemon that services queries from clients requesting information
about whether certain hash values are present in the NIST National Software Reference
Laboratory Reference Data Set (NSRL RDS).
.Sh OPTIONS
.Bl -tag -width Ds
.It Fl b
show information on submitting bug reports, then exit
.It Fl h
show a help screen, then exit
.It Fl o
only support the old 1.0 server protocol
.It Fl s
allow clients to query the server status (default: disabled)
.It Fl S
run as a normal process (do not run as a daemon)
.It Fl v
show version information, then exit
.It Fl f Ar RDS-file
specify an alternate RDS file in
.Ar RDS-file
.It Fl p Ar port
listen on port (default: 9120)
.Ar port
.It Fl t Ar timeout
shut down after
.Ar timeout
seconds of inactivity (default: disabled)
.El
.Sh NOTES
To support the full NSRL RDS requires a lot of memory. Although it will run on
a 4Gb system, the results may be unsatisfactory. A 64-bit OS with at least 8Gb
of RAM is recommended.
.Pp
nsrlsvr treats the
.Ar timeout
value as a guideline. It will not shut down before
.Ar timeout
seconds of inactivity, but it may allow up to thirty seconds more.
.Sh BUGS
None known.
.Sh SEE ALSO
nsrllookup(1)
.Sh AUTHOR
Robert J. Hansen <rjh@secret-alchemy.com>

1757
src/Doxyfile Normal file

File diff suppressed because it is too large Load Diff

6
src/Makefile.am Normal file
View File

@@ -0,0 +1,6 @@
EXTRA_DIST = handler.hpp Doxyfile
bin_PROGRAMS = nsrlsvr
nodist_pkgdata_DATA = NSRLFile.txt
nsrlsvr_SOURCES = main.cc handler.cc
nsrlsvr_CPPFLAGS = -DPKGDATADIR="\"$(pkgdatadir)\"" -DPACKAGE_VERSION="\"$(PACKAGE_VERSION)\"" -DPACKAGE_URL="\"$(PACKAGE_URL)\"" -DPACKAGE_BUGREPORT="\"$(PACKAGE_BUGREPORT)\"" $(PTHREAD_CFLAGS)
nsrlsvr_LDFLAGS=$(PTHREAD_CFLAGS)

595
src/handler.cc Normal file
View File

@@ -0,0 +1,595 @@
/* $Id: handler.cc 142 2013-02-23 22:25:32Z rjh $
*
* Copyright (c) 2011-2012, Robert J. Hansen <rjh@secret-alchemy.com>
* and others.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Code standards:
* This is a small enough project we don't need a formal coding standard.
* That said, here are some helpful tips for people who want to submit
* patches:
*
* - If it's not 100% ISO C++98, it won't get in.
* - It must compile cleanly and without warnings under both GNU G++
* and Clang++, even with "-W -Wextra -ansi -pedantic".
* - C++ offers 'and', 'or' and 'not' keywords instead of &&, || and !.
* I like these: I think they're more readable. Please use them.
* - C++ allows you to initialize variables at declaration time by
* doing something like "int x(3)" instead of "int x = 3". Please
* do this where practical: it's a good habit to get into for C++.
* - Please try to follow the formatting conventions. It's mostly
* straight-up astyle format, with occasional tweaks where necessary
* to get nice hardcopy printouts.
* - If you write a new function it must have a Doxygen block
* documenting it.
*
* Contributor history:
*
* Robert J. Hansen <rjh@secret-alchemy.com>
* - most everything
* Jesse Kornblum <jessekornblum@gmail.com>
* - patch to log how many hashes are in each QUERY statement
*/
#include <string>
#include <set>
#include <vector>
#include <algorithm>
#include <functional>
#include <memory>
#include <exception>
#include "handler.hpp"
#include <poll.h>
#include <cstdlib> // for getloadavg
#include <sys/types.h>
#include <syslog.h>
#include <inttypes.h>
#define INFO LOG_MAKEPRI(LOG_USER, LOG_INFO)
/* Additional defines necessary on Linux: */
#ifdef __linux__
#include <cstring> // for memset
#include <cstdio> // for snprintf
#include <unistd.h> // because Fedora has lately taken to being weird
#endif
using std::set;
using std::string;
using std::find;
using std::find_if;
using std::transform;
using std::vector;
using std::not1;
using std::equal_to;
using std::ptr_fun;
using std::remove;
using std::auto_ptr;
using std::exception;
extern const set<string>& hashes;
extern const bool& enable_status;
extern const bool& only_old;
namespace {
/** A convenience exception representing network errors that cannot
* be recovered from, and will result in a graceful bomb-out.
*
* @since 1.1
* @author Robert J. Hansen */
class UnrecoverableNetworkError : public exception
{
public:
const char* what() const throw() {
return "unr net err";
}
};
/** A functor that provides stateful reading of line-oriented data
* across UNIX file descriptors.
*
* The big problem with reading information over a socket
* connection is that data can arrive in a badly fragmented form.
* On a console you can just call getline() and be confident that
* when it returns there will be a CR/LF at the end and no data
* afterwards: that's the great virtue of accepting data one byte
* at a time on a tty. On a network connection you have to take
* what the system gives you, and if the system gives you two
* strings spread over three packets with a CR/LF smack in the
* middle, well ... you have to make do. That means returning the
* first line and storing the rest of the data for use in a
* subsequent call to the data reading facility.
*
* So, in other words, our get_line function needs to track state
* *and* be threadsafe/re-entrant. Declaring a static buffer within
* the function would let it track state, but thread safety would be
* a problem.
*
* Fortunately, the C++ functor idiom solves this problem
* beautifully.
*
* Further: naïve blocking I/O, although it works rather well, will
* artificially inflate the server load. For this reason the code
* uses slightly more complex but still quite manageable poll()-
* based I/O with a 750ms timeout. Responsiveness isn't quite as
* high as it could be, but it's a small price to pay for better
* behavior server-side.
*
* @author Rob Hansen
* @since 0.9*/
struct SocketIO
{
public:
/** Initializes the object to listen on a particular file
* descriptor.
*
* @param fd File descriptor to read on */
SocketIO(int32_t fd) :
sock_fd(fd), buffer(""), tmp_buf(65536, '\0') {}
/** Writes a line of text to the socket. The caller is
* responsible for ensuring the text has a '\r\n' appended.
*
* @param line The line to write
* @since 1.1 */
void write_line(string line) const
{
if (-1 == write(sock_fd, line.c_str(), line.size())) {
throw UnrecoverableNetworkError();
}
}
/** Writes a line of text to the socket. The caller is
* responsible for ensuring the text has a '\r\n' appended.
*
* @param line The line to write
* @since 1.1 */
void write_line(const char* line) const
{
write_line(string(line));
}
/** Reads a line from the socket. Returns an auto_ptr<string>
* because clients might be sending arbitrarily-sized (i.e.,
* really huge) data to us. Passing smartpointers around is
* ridiculously faster than copying huge blocks of memory.
*
* Arguably this should return a shared_ptr<string>, but a lot
* of C++ compilers have shaky support for TR1. Instead we use
* the lowest common denominator: std::auto_ptr.
*
* This function replaces the old operator().
*
* @since 1.1
* @return An auto_ptr<string> representing one line read from
* the file descriptor.*/
auto_ptr<string> read_line()
{
/* "But in Latin, Jehovah begins with the letter 'I'..."
*
* SAVE YOURSELF THE NIGHTMARE BUG HUNT. Remember that when
* you test this code at the console, tapping return will
* enter a \n. When you do it from a Telnet client, it enters
* a \r\n. This one-character difference turned into a six-
* hour bug hunt. Documented here for posterity. If you ever
* wonder why I'm tempted to start drinking before the sun
* rises, well, this one's a good example... */
while (true) {
pollfd fds = { sock_fd, POLLIN, 0 };
int poll_code(poll(&fds, 1, 750));
if (-1 == poll_code)
throw UnrecoverableNetworkError();
else if (fds.revents & POLLERR ||
fds.revents & POLLHUP)
throw UnrecoverableNetworkError();
else if (fds.revents & POLLIN) {
memset(static_cast<void*>(&tmp_buf[0]),
0,
tmp_buf.size());
ssize_t bytes_read = read(sock_fd,
static_cast<void*>(&tmp_buf[0]),
tmp_buf.size());
buffer += string(&tmp_buf[0], &tmp_buf[bytes_read]);
/* To prevent DoS from clients spamming us with huge
packets, bomb on any query larger than 256k. */
if (buffer.size() > 262144)
throw UnrecoverableNetworkError();
string::iterator iter = find(buffer.begin(),
buffer.end(), '\n');
if (iter != buffer.end()) {
auto_ptr<string> rv(new string(buffer.begin(), iter));
rv->erase(remove(rv->begin(),
rv->end(),
'\r'),
rv->end());
rv->erase(remove(rv->begin(),
rv->end(),
'\n'),
rv->end());
buffer = string(iter + 1, buffer.end());
return rv;
}
}
}
}
private:
/** Tracks the file descriptor to read */
const int32_t sock_fd;
/** Internal storage buffer for keeping track of read, but not
* yet finished, data */
string buffer;
/** Internal storage buffer used only briefly, but declared here
* in order so that we can avoid repeatedly putting it on the
* stack. Additionally, this only takes a few bytes on the stack:
* the actual buffer gets allocated on the heap. */
vector<char> tmp_buf;
};
/** A hand-rolled string tokenizer in C++.
*
* Efficient string tokenization in 29 lines, without absurd
* contortions of code. Booyah. Given the state of things in C,
* where on some platforms strtok is outright obsoleted by strsep
* and on other platforms strsep is just a distant promise of what
* the future might hold... I'll take this way.
*
* Returns a smartpointer to a vector for the same reason
* SocketIO::read_line() returns one: to spare us the
* otherwise absurd amount of memcpying that would be going on.
*
* @param line A pointer to the line to tokenize
* @param character The delimiter character
* @returns An auto_ptr to a vector of strings representing tokens */
auto_ptr<vector<string> > tokenize(string& line, char character = ' ')
{
auto_ptr<vector<string> > rv(new vector<string>());
transform(line.begin(), line.end(), line.begin(), toupper);
string::iterator begin(find_if(line.begin(), line.end(),
not1(bind2nd(equal_to<char>(),
character))));
string::iterator end(
(begin != line.end())
? find(begin + 1, line.end(), character)
: line.end()
);
while (begin != line.end()) {
rv->push_back(string(begin, end));
if (end == line.end()) {
begin = line.end();
continue;
}
begin = find_if(end + 1, line.end(),
not1(bind2nd(equal_to<char>(), character)));
end = (begin != line.end())
? find(begin + 1, line.end(), character)
: line.end();
}
return rv;
}
/** A hand-rolled string tokenizer in C++.
*
* Efficient string tokenization in 29 lines, without absurd
* contortions of code. Booyah. Given the state of things in C,
* where on some platforms strtok is outright obsoleted by strsep
* and on other platforms strsep is just a distant promise of what
* the future might hold... I'll take this way.
*
* Returns a smartpointer to a vector for the same reason
* SocketIO::read_line() returns one: to spare us the
* otherwise absurd amount of memcpying that would be going on.
*
* @param line A pointer to the line to tokenize
* @param character The delimiter character
* @returns An auto_ptr to a vector of strings representing tokens */
auto_ptr<vector<string> > tokenize(auto_ptr<string> line, char ch = ' ')
{
return tokenize(*line, ch);
}
/** Turns a string of 'a.b.c.d', ala dotted-quad style, into a
* 32-bit integer. 'a' must be present: if b through d are
* omitted, they are assumed to be zero.
*
* @param line A smartpointer to a version string
* @returns A 32-bit integer representing a version, or -1 on
* failure.
* @author Rob Hansen
* @since 0.9 */
int32_t parse_version(auto_ptr<string> line)
{
int32_t version(0);
int32_t this_token(0);
auto_ptr<vector<string> > tokens(tokenize(line));
auto_ptr<vector<string> > version_tokens;
size_t index(0);
if (tokens->size() != 2 or
tokens->at(0) != "VERSION:") {
goto PARSE_VERSION_BAIL_BAD;
}
version_tokens = tokenize(tokens->at(1), '.');
if (version_tokens->size() < 1 or version_tokens->size() > 4) {
goto PARSE_VERSION_BAIL_BAD;
}
while (version_tokens->size() != 4) {
version_tokens->push_back("0");
}
for (index = 0 ; index < 4 ; ++index) {
string& thing(version_tokens->at(index));
if (thing.end() != find_if(thing.begin(),
thing.end(),
not1(ptr_fun(::isdigit)))) {
goto PARSE_VERSION_BAIL_BAD;
}
this_token = atoi(thing.c_str());
if (this_token < 0 or this_token > 254) {
goto PARSE_VERSION_BAIL_BAD;
}
version = (version << 8) + this_token;
}
goto PARSE_VERSION_BAIL;
PARSE_VERSION_BAIL_BAD:
version = -1;
PARSE_VERSION_BAIL:
return version;
}
/** A simple convenience function that allows us to ensure
* we're getting valid hashes.
*
* @param digest The string being checked
* @returns true if it could be an MD5 or SHA-1 digest, false otherwise
* @since 0.9
* @author Rob Hansen */
bool ishexdigest(const string& digest)
{
string::const_iterator iter(digest.begin());
if (not (digest.size() == 40 or digest.size() == 32)) {
return false;
}
for ( ; iter != digest.end() ; ++iter) {
bool is_number = (*iter >= '0' and *iter <= '9');
bool is_letter = (*iter >= 'A' and *iter <= 'F');
if (not (is_number or is_letter))
return false;
}
return true;
}
/** Performs a transaction with a client. Adheres to protocol
* version 1.0.
*
* @param sio The socket to listen and respond on
* @param ip_addr The IP address of the remote host
* @since 0.9 */
void handle_protocol_10(SocketIO& sio, const char* ip_addr)
{
string return_seq("");
uint32_t found(0);
double frac(0.0);
uint32_t total_queries(0);
try {
auto_ptr<vector<string> > commands(tokenize(sio.read_line()));
if (commands->size() < 2 or commands->at(0) != "QUERY") {
sio.write_line("NOT OK\r\n");
return;
}
for (size_t index = 1 ; index < commands->size() ; ++index) {
if (not ishexdigest(commands->at(index))) {
sio.write_line("NOT OK\r\n");
return;
}
if (hashes.end() != hashes.find(commands->at(index))) {
return_seq += "1";
found += 1;
} else {
return_seq += "0";
}
}
total_queries = commands->size() -
(commands->size() > 0 ? 1 : 0);
if (total_queries) {
double numerator(100 * found);
double denominator(total_queries);
frac = numerator / denominator;
}
syslog(INFO,
"%s: protocol 1.0, found %u of %u hashes (%.1f%%), closed normally",
ip_addr,
found,
total_queries,
frac);
return_seq = "OK " + return_seq + "\r\n";
sio.write_line(return_seq);
} catch (exception&) {
return;
}
}
/** Performs a transaction with a client. Adheres to protocol
* version 2.0.
*
* @param sio The socket to listen and respond on
* @since 1.1 */
void handle_protocol_20(SocketIO& sio, const char* ip_addr)
{
uint32_t total_queries(0);
uint32_t found(0);
double frac(0.0);
try {
auto_ptr<vector<string> > commands(tokenize(sio.read_line()));
while (commands->size() >= 1) {
string return_seq("");
if ("BYE" == commands->at(0)) {
if (total_queries) {
double numerator(100 * found);
double denominator(total_queries);
frac = numerator / denominator;
}
syslog(INFO,
"%s: protocol 2.0, found %u of %u hashes (%.1f%%), closed normally",
ip_addr,
found,
total_queries,
frac);
return;
}
else if ("DOWNSHIFT" == commands->at(0)) {
syslog(INFO,
"%s asked for a protocol downgrade to 1.0",
ip_addr);
sio.write_line("OK\r\n");
handle_protocol_10(sio, ip_addr);
return;
}
else if ("UPSHIFT" == commands->at(0)) {
syslog(INFO,
"%s asked for a protocol upgrade (refused)",
ip_addr);
sio.write_line("NOT OK\r\n");
}
else if ("QUERY" == commands->at(0)) {
if (commands->size() == 1) {
sio.write_line("NOT OK\r\n");
return;
} else {
size_t index(1);
for ( ; index < commands->size() ; ++index) {
if (not ishexdigest(commands->at(index))) {
sio.write_line("NOT OK\r\n");
return;
}
set<string>::const_iterator iter(hashes.begin());
iter = hashes.find(commands->at(index));
if (iter != hashes.end()) {
return_seq += "1";
found += 1;
} else {
return_seq += "0";
}
}
return_seq = "OK " + return_seq + "\r\n";
total_queries += commands->size() - 1;
}
}
else if ("STATUS" == commands->at(0) and enable_status) {
double loadavg[3] = { 0.0, 0.0, 0.0 };
char buf[1024];
getloadavg(loadavg, 3);
memset(buf, 0, 1024);
snprintf(buf,
1024,
"OK %u %s hashes, load %.2f %.2f %.2f\r\n",
(u_int32_t) hashes.size(),
(hashes.begin() == hashes.end()) ? "unknown" :
(hashes.begin()->size() == 32 ? "MD5" :
hashes.begin()->size() == 40 ? "SHA-1" :
hashes.begin()->size() == 64 ? "SHA-256" :
"unknown algorithm"),
loadavg[0],
loadavg[1],
loadavg[2]);
string line(buf);
return_seq = string(buf);
syslog(INFO,
"%s asked for server status (sent '%s')",
ip_addr,
buf);
} else if ("STATUS" == commands->at(0)) {
syslog(INFO,
"%s asked for server status (refused)",
ip_addr);
return_seq = "OK NOT SUPPORTED\r\n";
} else {
sio.write_line("NOT OK\r\n");
return;
}
sio.write_line(return_seq);
commands = tokenize(sio.read_line());
}
} catch (exception&) {
if (total_queries) {
double numerator(100 * found);
double denominator(total_queries);
frac = numerator / denominator;
}
syslog(INFO,
"%s: protocol 2.0, found %u of %u hashes (%.1f%%), closed abnormally",
ip_addr,
found,
total_queries,
frac);
}
}
}
/** Handles client query requests.
*
* @param fd the client's socket file descriptor
* @since 0.9 */
void handle_client(const int32_t fd, const string& ip_addr)
{
SocketIO sio(fd);
try {
int32_t version(parse_version(sio.read_line()));
if (version > 0 and version <= 0x01000000) {
sio.write_line("OK\r\n");
handle_protocol_10(sio, ip_addr.c_str());
} else if (version > 0x01000000 and
version <= 0x02000000 and
not only_old) {
sio.write_line("OK\r\n");
handle_protocol_20(sio, ip_addr.c_str());
} else {
sio.write_line("NOT OK\r\n");
}
} catch (exception&) {
return;
}
}

19
src/handler.hpp Normal file
View File

@@ -0,0 +1,19 @@
/* $Id: handler.hpp 108 2012-01-30 19:30:29Z rjh $
*
* Copyright (c) 2011, Robert J. Hansen <rjh@secret-alchemy.com>
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.*/
#include <string>
void handle_client(int32_t, const std::string&);

498
src/main.cc Normal file
View File

@@ -0,0 +1,498 @@
/* $Id: main.cc 142 2013-02-23 22:25:32Z rjh $
*
* Copyright (c) 2011-2012, Robert J. Hansen <rjh@secret-alchemy.com>
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Code standards:
* This is a small enough project we don't need a formal coding standard.
* That said, here are some helpful tips for people who want to submit
* patches:
*
* - If it's not 100% ISO C++98, it won't get in.
* - It must compile cleanly and without warnings under both GNU G++
* and Clang++, even with "-W -Wextra -ansi -pedantic".
* - C++ offers 'and', 'or' and 'not' keywords instead of &&, || and !.
* I like these: I think they're more readable. Please use them.
* - C++ allows you to initialize variables at declaration time by
* doing something like "int x(3)" instead of "int x = 3". Please
* do this where practical: it's a good habit to get into for C++.
* - Please try to follow the formatting conventions. It's mostly
* straight-up astyle format, with occasional tweaks where necessary
* to get nice hardcopy printouts.
* - If you write a new function it must have a Doxygen block
* documenting it.
*
* Contributor history:
* Robert J. Hansen <rjh@secret-alchemy.com>
* - everything
*/
#include <sys/stat.h>
#include <syslog.h>
#include <set>
#include <string>
#include <time.h>
#include <arpa/inet.h>
#include <pthread.h>
#include <algorithm>
#include <limits.h>
#include "handler.hpp"
#include <iostream>
#include <fstream>
#include <vector>
#include <memory>
/* Additional defines necessary on Linux: */
#ifdef __linux__
#include <cstring> // for memset
#include <cstdio> // for stderr
#include <unistd.h> // for close, fork, chdir (Fedora only)
#endif
/* Additional defines necessary on FreeBSD: */
/* Necessary for sockaddr and sockaddr_in structures */
#ifdef __FreeBSD__
#include <sys/socket.h>
#include <netinet/in.h>
#endif
using std::string;
using std::set;
using std::transform;
using std::find_if;
using std::not1;
using std::ptr_fun;
using std::ifstream;
using std::cerr;
using std::vector;
using std::remove_if;
#define INFO LOG_MAKEPRI(LOG_USER, LOG_INFO)
#define WARN LOG_MAKEPRI(LOG_USER, LOG_WARNING)
#define DEBUG LOG_MAKEPRI(LOG_USER, LOG_DEBUG)
#define MAX_PENDING_REQUESTS 20
#define BUFFER_SIZE 8192
namespace {
/** Tracks whether the server should only support protocol 1.0. */
bool old_only(false);
/** Tracks whether the server should support status queries. */
bool status_enabled(false);
/** Tracks whether the server should run as a daemon. */
bool standalone(false);
/** Our set of hashes, represented as a set of strings. Note
* that the current NSRL library contains approximately 32
* million values, each at roughly 64 bytes (rounded to binary
* powers to make the math easier). This is 2**25 values times
* 2**6 bytes each = 2**31 bytes, or about two gigs of RAM.
*
* Moral of the story: populating this set is computationally
* expensive. */
set<string> hash_set;
/** Tracks where we look for the location of the
* reference data set. */
string RDS_LOC(PKGDATADIR "/NSRLFile.txt");
/** Keeps track of the last time we serviced a request.
* This is locked via the active_sessions_mutex mutex.*/
time_t last_req_at(time(0));
/** Keeps track of how many clients are currently being serviced.
* This is locked via the active_sessions_mutex mutex. */
int32_t active_sessions(0);
/** A mutex to keep various threads from clobbering each other
* in their fanatical zeal to update shared resources.
*
* Interestingly, PTHREAD_MUTEX_INITIALIZER is so complex that
* it cannot be used in a C++ initializer: you have to use old
* C-style equals-operator initialization. */
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
/** The server's inactivity timeout interval */
int32_t TIMEOUT(INT_MAX);
/** Which port to listen on */
uint16_t PORT(9120);
/** A convenience class allowing us to pass multiple pieces of
data with a void*. */
struct clientinfo {
clientinfo(int32_t sfd, const char* ipaddr) :
sock_fd(sfd), ip_address(ipaddr) {}
int32_t sock_fd;
string ip_address;
};
/** Determines whether a character represents a valid uppercase
* hexadecimal digit. */
bool is_hexit(char ch)
{
return (ch >= '0' and ch <= '9') or (ch >= 'A' and ch <= 'F');
}
/** Loads hashes from disk and stores them in a fast-accessing
* in-memory data structure. This will be slow. */
void load_hashes()
{
vector<char> buf(BUFFER_SIZE);
ifstream infile(RDS_LOC.c_str());
if (not infile.good()) {
syslog(WARN, "couldn't open hashes file %s",
RDS_LOC.c_str());
exit(EXIT_FAILURE);
}
while (infile) {
// Per the C++ spec, &vector<T>[loc] is guaranteed
// to be a T*. (Unless it's a vector<bool>, in which
// case you're living in such sin there's absolutely
// no help for you. Friends don't let friends use
// vector<bool>.)
memset(static_cast<void*>(&buf[0]), 0, BUFFER_SIZE);
infile.getline(&buf[0], BUFFER_SIZE);
string line(buf.begin(), buf.end());
string::iterator iter(line.begin());
string token("");
while (iter != line.end()) {
string::iterator end(find(iter, line.end(), ','));
token = string(iter, end);
transform(token.begin(), token.end(), token.begin(), ::toupper);
token.erase(remove_if(token.begin(),
token.end(),
not1(ptr_fun(is_hexit))),
token.end());
if (32 == token.size() || 40 == token.size() || 64 == token.size()) {
break;
}
iter = (end == line.end() ? line.end() : end + 1);
}
if (32 != token.size() && 40 != token.size() && 64 != token.size()) {
continue;
}
if (hash_set.size() > 0 and hash_set.size() % 1000000 == 0) {
syslog(INFO, "%lu million hashes read", hash_set.size() / 1000000);
}
hash_set.insert(token);
}
infile.close();
syslog(INFO, "read in %u unique hashes",
static_cast<uint32_t>(hash_set.size()));
}
/** A thin wrapper around handler.cc and handle_client, meant
* to ensure the programmer of that function doesn't have to
* worry about thread contention. */
void* run_client_thread(void* arg)
{
clientinfo* ci(static_cast<clientinfo*>(arg));
const int32_t sock_fd(ci->sock_fd);
const string ip_address(ci->ip_address);
// Delete the dynamically-allocated memory block. This
// is an inevitable line of execution after successfully
// allocating the block in the main loop (below).
delete ci;
if (0 != pthread_mutex_lock(&mutex)) {
syslog(WARN, "couldn't acquire the mutex!");
close(sock_fd);
exit(EXIT_FAILURE);
}
last_req_at = time(0);
active_sessions += 1;
if (0 != pthread_mutex_unlock(&mutex)) {
syslog(WARN, "couldn't release the mutex!");
close(sock_fd);
exit(EXIT_FAILURE);
}
syslog(INFO, "connection from %s", ip_address.c_str());
handle_client(sock_fd, ip_address);
close(sock_fd);
syslog(INFO, "disconnected from %s", ip_address.c_str());
if (0 != pthread_mutex_lock(&mutex)) {
syslog(WARN, "couldn't acquire the mutex!");
exit(-1);
}
active_sessions -= 1;
if (0 != pthread_mutex_unlock(&mutex)) {
syslog(WARN, "couldn't release the mutex!");
exit(EXIT_FAILURE);
}
return NULL;
}
/** Converts our application into a proper daemon. */
void daemonize()
{
const pid_t pid(fork());
if (pid < 0) {
syslog(WARN, "couldn't fork!");
exit(EXIT_FAILURE);
} else if (pid > 0) {
exit(EXIT_SUCCESS);
}
syslog(INFO, "daemon started");
umask(0);
if (setsid() < 0) {
syslog(WARN, "couldn't set sid");
exit(EXIT_FAILURE);
}
// Technically, the root directory is the only one guaranteed
// to exist on the filesystem. Therefore, it's the only safe
// directory to point our daemon at. I doubt this is strictly
// necessary, but remembering to completely rebase a daemon is
// part of just good hacking etiquette.
if (0 > chdir("/")) {
syslog(WARN, "couldn't chdir to root");
exit(EXIT_FAILURE);
}
// No extraneous filehandles for us. Daemons lack stdio, so
// shut 'em on down.
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);
}
/** Creates a server socket that will listen for clients. */
int32_t make_socket()
{
int32_t sock;
sockaddr_in server;
memset(static_cast<void*>(&server), 0, sizeof(server));
server.sin_family = AF_INET;
server.sin_addr.s_addr = htonl(INADDR_ANY);
server.sin_port = htons(PORT);
if (0 > (sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP))) {
syslog(WARN, "couldn't create a server socket");
exit(EXIT_FAILURE);
}
if (0 > bind(sock, reinterpret_cast<sockaddr*>(&server),
sizeof(server))) {
syslog(WARN, "couldn't bind to port 9120");
exit(EXIT_FAILURE);
}
if (0 > listen(sock, MAX_PENDING_REQUESTS)) {
syslog(WARN, "couldn't listen for clients");
exit(EXIT_FAILURE);
}
syslog(INFO, "ready for clients");
return sock;
}
/** A thread that runs every thirty seconds checking to see if the
* daemon should politely exit. It will automatically shut down
* if no clients are currently being serviced and more than
* AUTOSHUTDOWN seconds have elapsed since the time the last client
* connected. */
void* shutdown_handler(void*)
{
while (1) {
if (0 != pthread_mutex_lock(&mutex)) {
syslog(WARN, "shutdown handler couldn't get mutex");
exit(EXIT_FAILURE);
}
if (0 == active_sessions &&
(TIMEOUT < (time(0) - last_req_at))) {
syslog(INFO, "exiting normally due to inactivity");
exit(EXIT_SUCCESS);
}
if (0 != pthread_mutex_unlock(&mutex)) {
syslog(WARN, "shutdown handler couldn't release mutex");
exit(EXIT_FAILURE);
}
sleep(30);
}
return NULL;
}
/** Checks a string to see if it's a valid base-10 number. */
int32_t is_num(const string& num)
{
string::const_iterator b(num.begin());
string::const_iterator e(num.end());
return (e == find_if(b, e, not1(ptr_fun(::isdigit))))
? ::atoi(num.c_str())
: -1;
}
/** Checks a string to see whether it's a port in the range
* (1024, 65535) inclusive (i.e., in userspace). */
bool validate_port(const string& foo)
{
PORT = is_num(foo) & 0xFFFF;
return (PORT >= 1024);
}
bool validate_timeout(const string& foo)
{
int32_t timeout(is_num(foo));
if (0 == timeout) {
timeout = INT_MAX;
}
else if (0 < timeout) {
TIMEOUT = timeout;
}
return (0 < timeout);
}
void show_usage(const char* program_name)
{
cerr <<
"Usage: " << program_name << " [-vbhsSo -f FILE -p PORT -t TIMEOUT]\n\n" <<
"-v : print version information\n" <<
"-b : get information on reporting bugs\n" <<
"-f : specify an alternate RDS (default: "<< PKGDATADIR <<
"/NSRLFile.txt)\n" <<
"-s : allow clients to query server status (default: disabled)\n" <<
"-S : run as a normal process (do not run as a daemon)\n" <<
"-o : only support old (1.0) nsrlsvr protocol\n" <<
"-h : show this help message\n" <<
"-p : listen on PORT, between 1024 and 65535 (default: 9120)\n" <<
"-t : stop after TIMEOUT seconds of inactivity (default: disabled)\n\n";
exit(EXIT_FAILURE);
}
}
/** An externally available const reference to the hash set. */
const set<string>& hashes(hash_set);
/** An externally available const reference to the variable storing
* whether or not status checking should be enabled. */
const bool& enable_status(status_enabled);
/** An externally available const reference to the variable storing
* whether or not only protocol 1.0 should be supported. */
const bool& only_old(old_only);
/** magic happens here */
int main(int argc, char* argv[])
{
int32_t svr_sock(0);
int32_t client_sock(0);
sockaddr_in client;
uint32_t client_length(0);
pthread_t shutdown_handler_id;
string port_num("9120");
string timeout("0");
std::auto_ptr<ifstream> infile;
int32_t opt(0);
while (-1 != (opt = getopt(argc, argv, "bsvof:hp:t:S"))) {
switch (opt) {
case 'v':
cerr << argv[0] << " " << PACKAGE_VERSION << "\n\n";
exit(0);
break;
case 'b':
cerr << argv[0] << " " << PACKAGE_VERSION
<< "\n" << PACKAGE_URL << "\n" <<
"Praise, blame and bug reports to " << PACKAGE_BUGREPORT << ".\n\n" <<
"Please be sure to include your operating system, version of your\n" <<
"operating system, and a detailed description of how to recreate\n" <<
"your bug.\n\n";
exit(0);
break;
case 'f':
RDS_LOC = string((const char*) optarg);
infile = std::auto_ptr<ifstream>(new ifstream(RDS_LOC.c_str()));
if (not infile->good()) {
cerr <<
"Error: the specified dataset file could not be found.\n\n";
exit(EXIT_FAILURE);
}
// No explicit close: the auto_ptr will take care of that
// on object destruction.
break;
case 'h':
show_usage(argv[0]);
break;
case 'p':
port_num = string(optarg);
break;
case 't':
timeout = string(optarg);
break;
case 's':
status_enabled = true;
break;
case 'S':
standalone = true;
break;
case 'o':
old_only = true;
break;
default:
show_usage(argv[0]);
exit(EXIT_FAILURE);
}
}
if (not (validate_port(port_num) and validate_timeout(timeout))) {
show_usage(argv[0]);
exit(EXIT_FAILURE);
}
if (not standalone)
daemonize();
load_hashes();
svr_sock = make_socket();
pthread_create(&shutdown_handler_id, NULL, shutdown_handler, NULL);
while (true) {
client_length = sizeof(client);
if (0 > (client_sock = accept(svr_sock,
reinterpret_cast<sockaddr*>(&client),
&client_length))) {
syslog(WARN, "dropped a connection");
} else {
try {
pthread_t thread_id;
const char* ipaddr(inet_ntoa(client.sin_addr));
clientinfo* data(new clientinfo(client_sock, ipaddr));
pthread_create(&thread_id, NULL, run_client_thread, data);
} catch (std::bad_alloc&) {
// There's no reason to have the server fall over:
// the sysadmin might be able to kill off whatever
// errant process is taking up all the RAM.
syslog(WARN, "Critically short of available RAM!");
continue;
}
}
}
}