Lots of updates to README.md

2026-01-09 20:57:55 -05:00 · 2014-12-29 12:21:10 -05:00
parent ef978884bc
commit 8c6c02dffb
1 changed files with 140 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -4,6 +4,27 @@ AccumuloGraph

 This is an implementation of the [TinkerPop Blueprints](http://tinkerpop.com)
 2.6 API using [Apache Accumulo](http://apache.accumulo.com) as the backend.
+This combines the benefits and flexibility of Blueprints
+with the scalability and performance of Accumulo.
+
+In addition to the basic Blueprints functionality, we provide additional
+features that harness more of Accumulo's power.
+
+Some features include...
+
+
+Benchmarks
+
+
+Indexing via the `IndexableGraph` and `KeyIndexableGraph` interfaces.
+
+Benchmarking
+
+Feel free to email with suggestions for improvements.
+Please submit issues for any bugs you find or features you want.
+We are also open to pull requests.
+
+
 This implementation provides easy to use, easy to write, and easy to read 
 access to an arbitrarily large graph that is stored in Accumulo.
 
@@ -12,12 +33,15 @@ We implement the following Blueprints interfaces:
 	<br>2. KeyIndexableGraph
 	<br>3. IndexableGraph
 	
-Please feel free to submit issues for any bugs you find or features you want.
-We are open to pull requests from your forks also.
+Benchmarking.

-##Usage

-The releases are currently stored in Maven Central.
+
+## Getting Started
+
+First, include AccumuloGraph as a Maven dependency. Releases are deployed
+to Maven Central.
+
 ```xml
 <dependency>
 	<groupId>edu.jhuapl.tinkerpop</groupId>
@@ -26,8 +50,119 @@ The releases are currently stored in Maven Central.
 </dependency>
 ```

-For non-Maven users, the binaries can be found in the releases section in this
+For non-Maven users, the binary jars can be found in the releases section in this
 GitHub repository, or you can get them from Maven Central.
+
+Creating an `AccumuloGraph` involves setting a few parameters in an
+`AccumuloGraphConfiguration` object, and opening the graph.
+The defaults are sensible for using an Accumulo cluster.
+We provide some simple examples below. Javadocs for
+`AccumuloGraphConfiguration` explain all the other parameters
+in more detail.
+
+First, to instantiate an in-memory graph:
+```java
+Configuration cfg = new AccumuloGraphConfiguration()
+  .setInstanceType(InstanceType.Mock)
+  .setGraphName("graph");
+return GraphFactory.open(cfg);
+```
+
+This creates a "Mock" instance which holds the graph in memory.
+You can now use all the Blueprints and AccumuloGraph-specific functionality
+with this in-memory graph. This is useful for getting familiar
+with AccumuloGraph's functionality, or for testing or prototyping
+purposes.
+
+To use an actual Accumulo cluster, use the following:
+```java
+Configuration cfg = new AccumuloGraphConfiguration()
+  .setInstanceType(InstanceType.Distributed)
+  .setZooKeeperHosts("zookeeper-host")
+  .setInstanceName("instance-name")
+  .setUser("user").setPassword("password")
+  .setGraphName("graph")
+  .setCreate(true);
+return GraphFactory.open(cfg);
+```
+
+This directs AccumuloGraph to use a "Distributed" Accumulo
+instance, and sets the appropriate ZooKeeper parameters,
+instance name, and authentication information, which correspond
+to the usual Accumulo connection settings. The graph name is
+used to create several backing tables in Accumulo, and the
+`setCreate` option tells AccumuloGraph to create the backing
+tables if they don't already exist.
+
+
+## Improving Performance
+
+This section describes various configuration parameters that
+greatly enhance AccumuloGraph's performance.  Brief descriptions
+of each option are provided here, but refer to the
+`AccumuloGraphConfiguration` Javadoc for fuller explanations.
+
+### Disable consistency checks
+
+The Blueprints API specifies a number of consistency checks for
+various operations, and requires errors if they fail. Some examples
+of invalid operations include adding a vertex with the same id as an
+existing vertex, adding edges between nonexistent vertices,
+and setting properties on nonexistent elements.
+Unfortunately, checking the above constraints for an
+Accumulo installation entails significant performance issues,
+since these require extra traffic to Accumulo using inefficient
+non-batched access patterns.
+
+To remedy these performance issues, AccumuloGraph exposes
+several options to disable various of the above checks.
+These include:
+* `setAutoFlush` - to disable automatically flushing
+  changes to the backing Accumulo tables
+* `setSkipExistenceChecks` - to disable element
+  existence checks, avoiding trips to the Accumulo cluster
+* `setIndexableGraphDisabled` - to disable
+  indexing functionality, which improves performance
+  of element removal
+
+### Set Accumulo performance parameters
+
+Accumulo itself features a number of performance-related parameters,
+and we allow configuration of these. Generally, these relate to
+write buffer sizes, multithreading, etc. The settings include:
+* `setMaxWriteLatency` - max time prior to flushing
+  element write buffer
+* `setMaxWriteMemory` - max size for element write buffer
+* `setMaxWriteThreads` - max threads used for element writing
+* `setMaxWriteTimeout` - max time to wait before failing
+  element buffer writes
+* `setQueryThreads` - number of query threads to use
+  for fetching elements, properties etc.
+
+### Caching and preloading
+
+AccumuloGraph contains a number of 
+
+* `setPropertyCacheTimeout`
+
+* `setEdgeCacheParams`
+* `setVertexCacheParams`
+
+* `setPreloadedEdgeLabels`
+* `setPreloadedProperties`
+
+
+## Bulk Ingest
+
+
+
+## Hadoop Integration
+
+
+## Table Structure
+
+
+
 ##Code Examples
 ###Creating a new or connecting to an existing distributed graph
 ```java