Lots of updates to README.md

This commit is contained in:
Michael Lieberman
2014-12-29 12:21:10 -05:00
parent ef978884bc
commit 8c6c02dffb

145
README.md
View File

@@ -4,6 +4,27 @@ AccumuloGraph
This is an implementation of the [TinkerPop Blueprints](http://tinkerpop.com)
2.6 API using [Apache Accumulo](http://apache.accumulo.com) as the backend.
This combines the benefits and flexibility of Blueprints
with the scalability and performance of Accumulo.
In addition to the basic Blueprints functionality, we provide additional
features that harness more of Accumulo's power.
Some features include...
Benchmarks
Indexing via the `IndexableGraph` and `KeyIndexableGraph` interfaces.
Benchmarking
Feel free to email with suggestions for improvements.
Please submit issues for any bugs you find or features you want.
We are also open to pull requests.
This implementation provides easy to use, easy to write, and easy to read
access to an arbitrarily large graph that is stored in Accumulo.
@@ -12,12 +33,15 @@ We implement the following Blueprints interfaces:
<br>2. KeyIndexableGraph
<br>3. IndexableGraph
Please feel free to submit issues for any bugs you find or features you want.
We are open to pull requests from your forks also.
Benchmarking.
##Usage
The releases are currently stored in Maven Central.
## Getting Started
First, include AccumuloGraph as a Maven dependency. Releases are deployed
to Maven Central.
```xml
<dependency>
<groupId>edu.jhuapl.tinkerpop</groupId>
@@ -26,8 +50,119 @@ The releases are currently stored in Maven Central.
</dependency>
```
For non-Maven users, the binaries can be found in the releases section in this
For non-Maven users, the binary jars can be found in the releases section in this
GitHub repository, or you can get them from Maven Central.
Creating an `AccumuloGraph` involves setting a few parameters in an
`AccumuloGraphConfiguration` object, and opening the graph.
The defaults are sensible for using an Accumulo cluster.
We provide some simple examples below. Javadocs for
`AccumuloGraphConfiguration` explain all the other parameters
in more detail.
First, to instantiate an in-memory graph:
```java
Configuration cfg = new AccumuloGraphConfiguration()
.setInstanceType(InstanceType.Mock)
.setGraphName("graph");
return GraphFactory.open(cfg);
```
This creates a "Mock" instance which holds the graph in memory.
You can now use all the Blueprints and AccumuloGraph-specific functionality
with this in-memory graph. This is useful for getting familiar
with AccumuloGraph's functionality, or for testing or prototyping
purposes.
To use an actual Accumulo cluster, use the following:
```java
Configuration cfg = new AccumuloGraphConfiguration()
.setInstanceType(InstanceType.Distributed)
.setZooKeeperHosts("zookeeper-host")
.setInstanceName("instance-name")
.setUser("user").setPassword("password")
.setGraphName("graph")
.setCreate(true);
return GraphFactory.open(cfg);
```
This directs AccumuloGraph to use a "Distributed" Accumulo
instance, and sets the appropriate ZooKeeper parameters,
instance name, and authentication information, which correspond
to the usual Accumulo connection settings. The graph name is
used to create several backing tables in Accumulo, and the
`setCreate` option tells AccumuloGraph to create the backing
tables if they don't already exist.
## Improving Performance
This section describes various configuration parameters that
greatly enhance AccumuloGraph's performance. Brief descriptions
of each option are provided here, but refer to the
`AccumuloGraphConfiguration` Javadoc for fuller explanations.
### Disable consistency checks
The Blueprints API specifies a number of consistency checks for
various operations, and requires errors if they fail. Some examples
of invalid operations include adding a vertex with the same id as an
existing vertex, adding edges between nonexistent vertices,
and setting properties on nonexistent elements.
Unfortunately, checking the above constraints for an
Accumulo installation entails significant performance issues,
since these require extra traffic to Accumulo using inefficient
non-batched access patterns.
To remedy these performance issues, AccumuloGraph exposes
several options to disable various of the above checks.
These include:
* `setAutoFlush` - to disable automatically flushing
changes to the backing Accumulo tables
* `setSkipExistenceChecks` - to disable element
existence checks, avoiding trips to the Accumulo cluster
* `setIndexableGraphDisabled` - to disable
indexing functionality, which improves performance
of element removal
### Set Accumulo performance parameters
Accumulo itself features a number of performance-related parameters,
and we allow configuration of these. Generally, these relate to
write buffer sizes, multithreading, etc. The settings include:
* `setMaxWriteLatency` - max time prior to flushing
element write buffer
* `setMaxWriteMemory` - max size for element write buffer
* `setMaxWriteThreads` - max threads used for element writing
* `setMaxWriteTimeout` - max time to wait before failing
element buffer writes
* `setQueryThreads` - number of query threads to use
for fetching elements, properties etc.
### Caching and preloading
AccumuloGraph contains a number of
* `setPropertyCacheTimeout`
* `setEdgeCacheParams`
* `setVertexCacheParams`
* `setPreloadedEdgeLabels`
* `setPreloadedProperties`
## Bulk Ingest
## Hadoop Integration
## Table Structure
##Code Examples
###Creating a new or connecting to an existing distributed graph
```java