mirror of
https://github.com/JHUAPL/AccumuloGraph.git
synced 2026-01-09 12:47:56 -05:00
Merge branch 'master' into table-wrappers
This commit is contained in:
292
README.md
292
README.md
@@ -4,140 +4,210 @@ AccumuloGraph
|
||||
|
||||
This is an implementation of the [TinkerPop Blueprints](http://tinkerpop.com)
|
||||
2.6 API using [Apache Accumulo](http://apache.accumulo.com) as the backend.
|
||||
This implementation provides easy to use, easy to write, and easy to read
|
||||
access to an arbitrarily large graph that is stored in Accumulo.
|
||||
|
||||
We implement the following Blueprints interfaces:
|
||||
<br>1. Graph
|
||||
<br>2. KeyIndexableGraph
|
||||
<br>3. IndexableGraph
|
||||
|
||||
Please feel free to submit issues for any bugs you find or features you want.
|
||||
We are open to pull requests from your forks also.
|
||||
This combines the many benefits and flexibility of Blueprints
|
||||
with the scalability and performance of Accumulo.
|
||||
|
||||
##Usage
|
||||
In addition to the basic Blueprints functionality, we provide a number
|
||||
of enhanced features, including:
|
||||
* Indexing implementations via `IndexableGraph` and `KeyIndexableGraph`
|
||||
* Support for mock, mini, and distributed instances of Accumulo
|
||||
* Numerous performance tweaks and configuration parameters
|
||||
* Support for high speed ingest
|
||||
* Hadoop integration
|
||||
|
||||
Feel free to contact us with bugs, suggestions, pull requests,
|
||||
or simply how you are leveraging AccumuloGraph in your own work.
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
First, include AccumuloGraph as a Maven dependency. Releases are deployed
|
||||
to Maven Central.
|
||||
|
||||
The releases are currently stored in Maven Central.
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>edu.jhuapl.tinkerpop</groupId>
|
||||
<artifactId>blueprints-accumulo-graph</artifactId>
|
||||
<version>0.0.2</version>
|
||||
<version>0.0.3-SNAPSHOT</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
For non-Maven users, the binaries can be found in the releases section in this
|
||||
For non-Maven users, the binary jars can be found in the releases section in this
|
||||
GitHub repository, or you can get them from Maven Central.
|
||||
##Code Examples
|
||||
###Creating a new or connecting to an existing distributed graph
|
||||
|
||||
Creating an `AccumuloGraph` involves setting a few parameters in an
|
||||
`AccumuloGraphConfiguration` object, and opening the graph.
|
||||
The defaults are sensible for using an Accumulo cluster.
|
||||
We provide some simple examples below. Javadocs for
|
||||
`AccumuloGraphConfiguration` explain all the other parameters
|
||||
in more detail.
|
||||
|
||||
First, to instantiate an in-memory graph:
|
||||
```java
|
||||
Configuration cfg = new AccumuloGraphConfiguration()
|
||||
.setInstanceName("accumulo").setUser("user").setZookeeperHosts("zk1")
|
||||
.setPassword("password".getBytes()).setGraphName("myGraph");
|
||||
Graph graph = GraphFactory.open(cfg.getConfiguration());
|
||||
.setInstanceType(InstanceType.Mock)
|
||||
.setGraphName("graph");
|
||||
return GraphFactory.open(cfg);
|
||||
```
|
||||
###Creating a new Mock Graph
|
||||
|
||||
Setting the instance type to mock allows for in-memory processing with a MockAccumulo instance.<br>
|
||||
There is also support for Mini Accumulo.
|
||||
This creates a "Mock" instance which holds the graph in memory.
|
||||
You can now use all the Blueprints and AccumuloGraph-specific functionality
|
||||
with this in-memory graph. This is useful for getting familiar
|
||||
with AccumuloGraph's functionality, or for testing or prototyping
|
||||
purposes.
|
||||
|
||||
To use an actual Accumulo cluster, use the following:
|
||||
```java
|
||||
Configuration cfg = new AccumuloGraphConfiguration().setInstanceType(InstanceType.Mock)
|
||||
.setGraphName("myGraph");
|
||||
Graph graph = GraphFactory.open(cfg);
|
||||
Configuration cfg = new AccumuloGraphConfiguration()
|
||||
.setInstanceType(InstanceType.Distributed)
|
||||
.setZooKeeperHosts("zookeeper-host")
|
||||
.setInstanceName("instance-name")
|
||||
.setUser("user").setPassword("password")
|
||||
.setGraphName("graph")
|
||||
.setCreate(true);
|
||||
return GraphFactory.open(cfg);
|
||||
```
|
||||
###Accessing a graph
|
||||
|
||||
This directs AccumuloGraph to use a "Distributed" Accumulo
|
||||
instance, and sets the appropriate ZooKeeper parameters,
|
||||
instance name, and authentication information, which correspond
|
||||
to the usual Accumulo connection settings. The graph name is
|
||||
used to create several backing tables in Accumulo, and the
|
||||
`setCreate` option tells AccumuloGraph to create the backing
|
||||
tables if they don't already exist.
|
||||
|
||||
AccumuloGraph also has limited support for a "Mini" instance
|
||||
of Accumulo.
|
||||
|
||||
|
||||
## Improving Performance
|
||||
|
||||
This section describes various configuration parameters that
|
||||
greatly enhance AccumuloGraph's performance. Brief descriptions
|
||||
of each option are provided here, but refer to the
|
||||
`AccumuloGraphConfiguration` Javadoc for fuller explanations.
|
||||
|
||||
### Disable consistency checks
|
||||
|
||||
The Blueprints API specifies a number of consistency checks for
|
||||
various operations, and requires errors if they fail. Some examples
|
||||
of invalid operations include adding a vertex with the same id as an
|
||||
existing vertex, adding edges between nonexistent vertices,
|
||||
and setting properties on nonexistent elements.
|
||||
Unfortunately, checking the above constraints for an
|
||||
Accumulo installation entails significant performance issues,
|
||||
since these require extra traffic to Accumulo using inefficient
|
||||
non-batched access patterns.
|
||||
|
||||
To remedy these performance issues, AccumuloGraph exposes
|
||||
several options to disable various of the above checks.
|
||||
These include:
|
||||
* `setAutoFlush` - to disable automatically flushing
|
||||
changes to the backing Accumulo tables
|
||||
* `setSkipExistenceChecks` - to disable element
|
||||
existence checks, avoiding trips to the Accumulo cluster
|
||||
* `setIndexableGraphDisabled` - to disable
|
||||
indexing functionality, which improves performance
|
||||
of element removal
|
||||
|
||||
### Tweak Accumulo performance parameters
|
||||
|
||||
Accumulo itself features a number of performance-related parameters,
|
||||
and we allow configuration of these. Generally, these relate to
|
||||
write buffer sizes, multithreading, etc. The settings include:
|
||||
* `setMaxWriteLatency` - max time prior to flushing
|
||||
element write buffer
|
||||
* `setMaxWriteMemory` - max size for element write buffer
|
||||
* `setMaxWriteThreads` - max threads used for element writing
|
||||
* `setMaxWriteTimeout` - max time to wait before failing
|
||||
element buffer writes
|
||||
* `setQueryThreads` - number of query threads to use
|
||||
for fetching elements, properties etc.
|
||||
|
||||
### Enable edge and property preloading
|
||||
|
||||
As a performance tweak, AccumuloGraph performs lazy loading of
|
||||
properties and edges. This means that an operation such as
|
||||
`getVertex` does not by default populate the returned
|
||||
vertex object with the associated vertex's properties
|
||||
and edges. Instead, they are initialized only when requested via
|
||||
`getProperty`, `getEdges`, etc. These are useful
|
||||
for use cases where you won't be accessing many of these
|
||||
properties. However, if certain properties or edges will
|
||||
be accessed frequently, you can set options for preloading
|
||||
these specific properties and edges, which will be more
|
||||
efficient than on-the-fly loading. These options include:
|
||||
* `setPreloadedProperties` - set property keys
|
||||
to be preloaded
|
||||
* `setPreloadedEdgeLabels` - set edges to be
|
||||
preloaded based on their labels
|
||||
|
||||
### Enable caching
|
||||
|
||||
AccumuloGraph contains a number of caching options
|
||||
that mitigate the need for Accumulo traffic for recently-accessed
|
||||
elements. The following options control caching:
|
||||
* `setVertexCacheParams` - size and expiry for vertex cache
|
||||
* `setEdgeCacheParams` - size and expiry for edge cache
|
||||
* `setPropertyCacheTimeout` - property expiry time,
|
||||
which can be specified globally and/or for individual properties
|
||||
|
||||
|
||||
## High Speed Ingest
|
||||
|
||||
One of Accumulo's key advantages is its ability for high-speed ingest
|
||||
of huge amounts of data. To leverage this ability, we provide
|
||||
an additional `AccumuloBulkIngester` class that
|
||||
exchanges consistency guarantees for high speed ingest.
|
||||
|
||||
The following is an example of how to use the bulk ingester to
|
||||
ingest a simple graph:
|
||||
```java
|
||||
Vertex v1 = graph.addVertex("1");
|
||||
v1.setProperty("name", "Alice");
|
||||
Vertex v2 = graph.addVertex("2");
|
||||
v2.setProperty("name", "Bob");
|
||||
|
||||
Edge e1 = graph.addEdge("E1", v1, v2, "knows");
|
||||
e1.setProperty("since", new Date());
|
||||
```
|
||||
|
||||
|
||||
###Creating indexes
|
||||
|
||||
```java
|
||||
((KeyIndexableGraph)graph)
|
||||
.createKeyIndex("name", Vertex.class);
|
||||
AccumuloGraphConfiguration cfg = ...;
|
||||
AccumuloBulkIngester ingester = new AccumuloBulkIngester(cfg);
|
||||
// Add a vertex.
|
||||
ingester.addVertex("A").finish();
|
||||
// Add another vertex with properties.
|
||||
ingester.addVertex("B")
|
||||
.add("P1", "V1").add("P2", "V2")
|
||||
.finish();
|
||||
// Add an edge.
|
||||
ingester.addEdge("A", "B", "edge").finish();
|
||||
// Shutdown and compact tables.
|
||||
ingester.shutdown(true);
|
||||
```
|
||||
###MapReduce Integration
|
||||
|
||||
####In the tool
|
||||
See the Javadocs for more details.
|
||||
Note that you are responsible for ensuring that data is entered
|
||||
in a consistent way, or the resulting graph will
|
||||
have undefined behavior.
|
||||
|
||||
|
||||
## Hadoop Integration
|
||||
|
||||
AccumuloGraph features Hadoop integration via custom input and output
|
||||
format implementations. `VertexInputFormat` and `EdgeInputFormat`
|
||||
allow vertex and edge inputs to mappers, respectively. Use as follows:
|
||||
```java
|
||||
AccumuloConfiguration cfg = new AccumuloGraphConfiguration()
|
||||
.setInstanceName("accumulo").setZookeeperHosts("zk1").setUser("root")
|
||||
.setPassword("secret".getBytes()).setGraphName("myGraph");
|
||||
AccumuloGraphConfiguration cfg = ...;
|
||||
|
||||
// For vertices:
|
||||
Job j = new Job();
|
||||
j.setInputFormatClass(VertexInputFormat.class);
|
||||
VertexInputFormat.setAccumuloGraphConfiguration(j,
|
||||
cfg.getConfiguration());
|
||||
VertexInputFormat.setAccumuloGraphConfiguration(j, cfg);
|
||||
|
||||
// For edges:
|
||||
Job j = new Job();
|
||||
j.setInputFormatClass(EdgeInputFormat.class);
|
||||
EdgeInputFormat.setAccumuloGraphConfiguration(j, cfg);
|
||||
```
|
||||
####In the mapper
|
||||
|
||||
`ElementOutputFormat` allows writing to an AccumuloGraph from
|
||||
reducers. Use as follows:
|
||||
```java
|
||||
public void map(Text k, Vertex v, Context c) {
|
||||
System.out.println(v.getId().toString());
|
||||
}
|
||||
```
|
||||
|
||||
##Table Design
|
||||
###Vertex Table
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
VertexID | Label Flag | Exists Flag | [empty]
|
||||
VertexID | INVERTEX | OutVertexID_EdgeID | Edge Label
|
||||
VertexID | OUTVERTEX | InVertexID_EdgeID | Edge Label
|
||||
VertexID | Property Key | [empty] | Serialized Value
|
||||
###Edge Table
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
EdgeID|Label Flag|InVertexID_OutVertexID|Edge Label
|
||||
EdgeID|Property Key|[empty]|Serialized Value
|
||||
###Edge/Vertex Index
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
Serialized Value|Property Key|VertexID/EdgeID|[empty]
|
||||
|
||||
###Metadata Table
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
Index Name| Index Class |[empty]|[empty]
|
||||
##Advanced Configuration
|
||||
###Graph Configuration
|
||||
- setGraphName(String name)
|
||||
- setCreate(boolean create) - Sets if the backing graph tables should be created if they do not exist.
|
||||
- setClear(boolean clear) - Sets if the backing graph tables should be reset if they exist.
|
||||
- autoFlush(boolean autoFlush) - Sets if each graph element and property change will be flushed to the server.
|
||||
- skipExistenceChecks(boolean skip) - Sets if you want to skip existance checks when creating graph elemenets.
|
||||
- setAutoIndex(boolean ison) - Turns on/off automatic indexing.
|
||||
|
||||
###Accumulo Control
|
||||
|
||||
- setUser(String user) - Sets the user to use when connecting to Accumulo
|
||||
- setPassword(byte[] password | String password) - Sets the password to use when connecting to Accumulo
|
||||
- setZookeeperHosts(String zookeeperHosts) - Sets the Zookeepers to connect to.
|
||||
- setInstanceName(String instance) - Sets the Instance name to use when connecting to Zookeeper
|
||||
- setInstanceType(InstanceType type) - Sets the type of Instance to use : Distrubuted, Mini, or Mock. Defaults to Distrubuted
|
||||
- setQueryThreads(int threads) - Specifies the number of threads to use in scanners. Defaults to 3
|
||||
- setMaxWriteLatency(long latency) - Sets the latency to be used for all writes to Accumulo
|
||||
- setMaxWriteTimeout(long timeout) - Sets the timeout to be used for all writes to Accumulo
|
||||
- setMaxWriteMemory(long mem) - Sets the memory buffer to be used for all writes to Accumulo
|
||||
- setMaxWriteThreads(int threads) - Sets the number of threads to be used for all writes to Accumulo
|
||||
- setAuthorizations(Authorizations auths) - Sets the authorizations to use when accessing the graph
|
||||
- setColumnVisibility(ColumnVisibility colVis) - TODO
|
||||
- setSplits(String splits | String[] splits) - Sets the splits to use when creating tables. Can be a space sperated list or an array of splits
|
||||
- setMiniClusterTempDir(String dir) - Sets directory to use as the temp directory for the Mini cluster
|
||||
|
||||
###Caching
|
||||
- setLruMaxCapacity(int max) - TODO
|
||||
- setVertexCacheTimeout(int millis) - Sets the vertex cache timeout. A value <=0 clears the value
|
||||
- setEdgeCacheTimeout(int millis) - Sets the edge cache timeout. A value <=0 clears the value
|
||||
|
||||
###Preloading
|
||||
- setPropertyCacheTimeout(int millis) - Sets the element property cache timeout. A value <=0 clears the value
|
||||
- setPreloadedProperties(String[] propertyKeys) - Sets the property keys that should be preloaded. Requiers a positive timout.
|
||||
- setPreloadedEdgeLabels(String[] edgeLabels) - TODO
|
||||
AccumuloGraphConfiguration cfg = ...;
|
||||
|
||||
Job j = new Job();
|
||||
j.setOutputFormatClass(ElementOutputFormat.class);
|
||||
ElementOutputFormat.setAccumuloGraphConfiguration(j, cfg);
|
||||
```
|
||||
|
||||
59
src/main/java/edu/jhuapl/tinkerpop/AccumuloFeatures.java
Normal file
59
src/main/java/edu/jhuapl/tinkerpop/AccumuloFeatures.java
Normal file
@@ -0,0 +1,59 @@
|
||||
/******************************************************************************
|
||||
* COPYRIGHT NOTICE *
|
||||
* Copyright (c) 2014 The Johns Hopkins University/Applied Physics Laboratory *
|
||||
* All rights reserved. *
|
||||
* *
|
||||
* This material may only be used, modified, or reproduced by or for the *
|
||||
* U.S. Government pursuant to the license rights granted under FAR clause *
|
||||
* 52.227-14 or DFARS clauses 252.227-7013/7014. *
|
||||
* *
|
||||
* For any other permissions, please contact the Legal Office at JHU/APL. *
|
||||
******************************************************************************/
|
||||
package edu.jhuapl.tinkerpop;
|
||||
|
||||
import com.tinkerpop.blueprints.Features;
|
||||
|
||||
/**
|
||||
* {@link Features} creator.
|
||||
*/
|
||||
public class AccumuloFeatures {
|
||||
|
||||
public static Features get() {
|
||||
Features f = new Features();
|
||||
|
||||
// For simplicity, I accept all property types. They are handled in not the
|
||||
// best way. To be fixed later.
|
||||
f.ignoresSuppliedIds = true;
|
||||
f.isPersistent = true;
|
||||
f.isWrapper = false;
|
||||
f.supportsBooleanProperty = true;
|
||||
f.supportsDoubleProperty = true;
|
||||
f.supportsDuplicateEdges = true;
|
||||
f.supportsEdgeIndex = true;
|
||||
f.supportsEdgeIteration = true;
|
||||
f.supportsEdgeRetrieval = true;
|
||||
f.supportsEdgeKeyIndex = true;
|
||||
f.supportsEdgeProperties = true;
|
||||
f.supportsFloatProperty = true;
|
||||
f.supportsIndices = true;
|
||||
f.supportsIntegerProperty = true;
|
||||
f.supportsKeyIndices = true;
|
||||
f.supportsLongProperty = true;
|
||||
f.supportsMapProperty = true;
|
||||
f.supportsMixedListProperty = true;
|
||||
f.supportsPrimitiveArrayProperty = true;
|
||||
f.supportsSelfLoops = true;
|
||||
f.supportsSerializableObjectProperty = true;
|
||||
f.supportsStringProperty = true;
|
||||
f.supportsThreadedTransactions = false;
|
||||
f.supportsTransactions = false;
|
||||
f.supportsUniformListProperty = true;
|
||||
f.supportsVertexIndex = true;
|
||||
f.supportsVertexIteration = true;
|
||||
f.supportsVertexKeyIndex = true;
|
||||
f.supportsVertexProperties = true;
|
||||
f.supportsThreadIsolatedTransactions = false;
|
||||
|
||||
return f;
|
||||
}
|
||||
}
|
||||
@@ -294,46 +294,9 @@ public class AccumuloGraph implements Graph, KeyIndexableGraph, IndexableGraph {
|
||||
|
||||
// End Aliases
|
||||
|
||||
// For simplicity, I accept all property types. They are handled in not the
|
||||
// best way. To be fixed later
|
||||
Features f;
|
||||
|
||||
@Override
|
||||
public Features getFeatures() {
|
||||
if (f == null) {
|
||||
f = new Features();
|
||||
f.ignoresSuppliedIds = true;
|
||||
f.isPersistent = true;
|
||||
f.isWrapper = false;
|
||||
f.supportsBooleanProperty = true;
|
||||
f.supportsDoubleProperty = true;
|
||||
f.supportsDuplicateEdges = true;
|
||||
f.supportsEdgeIndex = true;
|
||||
f.supportsEdgeIteration = true;
|
||||
f.supportsEdgeRetrieval = true;
|
||||
f.supportsEdgeKeyIndex = true;
|
||||
f.supportsEdgeProperties = true;
|
||||
f.supportsFloatProperty = true;
|
||||
f.supportsIndices = true;
|
||||
f.supportsIntegerProperty = true;
|
||||
f.supportsKeyIndices = true;
|
||||
f.supportsLongProperty = true;
|
||||
f.supportsMapProperty = true;
|
||||
f.supportsMixedListProperty = true;
|
||||
f.supportsPrimitiveArrayProperty = true;
|
||||
f.supportsSelfLoops = true;
|
||||
f.supportsSerializableObjectProperty = true;
|
||||
f.supportsStringProperty = true;
|
||||
f.supportsThreadedTransactions = false;
|
||||
f.supportsTransactions = false;
|
||||
f.supportsUniformListProperty = true;
|
||||
f.supportsVertexIndex = true;
|
||||
f.supportsVertexIteration = true;
|
||||
f.supportsVertexKeyIndex = true;
|
||||
f.supportsVertexProperties = true;
|
||||
f.supportsThreadIsolatedTransactions = false;
|
||||
}
|
||||
return f;
|
||||
return AccumuloFeatures.get();
|
||||
}
|
||||
|
||||
@Override
|
||||
@@ -368,12 +331,7 @@ public class AccumuloGraph implements Graph, KeyIndexableGraph, IndexableGraph {
|
||||
if (id == null) {
|
||||
throw ExceptionFactory.vertexIdCanNotBeNull();
|
||||
}
|
||||
String myID;
|
||||
try {
|
||||
myID = (String) id;
|
||||
} catch (ClassCastException e) {
|
||||
return null;
|
||||
}
|
||||
String myID = id.toString();
|
||||
|
||||
if (vertexCache != null) {
|
||||
Vertex vertex = vertexCache.retrieve(myID);
|
||||
@@ -645,16 +603,10 @@ public class AccumuloGraph implements Graph, KeyIndexableGraph, IndexableGraph {
|
||||
|
||||
@Override
|
||||
public Edge getEdge(Object id) {
|
||||
String myID;
|
||||
if (id == null) {
|
||||
throw ExceptionFactory.edgeIdCanNotBeNull();
|
||||
} else {
|
||||
try {
|
||||
myID = (String) id;
|
||||
} catch (ClassCastException e) {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
String myID = id.toString();
|
||||
|
||||
if (edgeCache != null) {
|
||||
Edge edge = edgeCache.retrieve(myID);
|
||||
|
||||
@@ -18,7 +18,6 @@ import java.io.File;
|
||||
import java.io.IOException;
|
||||
import java.io.Serializable;
|
||||
import java.lang.reflect.Field;
|
||||
import java.nio.ByteBuffer;
|
||||
import java.util.Arrays;
|
||||
import java.util.Iterator;
|
||||
import java.util.List;
|
||||
@@ -39,6 +38,7 @@ import org.apache.accumulo.core.client.security.tokens.PasswordToken;
|
||||
import org.apache.accumulo.core.security.Authorizations;
|
||||
import org.apache.accumulo.core.security.ColumnVisibility;
|
||||
import org.apache.accumulo.minicluster.MiniAccumuloCluster;
|
||||
import org.apache.commons.configuration.AbstractConfiguration;
|
||||
import org.apache.commons.configuration.Configuration;
|
||||
import org.apache.commons.configuration.PropertiesConfiguration;
|
||||
import org.apache.hadoop.io.Text;
|
||||
@@ -52,7 +52,8 @@ import com.tinkerpop.blueprints.KeyIndexableGraph;
|
||||
* Setters return the same configuration instance to
|
||||
* ease chained setting of parameters.
|
||||
*/
|
||||
public class AccumuloGraphConfiguration implements Serializable {
|
||||
public class AccumuloGraphConfiguration extends AbstractConfiguration
|
||||
implements Serializable {
|
||||
|
||||
private static final long serialVersionUID = 7024072260167873696L;
|
||||
|
||||
@@ -91,7 +92,7 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
/**
|
||||
* Utility class gathering valid configuration keys.
|
||||
*/
|
||||
private static class Keys {
|
||||
static class Keys {
|
||||
public static final String GRAPH_CLASS = "blueprints.graph";
|
||||
public static final String ZK_HOSTS = "blueprints.accumulo.zkhosts";
|
||||
public static final String INSTANCE = "blueprints.accumulo.instance";
|
||||
@@ -222,10 +223,14 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
*/
|
||||
public AccumuloGraphConfiguration setInstanceType(InstanceType type) {
|
||||
conf.setProperty(Keys.INSTANCE_TYPE, type.toString());
|
||||
if (type.equals(InstanceType.Mock)) {
|
||||
|
||||
if (InstanceType.Mock.equals(type) ||
|
||||
InstanceType.Mini.equals(type)) {
|
||||
setUser("root");
|
||||
setPassword("");
|
||||
setCreate(true);
|
||||
}
|
||||
|
||||
return this;
|
||||
}
|
||||
|
||||
@@ -272,8 +277,8 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
return this;
|
||||
}
|
||||
|
||||
public ByteBuffer getPassword() {
|
||||
return ByteBuffer.wrap(conf.getString(Keys.PASSWORD).getBytes());
|
||||
public String getPassword() {
|
||||
return conf.getString(Keys.PASSWORD);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -679,14 +684,30 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
|
||||
/**
|
||||
* Name of the graph to create. Storage tables will be prefixed with this value.
|
||||
* <p/>Note: Accumulo only allows table names with alphanumeric and underscore
|
||||
* characters.
|
||||
* @param name
|
||||
* @return
|
||||
*/
|
||||
public AccumuloGraphConfiguration setGraphName(String name) {
|
||||
if (!isValidGraphName(name)) {
|
||||
throw new IllegalArgumentException("Invalid graph name."
|
||||
+ " Only alphanumerics and underscores are allowed");
|
||||
}
|
||||
|
||||
conf.setProperty(Keys.GRAPH_NAME, name);
|
||||
return this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Make sure this is a valid graph name because of restrictions
|
||||
* on table names.
|
||||
* @param name
|
||||
*/
|
||||
private static boolean isValidGraphName(String name) {
|
||||
return name.matches("^[A-Za-z0-9_]+$");
|
||||
}
|
||||
|
||||
public String[] getPreloadedProperties() {
|
||||
if (conf.containsKey(Keys.PRELOAD_PROPERTIES)) {
|
||||
return conf.getStringArray(Keys.PRELOAD_PROPERTIES);
|
||||
@@ -881,6 +902,14 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
.setTimeout(getMaxWriteTimeout(), TimeUnit.MILLISECONDS);
|
||||
}
|
||||
|
||||
/**
|
||||
* File-based version of {@link #setMiniClusterTempDir(String)}.
|
||||
* @param miniClusterTempDir
|
||||
*/
|
||||
public AccumuloGraphConfiguration setMiniClusterTempDir(File miniClusterTempDir) {
|
||||
return setMiniClusterTempDir(miniClusterTempDir.getPath());
|
||||
}
|
||||
|
||||
/**
|
||||
* Used by JUnit Tests to set the miniClusterTempDirectory.
|
||||
* If not set in advance of a test, getConnector will use a
|
||||
@@ -888,8 +917,9 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
*
|
||||
* @param miniClusterTempDir
|
||||
*/
|
||||
public void setMiniClusterTempDir(String miniClusterTempDir) {
|
||||
public AccumuloGraphConfiguration setMiniClusterTempDir(String miniClusterTempDir) {
|
||||
this.miniClusterTempDir = miniClusterTempDir;
|
||||
return this;
|
||||
}
|
||||
|
||||
public String getVertexTable() {
|
||||
@@ -932,10 +962,11 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
case Distributed:
|
||||
checkPropertyValue(Keys.ZK_HOSTS, getZooKeeperHosts(), false);
|
||||
checkPropertyValue(Keys.USER, getUser(), false);
|
||||
checkPropertyValue(Keys.PASSWORD, getPassword(), false);
|
||||
// no break intentional
|
||||
case Mini:
|
||||
checkPropertyValue(Keys.INSTANCE, getInstanceName(), false);
|
||||
checkPropertyValue(Keys.PASSWORD, new String(getPassword().array()), true);
|
||||
checkPropertyValue(Keys.PASSWORD, getPassword(), true);
|
||||
// no break intentional
|
||||
case Mock:
|
||||
checkPropertyValue(Keys.GRAPH_NAME, getGraphName(), false);
|
||||
@@ -984,16 +1015,7 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
public void print() {
|
||||
System.out.println(AccumuloGraphConfiguration.class+":");
|
||||
|
||||
Set<String> keys = new TreeSet<String>();
|
||||
for (Field field : Keys.class.getDeclaredFields()) {
|
||||
try {
|
||||
keys.add((String) field.get(null));
|
||||
} catch (Exception e) {
|
||||
throw new RuntimeException(e);
|
||||
}
|
||||
}
|
||||
|
||||
for (String key : keys) {
|
||||
for (String key : getValidInternalKeys()) {
|
||||
String value = "(null)";
|
||||
if (conf.containsKey(key)) {
|
||||
value = conf.getProperty(key).toString();
|
||||
@@ -1002,6 +1024,23 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the AccumuloGraph-specific keys for this configuration.
|
||||
* @return
|
||||
*/
|
||||
static Set<String> getValidInternalKeys() {
|
||||
Set<String> keys = new TreeSet<String>();
|
||||
|
||||
for (Field field : Keys.class.getDeclaredFields()) {
|
||||
try {
|
||||
keys.add((String) field.get(null));
|
||||
} catch (Exception e) {
|
||||
throw new RuntimeException(e);
|
||||
}
|
||||
}
|
||||
|
||||
return keys;
|
||||
}
|
||||
|
||||
// Old deprecated method names.
|
||||
|
||||
@@ -1118,4 +1157,37 @@ public class AccumuloGraphConfiguration implements Serializable {
|
||||
public String[] getPreloadedEdges() {
|
||||
return getPreloadedEdgeLabels();
|
||||
}
|
||||
|
||||
|
||||
// Abstract methods from the AbstractConfiguration implementation.
|
||||
|
||||
@Override
|
||||
public boolean isEmpty() {
|
||||
return conf.isEmpty();
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean containsKey(String key) {
|
||||
return conf.containsKey(key);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Object getProperty(String key) {
|
||||
return conf.getProperty(key);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Iterator<String> getKeys() {
|
||||
return conf.getKeys();
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void addPropertyDirect(String key, Object value) {
|
||||
// Only allow AccumuloGraph-specific keys.
|
||||
if (getValidInternalKeys().contains(key)) {
|
||||
conf.setProperty(key, value);
|
||||
} else {
|
||||
throw new UnsupportedOperationException("Invalid key: "+key);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -52,7 +52,7 @@ public class ElementOutputFormat extends OutputFormat<NullWritable,Element> {
|
||||
Configuration jobconf = job.getConfiguration();
|
||||
|
||||
jobconf.set(USER, acc.getUser());
|
||||
jobconf.set(PASSWORD, new String(acc.getPassword().array()));
|
||||
jobconf.set(PASSWORD, acc.getPassword());
|
||||
jobconf.set(GRAPH_NAME, acc.getGraphName());
|
||||
jobconf.set(INSTANCE, acc.getInstanceName());
|
||||
jobconf.set(INSTANCE_TYPE, acc.getInstanceType().toString());
|
||||
|
||||
88
src/test/java/edu/jhuapl/tinkerpop/AccumuloElementTest.java
Normal file
88
src/test/java/edu/jhuapl/tinkerpop/AccumuloElementTest.java
Normal file
@@ -0,0 +1,88 @@
|
||||
/******************************************************************************
|
||||
* COPYRIGHT NOTICE *
|
||||
* Copyright (c) 2014 The Johns Hopkins University/Applied Physics Laboratory *
|
||||
* All rights reserved. *
|
||||
* *
|
||||
* This material may only be used, modified, or reproduced by or for the *
|
||||
* U.S. Government pursuant to the license rights granted under FAR clause *
|
||||
* 52.227-14 or DFARS clauses 252.227-7013/7014. *
|
||||
* *
|
||||
* For any other permissions, please contact the Legal Office at JHU/APL. *
|
||||
******************************************************************************/
|
||||
package edu.jhuapl.tinkerpop;
|
||||
|
||||
import static org.junit.Assert.*;
|
||||
|
||||
import org.junit.Test;
|
||||
|
||||
import com.tinkerpop.blueprints.Edge;
|
||||
import com.tinkerpop.blueprints.Graph;
|
||||
import com.tinkerpop.blueprints.Vertex;
|
||||
|
||||
/**
|
||||
* Tests related to Accumulo elements.
|
||||
*/
|
||||
public class AccumuloElementTest {
|
||||
|
||||
@Test
|
||||
public void testNonStringIds() throws Exception {
|
||||
Graph graph = AccumuloGraphTestUtils.makeGraph("nonStringIds");
|
||||
|
||||
Object[] ids = new Object[] {
|
||||
10, 20, 30L, 40L,
|
||||
50.0f, 60.0f, 70.0d, 80.0d,
|
||||
(byte) 'a', (byte) 'b', 'c', 'd',
|
||||
"str1", "str2",
|
||||
new GenericObject("str3"), new GenericObject("str4"),
|
||||
};
|
||||
|
||||
Object[] edgeIds = new Object[] {
|
||||
100, 200, 300L, 400L,
|
||||
500.0f, 600.0f, 700.0d, 800.0d,
|
||||
(byte) 'e', (byte) 'f', 'g', 'h',
|
||||
"str5", "str6",
|
||||
new GenericObject("str7"), new GenericObject("str8"),
|
||||
};
|
||||
|
||||
for (int i = 0; i < ids.length; i++) {
|
||||
assertNull(graph.getVertex(ids[i]));
|
||||
Vertex v = graph.addVertex(ids[i]);
|
||||
assertNotNull(v);
|
||||
assertNotNull(graph.getVertex(ids[i]));
|
||||
}
|
||||
assertEquals(ids.length, count(graph.getVertices()));
|
||||
|
||||
for (int i = 1; i < edgeIds.length; i++) {
|
||||
assertNull(graph.getEdge(edgeIds[i-1]));
|
||||
Edge e = graph.addEdge(edgeIds[i-1],
|
||||
graph.getVertex(ids[i-1]),
|
||||
graph.getVertex(ids[i]), "label");
|
||||
assertNotNull(e);
|
||||
assertNotNull(graph.getEdge(edgeIds[i-1]));
|
||||
}
|
||||
assertEquals(edgeIds.length-1, count(graph.getEdges()));
|
||||
|
||||
graph.shutdown();
|
||||
}
|
||||
|
||||
private static int count(Iterable<?> iter) {
|
||||
int count = 0;
|
||||
for (@SuppressWarnings("unused") Object obj : iter) {
|
||||
count++;
|
||||
}
|
||||
return count;
|
||||
}
|
||||
|
||||
private static class GenericObject {
|
||||
private final Object id;
|
||||
|
||||
public GenericObject(Object id) {
|
||||
this.id = id;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return "GenericObject [id=" + id + "]";
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -22,6 +22,7 @@ import java.util.List;
|
||||
|
||||
import javax.xml.namespace.QName;
|
||||
|
||||
import org.apache.commons.configuration.Configuration;
|
||||
import org.apache.hadoop.io.Text;
|
||||
import org.junit.Test;
|
||||
|
||||
@@ -30,6 +31,21 @@ import com.tinkerpop.blueprints.Vertex;
|
||||
|
||||
public class AccumuloGraphConfigurationTest {
|
||||
|
||||
@Test
|
||||
public void testConfigurationInterface() throws Exception {
|
||||
Configuration conf = AccumuloGraphTestUtils.generateGraphConfig("setPropsValid");
|
||||
for (String key : AccumuloGraphConfiguration.getValidInternalKeys()) {
|
||||
// This is bad... but we should allow them if they are valid keys.
|
||||
conf.setProperty(key, "value");
|
||||
}
|
||||
|
||||
conf = AccumuloGraphTestUtils.generateGraphConfig("setPropsInvalid");
|
||||
try {
|
||||
conf.setProperty("invalidKey", "value");
|
||||
fail();
|
||||
} catch (Exception e) { }
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testSplits() throws Exception {
|
||||
AccumuloGraphConfiguration cfg;
|
||||
@@ -215,4 +231,30 @@ public class AccumuloGraphConfigurationTest {
|
||||
cfg.validate();
|
||||
assertFalse(cfg.getEdgeCacheEnabled());
|
||||
}
|
||||
|
||||
/**
|
||||
* Test different kinds of graph names (hyphens, punctuation, etc).
|
||||
* @throws Exception
|
||||
*/
|
||||
@Test
|
||||
public void testGraphNames() throws Exception {
|
||||
AccumuloGraphConfiguration conf = new AccumuloGraphConfiguration();
|
||||
|
||||
String[] valid = new String[] {
|
||||
"alpha", "12345", "alnum12345",
|
||||
"12345alnum", "under_score1", "_under_score_2"};
|
||||
String[] invalid = new String[] {"hyph-en",
|
||||
"dot..s", "quo\"tes"};
|
||||
|
||||
for (String name : valid) {
|
||||
conf.setGraphName(name);
|
||||
}
|
||||
|
||||
for (String name : invalid) {
|
||||
try {
|
||||
conf.setGraphName(name);
|
||||
fail();
|
||||
} catch (Exception e) { }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -14,6 +14,9 @@
|
||||
*/
|
||||
package edu.jhuapl.tinkerpop;
|
||||
|
||||
import com.tinkerpop.blueprints.Graph;
|
||||
import com.tinkerpop.blueprints.GraphFactory;
|
||||
|
||||
import edu.jhuapl.tinkerpop.AccumuloGraphConfiguration.InstanceType;
|
||||
|
||||
public class AccumuloGraphTestUtils {
|
||||
@@ -21,8 +24,12 @@ public class AccumuloGraphTestUtils {
|
||||
public static AccumuloGraphConfiguration generateGraphConfig(String graphDirectoryName) {
|
||||
AccumuloGraphConfiguration cfg = new AccumuloGraphConfiguration();
|
||||
cfg.setInstanceName("instanceName").setZooKeeperHosts("ZookeeperHostsString");
|
||||
cfg.setUser("root").setPassword("".getBytes());
|
||||
cfg.setUser("root").setPassword("");
|
||||
cfg.setGraphName(graphDirectoryName).setCreate(true).setAutoFlush(true).setInstanceType(InstanceType.Mock);
|
||||
return cfg;
|
||||
}
|
||||
|
||||
public static Graph makeGraph(String name) {
|
||||
return GraphFactory.open(generateGraphConfig(name).getConfiguration());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
package edu.jhuapl.tinkerpop;
|
||||
|
||||
import org.apache.commons.configuration.Configuration;
|
||||
import org.junit.Ignore;
|
||||
|
||||
import com.tinkerpop.blueprints.Graph;
|
||||
import com.tinkerpop.blueprints.GraphFactory;
|
||||
|
||||
import edu.jhuapl.tinkerpop.AccumuloGraphConfiguration.InstanceType;
|
||||
|
||||
/**
|
||||
* Run the test suite for a Distributed instance type.
|
||||
* <p/>Note: This is disabled by default since we can't
|
||||
* guarantee an Accumulo cluster setup in a test environment.
|
||||
* To use this, set the constants below and remove
|
||||
* the @Ignore annotation.
|
||||
*/
|
||||
@Ignore
|
||||
public class DistributedInstanceTest extends AccumuloGraphTest {
|
||||
|
||||
// Connection constants.
|
||||
private static final String ZOOKEEPERS = "zkhost-1,zkhost-2";
|
||||
private static final String INSTANCE = "accumulo-instance";
|
||||
private static final String USER = "user";
|
||||
private static final String PASSWORD = "password";
|
||||
|
||||
@Override
|
||||
public Graph generateGraph(String graphDirectoryName) {
|
||||
Configuration cfg = new AccumuloGraphConfiguration()
|
||||
.setInstanceType(InstanceType.Distributed)
|
||||
.setZooKeeperHosts(ZOOKEEPERS)
|
||||
.setInstanceName(INSTANCE)
|
||||
.setUser(USER).setPassword(PASSWORD)
|
||||
.setGraphName(graphDirectoryName)
|
||||
.setCreate(true);
|
||||
testGraphName.set(graphDirectoryName);
|
||||
return GraphFactory.open(cfg);
|
||||
}
|
||||
}
|
||||
23
src/test/java/edu/jhuapl/tinkerpop/MockInstanceTest.java
Normal file
23
src/test/java/edu/jhuapl/tinkerpop/MockInstanceTest.java
Normal file
@@ -0,0 +1,23 @@
|
||||
package edu.jhuapl.tinkerpop;
|
||||
|
||||
import org.apache.commons.configuration.Configuration;
|
||||
|
||||
import com.tinkerpop.blueprints.Graph;
|
||||
import com.tinkerpop.blueprints.GraphFactory;
|
||||
|
||||
import edu.jhuapl.tinkerpop.AccumuloGraphConfiguration.InstanceType;
|
||||
|
||||
/**
|
||||
* Run the test suite for a Mock instance type.
|
||||
*/
|
||||
public class MockInstanceTest extends AccumuloGraphTest {
|
||||
|
||||
@Override
|
||||
public Graph generateGraph(String graphDirectoryName) {
|
||||
Configuration cfg = new AccumuloGraphConfiguration()
|
||||
.setInstanceType(InstanceType.Mock)
|
||||
.setGraphName(graphDirectoryName);
|
||||
testGraphName.set(graphDirectoryName);
|
||||
return GraphFactory.open(cfg);
|
||||
}
|
||||
}
|
||||
34
table-structure.md
Normal file
34
table-structure.md
Normal file
@@ -0,0 +1,34 @@
|
||||
Table Structure
|
||||
---------------
|
||||
|
||||
This file documents the structure of backend tables
|
||||
used for storing elements, properties, etc.
|
||||
|
||||
**Note (29 Dec 2014): This documentation is out of date.**
|
||||
|
||||
## Vertex Table
|
||||
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
VertexID | Label Flag | Exists Flag | [empty]
|
||||
VertexID | INVERTEX | OutVertexID_EdgeID | Edge Label
|
||||
VertexID | OUTVERTEX | InVertexID_EdgeID | Edge Label
|
||||
VertexID | Property Key | [empty] | Serialized Value
|
||||
|
||||
## Edge Table
|
||||
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
EdgeID|Label Flag|InVertexID_OutVertexID|Edge Label
|
||||
EdgeID|Property Key|[empty]|Serialized Value
|
||||
|
||||
## Edge/Vertex Index
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
Serialized Value|Property Key|VertexID/EdgeID|[empty]
|
||||
|
||||
## Metadata Table
|
||||
|
||||
Row ID | Column Family | Column Qualifier | Value
|
||||
---|---|---|---
|
||||
Index Name| Index Class |[empty]|[empty]
|
||||
Reference in New Issue
Block a user