mirror of
https://github.com/JHUAPL/AccumuloGraph.git
synced 2026-04-26 03:01:00 -04:00
1 line
9.5 KiB
JSON
1 line
9.5 KiB
JSON
{"name":"Accumulograph","tagline":"An implementation of TinkerPop Blueprints using Accumulo","body":"AccumuloGraph\r\n=============\r\n[](https://travis-ci.org/JHUAPL/AccumuloGraph)\r\n\r\nThis is an implementation of the [TinkerPop Blueprints](http://tinkerpop.com)\r\n2.6 API using [Apache Accumulo](http://apache.accumulo.com) as the backend.\r\nThis combines the many benefits and flexibility of Blueprints\r\nwith the scalability and performance of Accumulo.\r\n\r\nIn addition to the basic Blueprints functionality, we provide a number\r\nof enhanced features, including:\r\n* Indexing implementations via `IndexableGraph` and `KeyIndexableGraph`\r\n* Support for mock, mini, and distributed instances of Accumulo\r\n* Numerous performance tweaks and configuration parameters\r\n* Support for high speed ingest\r\n* Hadoop integration\r\n\r\nFeel free to contact us with bugs, suggestions, pull requests,\r\nor simply how you are leveraging AccumuloGraph in your own work.\r\n\r\n\r\n## Getting Started\r\n\r\nFirst, include AccumuloGraph as a Maven dependency. Releases are deployed\r\nto Maven Central.\r\n\r\n```xml\r\n<dependency>\r\n\t<groupId>edu.jhuapl.tinkerpop</groupId>\r\n\t<artifactId>blueprints-accumulo-graph</artifactId>\r\n\t<version>0.2.1</version>\r\n</dependency>\r\n```\r\n\r\nFor non-Maven users, the binary jars can be found in the releases section in this\r\nGitHub repository, or you can get them from Maven Central.\r\n\r\nCreating an `AccumuloGraph` involves setting a few parameters in an\r\n`AccumuloGraphConfiguration` object, and opening the graph.\r\nThe defaults are sensible for using an Accumulo cluster.\r\nWe provide some simple examples below. Javadocs for\r\n`AccumuloGraphConfiguration` explain all the other parameters\r\nin more detail.\r\n\r\nFirst, to instantiate an in-memory graph:\r\n```java\r\nConfiguration cfg = new AccumuloGraphConfiguration()\r\n .setInstanceType(InstanceType.Mock)\r\n .setGraphName(\"graph\");\r\nreturn GraphFactory.open(cfg);\r\n```\r\n\r\nThis creates a \"Mock\" instance which holds the graph in memory.\r\nYou can now use all the Blueprints and AccumuloGraph-specific functionality\r\nwith this in-memory graph. This is useful for getting familiar\r\nwith AccumuloGraph's functionality, or for testing or prototyping\r\npurposes.\r\n\r\nTo use an actual Accumulo cluster, use the following:\r\n```java\r\nConfiguration cfg = new AccumuloGraphConfiguration()\r\n .setInstanceType(InstanceType.Distributed)\r\n .setZooKeeperHosts(\"zookeeper-host\")\r\n .setInstanceName(\"instance-name\")\r\n .setUser(\"user\").setPassword(\"password\")\r\n .setGraphName(\"graph\")\r\n .setCreate(true);\r\nreturn GraphFactory.open(cfg);\r\n```\r\n\r\nThis directs AccumuloGraph to use a \"Distributed\" Accumulo\r\ninstance, and sets the appropriate ZooKeeper parameters,\r\ninstance name, and authentication information, which correspond\r\nto the usual Accumulo connection settings. The graph name is\r\nused to create several backing tables in Accumulo, and the\r\n`setCreate` option tells AccumuloGraph to create the backing\r\ntables if they don't already exist.\r\n\r\nAccumuloGraph also has limited support for a \"Mini\" instance\r\nof Accumulo.\r\n\r\n\r\n## Improving Performance\r\n\r\nThis section describes various configuration parameters that\r\ngreatly enhance AccumuloGraph's performance. Brief descriptions\r\nof each option are provided here, but refer to the\r\n`AccumuloGraphConfiguration` Javadoc for fuller explanations.\r\n\r\n### Disable consistency checks\r\n\r\nThe Blueprints API specifies a number of consistency checks for\r\nvarious operations, and requires errors if they fail. Some examples\r\nof invalid operations include adding a vertex with the same id as an\r\nexisting vertex, adding edges between nonexistent vertices,\r\nand setting properties on nonexistent elements.\r\nUnfortunately, checking the above constraints for an\r\nAccumulo installation entails significant performance issues,\r\nsince these require extra traffic to Accumulo using inefficient\r\nnon-batched access patterns.\r\n\r\nTo remedy these performance issues, AccumuloGraph exposes\r\nseveral options to disable various of the above checks.\r\nThese include:\r\n* `setAutoFlush` - to disable automatically flushing\r\n changes to the backing Accumulo tables\r\n* `setSkipExistenceChecks` - to disable element\r\n existence checks, avoiding trips to the Accumulo cluster\r\n* `setIndexableGraphDisabled` - to disable\r\n indexing functionality, which improves performance\r\n of element removal\r\n\r\n### Tweak Accumulo performance parameters\r\n\r\nAccumulo itself features a number of performance-related parameters,\r\nand we allow configuration of these. Generally, these relate to\r\nwrite buffer sizes, multithreading, etc. The settings include:\r\n* `setMaxWriteLatency` - max time prior to flushing\r\n element write buffer\r\n* `setMaxWriteMemory` - max size for element write buffer\r\n* `setMaxWriteThreads` - max threads used for element writing\r\n* `setMaxWriteTimeout` - max time to wait before failing\r\n element buffer writes\r\n* `setQueryThreads` - number of query threads to use\r\n for fetching elements, properties etc.\r\n\r\n### Enable edge and property preloading\r\n\r\nAs a performance tweak, AccumuloGraph performs lazy loading of\r\nproperties and edges. This means that an operation such as\r\n`getVertex` does not by default populate the returned\r\nvertex object with the associated vertex's properties\r\nand edges. Instead, they are initialized only when requested via\r\n`getProperty`, `getEdges`, etc. These are useful\r\nfor use cases where you won't be accessing many of these\r\nproperties. However, if certain properties or edges will\r\nbe accessed frequently, you can set options for preloading\r\nthese specific properties and edges, which will be more\r\nefficient than on-the-fly loading. These options include:\r\n* `setPreloadedProperties` - set property keys\r\n to be preloaded\r\n* `setPreloadedEdgeLabels` - set edges to be\r\n preloaded based on their labels\r\n\r\n### Enable caching\r\n\r\nAccumuloGraph contains a number of caching options\r\nthat mitigate the need for Accumulo traffic for recently-accessed\r\nelements. The following options control caching:\r\n* `setVertexCacheParams` - size and expiry for vertex cache\r\n* `setEdgeCacheParams` - size and expiry for edge cache\r\n* `setPropertyCacheTimeout` - property expiry time,\r\n which can be specified globally and/or for individual properties\r\n\r\n\r\n## High Speed Ingest\r\n\r\nOne of Accumulo's key advantages is its ability for high-speed ingest\r\nof huge amounts of data. To leverage this ability, we provide\r\nan additional `AccumuloBulkIngester` class that\r\nexchanges consistency guarantees for high speed ingest.\r\n\r\nThe following is an example of how to use the bulk ingester to\r\ningest a simple graph:\r\n```java\r\nAccumuloGraphConfiguration cfg = ...;\r\nAccumuloBulkIngester ingester = new AccumuloBulkIngester(cfg);\r\n// Add a vertex.\r\ningester.addVertex(\"A\").finish();\r\n// Add another vertex with properties.\r\ningester.addVertex(\"B\")\r\n .add(\"P1\", \"V1\").add(\"P2\", \"V2\")\r\n .finish();\r\n// Add an edge.\r\ningester.addEdge(\"A\", \"B\", \"edge\").finish();\r\n// Shutdown and compact tables.\r\ningester.shutdown(true);\r\n```\r\n\r\nSee the Javadocs for more details.\r\nNote that you are responsible for ensuring that data is entered\r\nin a consistent way, or the resulting graph will\r\nhave undefined behavior.\r\n\r\n\r\n## Hadoop Integration\r\n\r\nAccumuloGraph features Hadoop integration via custom input and output\r\nformat implementations. `VertexInputFormat` and `EdgeInputFormat`\r\nallow vertex and edge inputs to mappers, respectively. Use as follows:\r\n```java\r\nAccumuloGraphConfiguration cfg = ...;\r\n\r\n// For vertices:\r\nJob j = new Job();\r\nj.setInputFormatClass(VertexInputFormat.class);\r\nVertexInputFormat.setAccumuloGraphConfiguration(j, cfg);\r\n\r\n// For edges:\r\nJob j = new Job();\r\nj.setInputFormatClass(EdgeInputFormat.class);\r\nEdgeInputFormat.setAccumuloGraphConfiguration(j, cfg);\r\n```\r\n\r\n`ElementOutputFormat` allows writing to an AccumuloGraph from\r\nreducers. Use as follows:\r\n```java\r\nAccumuloGraphConfiguration cfg = ...;\r\n\r\nJob j = new Job();\r\nj.setOutputFormatClass(ElementOutputFormat.class);\r\nElementOutputFormat.setAccumuloGraphConfiguration(j, cfg);\r\n```\r\n\r\n## Rexster Configuration\r\nBelow is a snippet to show an example of AccumuloGraph integration with Rexster. For a complete list of options for configuration, see [`AccumuloGraphConfiguration$Keys`](https://github.com/JHUAPL/AccumuloGraph/blob/master/src/main/java/edu/jhuapl/tinkerpop/AccumuloGraphConfiguration.java#L110) \r\n\r\n```xml\r\n<graph>\r\n\t<graph-enabled>true</graph-enabled>\r\n\t<graph-name>myGraph</graph-name>\r\n\t<graph-type>edu.jhuapl.tinkerpop.AccumuloRexsterGraphConfiguration</graph-type>\r\n\t<properties>\r\n\t\t<blueprints.accumulo.instance.type>Distributed</blueprints.accumulo.instance.type>\r\n\t\t<blueprints.accumulo.instance>accumulo</blueprints.accumulo.instance>\r\n\t\t<blueprints.accumulo.zkhosts>zk1,zk2,zk3</blueprints.accumulo.zkhosts>\r\n\t\t<blueprints.accumulo.user>user</blueprints.accumulo.user>\r\n\t\t<blueprints.accumulo.password>password</blueprints.accumulo.password>\r\n\t</properties>\r\n\t<extensions>\r\n\t</extensions>\r\n</graph>\r\n```\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."} |