My use case is a graph of several hundreds of millions of vertices (say 100M to 1B). Each vertex has a set of 10 properties which are basically scores that are computed based on the weights of the vertex's edges and the scores of the adjacent vertices. When adding (or removing) nodes in the graph, the scores of all the vertices potentially need to be recomputed. This doesn't need to be done in real time, and thus this is definitely an OLAP/batch use case. There are also some very simple graph OLTP requirements, which are basically just reading the scores of a given vertex and its adjacent nodes. I am trying to determine whether I should go with either of the following approaches: 1- Giraph: this would imply exporting the whole graph in a file format, loading it into Giraph, and then loading the results back into whatever datastore is used to persist the graph (Neo4J, Neptune, JanusGraph, HBase, RDBMS...). 2- Tinkerpop3's GraphComputer: if I understand correctly, I could run the OLAP graph update algorithm directly on a Tinkerpop3-compatible graph DB (JanusGraph, Neptune, other?), and thus solve both the OLAP and OLTP use case with a single tool, without having to do additional data import/export.
Graph OLAP processing - Giraph vs. Tinkerpop3 GraphComputer
534 Views Asked by Fabien Coppens At
1
There are 1 best solutions below
Related Questions in GRAPH-THEORY
- Algorithm for total flow through weighted directed acyclic graph
- Finding path with smallest GCD of nodes's weights in directed graph
- The plot function in the 'gRc' library gives an error (also in the demo)
- Color edges distinctly in network based on attribute value
- Make a stack of adjacency matrices from a dataframe in R
- What is an efficient algorithm to identify multi-degree email chains in a mock company network?
- Approximation Algorithms for the Longest Simple Path in a Directed Graph
- Eliminate edges in a routing graph which aren't used in the shortest path between a subset of nodes
- PageRank Algorithm on a Graph with a Sink Node
- Algorithm to cover time periods
- Prims minimum spanning
- DFS Maze generation
- Find the node with the minimum maximum distance in a graph
- Undirected connected graph - Finding edges with specific weight that belong to MST
- Why is my graph coloring code not coloring the graph correctly?
Related Questions in OLAP
- OLAP Cube process error (A connection could not be made to the data source with the DataSourceID of 'Example_SQL', Name of 'ExampleSQL'.)
- Issues with deploying a project in SSAS: "Cannot deploy metadata"
- Why some of existing user not able to see newly added DB in clickhouse
- Filtering OLAP Pivot Table based on dates
- My SQL Server Analysis services won't switch from tabular to multidimensional
- Aggregating Members calculated with IIF / CurrentMember conditions
- markUsed is not marking segments as unUsed in druid
- How to add and backfill a column to a ClickHouse materialized vew and the underlying table
- Does TiDB Serverless support TiFlash for analytical jobs (OLAP) just like TiDB?
- Need to create a report in Apache Kylin using SSB sample data
- Excel CUBEMEMBER return multiple matches
- Calculated Member for Time Periods Hierarchy
- MDX get DISTINCT COUNT for Month to Date
- Relationship between dimension and measures
- Create Current Month
Related Questions in TINKERPOP
- The most efficient way to compare 2 aggregates (set of vertices) in Gremlin Query Language
- Migrating a node label in gremlin tinkerpop
- Neptune query behavior during edge creation
- Running a local tinkerpop gremlin graph for testing in nodejs
- gremlin query to count by path-length all paths between nodes that share the same (specified) label
- Gremlin.net throwing exception when using tree
- I am creating client application in java using tinkerpop for gremlin but i am unable to query through my application. JanusGraph and Gremlin server
- Gephi not displaying graph from gremlin console .......1. error
- Why I could not do E step at the middle of a Gremlin query?
- Fetch substring of a field from the edge of a graph
- Tinkerpop Gremlin Query: Find all edges pointing back to a vertex in the current path
- gremlin query for cycle detection works for complex graph but returns nothing for simple one
- select and arrange multiple result within single projection
- Gremlin query to check the cycle nature of the graph
- Modify the gremlin Query to add a new field along with existing one
Related Questions in GIRAPH
- Pregel for Dynamic Graph Processing and Graph Streams?
- Set JVM flags in an Apache Giraph job
- Will YARN working on NUMA respect node memory locality?
- Where are Apache Giraph logs (with log4j) located?
- Job failed as tasks failed. failedMaps:1 failedReduces:0 exception while using hadoop and giraph
- giraph building error at "Apache Giraph Parent"
- Apache Giraph: Read in postgres rows as vertices?
- YARN Giraph application on Google Cloud - fat jar not found
- 4-profiles calculus of big graph with apache giraph
- giraph fails only on large graphs after warn "likely client has closed socket"
- Is there a way to activate Giraph Stats in giraph built for yarn?
- how determine the number of workers of giraph to set in -w argument?
- Is it correct that master runs on a datanode?
- What does it change in hadoop usage by giraph built with -Phadoop_2 and by giraph built with -Phadoop_yarn?
- what is the difference between two downloadable versions of giraph: 1.2giraph-dist-1.2.0-hadoop2-bin.tar.gz and giraph-dist-1.2.0-bin.tar.gz
Related Questions in OLTP
- Why some of existing user not able to see newly added DB in clickhouse
- OLTP-STAGGING-DW-DUPLICATES
- OLAP vs OLTP - Fundamental difference
- How to send authenticated traces to tempo cloud using OLTP exporter python?
- How measure attributes values are populated in fact table?
- Best practice to size Autonomous Transaction Processing (ATP) (number of OCPUs) for APEX on ATP?
- OLAP architecture?
- What do Transaction means in OLTP?
- OLAP/OLTP: What does "online" mean?
- OLTP type data in a OLAP system
- Data warehouse - OLAP vs OLTP vs Dimentional model vs Relational model
- Google Cloud Architecture: Can a data lake be used for OLTP?
- MS Access get row ID of Update or intercept Update
- How deep to go when denormalising
- creating SQL memory table visual studio
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
If you are not yet getting the Graph OLAP performance you need or if moving data to Spark is proving slow or cumbersome, I suggest you take a look at AnzoGraph. It was programmed by the same team who built Netezza and ParAccel/Redshift.
AnzoGraph is a from-the-ground-up C/C++ HPC implementation of a massively parallel processing native Graph OLAP (GOLAP) engine - i.e. data warehouse style interactive or batch reporting analytics and aggregation of graph data. It is very high performance and scales linearly on commodity computers, so will handle the data set you mention (you may not even need a cluster for that size data). At time of writing it does not support Tinkerpop/Gremlin which may be a problem for you. It does support SPARQL1.1 as well as RDF* (property graphs support which is not yet part of W3C SPARQL standard) and many additional extension functions/aggregate functions necessary for regular analytics. It also supports inference, named queries, views, various graph algorithms etc
Disclaimer: I work for Cambridge Semantics.
anzograph