Update PageRank of Existing Dataset in Janus / Nebula Graph Database

78 Views Asked by At

I’m using JanusGraph / Nebula Graph to calculate the page rank of a super large dataset (hundreds of billions of pages, trillions of edges). Every day tens of millions of new pages are indexed & I want to add the new pages to the graph and update the page rank of all of the existing pages (as new pages can contain links to previously indexed pages and vice versa). However, I don’t want to have to compute the PageRank of all existing pages from scratch. I only want to feed the new data into the system and compute the PageRank of existing pages based on new data. In other word, I don’t want to perform the same computation every day from scratch.

Is there a way to save the existing page rank model so that I only have to compute PageRank of the newly indexed pages w/o starting the process from scratch?

1

There are 1 best solutions below

0
HadoopMarc On

Sure, the following paper should give relevant links: https://www.researchgate.net/publication/340281398_DiffPageRank_an_efficient_differential_PageRank_approach_in_MapReduce

As to the implementation, Apache TinkerPop allows to run a custom VertexProgram