Thought I Created A JanusGraph Vertex Index But I Guess Not; What Went Wrong?

234 Views Asked by At

Why is JanusGraph not finding-and-using the index I made at time of search?

Please note, I need this to work with the Java implementation specifically; not Gremlin Console.
And this needs to work with the latest JanusGraph releases; not older versions,
Especially not with different major-version release-numbers.

[WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();

So I expect for class Vertex (which is all of vertexes) to be using the index.
But what I actually see is the warning that this is not so; and
I experience is data returned and changed to be slower than expected.

2023-05-09 15:18:38,678 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:38,759 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:18:39,215 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:18:39,468 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=60cfcdc3)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,691 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c21376-rmt-lap-win201
2023-05-09 15:18:39,707 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:39,732 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:18:39,766 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=3d0c7306)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,786 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 15:18:39,897 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 15:18:39,919 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T20:18:39.919098Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
2023-05-09 15:18:39,971 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-09 15:18:41,311 [INFO] [Main.main] ::    g.V().count().next():  69999
2023-05-09 15:18:41,314 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes

The index seems to exist, because trying to create will cause IllegalArgumentException: An index with name '_id' has already been defined.
I don't delete the index in-between runs, so there should be more than enough time for JanusGraph to create the index, especially if I let the container run overnight.

...

2023-05-09 15:21:38,946 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 15:21:39,053 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 15:21:39,079 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T20:21:39.079040Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalArgumentException: An index with name '_id' has already been defined
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
    at org.janusgraph.graphdb.database.management.ManagementSystem.checkIndexName(ManagementSystem.java:661)
    at org.janusgraph.graphdb.database.management.ManagementSystem.createCompositeIndex(ManagementSystem.java:728)
    at org.janusgraph.graphdb.database.management.ManagementSystem.access$300(ManagementSystem.java:130)
    at org.janusgraph.graphdb.database.management.ManagementSystem$IndexBuilder.buildCompositeIndex(ManagementSystem.java:824)
    at Test3.main(Test3.java:24)

I've done this before with OrientDB and thought I understood how to work with Tinkerpop3 + Gremlin-related graph-databases then. But as I'm transitioning to JanusGraph from OrientDB they don't seem to have the same process of using indexes after they've been made.

        OrientGraph orientGraph = OrientGraph.open(configuration);
        if (!orientGraph.getVertexIndexedKeys("V").contains("_id"))
            orientGraph.createVertexIndex("_id", "V", configuration);
        orientGraph.commit();

Update

Update 1

JanusGraph never seems to make the index requested; same thing as before but now:

...

2023-05-11 07:49:16,383 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-11T12:49:16.382971Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@6ad16c5d
2023-05-11 07:49:16,512 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED

...

2023-05-11 07:50:16,913 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
2023-05-11 07:50:16,998 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,020 [INFO] [Main.main] ::    g.V().count().next():  0
2023-05-11 07:50:17,025 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,032 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-11 07:50:17,038 [INFO] [Main.main] ::    g.V().count().next():  0

Process finished with exit code 0

Awaiting now for JanusGraph to create index by casting JanusGraph.openManagement() from JanusGraphManagement to ManagementSystem.

I've already tried variations of janusGraphManagement.rollback() even though this is a brand-new container with no data. Here's the log output to show what messages I get:

...

2023-05-11 08:16:16,302 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-11T13:16:16.302033Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalStateException: Operation cannot be executed because the enclosing transaction is closed
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.verifyOpen(StandardJanusGraphTx.java:369)
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getRelationType(StandardJanusGraphTx.java:1080)
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.getOrCreatePropertyKey(StandardJanusGraphTx.java:1142)
    at org.janusgraph.graphdb.database.management.ManagementSystem.getOrCreatePropertyKey(ManagementSystem.java:1478)
    at Test7.main(Test7.java:20)
...

2023-05-11 08:18:50,029 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-11T13:18:50.029694Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalStateException: This management system instance has been closed
    at com.google.common.base.Preconditions.checkState(Preconditions.java:502)
    at org.janusgraph.graphdb.database.management.ManagementSystem.ensureOpen(ManagementSystem.java:228)
    at org.janusgraph.graphdb.database.management.ManagementSystem.commit(ManagementSystem.java:233)
    at Test7.main(Test7.java:17)

Code

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.graphdb.database.management.ManagementSystem;

public class Test7 {
    private static final Logger logger = LogManager.getLogger(Main.class);

    public static void main(String[] args) throws InterruptedException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:19042").open();
        ManagementSystem janusGraphManagement = (ManagementSystem) janusGraph.openManagement();
        janusGraphManagement.rollback();
        janusGraphManagement = (ManagementSystem) janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        if (!janusGraphManagement.containsGraphIndex("_id"))
            janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraphManagement.commit();
        ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
//        janusGraphManagement.updateIndex(janusGraphManagement.getGraphIndex("_id"), SchemaAction.REINDEX);
        GraphTraversalSource g = janusGraph.traversal();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        g.V().drop().iterate();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        janusGraph.close();
    }
}
2

There are 2 best solutions below

0
Zach On BEST ANSWER

Short answer: JanusGraph wants it SchemaStatus.ENABLED.
Which I can get ENABLED, but the issue still persists for Cassandra.

[INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED

Logs

...

2023-05-15 09:58:05,984 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-15T14:58:05.984686Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@48b4a043
2023-05-15 09:58:06,047 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-15 09:58:06,270 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:58:06,275 [INFO] [Main.main] ::    drop g.V().hasLabel("entity").count().next():  0
2023-05-15 09:58:07,020 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:58:07,033 [INFO] [Main.main] ::    addVertex g.V().hasLabel("entity").count().next(): 1
2023-05-15 09:58:07,046 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED

...

[o.j.g.d.m.GraphIndexStatusWatcher.main] ::  Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
2023-05-15 09:59:07,520 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
2023-05-15 09:59:07,522 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:59:07,531 [INFO] [Main.main] ::    awaitGraphIndexStatus g.V().hasLabel("entity").count().next(): 1
2023-05-15 09:59:07,627 [INFO] [o.j.g.o.j.IndexRepairJob.Thread-68] ::   Index _id metrics: success-tx: 2 doc-updates: 0 succeeded: 1

...

2023-05-15 09:59:08,297 [INFO] [o.j.g.d.m.ManagementSystem.Thread-51] ::     Index update job successful for [_id]
2023-05-15 09:59:08,299 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:59:08,304 [INFO] [Main.main] ::    updateIndex g.V().hasLabel("entity").count().next():   1

Process finished with exit code 0

Code

public class Test {
    private static final Logger logger = LogManager.getLogger(Main.class);
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        g.V().drop().iterate();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphVertex vertex = janusGraph.addVertex("entity");
        vertex.property("_id", "Test1");
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        if (!janusGraphManagement.containsGraphIndex("_id"))
            janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraphManagement.commit();
        janusGraphManagement = janusGraph.openManagement();
        GraphIndexStatusReport report = ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
        logger.info("awaitGraphIndexStatus g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphIndex test = janusGraphManagement.getGraphIndex("_id");
        janusGraphManagement.updateIndex(test, SchemaAction.REINDEX).get();
        janusGraphManagement.commit();
        logger.info("updateIndex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        janusGraph.close();
    }
}
5
Kelvin Lawrence On

After creating the index you need to wait for it to become available before queries will take advantage of it. This is described here

We can test this using a simple inmemory JanusGraph instance and the Gremlin Console.

// New graph
gremlin> graph = JanusGraphFactory.open('inmemory')
==>standardjanusgraph[inmemory:[127.0.0.1]]

gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]

// Define our schema - just one vertex key for now
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@133aacbe

gremlin> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name

gremlin> mgmt.commit()
==>null

// This is a key step, always make sure no other transactions are in flight
gremlin> graph.tx().rollback()
==>null

// Create the index
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@65f3e805

gremlin> name = mgmt.getPropertyKey('name')
==>name

gremlin> mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
==>byNameComposite

gremlin> mgmt.commit()
==>null

gremlin> mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()
==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=[REGISTERED], notConverged={}, converged={name=REGISTERED}, elapsed=PT0.002S]

// Add a vertex to the graph
gremlin> g.addV('Dog').property('name','Baxter')
==>v[4344]

// Note this still does not use the index as we have not re-indexed yet
gremlin> g.V().has('name','Baxter')
17:50:25 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(name = Baxter)]. For better performance, use indexes
==>v[4344]

// Re-index
gremlin>  mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7af1d072

gremlin> mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics@53d30d23

gremlin> mgmt.commit()
==>null

gremlin> mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').status(SchemaStatus.ENABLED).call()
==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=[ENABLED], notConverged={}, converged={name=ENABLED}, elapsed=PT0.001S]

// Now it works!
gremlin> g.V().has('name','Baxter')
==>v[4344]