qfel
qfel
ATApache TinkerPop
Created by qfel on 6/15/2024 in #questions
Setting index in gremlin-python
I was trying to create vote graph from tutorial on loading data in gremlin-python and afaik you can't simply add index from non-JVM languages because for example there is no TinkerGraph that you could .open(). I don't know how better is performance when having index on 'userId' but my code simply takes too long go through queries from vote file. I tried using client functionality
ws_url = 'ws://localhost:8182/gremlin'

# Create index on userId
client = Client(ws_url, 'g')
client.submit('graph = TinkerGraph.open()')
client.submit("graph.createIndex('userId', Vertex.class)")
client.close()

conn = DriverRemoteConnection(ws_url, 'g')
g = traversal().with_remote(conn)
ws_url = 'ws://localhost:8182/gremlin'

# Create index on userId
client = Client(ws_url, 'g')
client.submit('graph = TinkerGraph.open()')
client.submit("graph.createIndex('userId', Vertex.class)")
client.close()

conn = DriverRemoteConnection(ws_url, 'g')
g = traversal().with_remote(conn)
to do it from string query and i'm not sure if with_remote(conn) uses previously assigned graph, let me know how to do it correctly. I'm not sure how to assign to g from client.submit(...). Additionally: how does one speed up those queries, if setting index won't do it? In my implementation
def idToNode(g: GraphTraversalSource, id: str):
return g.V().has('user', 'userId', id) \
.fold() \
.coalesce(__.unfold(),
__.add_v('user').property('userId', id)) \
.next()

def loadVotes():
with open("/tmp/wiki-Vote.txt", "r") as file:
for _ in range(4):
next(file)

for line in file:
ids = line.split('\t')
from_node = idToNode(g, ids[0])
to_node = idToNode(g, ids[1])
g.add_e('votesFor').from_(from_node).to(to_node).iterate()
def idToNode(g: GraphTraversalSource, id: str):
return g.V().has('user', 'userId', id) \
.fold() \
.coalesce(__.unfold(),
__.add_v('user').property('userId', id)) \
.next()

def loadVotes():
with open("/tmp/wiki-Vote.txt", "r") as file:
for _ in range(4):
next(file)

for line in file:
ids = line.split('\t')
from_node = idToNode(g, ids[0])
to_node = idToNode(g, ids[1])
g.add_e('votesFor').from_(from_node).to(to_node).iterate()
call to idToNode for each line takes too long.
54 replies