I am unsure on how to use Python to add graphs to JanusGraph

I am having a difficult time using python-gremlin. I am really unsure as to how I can create a graph, create vertices and edges, and then upload it to the database. Could someone provide a guide on how to do these things? I followed the JanusGraph tutorial as well as the Tinkerpop tutorials on how to use gremlin-python but nothing seems to be working for me.
Solution:
the key bit to understand with gremlin-python is that you can only use it to query/mutate a graph with the Gremlin language and that graph must be hosted in Gremlin Server (or be compliant with its protocol, like Amazon Neptune). other functions like, "create a graph" or access provider specific functions like creating indices are not possible with the Gremln language and therefore not possible iwth gremlin-python (or other non-JVM programming languages that support Gremln). if you are wholly new to TinkerPop and JanusGraph (and maybe graph databases themselves), my recommendation would be to not start with JanusGraph. It's natural to want to dive right in to the graph you want to use, but I think it's better to take a slower approach. i think the learning process is like:...
Jump to solution
20 Replies
Solution
spmallette
spmallette4mo ago
the key bit to understand with gremlin-python is that you can only use it to query/mutate a graph with the Gremlin language and that graph must be hosted in Gremlin Server (or be compliant with its protocol, like Amazon Neptune). other functions like, "create a graph" or access provider specific functions like creating indices are not possible with the Gremln language and therefore not possible iwth gremlin-python (or other non-JVM programming languages that support Gremln). if you are wholly new to TinkerPop and JanusGraph (and maybe graph databases themselves), my recommendation would be to not start with JanusGraph. It's natural to want to dive right in to the graph you want to use, but I think it's better to take a slower approach. i think the learning process is like: 1. do you know the basics of Gremlin? if not, just try to load a subset of your data into a TinkerGraph using Gremlin Console. run some queries over your data and just have an easy success doing that. 2. do you know Gremlin Server? if not, install it locally, connect with Gremlin Console to the default TinkerGraph. maybe configure that TinkerGraph to load the data you established in step (1) above. 3. do you know gremlin-python? if not, make sure you can easily connect to what you set up in step (2) above. write some queries, load some more data, etc. If you can work all the way through step 3 comfortably then you are likely ready for JanusGraph. I would again suggest starting slowly there and focus on using BerkelyDB as your backend (unless you are already expert at cassandra/hbase). Get that running inside of JanusServer, which is just a specially packaged Gremlin Server, hence step (2). along the way, folks here are happy to try to help you but i think we'll need some more specifics about what you've tried so far and where you feel you're stuck.
b4lls4ck
b4lls4ck4mo ago
Hi @spmallette , thank you for this detailed answer, I appreciate it. So basically, you cannot create a graph using gremlin-python, you can only use gremlin-python to read/write an already existing graph? So far in terms of how I've been learning this, I have been able to complete point 1. where I created a graph in the gremlin console as follows:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> v1 = graph.addVertex(label, "person", "name", "John")
==>v[0]
gremlin> v2 = graph.addVertex(label, "person", "name", "Mary")
==>v[2]
gremlin> v1.addEdge("knows", v2)
==>e[4][0-"knows"->2]
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = traversal().withEmbedded(graph)
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> v1 = graph.addVertex(label, "person", "name", "John")
==>v[0]
gremlin> v2 = graph.addVertex(label, "person", "name", "Mary")
==>v[2]
gremlin> v1.addEdge("knows", v2)
==>e[4][0-"knows"->2]
However, I am not sure if this is what you meant in point 1 as I really am not loading in any of my own data from a csv or any external file, just hardcoding it. I am not familiar with how to load in data using Gremlin console and I really couldn't find anything on this topic so I have just created this graph with hardcoded data. for point 2. I am using Docker to run the Gremlin server, I am on Windows btw. The Gremlin console I ran in point 1 would not run unless the docker container running the Gremlin server was running, so I think I have completed this part as the Gremlin console runs only if the Gremlin server is running (so they must be connected). for point 3. I have researched extensively how to use gremlin-python, however I have not been able to successfully query/mutate an existing graph using gremlin-python, i really am not sure how to access the graph that I made in point 1. I'm finding the documentation for gremlin-python difficult to understand. I'll take a look into BerkelyDB, is it a graph database that's free to use as well? The main reason why I wanted to use Janusgraph is because it is free and I need to create a knowledge graph in Python Thanks for your time!
spmallette
spmallette4mo ago
correct...gremlin-python, gremlin-go, gremlin-javascript, etc. (i.e. any non-JVM language) cannot create a graph instance. note further that your example for gremlin co sole will not work for gremlin-python, because you are not using the Gremlin langiage there to add vertices and edges. you are using what we call the Graph API and something we dont recommend folks use. you should only use Gremlin when interacting with the graph. so, you would instead refer to "g" and do g.addV('person').property('name','mary') to create your vertex. note than in python, we like you to feel comfortable with pythonic syntax so addV() becomes add_v but other than that its basically the same. as for BerkeleyDB, it is free, but its not a graph database. janusgraph simply offers it as a backend within which it stores your data. with JanusGraph you must choose your backend. I think most folks ultimately use Cassandra but thats a whole additional layer of complexity and learning if youre not familiar with it. so, start with berkeleydb until you figure out janusgraph as the usual progression https://docs.janusgraph.org/storage-backend/bdb/
b4lls4ck
b4lls4ck4mo ago
I see okay, so in gremlin console I should do
g.addV('person').property('name','mary')
g.addV('person').property('name','mary')
and then in python / gremlim-python, I should do
g.add_v('person').property('name','mary')
g.add_v('person').property('name','mary')
But I am confused as to how to set up my Python script (what should g be? How is g initialized, what should I import?) Where could I find more detailed documentation on gremlin-python and its methods? I think I may be looking in the wrong places Is Cassandra something that can be quickly picked up and installed? If I choose to use BerkelyDB in the meantime as my backend, is there Python support for BerkelyDB or it exclusively for JVM language?
spmallette
spmallette4mo ago
you can find gremlin-python docs here: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python it discusses how to create "g", common imports, etc it details differences between archtypal Gremlin and the python variant but its not going to teach you Gremlin...for that you refer to the rest of the documentation also, id recommend this free online book to help learning the Gremlin language: https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
b4lls4ck
b4lls4ck4mo ago
Hmm okay, so following this gremlin-python docs in the 1st link, I wrote this in Python:
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

g = traversal().with_remote(DriverRemoteConnection("ws://localhost:8182/gremlin", "g"))

v1 = g.addV("person").property("name", "John")
print(v1)
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

g = traversal().with_remote(DriverRemoteConnection("ws://localhost:8182/gremlin", "g"))

v1 = g.addV("person").property("name", "John")
print(v1)
printing v1 outputs the following:
[['addV', 'person'], ['property','name','John']]
[['addV', 'person'], ['property','name','John']]
Is this the intended behaviour? I am assuming if I have the Cassandra backend setup with the JanusGraph, I'll then be able to view this vertex in Cassandra DB
spmallette
spmallette4mo ago
you are missing something important. a Gremlin traversal must be iterated for it to do something. https://tinkerpop.apache.org/docs/current/tutorials/the-gremlin-console/#result-iteration you need a terminal step to make that happen. i your case you would add next() to take the first item from the traversal (ie the new vertex) typically most times you will use to_list() you dont usually do this in Gremlin Console because it automatically does that stuff for you
b4lls4ck
b4lls4ck4mo ago
hmmm Okay, so If I understood correctly, I need to do
v1 = g.addV("person").property("name", "John")
print(v1).next()
v1 = g.addV("person").property("name", "John")
print(v1).next()
or
v1 = g.addV("person").property("name", "John")
print(v1).to_list()
v1 = g.addV("person").property("name", "John")
print(v1).to_list()
I tried both of these out seperately but both times on this exact line I would get an error saying
OSError: [WinError 87] The parameter is incorrect
OSError: [WinError 87] The parameter is incorrect
spmallette
spmallette4mo ago
regarding your question on cassandra, technically your data is in cassandra but i think its encoded so you wont be able to really see it directly its like g.V()....next() the termination step goes on the traversal itself to tell itbhow to iterate
b4lls4ck
b4lls4ck4mo ago
Oh yes I actually typed it wrong here on discord, I meant to write
v1 = g.addV("person").property("name", "John").next()
print(v1)
v1 = g.addV("person").property("name", "John").next()
print(v1)
Which is what I had in my code, and it would yield the following error:
KeyError: <DataType.custom: 0>
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
....
OSError: [WinError 87] The parameter is incorrect
KeyError: <DataType.custom: 0>
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
....
OSError: [WinError 87] The parameter is incorrect
I am not too sure on why this would be occurring , I have seen it a couple of times
spmallette
spmallette4mo ago
is that connecting to JanusGraph? or are you still working with Gremlin Server and TinkerGraph? also, some python examples here: https://github.com/apache/tinkerpop/tree/master/gremlin-python/src/main/python/examples
GitHub
tinkerpop/gremlin-python/src/main/python/examples at master · apach...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
b4lls4ck
b4lls4ck4mo ago
I am using docker to run JanusGraph and I ran a command that is similar to the first command listed on this page: https://docs.janusgraph.org/getting-started/installation/ The exact command I run is:
docker run -it -p 8182:8182 --name janusgraph-default janusgraph/jannusgraph:latest
docker run -it -p 8182:8182 --name janusgraph-default janusgraph/jannusgraph:latest
which is similar to that first command on that page. I needed to modify it because i need to map port 8182 of the JanusGraph container to port 8182 of my machine (the host machine). Without the "-it -p 8182:8182" part I would get the following error when running the python script:
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:8182 ssl:default [The remote computer refused the network connection]
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001FFAA7B2210>
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:8182 ssl:default [The remote computer refused the network connection]
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001FFAA7B2210>
But since I am mapping the port of the container to the port of my machine, I am no longer getting that error, and am now facing this new error mentioned earlier So I would say that my python script is connecting to JanusGraph as opposed to the Gremlin Server Thanks for sharing the Python examples, they look very helpful Is there any way of starting a JanusGraph server in Windows? I cannot run the /bin/janusgraph-server.sh script as I am on Windows, and there is no /bin/janusgraph-server.bat file included in the JanusGraph distribution I downloaded (1.0.0). This is the very reason why I am using Docker instead
spmallette
spmallette4mo ago
So I would say that my python script is connecting to JanusGraph as opposed to the Gremlin Server
just to get the wording right for clarity, you are running Janus Server in that docker container, and therefore you are connecting to Gremln Server hosting a JanusGraph instance.
Is there any way of starting a JanusGraph server in Windows?
maybe someone from @janusgraph can provide some more details on Windows and for your other question on the KeyError you mentioned here: https://discord.com/channels/838910279550238720/1260358985974812743/1260401279520473108 I assume the KeyError has something to do with serialization. perhaps you need this add-on: https://github.com/JanusGraph/janusgraph-python
GitHub
GitHub - JanusGraph/janusgraph-python: JanusGraph Python Gremlin La...
JanusGraph Python Gremlin Language Variant (GLV). Contribute to JanusGraph/janusgraph-python development by creating an account on GitHub.
b4lls4ck
b4lls4ck4mo ago
Yes exactly, I am running a Janus Server in that docker container Now in JanusGraph 1.0.0. distribution, there is a gremlin-server.bat file which I can run as I am on Windows, upon running it, it says it has connected to port 8182. I run my Python script afterwards
g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
v1=g.add_v("person").property("name", "John").next()
g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
v1=g.add_v("person").property("name", "John").next()
and I get an error outputted by the Python script saying
gremlin_python.driver.protocol.GremlinServerError: 499: The traversal source [g] for alias [g] is not configured on the server. bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
.
.
.
OSError: [WinError 87] The parameter is incorrect
gremlin_python.driver.protocol.GremlinServerError: 499: The traversal source [g] for alias [g] is not configured on the server. bin/gremlin-server.sh conf/gremlin-server/gremlin-server-configuration.yaml
.
.
.
OSError: [WinError 87] The parameter is incorrect
Now I will try out the add on above involving serialization, but do you think this different error could be due to another reason?
Florian Hockmann
The traversal source [g] for alias [g] is not configured on the server
This means that an error occurred while starting the server. You need to inspect the server logs to investigate deeper While it's in general possible to run JanusGraph on Windows, it's not really officially supported and also not tested during the development process. So, I'd suggest against doing that and instead to use the Docker container
b4lls4ck
b4lls4ck4mo ago
Hi @Florian Hockmann Thank you for letting me know that Janus doesn't have official Windows support, I have used the Docker container, and in this message I am replying to, I go into detail about the error I would get when using the Docker container, are there any suggestions or ways I could address this issue?
Florian Hockmann
What's the output of docker logs [container-id-of-the-janusgraph-container]?
b4lls4ck
b4lls4ck4mo ago
it's quite a lot to send here , but the output of my docker logs are equivalent to what gets printed when starting the Janus Graph container itself, My Python script has been modified to look like this:
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal
from janusgraph_python.driver.serializer import JanusGraphSONSerializersV3d0

g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8182/gremlin','g',message_serializer=JanusGraphSONSerializersV3d0()))

v1 = g.add_v("person").property("name", "John").next()
print(v1)
v2 = g.add_v("person").property("name", "Mary").next()
print(v2)
e = g.V(v1).add_e("knows").to(v2).iterate()
print(g.V().has("person", "name", "John").out("knows").values("name").to_list())
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal
from janusgraph_python.driver.serializer import JanusGraphSONSerializersV3d0

g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8182/gremlin','g',message_serializer=JanusGraphSONSerializersV3d0()))

v1 = g.add_v("person").property("name", "John").next()
print(v1)
v2 = g.add_v("person").property("name", "Mary").next()
print(v2)
e = g.V(v1).add_e("knows").to(v2).iterate()
print(g.V().has("person", "name", "John").out("knows").values("name").to_list())
What is outputted is the following, still getting similar error , but I am now able to successfully add vertices using the add_v method, create an edge between the two vertices using add_e method, and print out the name of the person that John knows (Mary):
v[4184]
v[8296]
['Mary']
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
Traceback (most recent call last):
File "C:\Python312\Lib\asyncio\proactor_events.py", line802, in _loop_self_reading
.
.
.
OSError: [WinError 87] The parameter is incorrect
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
v[4184]
v[8296]
['Mary']
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
Traceback (most recent call last):
File "C:\Python312\Lib\asyncio\proactor_events.py", line802, in _loop_self_reading
.
.
.
OSError: [WinError 87] The parameter is incorrect
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
It's just I still get the OSError WinError 87 ,
spmallette
spmallette4mo ago
a minor point but this line:
e = g.V(v1).add_e("knows").to(v2).iterate()
e = g.V(v1).add_e("knows").to(v2).iterate()
should probably just be:
g.V(v1).add_e("knows").to(v2).iterate()
g.V(v1).add_e("knows").to(v2).iterate()
the iterate() termination step really doesn't have a return value. i suppose that technically it returns the current Traversal object that would already be iterated to its end and no longer server a purpose, but that' sit. other than that, it looks like your code is working except for that. maybe you could try closing some resources before exiting?
...

rc = DriverRemoteConnection('ws://localhost:8182/gremlin','g',message_serializer=JanusGraphSONSerializersV3d0())
g = traversal().with_remote(rc)

...

rc.close()
...

rc = DriverRemoteConnection('ws://localhost:8182/gremlin','g',message_serializer=JanusGraphSONSerializersV3d0())
g = traversal().with_remote(rc)

...

rc.close()
b4lls4ck
b4lls4ck4mo ago
Wow That fixed it! thank you so much! I really appreciate your help and your super detailed explanations, thank you!!!!! Will post any other questions I have on a new thread if I have
Want results from more Discord servers?
Add your server