I am unsure on how to use Python to add graphs to JanusGraph
I am having a difficult time using python-gremlin. I am really unsure as to how I can create a graph, create vertices and edges, and then upload it to the database. Could someone provide a guide on how to do these things? I followed the JanusGraph tutorial as well as the Tinkerpop tutorials on how to use gremlin-python but nothing seems to be working for me.
Solution:Jump to solution
the key bit to understand with gremlin-python is that you can only use it to query/mutate a graph with the Gremlin language and that graph must be hosted in Gremlin Server (or be compliant with its protocol, like Amazon Neptune). other functions like, "create a graph" or access provider specific functions like creating indices are not possible with the Gremln language and therefore not possible iwth gremlin-python (or other non-JVM programming languages that support Gremln).
if you are wholly new to TinkerPop and JanusGraph (and maybe graph databases themselves), my recommendation would be to not start with JanusGraph. It's natural to want to dive right in to the graph you want to use, but I think it's better to take a slower approach.
i think the learning process is like:...
29 Replies
Solution
the key bit to understand with gremlin-python is that you can only use it to query/mutate a graph with the Gremlin language and that graph must be hosted in Gremlin Server (or be compliant with its protocol, like Amazon Neptune). other functions like, "create a graph" or access provider specific functions like creating indices are not possible with the Gremln language and therefore not possible iwth gremlin-python (or other non-JVM programming languages that support Gremln).
if you are wholly new to TinkerPop and JanusGraph (and maybe graph databases themselves), my recommendation would be to not start with JanusGraph. It's natural to want to dive right in to the graph you want to use, but I think it's better to take a slower approach.
i think the learning process is like:
1. do you know the basics of Gremlin? if not, just try to load a subset of your data into a TinkerGraph using Gremlin Console. run some queries over your data and just have an easy success doing that.
2. do you know Gremlin Server? if not, install it locally, connect with Gremlin Console to the default TinkerGraph. maybe configure that TinkerGraph to load the data you established in step (1) above.
3. do you know gremlin-python? if not, make sure you can easily connect to what you set up in step (2) above. write some queries, load some more data, etc.
If you can work all the way through step 3 comfortably then you are likely ready for JanusGraph. I would again suggest starting slowly there and focus on using BerkelyDB as your backend (unless you are already expert at cassandra/hbase). Get that running inside of JanusServer, which is just a specially packaged Gremlin Server, hence step (2).
along the way, folks here are happy to try to help you but i think we'll need some more specifics about what you've tried so far and where you feel you're stuck.
Hi @spmallette , thank you for this detailed answer, I appreciate it. So basically, you cannot create a graph using gremlin-python, you can only use gremlin-python to read/write an already existing graph?
So far in terms of how I've been learning this, I have been able to complete point 1. where I created a graph in the gremlin console as follows:
However, I am not sure if this is what you meant in point 1 as I really am not loading in any of my own data from a csv or any external file, just hardcoding it. I am not familiar with how to load in data using Gremlin console and I really couldn't find anything on this topic so I have just created this graph with hardcoded data.
for point 2. I am using Docker to run the Gremlin server, I am on Windows btw. The Gremlin console I ran in point 1 would not run unless the docker container running the Gremlin server was running, so I think I have completed this part as the Gremlin console runs only if the Gremlin server is running (so they must be connected).
for point 3. I have researched extensively how to use gremlin-python, however I have not been able to successfully query/mutate an existing graph using gremlin-python, i really am not sure how to access the graph that I made in point 1. I'm finding the documentation for gremlin-python difficult to understand.
I'll take a look into BerkelyDB, is it a graph database that's free to use as well? The main reason why I wanted to use Janusgraph is because it is free and I need to create a knowledge graph in Python
Thanks for your time!
correct...gremlin-python, gremlin-go, gremlin-javascript, etc. (i.e. any non-JVM language) cannot create a graph instance. note further that your example for gremlin co sole will not work for gremlin-python, because you are not using the Gremlin langiage there to add vertices and edges. you are using what we call the Graph API and something we dont recommend folks use. you should only use Gremlin when interacting with the graph. so, you would instead refer to "g" and do
g.addV('person').property('name','mary')
to create your vertex. note than in python, we like you to feel comfortable with pythonic syntax so addV()
becomes add_v
but other than that its basically the same.
as for BerkeleyDB, it is free, but its not a graph database. janusgraph simply offers it as a backend within which it stores your data. with JanusGraph you must choose your backend. I think most folks ultimately use Cassandra but thats a whole additional layer of complexity and learning if youre not familiar with it. so, start with berkeleydb until you figure out janusgraph as the usual progression
https://docs.janusgraph.org/storage-backend/bdb/I see okay, so in gremlin console I should do
and then in python / gremlim-python, I should do
But I am confused as to how to set up my Python script (what should g be? How is g initialized, what should I import?)
Where could I find more detailed documentation on gremlin-python and its methods? I think I may be looking in the wrong places
Is Cassandra something that can be quickly picked up and installed? If I choose to use BerkelyDB in the meantime as my backend, is there Python support for BerkelyDB or it exclusively for JVM language?
you can find gremlin-python docs here: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python
it discusses how to create "g", common imports, etc
it details differences between archtypal Gremlin and the python variant
but its not going to teach you Gremlin...for that you refer to the rest of the documentation
also, id recommend this free online book to help learning the Gremlin language: https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
Hmm okay, so following this gremlin-python docs in the 1st link, I wrote this in Python:
printing v1 outputs the following:
Is this the intended behaviour? I am assuming if I have the Cassandra backend setup with the JanusGraph, I'll then be able to view this vertex in Cassandra DB
you are missing something important.
a Gremlin traversal must be iterated for it to do something.
https://tinkerpop.apache.org/docs/current/tutorials/the-gremlin-console/#result-iteration
you need a terminal step to make that happen. i your case you would add next() to take the first item from the traversal (ie the new vertex)
typically most times you will use to_list()
you dont usually do this in Gremlin Console because it automatically does that stuff for you
hmmm Okay, so If I understood correctly, I need to do
or
I tried both of these out seperately but both times on this exact line I would get an error saying
regarding your question on cassandra, technically your data is in cassandra but i think its encoded so you wont be able to really see it directly
its like g.V()....next()
the termination step goes on the traversal itself to tell itbhow to iterate
Oh yes I actually typed it wrong here on discord, I meant to write
Which is what I had in my code, and it would yield the following error:
I am not too sure on why this would be occurring , I have seen it a couple of times
is that connecting to JanusGraph? or are you still working with Gremlin Server and TinkerGraph? also, some python examples here: https://github.com/apache/tinkerpop/tree/master/gremlin-python/src/main/python/examples
GitHub
tinkerpop/gremlin-python/src/main/python/examples at master · apach...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
I am using docker to run JanusGraph and I ran a command that is similar to the first command listed on this page: https://docs.janusgraph.org/getting-started/installation/
The exact command I run is:
which is similar to that first command on that page. I needed to modify it because i need to map port 8182 of the JanusGraph container to port 8182 of my machine (the host machine). Without the "-it -p 8182:8182" part I would get the following error when running the python script:
But since I am mapping the port of the container to the port of my machine, I am no longer getting that error, and am now facing this new error mentioned earlier
So I would say that my python script is connecting to JanusGraph as opposed to the Gremlin Server
Thanks for sharing the Python examples, they look very helpful
Is there any way of starting a JanusGraph server in Windows? I cannot run the /bin/janusgraph-server.sh script as I am on Windows, and there is no /bin/janusgraph-server.bat file included in the JanusGraph distribution I downloaded (1.0.0). This is the very reason why I am using Docker instead
So I would say that my python script is connecting to JanusGraph as opposed to the Gremlin Serverjust to get the wording right for clarity, you are running Janus Server in that docker container, and therefore you are connecting to Gremln Server hosting a JanusGraph instance.
Is there any way of starting a JanusGraph server in Windows?maybe someone from @janusgraph can provide some more details on Windows and for your other question on the KeyError you mentioned here: https://discord.com/channels/838910279550238720/1260358985974812743/1260401279520473108 I assume the KeyError has something to do with serialization. perhaps you need this add-on: https://github.com/JanusGraph/janusgraph-python
GitHub
GitHub - JanusGraph/janusgraph-python: JanusGraph Python Gremlin La...
JanusGraph Python Gremlin Language Variant (GLV). Contribute to JanusGraph/janusgraph-python development by creating an account on GitHub.
Yes exactly, I am running a Janus Server in that docker container
Now in JanusGraph 1.0.0. distribution, there is a gremlin-server.bat file which I can run as I am on Windows, upon running it, it says it has connected to port 8182. I run my Python script afterwards
and I get an error outputted by the Python script saying
Now I will try out the add on above involving serialization, but do you think this different error could be due to another reason?
The traversal source [g] for alias [g] is not configured on the serverThis means that an error occurred while starting the server. You need to inspect the server logs to investigate deeper While it's in general possible to run JanusGraph on Windows, it's not really officially supported and also not tested during the development process. So, I'd suggest against doing that and instead to use the Docker container
Hi @Florian Hockmann Thank you for letting me know that Janus doesn't have official Windows support, I have used the Docker container, and in this message I am replying to, I go into detail about the error I would get when using the Docker container, are there any suggestions or ways I could address this issue?
What's the output of
docker logs [container-id-of-the-janusgraph-container]
?it's quite a lot to send here , but the output of my docker logs are equivalent to what gets printed when starting the Janus Graph container itself,
My Python script has been modified to look like this:
What is outputted is the following, still getting similar error , but I am now able to successfully add vertices using the add_v method, create an edge between the two vertices using add_e method, and print out the name of the person that John knows (Mary):
It's just I still get the OSError WinError 87 ,
a minor point but this line:
should probably just be:
the
iterate()
termination step really doesn't have a return value. i suppose that technically it returns the current Traversal
object that would already be iterated to its end and no longer server a purpose, but that' sit. other than that, it looks like your code is working except for that.
maybe you could try closing some resources before exiting?
Wow That fixed it! thank you so much! I really appreciate your help and your super detailed explanations, thank you!!!!! Will post any other questions I have on a new thread if I have
i'm trying to follow along here and keep getting
i am starting the gremlin server locally and then within my python script and creating the romote connection via:
i then try to insert a basic vertex via (where name is provided via user input) resulting in ...super confused.
i then try to insert a basic vertex via (where name is provided via user input) resulting in ...super confused.
i think you should check in on the output of the startup of Gremlin Server. it should show a line like:
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with ...If you don't see that then you might have some kind of error in startup that is preventing your access to "g".
awesome. i’ll look there. thanks 🙏
@spmallette okay -- good thinking because i was just starting the server in the background. when i use gremlin-server.sh console i do get a few errors with java files:
`
AFAICT, the server does ultimately start but i do not see any context 'g' started.
that looks like a problem with java compatibility. what version of java are you using?
i'd prefer 17 if possible.
java --version
gets me :
my .zshrc includes:
i still get the same error. is there a way to check the java dependencies and update those?
update. i grabbed the most recent version 3.7 of server and it loaded without issue -- i see the output for A GraphTraversalSource is now bound to [g]
. now back to the code.
i still seem to get the error with my python code: '{'requestId': '18abc62e-ee3a-471d-8614-0dad21772789', 'status': {'code': 499, 'message': 'The traversal source [g] for alias [g] is not configured on the server.', 'attributes': {}}, 'result': {'meta': {}, 'data': None}}'
i appreciate the help!
maybe how i'm using .next()
poking around when i don't use .next()
i don't get the g alias as g
error.
i'll keep reading through the docs. i think i'm at a "user error" state. 😉
thanks again for all the help thus far. i really appreciate it!you need
next()
(i.e. a terminator step) to trigger the traversal. without it, you wont send the request to the server, so it makes sense that removing next()
doesnt' produce an error, but then it wont produce anything else
i'm not sure what is amiss now if you see that "g" is configured via the logs and dont see any other errors/warnings on server startup. that's a bit strange. it should work under those conditions
an easy thing to try is to test your connection/graph with Gremlin Console and see what happens: https://tinkerpop.apache.org/docs/current/reference/#connecting-via-consoleokay. so i started the gremlin server with the modern locally -- for example the output log for the server shows:
i then opened the console and used
using the following
g.V().values('name')
i get
should i see some sort of log output on the server showing a connected traversal?You need to also do a
:remote console
before sending any queries or prefix each query with :>