ManabuBeach
ATApache TinkerPop
•Created by ManabuBeach on 8/6/2024 in #questions
## Breadth-First Traversal Fact Check
Using the Neptune Query Profiling, I have found out that Gremlin queries seems to use depth first strategy to search things and as a result it tends to be both time and resource intensive especially when what I am looking for is a node just a 1 or 2 levels below.
To do a Breadth-First Traversal the following approach has been suggested, but not sure if this really does the trick.
If my goal is to find nearest nodes quickly, what could be efficient approaches?
4 replies
ATApache TinkerPop
•Created by ManabuBeach on 11/13/2023 in #questions
Gremlin (with Python + Neptune) Out of Memory Error with .toList()[0], .next() Fixes It. But Why?
This is not really a question, but more of a discussions on how internally this would cause an out of memory error.
I have the following construct and last night, it resulted in out of memory error.
*
path_v
is a single well defined V. I am expecting it is returning a single V.
* All values in valueMap() I wanted were single cardinality string values.
* I know that it is obvious that toList() will not be efficient if I am only interested in one value, however, this only should return a single record set, so even though it isn't efficient, it should not blow up the memory.
The change to the following seems to fix this. Obviously this will be asking for a single value from a cursor
type interaction
Very intriguing...3 replies
ATApache TinkerPop
•Created by ManabuBeach on 11/1/2023 in #questions
Cryptic Neptune Gremlin Error Rate Creeping - What Would You Recommend?
This relates more to do with Neptune usage, nevertheless, it is also related to the Gremlin Query error rate that logs in to the Monitor plot page and also triggers cloud watch alert in our case.
Situation:
We've noticed a gradual increase in unexplained Gremlin error counts — with a few popping up every several hours.
Actions Taken:
We attempted to pinpoint the cause by checking for gremlin error exceptions in our internal logs. However, no related errors were detected around the same time frames when the cloud watch log indicated issues in the Neptune Gremlin Audit logs. While we acknowledge occasional concurrency problems in our system, the timing of these doesn't align with the reported Gremlin count.
Concern:
Our primary concern stems from the absence of a "traditional event log" on the serverless instance of the Neptune server we use. This makes it challenging for us to correlate potential causes behind these logs. We're left wondering whether these discrepancies might be due to some oversight on our end.
Request for Guidance:
Is reaching out to AWS support the best course of action regarding this issue? Or should we consider it a minor hiccup and proceed as usual? Any insights or suggestions would be greatly appreciated.
P.S: Last night I have rebooted the entire cluster and it appears the issue subsided. Running the same level of production access today. It just logged another 1 error increase.
3 replies
ATApache TinkerPop
•Created by ManabuBeach on 9/22/2023 in #questions
project("p").by(__.values("a", "b") Only Outputs Single Property, Bug or Expected?
I am curious why this does not behave in the way I expected. Not a problem - solution question.
I created the following Gremlin:
g.E().
hasLabel("hiddenFrom").
inV().
hasLabel("Person").
project("Trial", "Site", "Subject ID").
by(__.out().out().values("tag")).
by(__.out().values("tag")).
by(__.values("localId", "uuid"))
And in above scenario the Subject ID
output only outputs the localId
property but not uuid
property. When I change by(__.values("localId", "uuid"))
to by(__.valueMap("localId", "uuid"))
I can get the uuid
property value in the output.
A bit surprising that it did not work the way I expected but If it is an expected behavior, then I will remember so from now on. Curious if that is the case why it is designed that way.5 replies
ATApache TinkerPop
•Created by ManabuBeach on 8/3/2023 in #questions
The Cascading Coalescing - Create a V then Create an E in One Shot
I have been struggling with this and perhaps I can ask some expert on how to approach this type of issue.
What I want to do:
1. Find a V
2. If V isn't found create a V. If V exists, move on to 3.
3. Update some properties regardless of new or existing.
4. Find an outgoing edge out of the V and if not found create one
5. Update some edge properties regardless of new or existing edge was found.
6. Bonus - return the originally found or created V
Why Do I Want This?
I have a highly concurrent image processing where tons of images their records get generated from concurrent external functions, this hits the Neptune quite hard and I run into some concurrent update issues, resulting in rare but dangling or duplicated Vs.
Here is My Example Of How I plan to do Double Coalesce
I am sure the use of
select
is wrong, but how can I get Gremlin to remember the V that I have just found or created during the second coalesce
step?
g.V().
has("Test", "name", "test7").
fold().
coalesce(__.unfold(), __.addV("Test")).as("x").
property("name", "test7").
out().hasId("dac4de07-1371-a1f7-7409-ad28d75069a5").fold().
coalesce(__.unfold(), __.select("x").addE("Link").to(__.V("dac4de07-1371-a1f7-7409-ad28d75069a5"))).toList()
15 replies
ATApache TinkerPop
•Created by ManabuBeach on 6/26/2023 in #questions
Gremlin Python 3.4.13 - Exception Ignored Message When Existing A Python Main
What Happens
This happens in Gremlin Python 3.4.13
1. Open
client.close() # Let the underlying cleanup for a bit. for in range(5): sleep(0.25) ` Seems to avoid the error.
self.connection = DriverRemoteConnection(url, 'g')
2. Run some Gremlin Queries
2. Close self.connection.close()
4. Exit the Main immediately after the close().
This is what I see thrown on the console.
After
Exception ignored in: <function _ProactorBasePipeTransport.del at 0x000001CB0D9232E0>
Traceback (most recent call last):
File "C:\Python310\lib\asyncio\proactor_events.py", line 116, in del
self.close()
File "C:\Python310\lib\asyncio\proactor_events.py", line 108, in close
self._loop.call_soon(self._call_connection_lost, None)
File "C:\Python310\lib\asyncio\base_events.py", line 750, in call_soon
self._check_closed()
File "C:\Python310\lib\asyncio\base_events.py", line 515, in _checkclosed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
**What I Was Expecting**
Because I closed it, I would expect no above error should be raised.
May be I am doing this wrong, or there may need to be additional config or set up that I am missing.
** Workaround **
client.close() # Let the underlying cleanup for a bit. for in range(5): sleep(0.25) ` Seems to avoid the error.
22 replies
ATApache TinkerPop
•Created by ManabuBeach on 6/22/2023 in #questions
Solved: Gremlin Python Exceptions with .property("timeStamp", 0)
The Issue:
In the Python code below:
def create_edge(self, from_v: Vertex, to_v: Vertex, edge_label: str, related_as: str,
updated_by_uuid: str = INebulaGraph.uuid_system) -> Edge:
uuid = DateAndUuidUtils.uuid_generate()
now = DateAndUuidUtils.datetime_of_now()
updated_on = DateAndUuidUtils.java_epoch_of(now)
updated_on_str = DateAndUuidUtils.datetime_to_iso(now)
created_edge: Edge = self.g.add_e(edge_label). \
from_(from_v).to(to_v). \
property("name", ""). \
property("updatedOn", updatedOn). \
property("updatedOnStr", updated_on_str). \
property("relatedAs", related_as). \
property("updatedBy", updated_by_uuid). \
property("uuid", uuid). \
next()
return created_edge
Where my DateAndUuidUtils.datetime_of_now()
returns a python (int) of the Java Epoch value via datatime(). When above code is executed it exceptions by the underlying code saying that the value is too big and use gremlin bigint.
Solved:
Using statics.long()
has fixed this issue.
from gremlin_python.statics import long
Change property("updatedOn", updatedOn)
to property("updatedOn", long(updatedOn))
Additional Note:
* A double value goes in without any problem. It is only python int() into Gremlin.
* I would hope the error message can be a bit more friendly as to what I can do.
* The compare code in graphbinaryV1.py:dictify looks buggy, it's comparing the range much bigger than the long value.
if obj < -9223372036854775808 or obj > 9223372036854775807:
obj is much smaller in my case than the range it compares.8 replies
ATApache TinkerPop
•Created by ManabuBeach on 5/2/2023 in #questions
Finding Out Looped Graphs
I am trying to identify situations where I have inadvertently created edges that connect to the same vertex, resulting in F -> F and also possibly G -> F -> G scenarios. What are effective Gremlin techniques for detecting and resolving these instances?
5 replies
ATApache TinkerPop
•Created by ManabuBeach on 3/14/2023 in #questions
How can I find property with a certain data type?
I have a situation where the same property has different type under the same label, kind of like the following:
g.addV("Cookies").property(single, "howMany", "10").next()
g.addV("Cookies").property(single, "howMany", 42).next()
I am hoping to query for Cookies that only has "howMany" property that is in String, for example. We accidentally corrupted our DB with mixed types and need to find which ones are corrupted. What I am hoping to see is something like.
g.V().hasLabel("Cookies").proporties("howMany").ofType(String)
6 replies