Using Spark inside Gremlin-Server

I am trying to invoke spark in Java inside a call step inside my graph implementation which is loaded in Gremlin-Server but I am running into a difficult error. Outside of gremlin-server everything works properly. I was wondering if anyone has experienced this or if there is anything that gremlin-server does with the class loader that might affect this? The stack trace is attached
8 Replies
Lyndon
LyndonOP2y ago
It seems to be a problem with class loading that I do not get outside of gremlin-server
spmallette
spmallette2y ago
not much of an error i guess some static initializer is failing? i suppose the question is, what is trying to initialize in that moment?
Lyndon
LyndonOP2y ago
Sorry i was out of town for the canadian long weekend. I was trying to invoke spark to do a read() on a csv. If you see at the middle it of the stack trace it kind of implies that in scala it has some class loader issue
Caused by: scala.reflect.internal.MissingRequirementError: class scala.Array in JavaMirror with jdk.internal.loader.ClassLoaders$AppClassLoader@55054057 of type class jdk.internal.loader.ClassLoaders$AppClassLoader with classpath [<unknown>] and parent being jdk.internal.loader.ClassLoaders$PlatformClassLoader@6bd841ce of type class jdk.internal.loader.ClassLoaders$PlatformClassLoader with classpath [<unknown>] and parent being primordial classloader with boot classpath [<unknown>] not found.
Caused by: scala.reflect.internal.MissingRequirementError: class scala.Array in JavaMirror with jdk.internal.loader.ClassLoaders$AppClassLoader@55054057 of type class jdk.internal.loader.ClassLoaders$AppClassLoader with classpath [<unknown>] and parent being jdk.internal.loader.ClassLoaders$PlatformClassLoader@6bd841ce of type class jdk.internal.loader.ClassLoaders$PlatformClassLoader with classpath [<unknown>] and parent being primordial classloader with boot classpath [<unknown>] not found.
Seems to me like the classloader is failing to find stuff but this only happens in gremlin-server which I thought might be doign something fancy to that
spmallette
spmallette2y ago
not sure if it's relevant or not, but there are some manifest entries to spark-gremlin that i don't think have been looked at in a while. maybe upgrading to newer spark versions broke something? https://github.com/apache/tinkerpop/blob/3.6.4/spark-gremlin/pom.xml#L325-L344e
GitHub
tinkerpop/pom.xml at 3.6.4 · apache/tinkerpop
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
Lyndon
LyndonOP2y ago
I will have a look at this, any lead is useful right now, thanks Stephen
spmallette
spmallette2y ago
you can see how those manifest entries are used here in the DependencyGrabber that gremlin-server.sh install invokes: https://github.com/apache/tinkerpop/blob/3.6.4/gremlin-groovy/src/main/groovy/org/apache/tinkerpop/gremlin/groovy/util/DependencyGrabber.groovy
GitHub
tinkerpop/DependencyGrabber.groovy at 3.6.4 · apache/tinkerpop
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
Lyndon
LyndonOP2y ago
So I kind of figured it out. If I go run
mvn dependency:build-classpath
mvn dependency:build-classpath
Take the output and add it to gremlin-server.sh as
CP="$CP":"<output>"
CP="$CP":"<output>"
it works. Not sure on a general solution for this yet but that is a start
spmallette
spmallette2y ago
oh - well, i think you still have to do the standard steps that you would take if you were using the Gremlin Console: https://tinkerpop.apache.org/docs/current/reference/#hadoop-gremlin so setting up stuff like HADOOP_GREMLIN_LIBS and the like

Did you find this page helpful?