java/scala multithreading on spark?
I'm using scala spark to get data from hudi table to hive table. I'm already maximizing optimization on spark with worker nodes and cluster size. Table is missing a lot of features that could improve the speed, but it's managed by other team and many other teams are also accessing this table (Thus table side change cannot happen). What I've not tried is multithreading from scala.
Has any of you used java and spark, and applied java multithreading and saw noticeable improvement from data ETL duration?
1 Reply
⌛
This post has been reserved for your question.
Hey @VaygrEmpire! Please useTIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here./close
or theClose Post
button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically marked as dormant after 300 minutes of inactivity.
💤
Post marked as dormant
This post has been inactive for over 300 minutes, thus, it has been archived.
If your question was not answered yet, feel free to re-open this post or create a new one.
In case your post is not getting any attention, you can try to use /help ping
.
Warning: abusing this will result in moderative actions taken against you.