VaygrEmpire
VaygrEmpire
JCHJava Community | Help. Code. Learn.
Created by VaygrEmpire on 1/20/2025 in #java-help
java/scala multithreading on spark?
I'm using scala spark to get data from hudi table to hive table. I'm already maximizing optimization on spark with worker nodes and cluster size. Table is missing a lot of features that could improve the speed, but it's managed by other team and many other teams are also accessing this table (Thus table side change cannot happen). What I've not tried is multithreading from scala. Has any of you used java and spark, and applied java multithreading and saw noticeable improvement from data ETL duration?
4 replies
JCHJava Community | Help. Code. Learn.
Created by VaygrEmpire on 1/8/2025 in #java-help
sonarqube for scala
Anyone used sonarqube for Java and Scala project? Entering scoverage.reportPaths like below works:
scoverage.reportPaths=../target/scoverage.xml, ../target/scoverage.xml, etc
scoverage.reportPaths=../target/scoverage.xml, ../target/scoverage.xml, etc
but when I dynamically generate these paths in shell script, then write them in sonarqube.properties, code coverage won't show up for scala modules. I can see code coverage is being generated properly through looper build output. But when sonarqube scanner is running, it doesn't work. Has anyone encountered this issue before? Scala version is 2.12.5 and scoverage plugin is 1.4.0.
5 replies
JCHJava Community | Help. Code. Learn.
Created by VaygrEmpire on 1/7/2025 in #java-help
send message to slack without token
Anyone know how to send a message to slack without token anywhere in the code? Slack token looks like this:
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
I'm not allowed to have
T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
anywhere in the code or in json file. I know similar function can be achieved in python via the use of BaseHook and getting password from slack conn id. Is there way to send a message to slack in Java as well?
5 replies
JCHJava Community | Help. Code. Learn.
Created by VaygrEmpire on 6/8/2023 in #java-help
is it possible to extract file from website in spark submit command only?
as title says, is it possible to extract file from website in spark submit command, without putting any code related to web-scraping or so in spark driver program? you cannot define anything in spark context or spark session as well.
4 replies
JCHJava Community | Help. Code. Learn.
Created by VaygrEmpire on 6/2/2023 in #java-help
question about spark submit
I've been working on PoC to find a way to test spark submit with empty jar but with proper class file in CLI. I'm supposed to extract data from some other website. Is this even possible thing to do with empty jar? (empty as in nothing in the jar. All you do is
touch test.jar
touch test.jar
and that's it) Everywhere I look I need a jar that's functioning, that includes dependencies for class file. With empty jar. is there way to bypass all those class not found exceptions? Or is this impossible to do?
17 replies
JCHJava Community | Help. Code. Learn.
Created by VaygrEmpire on 4/3/2023 in #java-help
question about multithreading
Hello. I'm using multithreading with spark and have a question. I'm using Scala but shouldn't matter.
// Java code for thread creation by implementing
// the Runnable Interface
class MultithreadingDemo implements Runnable {
public void run()
{
try {
// Displaying the thread that is running
Foo(dataframe1)
Foo(dataframe2)
Foo(dataframe3)
}
catch (Exception e) {
// Throwing an exception
System.out.println("Exception is caught");
}
}
}

// Main Class
class Multithread {
public static void main(String[] args)
{
int n = 3; // Number of threads
for (int i = 0; i < n; i++) {
Thread object
= new Thread(new MultithreadingDemo());
object.start();
}
}
}
// Java code for thread creation by implementing
// the Runnable Interface
class MultithreadingDemo implements Runnable {
public void run()
{
try {
// Displaying the thread that is running
Foo(dataframe1)
Foo(dataframe2)
Foo(dataframe3)
}
catch (Exception e) {
// Throwing an exception
System.out.println("Exception is caught");
}
}
}

// Main Class
class Multithread {
public static void main(String[] args)
{
int n = 3; // Number of threads
for (int i = 0; i < n; i++) {
Thread object
= new Thread(new MultithreadingDemo());
object.start();
}
}
}
above is simple multithreading from geeksforgeeks.org. I just changed n to 3 because that's what i'm using in Scala and put custom function in run() as an example. I was working with spark and saw unexpected behavior. I have a function called Foo that takes in dataframe and return dataframe. Inside Foo, modifications and optimizations are applied to dataframe. when I run the code I thought I would be assigning spark job to each thread, therefore I would be able to run 3 jobs that require dataframes at the same time. But I just tried to print something inside that Foo and it printed out 7 times. I was expecting 3 times. What's causing it to print out 7 times rather than 3 times? Am I misunderstanding something about multithreading?
5 replies