Capture download data directly from browser to give to Apache POI workbook?

Hi, I want to figure out how I "capture"/"get" the bytes of the inputstream when you press on a download link in the browser. Reason is that for my work, we have an internal tool that generates excel files, and I want to test that the generated file is an excel file. However, since we use docker etc, its easier to just capture the bytes/stream of data that comes when pressing the download link and give that to the following method: Workbook workbook = WorkbookFactory.create(new ByteArrayInputStream(response.contentAsByteArray)) The above method is Apache POI workbook, it will create an excel file and if the data is not excel file, then it will throw an exception. TL;DR: How do I get the response.contentAsByteArray from a link? With selenium its easy to click on the link, but I don't know how to pass that along to the workbook?
13 Replies
JavaBot
JavaBot6mo ago
This post has been reserved for your question.
Hey @Steadhaven! Please use /close or the Close Post button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.
TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.
straightface
straightface6mo ago
what error do you get?
Steadhaven
SteadhavenOP6mo ago
I didn't get further. The workbook line is from some other part of our code base. My main confusion is how to get the stream of data that comes when you click on the download link. I want to pass that response to the workbook, and see if it accepts it as an excel file. I don't care about the content further than that, except if its a proper excel file or not. I don't have access to the file system (that would be easy, just give workbook the file) since its like a docker container which runs both locally and in git/jenkins CI/CD. Maybe HTTPBuilder from Groovy is a solution?
straightface
straightface6mo ago
this can probably be done in java, give me a sec i will try and see
Steadhaven
SteadhavenOP6mo ago
will be great I have been stuck with this work task for too long pretty hard to understand I/O, especially when its not local files on my own system like how would one capture/"intercept" the data from the download link
straightface
straightface6mo ago
should be just simple get call
Steadhaven
SteadhavenOP6mo ago
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7.1')
@Grab(group='org.apache.poi', module='poi-ooxml', version='5.0.0')

import groovyx.net.http.RESTClient
import org.apache.poi.ss.usermodel.Workbook
import org.apache.poi.ss.usermodel.WorkbookFactory

import java.io.ByteArrayInputStream

// Import Selenium dependencies
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver

// Set the path to the chromedriver executable
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver")

// Initialize Selenium WebDriver
WebDriver driver = new ChromeDriver()

try {
// Navigate to the web page
driver.get("http://your-internal-tool-url")

// Find the download link element
WebElement downloadLink = driver.findElement(By.id("your-download-link-id"))

// Get the URL of the download link
String fileURL = downloadLink.getAttribute("href")

// Initialize HTTPBuilder with the file URL
def client = new RESTClient(fileURL)

// Perform a GET request to download the file
def response = client.get([:])

// Check if the response status is OK (HTTP 200)
if (response.status == 200) {
// Extract the bytes from the response data
byte[] fileBytes = response.data.bytes

// Use Apache POI to verify the file
try (ByteArrayInputStream bais = new ByteArrayInputStream(fileBytes)) {
// Create a Workbook from the byte array input stream
Workbook workbook = WorkbookFactory.create(bais)
println("The downloaded file is a valid Excel file.")
} catch (Exception e) {
println("The downloaded file is not a valid Excel file.")
}
} else {
//...
}
} catch (Exception e) {
//...
} finally {
//...
}
@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7.1')
@Grab(group='org.apache.poi', module='poi-ooxml', version='5.0.0')

import groovyx.net.http.RESTClient
import org.apache.poi.ss.usermodel.Workbook
import org.apache.poi.ss.usermodel.WorkbookFactory

import java.io.ByteArrayInputStream

// Import Selenium dependencies
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver

// Set the path to the chromedriver executable
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver")

// Initialize Selenium WebDriver
WebDriver driver = new ChromeDriver()

try {
// Navigate to the web page
driver.get("http://your-internal-tool-url")

// Find the download link element
WebElement downloadLink = driver.findElement(By.id("your-download-link-id"))

// Get the URL of the download link
String fileURL = downloadLink.getAttribute("href")

// Initialize HTTPBuilder with the file URL
def client = new RESTClient(fileURL)

// Perform a GET request to download the file
def response = client.get([:])

// Check if the response status is OK (HTTP 200)
if (response.status == 200) {
// Extract the bytes from the response data
byte[] fileBytes = response.data.bytes

// Use Apache POI to verify the file
try (ByteArrayInputStream bais = new ByteArrayInputStream(fileBytes)) {
// Create a Workbook from the byte array input stream
Workbook workbook = WorkbookFactory.create(bais)
println("The downloaded file is a valid Excel file.")
} catch (Exception e) {
println("The downloaded file is not a valid Excel file.")
}
} else {
//...
}
} catch (Exception e) {
//...
} finally {
//...
}
This was chatgpts try with groovys RESTClient
straightface
straightface6mo ago
HttpClient clieint = HttpClient.newBuilder().build();

HttpRequest request = HttpRequest.newBuilder(URI.create("url_too_xlsx"))
.GET().build();

HttpResponse<InputStream> send = clieint.send(request, HttpResponse.BodyHandlers.ofInputStream());

Workbook book = WorkbookFactory.create(send.body());
System.out.println(book);
HttpClient clieint = HttpClient.newBuilder().build();

HttpRequest request = HttpRequest.newBuilder(URI.create("url_too_xlsx"))
.GET().build();

HttpResponse<InputStream> send = clieint.send(request, HttpResponse.BodyHandlers.ofInputStream());

Workbook book = WorkbookFactory.create(send.body());
System.out.println(book);
this works
Steadhaven
SteadhavenOP6mo ago
oh nice, so the httprequest only needs the link to the download? e.g. say I want to download the small file of the test file website: https://www.thinkbroadband.com/download Would I then just pass it the appropriate url: http://ipv4.download.thinkbroadband.com/20MB.zip
No description
straightface
straightface6mo ago
yes
Steadhaven
SteadhavenOP6mo ago
thanks so much, I was stuck with this for weeks
JavaBot
JavaBot6mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
JavaBot
JavaBot6mo ago
Post Closed
This post has been closed by <@305362010374406144>.

Did you find this page helpful?