Java Community | Help. Code. Learn.•9mo ago

Capture download data directly from browser to give to Apache POI workbook?

Hi, I want to figure out how I "capture"/"get" the bytes of the inputstream when you press on a download link in the browser. Reason is that for my work, we have an internal tool that generates excel files, and I want to test that the generated file is an excel file. However, since we use docker etc, its easier to just capture the bytes/stream of data that comes when pressing the download link and give that to the following method: Workbook workbook = WorkbookFactory.create(new ByteArrayInputStream(response.contentAsByteArray)) The above method is Apache POI workbook, it will create an excel file and if the data is not excel file, then it will throw an exception. TL;DR: How do I get the response.contentAsByteArray from a link? With selenium its easy to click on the link, but I don't know how to pass that along to the workbook?

13 Replies

JavaBot•9mo ago

⌛ This post has been reserved for your question.

Hey @Steadhaven! Please use /close or the Close Post button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

straightface•9mo ago

what error do you get?

SteadhavenOP•9mo ago

I didn't get further. The workbook line is from some other part of our code base. My main confusion is how to get the stream of data that comes when you click on the download link. I want to pass that response to the workbook, and see if it accepts it as an excel file. I don't care about the content further than that, except if its a proper excel file or not. I don't have access to the file system (that would be easy, just give workbook the file) since its like a docker container which runs both locally and in git/jenkins CI/CD. Maybe HTTPBuilder from Groovy is a solution?

straightface•9mo ago

this can probably be done in java, give me a sec i will try and see

SteadhavenOP•9mo ago

will be great I have been stuck with this work task for too long pretty hard to understand I/O, especially when its not local files on my own system like how would one capture/"intercept" the data from the download link

straightface•9mo ago

should be just simple get call

SteadhavenOP•9mo ago

@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7.1')
@Grab(group='org.apache.poi', module='poi-ooxml', version='5.0.0')

import groovyx.net.http.RESTClient
import org.apache.poi.ss.usermodel.Workbook
import org.apache.poi.ss.usermodel.WorkbookFactory

import java.io.ByteArrayInputStream

// Import Selenium dependencies
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver

// Set the path to the chromedriver executable
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver")

// Initialize Selenium WebDriver
WebDriver driver = new ChromeDriver()

try {
    // Navigate to the web page
    driver.get("http://your-internal-tool-url")

    // Find the download link element
    WebElement downloadLink = driver.findElement(By.id("your-download-link-id"))

    // Get the URL of the download link
    String fileURL = downloadLink.getAttribute("href")

    // Initialize HTTPBuilder with the file URL
    def client = new RESTClient(fileURL)

    // Perform a GET request to download the file
    def response = client.get([:])

    // Check if the response status is OK (HTTP 200)
    if (response.status == 200) {
        // Extract the bytes from the response data
        byte[] fileBytes = response.data.bytes

        // Use Apache POI to verify the file
        try (ByteArrayInputStream bais = new ByteArrayInputStream(fileBytes)) {
            // Create a Workbook from the byte array input stream
            Workbook workbook = WorkbookFactory.create(bais)
            println("The downloaded file is a valid Excel file.")
        } catch (Exception e) {
            println("The downloaded file is not a valid Excel file.")
        }
    } else {
      //...
    }
} catch (Exception e) {
   //...
} finally {
    //...
}

@Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.7.1')
@Grab(group='org.apache.poi', module='poi-ooxml', version='5.0.0')

import groovyx.net.http.RESTClient
import org.apache.poi.ss.usermodel.Workbook
import org.apache.poi.ss.usermodel.WorkbookFactory

import java.io.ByteArrayInputStream

// Import Selenium dependencies
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.chrome.ChromeDriver

// Set the path to the chromedriver executable
System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver")

// Initialize Selenium WebDriver
WebDriver driver = new ChromeDriver()

try {
    // Navigate to the web page
    driver.get("http://your-internal-tool-url")

    // Find the download link element
    WebElement downloadLink = driver.findElement(By.id("your-download-link-id"))

    // Get the URL of the download link
    String fileURL = downloadLink.getAttribute("href")

    // Initialize HTTPBuilder with the file URL
    def client = new RESTClient(fileURL)

    // Perform a GET request to download the file
    def response = client.get([:])

    // Check if the response status is OK (HTTP 200)
    if (response.status == 200) {
        // Extract the bytes from the response data
        byte[] fileBytes = response.data.bytes

        // Use Apache POI to verify the file
        try (ByteArrayInputStream bais = new ByteArrayInputStream(fileBytes)) {
            // Create a Workbook from the byte array input stream
            Workbook workbook = WorkbookFactory.create(bais)
            println("The downloaded file is a valid Excel file.")
        } catch (Exception e) {
            println("The downloaded file is not a valid Excel file.")
        }
    } else {
      //...
    }
} catch (Exception e) {
   //...
} finally {
    //...
}

This was chatgpts try with groovys RESTClient

straightface•9mo ago

HttpClient clieint = HttpClient.newBuilder().build();

        HttpRequest request = HttpRequest.newBuilder(URI.create("url_too_xlsx"))
                .GET().build();

        HttpResponse<InputStream> send = clieint.send(request, HttpResponse.BodyHandlers.ofInputStream());

        Workbook book = WorkbookFactory.create(send.body());
        System.out.println(book);

HttpClient clieint = HttpClient.newBuilder().build();

        HttpRequest request = HttpRequest.newBuilder(URI.create("url_too_xlsx"))
                .GET().build();

        HttpResponse<InputStream> send = clieint.send(request, HttpResponse.BodyHandlers.ofInputStream());

        Workbook book = WorkbookFactory.create(send.body());
        System.out.println(book);

this works

SteadhavenOP•9mo ago

oh nice, so the httprequest only needs the link to the download? e.g. say I want to download the small file of the test file website: https://www.thinkbroadband.com/download Would I then just pass it the appropriate url: http://ipv4.download.thinkbroadband.com/20MB.zip

straightface•9mo ago

yes

SteadhavenOP•9mo ago

thanks so much, I was stuck with this for weeks

JavaBot•9mo ago

If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.

JavaBot•9mo ago

Post Closed

This post has been closed by <@305362010374406144>.

Gaming

Programming

Capture download data directly from browser to give to Apache POI workbook?

Did you find this page helpful?