Java Community | Help. Code. Learn.•16mo ago

Weird issue while reading XML

im reading xml from this url: https://maven.neoforged.net/releases/net/neoforged/neoform/maven-metadata.xml like this:

XmlParser parser = new XmlParser(u);
                var children = parser.getElement("metadata.versioning.versions").getChildNodes();
                for (int i = 0; i < children.getLength(); i++) {
                    String content = children.item(i).getTextContent();
                    System.out.println("'" + content + "'");
                    versions.add(new Version(content, null, false, artifact, buildNeoURL(content, null, artifact).toString()));
                }

XmlParser parser = new XmlParser(u);
                var children = parser.getElement("metadata.versioning.versions").getChildNodes();
                for (int i = 0; i < children.getLength(); i++) {
                    String content = children.item(i).getTextContent();
                    System.out.println("'" + content + "'");
                    versions.add(new Version(content, null, false, artifact, buildNeoURL(content, null, artifact).toString()));
                }

but its printing the data in very weird way:

'
      '
'23w31a-20230819.124900'

'
      '
'23w31a-20230819.124900'

which is not normal

23 Replies

JavaBot•16mo ago

⌛ This post has been reserved for your question.

Hey @Koblížkáč! Please use /close or the Close Post button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.

TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.

KoblížkáčOP•16mo ago

this is the custom getelement method:

    public Element getElement(String path) {
        var p = path.split("\\.");
        Element e = (Element) document.getElementsByTagName(p[0]).item(0);
        for (int i = 1; i < p.length; i++) {
            e = (Element) e.getElementsByTagName(p[i]).item(0);
        }
        return e;
    }

    public Element getElement(String path) {
        var p = path.split("\\.");
        Element e = (Element) document.getElementsByTagName(p[0]).item(0);
        for (int i = 1; i < p.length; i++) {
            e = (Element) e.getElementsByTagName(p[i]).item(0);
        }
        return e;
    }

Kyo-chan•16mo ago

I don't know what XML API you're using. But in XML typical standards, getChildNodes() will also give out the text nodes that make up the whitespace. You'd be better off querying the right element name rather than everything that's a child.

KoblížkáčOP•16mo ago

im using the classic dom api

Kyo-chan•16mo ago

XmlParser? Anyway, the DOM API shipped with Java does indeed conform with W3C's general DOM API recommandations, and will indeed give out the whitespace text nodes when calling getChildNodes()

KoblížkáčOP•16mo ago

XmlParser is some sort of wrapper class for the whole getting the xml from URL anyways i find it funny that i works for 1 repository but doesnt for second one for example: https://maven.minecraftforge.net/de/oceanlabs/mcp/mcp_stable/maven-metadata.xml

Kyo-chan•16mo ago

Maybe the second one doesn't have whitespace?

KoblížkáčOP•16mo ago

these metadata are fetched normally

Kyo-chan•16mo ago

The URL you just showed doesn't have whitespace and wouldn't have the problem described I can't vouch for other URLs

KoblížkáčOP•16mo ago

how did you find out?

Kyo-chan•16mo ago

....... I opened it

KoblížkáčOP•16mo ago

well i dont really know what you mean by whitespaces where are them the files seem identical to me

Kyo-chan•16mo ago

Whitespace is the blank characters that make it so a text displays like this:

<a>
  <b>stuff</b>
</a>

<a>
  <b>stuff</b>
</a>

rather than this:

<a><b>stuff</b></a>

<a><b>stuff</b></a>

It is performed with newlines, spaces and tabs

KoblížkáčOP•16mo ago

doesnt it look the same?

Kyo-chan•16mo ago

They're both pretty-displayed by your browser. You need to ask your browser to show the actual source

KoblížkáčOP•16mo ago

i see now!

KoblížkáčOP•16mo ago

so do you think i can safely just purge all the ' ' characters?

Kyo-chan•16mo ago

While it's not guaranteed in XML, in this case I'd definitely assume you can However, it would be better style to be able to make do when the whitespace is here

KoblížkáčOP•16mo ago

doesnt seem like it would touch the data in any way does the dom has anything for it or should i use different parser?

Kyo-chan•16mo ago

Like I said, you should ask for the child you want by its actual name, rather than call getChildNodes()

KoblížkáčOP•16mo ago

i mean yeah that would make sense seems to be working now, big thanks!

JavaBot•16mo ago

If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.

JavaBot•16mo ago

Post Closed

This post has been closed by <@466650807996252162>.

Gaming

Programming

Weird issue while reading XML

Did you find this page helpful?