Weird issue while reading XML

im reading xml from this url: https://maven.neoforged.net/releases/net/neoforged/neoform/maven-metadata.xml like this:
XmlParser parser = new XmlParser(u);
var children = parser.getElement("metadata.versioning.versions").getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
String content = children.item(i).getTextContent();
System.out.println("'" + content + "'");
versions.add(new Version(content, null, false, artifact, buildNeoURL(content, null, artifact).toString()));
}
XmlParser parser = new XmlParser(u);
var children = parser.getElement("metadata.versioning.versions").getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
String content = children.item(i).getTextContent();
System.out.println("'" + content + "'");
versions.add(new Version(content, null, false, artifact, buildNeoURL(content, null, artifact).toString()));
}
but its printing the data in very weird way:
'
'
'23w31a-20230819.124900'
'
'
'23w31a-20230819.124900'
which is not normal
23 Replies
JavaBot
JavaBot13mo ago
This post has been reserved for your question.
Hey @Koblížkáč! Please use /close or the Close Post button above when your problem is solved. Please remember to follow the help guidelines. This post will be automatically closed after 300 minutes of inactivity.
TIP: Narrow down your issue to simple and precise questions to maximize the chance that others will reply in here.
Koblížkáč
KoblížkáčOP13mo ago
this is the custom getelement method:
public Element getElement(String path) {
var p = path.split("\\.");
Element e = (Element) document.getElementsByTagName(p[0]).item(0);
for (int i = 1; i < p.length; i++) {
e = (Element) e.getElementsByTagName(p[i]).item(0);
}
return e;
}
public Element getElement(String path) {
var p = path.split("\\.");
Element e = (Element) document.getElementsByTagName(p[0]).item(0);
for (int i = 1; i < p.length; i++) {
e = (Element) e.getElementsByTagName(p[i]).item(0);
}
return e;
}
Kyo-chan
Kyo-chan13mo ago
I don't know what XML API you're using. But in XML typical standards, getChildNodes() will also give out the text nodes that make up the whitespace. You'd be better off querying the right element name rather than everything that's a child.
Koblížkáč
KoblížkáčOP13mo ago
im using the classic dom api
Kyo-chan
Kyo-chan13mo ago
XmlParser? Anyway, the DOM API shipped with Java does indeed conform with W3C's general DOM API recommandations, and will indeed give out the whitespace text nodes when calling getChildNodes()
Koblížkáč
KoblížkáčOP13mo ago
XmlParser is some sort of wrapper class for the whole getting the xml from URL anyways i find it funny that i works for 1 repository but doesnt for second one for example: https://maven.minecraftforge.net/de/oceanlabs/mcp/mcp_stable/maven-metadata.xml
Kyo-chan
Kyo-chan13mo ago
Maybe the second one doesn't have whitespace?
Koblížkáč
KoblížkáčOP13mo ago
these metadata are fetched normally
Kyo-chan
Kyo-chan13mo ago
The URL you just showed doesn't have whitespace and wouldn't have the problem described I can't vouch for other URLs
Koblížkáč
KoblížkáčOP13mo ago
how did you find out?
Kyo-chan
Kyo-chan13mo ago
....... I opened it
Koblížkáč
KoblížkáčOP13mo ago
well i dont really know what you mean by whitespaces where are them the files seem identical to me
Kyo-chan
Kyo-chan13mo ago
Whitespace is the blank characters that make it so a text displays like this:
<a>
<b>stuff</b>
</a>
<a>
<b>stuff</b>
</a>
rather than this:
<a><b>stuff</b></a>
<a><b>stuff</b></a>
It is performed with newlines, spaces and tabs
Koblížkáč
KoblížkáčOP13mo ago
doesnt it look the same?
No description
No description
Kyo-chan
Kyo-chan13mo ago
They're both pretty-displayed by your browser. You need to ask your browser to show the actual source
Koblížkáč
KoblížkáčOP13mo ago
i see now!
No description
Koblížkáč
KoblížkáčOP13mo ago
so do you think i can safely just purge all the ' ' characters?
Kyo-chan
Kyo-chan13mo ago
While it's not guaranteed in XML, in this case I'd definitely assume you can However, it would be better style to be able to make do when the whitespace is here
Koblížkáč
KoblížkáčOP13mo ago
doesnt seem like it would touch the data in any way does the dom has anything for it or should i use different parser?
Kyo-chan
Kyo-chan13mo ago
Like I said, you should ask for the child you want by its actual name, rather than call getChildNodes()
Koblížkáč
KoblížkáčOP13mo ago
i mean yeah that would make sense seems to be working now, big thanks!
JavaBot
JavaBot13mo ago
If you are finished with your post, please close it. If you are not, please ignore this message. Note that you will not be able to send further messages here after this post have been closed but you will be able to create new posts.
JavaBot
JavaBot13mo ago
Post Closed
This post has been closed by <@466650807996252162>.
Want results from more Discord servers?
Add your server