C
C#2y ago
Chik3r

✅ Serializing XML with diacritics/accents

Hi, I'm trying to serialize a class to xml that contains some strings, some of which may contain an accent (example: genérica) or other latin characters (example: diseño). Currently I'm serializing it using the following code:
private static byte[] SerializeXml<T>(T data) {
using MemoryStream mem = new();
using StreamWriter writer = new(mem);
// XmlWriterSettings settings = new() {Encoding = Encoding.Default};

XmlSerializer serializer = new(typeof(T));
// using StringWriter writer = new Utf8StringWriter();
using XmlWriter xmlWriter = XmlWriter.Create(writer);
serializer.Serialize(xmlWriter, data);
return mem.ToArray();
}
private static byte[] SerializeXml<T>(T data) {
using MemoryStream mem = new();
using StreamWriter writer = new(mem);
// XmlWriterSettings settings = new() {Encoding = Encoding.Default};

XmlSerializer serializer = new(typeof(T));
// using StringWriter writer = new Utf8StringWriter();
using XmlWriter xmlWriter = XmlWriter.Create(writer);
serializer.Serialize(xmlWriter, data);
return mem.ToArray();
}
(I was previously trying to use a StringWriter but it also doesn't work) It ends up replacing the characters with ?? instead of writing the actual characters. The final XML needs to be formatted to UTF8 (and include encoding="utf-8" at the top of the xml), so I'm not able to use iso-8859-1 (but even when I tried to use it as an encoding it replaced characters with "??")
15 Replies
jcotton42
jcotton422y ago
my guess is the default encoding for XmlWriter isn't any form of unicode
jcotton42
jcotton422y ago
hm, but it's UTF8 by default in .NET 7 what runtime version are you on?
Chik3r
Chik3rOP2y ago
net 7
jcotton42
jcotton422y ago
what program are you using to view the output XML?
Chik3r
Chik3rOP2y ago
I'm currently printing it to console, but I initially checked it on a browser I'm also checking the byte array and it shows 2 '?' characters
jcotton42
jcotton422y ago
hmm lemme try that code on my end
Chik3r
Chik3rOP2y ago
jcotton42
jcotton422y ago
hm, it works here
jcotton42
jcotton422y ago
also .NET 7 what if you try explicitly specifying the Encoding as UTF8 on the writer settings?
Chik3r
Chik3rOP2y ago
like this?
using StreamWriter writer = new(mem, Encoding.UTF8);
XmlWriterSettings settings = new() {Encoding = Encoding.UTF8};

XmlSerializer serializer = new(typeof(T));
using XmlWriter xmlWriter = XmlWriter.Create(writer, settings);
using StreamWriter writer = new(mem, Encoding.UTF8);
XmlWriterSettings settings = new() {Encoding = Encoding.UTF8};

XmlSerializer serializer = new(typeof(T));
using XmlWriter xmlWriter = XmlWriter.Create(writer, settings);
it still doesn't work
jcotton42
jcotton422y ago
have you tried printing out the values in the object you're serializing maybe they're already messed up?
Chik3r
Chik3rOP2y ago
I tried setting it to a test value (product.Description = "genérico diseño tést";) but it's still replacing it
jcotton42
jcotton422y ago
hm can you produce a minimal example?
Chik3r
Chik3rOP2y ago
I discovered the issue. I have a Request<T> class to surround anything with <Request><T></T></Request> (where T is the class name of the generic T) and it was done like this
[XmlRoot(ElementName = "Request")]
public class Request<T> {
[XmlIgnore]
public required T Data { get; set; }

[XmlAnyElement]
public XElement Element {
get {
using MemoryStream memStream = new();
using TextWriter writer = new StreamWriter(memStream);
XmlSerializer serializer = new(typeof(T));
serializer.Serialize(writer, Data);
return XElement.Parse(Encoding.ASCII.GetString(memStream.ToArray()));
}
set {
XmlSerializer serializer = new(typeof(T));
Data = (T) serializer.Deserialize(value.CreateReader());
}
}
}
[XmlRoot(ElementName = "Request")]
public class Request<T> {
[XmlIgnore]
public required T Data { get; set; }

[XmlAnyElement]
public XElement Element {
get {
using MemoryStream memStream = new();
using TextWriter writer = new StreamWriter(memStream);
XmlSerializer serializer = new(typeof(T));
serializer.Serialize(writer, Data);
return XElement.Parse(Encoding.ASCII.GetString(memStream.ToArray()));
}
set {
XmlSerializer serializer = new(typeof(T));
Data = (T) serializer.Deserialize(value.CreateReader());
}
}
}
It serialized the data so that the element name changes with the class name, but I made it use ascii instead of utf8, causing characters to be replaced with ?. It's fixed now by changing Encoding.ASCII to Encoding.UTF8, although I should probably check if there's a better/correct way to do this Thanks for the help

Did you find this page helpful?