Invalid Byte Encoding with Postgres JSONB columns

I've been running a fairly successful website with drizzle-orm but have recently come across a problem that I cannot for the life of me figure out. Any insight into a solution, or even how to debug this further would be helpful. I have a pretty standard setup - - postgres16 [docker image] - Web frontend using nuxt. Data used for the backend API is provided by drizzle. I have a few tables that store information in jsonb columns. This data contains strings (peoples names) that often have characters outside the standard en-us character set. For a long time (almost a year) this has worked seemingly well for me and I had not noticed any errors. Recently though, I've started to receive errors like this from postgres - invalid byte sequence for encoding "UTF8": 0xf8 invalid byte sequence for encoding "UTF8": 0xe5 0x6b 0x61 What I found out from investigating this is that the data being sent to postgres is using an encoding like USC-2 . The byte sequence that is reported as error is the unicode representation of the extended character not the utf-8 sequence. I determined this by running db.execute(sql'${JSON.stringify(entries)}::bytea') and then dumping the resulting buffer with toString('hex') What's really crazy about this is that I can only get it to happen on my production server after about 30 minutes of uptime. Visiting a page triggers an update of a row in the database - The update, with extended characters, works immediately after the service comes up, but will then consistently fail after some amount of time. I started with the postgres.js driver and assumed that it was a bug in their implementation, so I switched to node-postgres and ran into the same issue. Right now I have no real guesses on what is going wrong - all the code is pretty straight forward. Maybe something with server side rendering is changing the default encoding output of JSON.stringify? Any help is appreciated!
6 Replies
Swart
SwartOP4mo ago
Oh, one additional thing to mention here - I cannot reproduce this at all on my machine. I can't even create dummy data that will cause this postgres error. If anything I'd like to understand how this is even possible with an update clause through the ORM. I've investigated more and began dumping the hex output of the resultant string produced by JSON.stringify and the data going into my jsonb column. It turns out that when this error begins to occur, the that output has the Unicode code point encoded into it, NOT the hex value. I've done this by using the following
// eslint-disable-next-line node/prefer-global/buffer
const hex = Buffer.from(JSON.stringify(entries)).toString('hex')
// eslint-disable-next-line no-console
console.log('signups: detected failed update', { scheduleKey, error, hex, entries })
// eslint-disable-next-line node/prefer-global/buffer
const hex = Buffer.from(JSON.stringify(entries)).toString('hex')
// eslint-disable-next-line no-console
console.log('signups: detected failed update', { scheduleKey, error, hex, entries })
So for the error: invalid byte sequence for encoding "UTF8": 0xe9 0x73 0x20 the hex for the word that contains the incorrect symbol is 47 69 76 72 E9 73 20 instead of 47 69 76 72 C3 A9 73 20 Based on this, I think the error is actually in something hijacking the global JSON.stringify behavior.
Angelelz
Angelelz4mo ago
I don’t think drizzle would do that You can try a different serializer/parser to see if there a difference.
Swart
SwartOP4mo ago
Yea, I think it's something in one of the nuxt extensions I am using. Trying a different serializer is actually a great idea, I'll give that a shot tomorrow. For completeness, I created the following type and used another serializer, which resolved the core issue.
import stringify from 'fast-json-stable-stringify'


export function customJsonb<TData>(name: string) {
return customType<{ data: TData, driverData: string }>({
dataType() {
return 'jsonb'
},
toDriver(value: TData): string {
return stringify(value)
},
})(name)
}
import stringify from 'fast-json-stable-stringify'


export function customJsonb<TData>(name: string) {
return customType<{ data: TData, driverData: string }>({
dataType() {
return 'jsonb'
},
toDriver(value: TData): string {
return stringify(value)
},
})(name)
}
Unfortunately though, I am now seeing the same error when updating / inserting normal varchar columns. I feel something is deeply wrong outside of drizzle and will continuing to investigate.
Angelelz
Angelelz4mo ago
Please let us know
Swart
SwartOP4mo ago
Minor update here - I've continued to be flummoxed by this. The errors have disappeared. The only change I have in place currently is sanitizing all input I get from API's with the following
const capturedParse = JSON.parse

...

const encoded = Buffer.from(responseText).toString('utf8')
return capturedParse(encoded)
const capturedParse = JSON.parse

...

const encoded = Buffer.from(responseText).toString('utf8')
return capturedParse(encoded)
I'll likely stop posting updates here as this isn't a drizzle issue (that I know of), but one thing I never figured out was how to reproduce this locally. Even defining strings with unicode strings like \u00f8 were handled correctly locally. Anyway, thank you for the help and if anyone finds this and the future and runs into the same problem please let me know as I'd love to understand what's going on.
Angelelz
Angelelz4mo ago
Sounds good, thanks for the follow up!
Want results from more Discord servers?
Add your server