Swart
DTDrizzle Team
•Created by Swart on 8/31/2024 in #help
Invalid Byte Encoding with Postgres JSONB columns
I've been running a fairly successful website with
drizzle-orm
but have recently come across a problem that I cannot for the life of me figure out. Any insight into a solution, or even how to debug this further would be helpful.
I have a pretty standard setup -
- postgres16 [docker image]
- Web frontend using nuxt
. Data used for the backend API is provided by drizzle.
I have a few tables that store information in jsonb
columns. This data contains strings (peoples names) that often have characters outside the standard en-us
character set. For a long time (almost a year) this has worked seemingly well for me and I had not noticed any errors. Recently though, I've started to receive errors like this from postgres -
invalid byte sequence for encoding "UTF8": 0xf8
invalid byte sequence for encoding "UTF8": 0xe5 0x6b 0x61
What I found out from investigating this is that the data being sent to postgres is using an encoding like USC-2
. The byte sequence that is reported as error is the unicode representation of the extended character not the utf-8
sequence. I determined this by running db.execute(sql'${JSON.stringify(entries)}::bytea')
and then dumping the resulting buffer with toString('hex')
What's really crazy about this is that I can only get it to happen on my production server after about 30 minutes of uptime. Visiting a page triggers an update of a row in the database - The update
, with extended characters, works immediately after the service comes up, but will then consistently fail after some amount of time.
I started with the postgres.js
driver and assumed that it was a bug in their implementation, so I switched to node-postgres
and ran into the same issue.
Right now I have no real guesses on what is going wrong - all the code is pretty straight forward. Maybe something with server side rendering is changing the default encoding output of JSON.stringify?
Any help is appreciated!10 replies