EmberJson: High level JSON library

I've spent part of the last week on the beginnings of JSON library for Mojo. It's still very much under development so I haven't made any official releases, but if anyone would like to help add test cases or point out edge cases I've missed that would be greatly appreciated! https://github.com/bgreni/EmberJson A quick example of how it's used:
from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo
from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo
GitHub
GitHub - bgreni/EmberJson
Contribute to bgreni/EmberJson development by creating an account on GitHub.
37 Replies
Caroline
Caroline4mo ago
Very cool! 😎
Peter Homola
Peter Homola4mo ago
I wrote a similar parser a few weeks ago. Seems to be ~2.5 faster. https://github.com/phomola/mojolibs/tree/main/src/textkit
GitHub
mojolibs/src/textkit at main · phomola/mojolibs
Utils for Mojo. Contribute to phomola/mojolibs development by creating an account on GitHub.
eggsquad
eggsquadOP4mo ago
How are you measuring?
Peter Homola
Peter Homola4mo ago
I just ran your benchmarks.
eggsquad
eggsquadOP4mo ago
Oh upon further reading I didn’t realize Unicode respects the same first 128 characters as ascii. I’ll try reading it from raw bytes as you’ve done and see where that gets me
Peter Homola
Peter Homola4mo ago
Yes, I think reading from raw bytes is better here. Your code should be faster then because I first tokenise the input.
eggsquad
eggsquadOP4mo ago
Ah yes that has yielded quite the improvement, thank you for pointing that out!
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.020933, 57400, 0.020933, 0.020933, 0.020933, 1201.580000
JsonArrayMedium , 0.054992, 21948, 0.054992, 0.054992, 0.054992, 1206.966000
JsonArrayLarge , 0.128410, 9324, 0.128410, 0.128410, 0.128410, 1197.299000
JsonArrayExtraLarge, 13.749082, 85, 13.749082, 13.749082, 13.749082, 1168.672000
JsonArrayVeryBig , 46.724640, 25, 46.724640, 46.724640, 46.724640, 1168.116000
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.020933, 57400, 0.020933, 0.020933, 0.020933, 1201.580000
JsonArrayMedium , 0.054992, 21948, 0.054992, 0.054992, 0.054992, 1206.966000
JsonArrayLarge , 0.128410, 9324, 0.128410, 0.128410, 0.128410, 1197.299000
JsonArrayExtraLarge, 13.749082, 85, 13.749082, 13.749082, 13.749082, 1168.672000
JsonArrayVeryBig , 46.724640, 25, 46.724640, 46.724640, 46.724640, 1168.116000
Martin Vuyk
Martin Vuyk4mo ago
Hi @bgreni cool library. Some comments on your approach: I have a PR open which will make split more efficient and make StringSlice.split() return a List[StringSlice] so there is no allocation beyond building that list. My next feature in line is doing something similar for splitlines so that you will have the option to have your Reader struct already have everything semi-tokenized very cheaply, because splitlines splits by every character except " " (which you dont want to split since your fields might have strings with whitespace). You can also implement your own version since we follow Python that also takes some newline separators into account that the JSON spec doesn't (AFAIK). If you want to keep the peek approach, you can make it faster by going over a byte Span or using UnsafePointer since string slicing is expensive because it checks bounds and allocates one each time. You can look at the code in the split PR to take inspiration. Mojo is very cool and you can make your const types be very readable:
alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))
alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))
Anyway GLHF! looking forward to a PR/review of one by you on this 🙂
aurelian
aurelian4mo ago
string slicing allocates? sorry meant to say why does it
Martin Vuyk
Martin Vuyk4mo ago
yep since it returns a String instance which owns its data. Also it should be noted that it currently does not work by unicode codepoints and it will in the future, which will also add overhead so overall using StringSlice as much as possible and Span[Byte] as well are the best ways to go since they are non-owning types that just offer a view into the data
aurelian
aurelian4mo ago
I would expect slicing a string to return a StringSlice
Martin Vuyk
Martin Vuyk4mo ago
maybe in the future, but currently it doesn't
ModularBot
ModularBot4mo ago
Congrats @Martin Vuyk, you just advanced to level 5!
Martin Vuyk
Martin Vuyk4mo ago
I've opened a proposal to change the way we do __getitem__(self, slice: Slice) to return an Iterator instead of a new instance We'll see where it goes, might get changed for something else 🤷‍♂️ . The whole stdlib is still WIP
eggsquad
eggsquadOP4mo ago
StringSlice is basically just a Span wrapper at this point, and Span doesn't work with strided steps at the moment, so expression like this currently require it to copy into a new String
var s = "some string"
print(s[:3:-1])
var s = "some string"
print(s[:3:-1])
eggsquad
eggsquadOP4mo ago
In case it's useful to anyone I found this collection of benchmarks and validation tests for json https://github.com/miloyip/nativejson-benchmark I made a benchmark using these three big files it seems to be using for it's parsing performance section https://github.com/miloyip/nativejson-benchmark I'm getting about 121ms which would put it at the top of the lower half of this graph? (Probably not since I'm using much newer hardware than what was used for this graph though)
GitHub
GitHub - miloyip/nativejson-benchmark: C/C++ JSON parser/generator ...
C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.
No description
Peter Homola
Peter Homola4mo ago
I've removed unnecessary heap allocations and now my parser seems to be faster again :)
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.026477, 45358, 0.026477, 0.026477, 0.026477, 1200.923000
JsonArrayMedium , 0.068801, 17439, 0.068801, 0.068801, 0.068801, 1199.816000
JsonArrayLarge , 0.167700, 7155, 0.167700, 0.167700, 0.167700, 1199.890000
JsonArrayExtraLarge, 17.921433, 67, 17.921433, 17.921433, 17.921433, 1200.736000
JsonArrayVeryBig , 57.036952, 21, 57.036952, 57.036952, 57.036952, 1197.776000
JsonBig3 , 146.449375, 8, 146.449375, 146.449375, 146.449375, 1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.011757, 100000, 0.011757, 0.011757, 0.011757, 1175.731000
JsonArrayMedium , 0.025958, 46269, 0.025958, 0.025958, 0.025958, 1201.055000
JsonArrayLarge , 0.064550, 18595, 0.064550, 0.064550, 0.064550, 1200.308000
JsonArrayExtraLarge, 7.059231, 169, 7.059231, 7.059231, 7.059231, 1193.010000
JsonArrayVeryBig , 26.120578, 45, 26.120578, 26.120578, 26.120578, 1175.426000
JsonBig3 , 91.799231, 13, 91.799231, 91.799231, 91.799231, 1193.390000
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.026477, 45358, 0.026477, 0.026477, 0.026477, 1200.923000
JsonArrayMedium , 0.068801, 17439, 0.068801, 0.068801, 0.068801, 1199.816000
JsonArrayLarge , 0.167700, 7155, 0.167700, 0.167700, 0.167700, 1199.890000
JsonArrayExtraLarge, 17.921433, 67, 17.921433, 17.921433, 17.921433, 1200.736000
JsonArrayVeryBig , 57.036952, 21, 57.036952, 57.036952, 57.036952, 1197.776000
JsonBig3 , 146.449375, 8, 146.449375, 146.449375, 146.449375, 1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.011757, 100000, 0.011757, 0.011757, 0.011757, 1175.731000
JsonArrayMedium , 0.025958, 46269, 0.025958, 0.025958, 0.025958, 1201.055000
JsonArrayLarge , 0.064550, 18595, 0.064550, 0.064550, 0.064550, 1200.308000
JsonArrayExtraLarge, 7.059231, 169, 7.059231, 7.059231, 7.059231, 1193.010000
JsonArrayVeryBig , 26.120578, 45, 26.120578, 26.120578, 26.120578, 1175.426000
JsonBig3 , 91.799231, 13, 91.799231, 91.799231, 91.799231, 1193.390000
eggsquad
eggsquadOP4mo ago
Tag you're it lol
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.010236, 100000, 0.010236, 0.010236, 0.010236, 1023.641000
JsonArrayMedium , 0.023358, 51153, 0.023358, 0.023358, 0.023358, 1194.818000
JsonArrayLarge , 0.062322, 19975, 0.062322, 0.062322, 0.062322, 1244.885000
JsonArrayExtraLarge, 6.513168, 184, 6.513168, 6.513168, 6.513168, 1198.423000
JsonArrayVeryBig , 22.549189, 53, 22.549189, 22.549189, 22.549189, 1195.107000
JsonBig3 , 68.620235, 17, 68.620235, 68.620235, 68.620235, 1166.544000
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.010236, 100000, 0.010236, 0.010236, 0.010236, 1023.641000
JsonArrayMedium , 0.023358, 51153, 0.023358, 0.023358, 0.023358, 1194.818000
JsonArrayLarge , 0.062322, 19975, 0.062322, 0.062322, 0.062322, 1244.885000
JsonArrayExtraLarge, 6.513168, 184, 6.513168, 6.513168, 6.513168, 1198.423000
JsonArrayVeryBig , 22.549189, 53, 22.549189, 22.549189, 22.549189, 1195.107000
JsonBig3 , 68.620235, 17, 68.620235, 68.620235, 68.620235, 1166.544000
eggsquad
eggsquadOP4mo ago
There's also a collection of conformance test cases in this repo which was very helpful https://github.com/miloyip/nativejson-benchmark/tree/master/data/jsonchecker
GitHub
nativejson-benchmark/data/jsonchecker at master · miloyip/nativejso...
C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.
aurelian
aurelian4mo ago
@bgreni thanks for this, working well was easy to add from_list
eggsquad
eggsquadOP4mo ago
Thank you! I think I’ll finally do an actual package release to prefix.dev tomorrow for nightly
aurelian
aurelian4mo ago
could the return type be inferred here, down the road?
fn __init__(inout self, json: Object) raises:
frame = json["frame"].object()
self.x = abs(frame["x"].float()).cast[DType.int16]()
self.y = frame["y"].float().cast[DType.int16]()
self.w = frame["w"].float().cast[DType.int16]()
self.h = frame["h"].float().cast[DType.int16]()
self.x2 = self.x + self.w
self.y2 = self.y + self.h
fn __init__(inout self, json: Object) raises:
frame = json["frame"].object()
self.x = abs(frame["x"].float()).cast[DType.int16]()
self.y = frame["y"].float().cast[DType.int16]()
self.w = frame["w"].float().cast[DType.int16]()
self.h = frame["h"].float().cast[DType.int16]()
self.x2 = self.x + self.w
self.y2 = self.y + self.h
looking forward to comptime reflection this could just be a loop
eggsquad
eggsquadOP4mo ago
Infer the type where exactly?
ModularBot
ModularBot4mo ago
Congrats @bgreni, you just advanced to level 4!
aurelian
aurelian4mo ago
of the struct field more a mojo question
eggsquad
eggsquadOP4mo ago
I imagine probably not?
eggsquad
eggsquadOP4mo ago
EmberJson has its first release on prefix.dev in the mojo-community-nightly channel! https://prefix.dev/channels/mojo-community-nightly/packages/emberjson
prefix.dev
prefix.dev – solving software package management
The software package management platform for Python, C++, R, Rust and more
f0cii
f0cii3mo ago
Hey! I also created an open-source project for JSON handling in Mojo: sonic-mojo. It seems to be ~7.5 faster than the parser you mentioned(https://github.com/bgreni/EmberJson)! This project is based on Mojo FFI bindings for sonic-rs and uses Diplomat for code generation, with some modifications in my forked version f0cii/diplomat. Here are my benchmark results:
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.001463, 803360, 0.001463, 0.001463, 0.001463, 1175.402216
JsonArrayMedium , 0.003468, 359714, 0.003468, 0.003468, 0.003468, 1247.467734
JsonArrayLarge , 0.005713, 204953, 0.005713, 0.005713, 0.005713, 1170.873361
JsonArrayExtraLarge , 0.625991, 1848, 0.625991, 0.625991, 0.625991, 1156.831185
JsonArrayCanada , 4.995497, 248, 4.995497, 4.995497, 4.995497, 1238.883308
JsonArrayTwitter , 0.973425, 1000, 0.973425, 0.973425, 0.973425, 973.424817
JsonArrayCitmCatalog , 2.057586, 545, 2.057586, 2.057586, 2.057586, 1121.384578
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.001463, 803360, 0.001463, 0.001463, 0.001463, 1175.402216
JsonArrayMedium , 0.003468, 359714, 0.003468, 0.003468, 0.003468, 1247.467734
JsonArrayLarge , 0.005713, 204953, 0.005713, 0.005713, 0.005713, 1170.873361
JsonArrayExtraLarge , 0.625991, 1848, 0.625991, 0.625991, 0.625991, 1156.831185
JsonArrayCanada , 4.995497, 248, 4.995497, 4.995497, 4.995497, 1238.883308
JsonArrayTwitter , 0.973425, 1000, 0.973425, 0.973425, 0.973425, 973.424817
JsonArrayCitmCatalog , 2.057586, 545, 2.057586, 2.057586, 2.057586, 1121.384578
https://github.com/f0cii/sonic-mojo
eggsquad
eggsquadOP3mo ago
Very cool! I've just written it quickly from scratch so it is quite slow lol. I've thought about trying to port over the simdjson implementation, but that's a lot more time than I have right now https://github.com/simdjson/simdjson
GitHub
GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second : ...
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks - simdjson/simdjson
stano
stano3w ago
Hi @eggsquad I'm not sure it's the right place but I tried to add emberjson as dependency (from the nightly community channel) and magic search fails to find it although it successfully finds other packages in the same nightly channel (and it resolves their latest versions).
magic search emberjson
Using channels: conda-forge, https://conda.modular.com/max/, https://repo.prefix.dev/mojo-community-nightly/
× Package emberjson not found, please use a wildcard '*' in the search name for a broader result
magic search emberjson
Using channels: conda-forge, https://conda.modular.com/max/, https://repo.prefix.dev/mojo-community-nightly/
× Package emberjson not found, please use a wildcard '*' in the search name for a broader result
But it finds others like mog, weave and hue
eggsquad
eggsquadOP3w ago
thats odd, I'll take a look, thanks for bringing this up! @stano Seems to work for me now?
stano
stano3w ago
It does resolve now. The only change I mad was switching to max nightly, I believe. I'm still getting my way around for stable vs nightly so I probably didn't realize the nightly community channel packages require nightly max (in this case >= 25.1). I apologize for taking of your time
eggsquad
eggsquadOP3w ago
Ah yes I haven't made a stable version of the lib yet. No worries though thank you for trying the package! Not sure if this really needs announcing but I am formally opening up EmberJson to contributions! https://github.com/bgreni/EmberJson/blob/main/CONTRIBUTING.md
toasty
toasty2w ago
Any thoughts on when it’ll make it to a version that works with stable max? 😄
eggsquad
eggsquadOP2w ago
I find it easier to development on nightly so honestly I was largely waiting for someone to ask lol. I'm currently looking at some changes to float string conversion but I can look into doing one afterwards?
toasty
toasty2w ago
No rush! Just curious. Was thinking about handling json payloads for the client in lightbug_http 🙂
eggsquad
eggsquadOP2w ago
EmberJson now has a stable channel release! Version 0.1.1 (because I botched my conda package in 0.1.0 lol) is available in the mojo-community prefix channel! https://prefix.dev/channels/mojo-community/packages/emberjson Also a somewhat notable addition I've added some support for unicode character decoding so stuff like this works now.
print(parse('["\\u2211"]')) # prints ["∑"]
print(parse('["\\u2211"]')) # prints ["∑"]

Did you find this page helpful?