EmberJson: High level JSON library

I've spent part of the last week on the beginnings of JSON library for Mojo. It's still very much under development so I haven't made any official releases, but if anyone would like to help add test cases or point out edge cases I've missed that would be greatly appreciated! https://github.com/bgreni/EmberJson A quick example of how it's used:
from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo
from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo
GitHub
GitHub - bgreni/EmberJson
Contribute to bgreni/EmberJson development by creating an account on GitHub.
29 Replies
Caroline
Caroline3mo ago
Very cool! 😎
Peter Homola
Peter Homola3mo ago
I wrote a similar parser a few weeks ago. Seems to be ~2.5 faster. https://github.com/phomola/mojolibs/tree/main/src/textkit
GitHub
mojolibs/src/textkit at main · phomola/mojolibs
Utils for Mojo. Contribute to phomola/mojolibs development by creating an account on GitHub.
eggsquad
eggsquadOP3mo ago
How are you measuring?
Peter Homola
Peter Homola3mo ago
I just ran your benchmarks.
eggsquad
eggsquadOP3mo ago
Oh upon further reading I didn’t realize Unicode respects the same first 128 characters as ascii. I’ll try reading it from raw bytes as you’ve done and see where that gets me
Peter Homola
Peter Homola3mo ago
Yes, I think reading from raw bytes is better here. Your code should be faster then because I first tokenise the input.
eggsquad
eggsquadOP3mo ago
Ah yes that has yielded quite the improvement, thank you for pointing that out!
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.020933, 57400, 0.020933, 0.020933, 0.020933, 1201.580000
JsonArrayMedium , 0.054992, 21948, 0.054992, 0.054992, 0.054992, 1206.966000
JsonArrayLarge , 0.128410, 9324, 0.128410, 0.128410, 0.128410, 1197.299000
JsonArrayExtraLarge, 13.749082, 85, 13.749082, 13.749082, 13.749082, 1168.672000
JsonArrayVeryBig , 46.724640, 25, 46.724640, 46.724640, 46.724640, 1168.116000
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.020933, 57400, 0.020933, 0.020933, 0.020933, 1201.580000
JsonArrayMedium , 0.054992, 21948, 0.054992, 0.054992, 0.054992, 1206.966000
JsonArrayLarge , 0.128410, 9324, 0.128410, 0.128410, 0.128410, 1197.299000
JsonArrayExtraLarge, 13.749082, 85, 13.749082, 13.749082, 13.749082, 1168.672000
JsonArrayVeryBig , 46.724640, 25, 46.724640, 46.724640, 46.724640, 1168.116000
Martin Vuyk
Martin Vuyk3mo ago
Hi @bgreni cool library. Some comments on your approach: I have a PR open which will make split more efficient and make StringSlice.split() return a List[StringSlice] so there is no allocation beyond building that list. My next feature in line is doing something similar for splitlines so that you will have the option to have your Reader struct already have everything semi-tokenized very cheaply, because splitlines splits by every character except " " (which you dont want to split since your fields might have strings with whitespace). You can also implement your own version since we follow Python that also takes some newline separators into account that the JSON spec doesn't (AFAIK). If you want to keep the peek approach, you can make it faster by going over a byte Span or using UnsafePointer since string slicing is expensive because it checks bounds and allocates one each time. You can look at the code in the split PR to take inspiration. Mojo is very cool and you can make your const types be very readable:
alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))
alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))
Anyway GLHF! looking forward to a PR/review of one by you on this 🙂
aurelian
aurelian3mo ago
string slicing allocates? sorry meant to say why does it
Martin Vuyk
Martin Vuyk3mo ago
yep since it returns a String instance which owns its data. Also it should be noted that it currently does not work by unicode codepoints and it will in the future, which will also add overhead so overall using StringSlice as much as possible and Span[Byte] as well are the best ways to go since they are non-owning types that just offer a view into the data
aurelian
aurelian3mo ago
I would expect slicing a string to return a StringSlice
Martin Vuyk
Martin Vuyk3mo ago
maybe in the future, but currently it doesn't
ModularBot
ModularBot3mo ago
Congrats @Martin Vuyk, you just advanced to level 5!
Martin Vuyk
Martin Vuyk3mo ago
I've opened a proposal to change the way we do __getitem__(self, slice: Slice) to return an Iterator instead of a new instance We'll see where it goes, might get changed for something else 🤷‍♂️ . The whole stdlib is still WIP
eggsquad
eggsquadOP3mo ago
StringSlice is basically just a Span wrapper at this point, and Span doesn't work with strided steps at the moment, so expression like this currently require it to copy into a new String
var s = "some string"
print(s[:3:-1])
var s = "some string"
print(s[:3:-1])
eggsquad
eggsquadOP3mo ago
In case it's useful to anyone I found this collection of benchmarks and validation tests for json https://github.com/miloyip/nativejson-benchmark I made a benchmark using these three big files it seems to be using for it's parsing performance section https://github.com/miloyip/nativejson-benchmark I'm getting about 121ms which would put it at the top of the lower half of this graph? (Probably not since I'm using much newer hardware than what was used for this graph though)
GitHub
GitHub - miloyip/nativejson-benchmark: C/C++ JSON parser/generator ...
C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.
No description
Peter Homola
Peter Homola3mo ago
I've removed unnecessary heap allocations and now my parser seems to be faster again :)
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.026477, 45358, 0.026477, 0.026477, 0.026477, 1200.923000
JsonArrayMedium , 0.068801, 17439, 0.068801, 0.068801, 0.068801, 1199.816000
JsonArrayLarge , 0.167700, 7155, 0.167700, 0.167700, 0.167700, 1199.890000
JsonArrayExtraLarge, 17.921433, 67, 17.921433, 17.921433, 17.921433, 1200.736000
JsonArrayVeryBig , 57.036952, 21, 57.036952, 57.036952, 57.036952, 1197.776000
JsonBig3 , 146.449375, 8, 146.449375, 146.449375, 146.449375, 1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.011757, 100000, 0.011757, 0.011757, 0.011757, 1175.731000
JsonArrayMedium , 0.025958, 46269, 0.025958, 0.025958, 0.025958, 1201.055000
JsonArrayLarge , 0.064550, 18595, 0.064550, 0.064550, 0.064550, 1200.308000
JsonArrayExtraLarge, 7.059231, 169, 7.059231, 7.059231, 7.059231, 1193.010000
JsonArrayVeryBig , 26.120578, 45, 26.120578, 26.120578, 26.120578, 1175.426000
JsonBig3 , 91.799231, 13, 91.799231, 91.799231, 91.799231, 1193.390000
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.026477, 45358, 0.026477, 0.026477, 0.026477, 1200.923000
JsonArrayMedium , 0.068801, 17439, 0.068801, 0.068801, 0.068801, 1199.816000
JsonArrayLarge , 0.167700, 7155, 0.167700, 0.167700, 0.167700, 1199.890000
JsonArrayExtraLarge, 17.921433, 67, 17.921433, 17.921433, 17.921433, 1200.736000
JsonArrayVeryBig , 57.036952, 21, 57.036952, 57.036952, 57.036952, 1197.776000
JsonBig3 , 146.449375, 8, 146.449375, 146.449375, 146.449375, 1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.011757, 100000, 0.011757, 0.011757, 0.011757, 1175.731000
JsonArrayMedium , 0.025958, 46269, 0.025958, 0.025958, 0.025958, 1201.055000
JsonArrayLarge , 0.064550, 18595, 0.064550, 0.064550, 0.064550, 1200.308000
JsonArrayExtraLarge, 7.059231, 169, 7.059231, 7.059231, 7.059231, 1193.010000
JsonArrayVeryBig , 26.120578, 45, 26.120578, 26.120578, 26.120578, 1175.426000
JsonBig3 , 91.799231, 13, 91.799231, 91.799231, 91.799231, 1193.390000
eggsquad
eggsquadOP3mo ago
Tag you're it lol
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.010236, 100000, 0.010236, 0.010236, 0.010236, 1023.641000
JsonArrayMedium , 0.023358, 51153, 0.023358, 0.023358, 0.023358, 1194.818000
JsonArrayLarge , 0.062322, 19975, 0.062322, 0.062322, 0.062322, 1244.885000
JsonArrayExtraLarge, 6.513168, 184, 6.513168, 6.513168, 6.513168, 1198.423000
JsonArrayVeryBig , 22.549189, 53, 22.549189, 22.549189, 22.549189, 1195.107000
JsonBig3 , 68.620235, 17, 68.620235, 68.620235, 68.620235, 1166.544000
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.010236, 100000, 0.010236, 0.010236, 0.010236, 1023.641000
JsonArrayMedium , 0.023358, 51153, 0.023358, 0.023358, 0.023358, 1194.818000
JsonArrayLarge , 0.062322, 19975, 0.062322, 0.062322, 0.062322, 1244.885000
JsonArrayExtraLarge, 6.513168, 184, 6.513168, 6.513168, 6.513168, 1198.423000
JsonArrayVeryBig , 22.549189, 53, 22.549189, 22.549189, 22.549189, 1195.107000
JsonBig3 , 68.620235, 17, 68.620235, 68.620235, 68.620235, 1166.544000
eggsquad
eggsquadOP3mo ago
There's also a collection of conformance test cases in this repo which was very helpful https://github.com/miloyip/nativejson-benchmark/tree/master/data/jsonchecker
GitHub
nativejson-benchmark/data/jsonchecker at master · miloyip/nativejso...
C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.
aurelian
aurelian3mo ago
@bgreni thanks for this, working well was easy to add from_list
eggsquad
eggsquadOP3mo ago
Thank you! I think I’ll finally do an actual package release to prefix.dev tomorrow for nightly
aurelian
aurelian3mo ago
could the return type be inferred here, down the road?
fn __init__(inout self, json: Object) raises:
frame = json["frame"].object()
self.x = abs(frame["x"].float()).cast[DType.int16]()
self.y = frame["y"].float().cast[DType.int16]()
self.w = frame["w"].float().cast[DType.int16]()
self.h = frame["h"].float().cast[DType.int16]()
self.x2 = self.x + self.w
self.y2 = self.y + self.h
fn __init__(inout self, json: Object) raises:
frame = json["frame"].object()
self.x = abs(frame["x"].float()).cast[DType.int16]()
self.y = frame["y"].float().cast[DType.int16]()
self.w = frame["w"].float().cast[DType.int16]()
self.h = frame["h"].float().cast[DType.int16]()
self.x2 = self.x + self.w
self.y2 = self.y + self.h
looking forward to comptime reflection this could just be a loop
eggsquad
eggsquadOP3mo ago
Infer the type where exactly?
ModularBot
ModularBot3mo ago
Congrats @bgreni, you just advanced to level 4!
aurelian
aurelian3mo ago
of the struct field more a mojo question
eggsquad
eggsquadOP3mo ago
I imagine probably not?
eggsquad
eggsquadOP3mo ago
EmberJson has its first release on prefix.dev in the mojo-community-nightly channel! https://prefix.dev/channels/mojo-community-nightly/packages/emberjson
prefix.dev
prefix.dev – solving software package management
The software package management platform for Python, C++, R, Rust and more
f0cii
f0cii2mo ago
Hey! I also created an open-source project for JSON handling in Mojo: sonic-mojo. It seems to be ~7.5 faster than the parser you mentioned(https://github.com/bgreni/EmberJson)! This project is based on Mojo FFI bindings for sonic-rs and uses Diplomat for code generation, with some modifications in my forked version f0cii/diplomat. Here are my benchmark results:
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.001463, 803360, 0.001463, 0.001463, 0.001463, 1175.402216
JsonArrayMedium , 0.003468, 359714, 0.003468, 0.003468, 0.003468, 1247.467734
JsonArrayLarge , 0.005713, 204953, 0.005713, 0.005713, 0.005713, 1170.873361
JsonArrayExtraLarge , 0.625991, 1848, 0.625991, 0.625991, 0.625991, 1156.831185
JsonArrayCanada , 4.995497, 248, 4.995497, 4.995497, 4.995497, 1238.883308
JsonArrayTwitter , 0.973425, 1000, 0.973425, 0.973425, 0.973425, 973.424817
JsonArrayCitmCatalog , 2.057586, 545, 2.057586, 2.057586, 2.057586, 1121.384578
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.001463, 803360, 0.001463, 0.001463, 0.001463, 1175.402216
JsonArrayMedium , 0.003468, 359714, 0.003468, 0.003468, 0.003468, 1247.467734
JsonArrayLarge , 0.005713, 204953, 0.005713, 0.005713, 0.005713, 1170.873361
JsonArrayExtraLarge , 0.625991, 1848, 0.625991, 0.625991, 0.625991, 1156.831185
JsonArrayCanada , 4.995497, 248, 4.995497, 4.995497, 4.995497, 1238.883308
JsonArrayTwitter , 0.973425, 1000, 0.973425, 0.973425, 0.973425, 973.424817
JsonArrayCitmCatalog , 2.057586, 545, 2.057586, 2.057586, 2.057586, 1121.384578
https://github.com/f0cii/sonic-mojo
eggsquad
eggsquadOP2mo ago
Very cool! I've just written it quickly from scratch so it is quite slow lol. I've thought about trying to port over the simdjson implementation, but that's a lot more time than I have right now https://github.com/simdjson/simdjson
GitHub
GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second : ...
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks - simdjson/simdjson
Want results from more Discord servers?
Add your server