EmberJson: High level JSON library

I've spent part of the last week on the beginnings of JSON library for Mojo. It's still very much under development so I haven't made any official releases, but if anyone would like to help add test cases or point out edge cases I've missed that would be greatly appreciated! https://github.com/bgreni/EmberJson A quick example of how it's used:
from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo
from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo
GitHub
GitHub - bgreni/EmberJson
Contribute to bgreni/EmberJson development by creating an account on GitHub.
19 Replies
Caroline
Caroline2d ago
Very cool! 😎
Peter Homola
Peter Homola2d ago
I wrote a similar parser a few weeks ago. Seems to be ~2.5 faster. https://github.com/phomola/mojolibs/tree/main/src/textkit
GitHub
mojolibs/src/textkit at main · phomola/mojolibs
Utils for Mojo. Contribute to phomola/mojolibs development by creating an account on GitHub.
eggsquad
eggsquad2d ago
How are you measuring?
Peter Homola
Peter Homola2d ago
I just ran your benchmarks.
eggsquad
eggsquad2d ago
Oh upon further reading I didn’t realize Unicode respects the same first 128 characters as ascii. I’ll try reading it from raw bytes as you’ve done and see where that gets me
Peter Homola
Peter Homola2d ago
Yes, I think reading from raw bytes is better here. Your code should be faster then because I first tokenise the input.
eggsquad
eggsquad2d ago
Ah yes that has yielded quite the improvement, thank you for pointing that out!
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.020933, 57400, 0.020933, 0.020933, 0.020933, 1201.580000
JsonArrayMedium , 0.054992, 21948, 0.054992, 0.054992, 0.054992, 1206.966000
JsonArrayLarge , 0.128410, 9324, 0.128410, 0.128410, 0.128410, 1197.299000
JsonArrayExtraLarge, 13.749082, 85, 13.749082, 13.749082, 13.749082, 1168.672000
JsonArrayVeryBig , 46.724640, 25, 46.724640, 46.724640, 46.724640, 1168.116000
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.020933, 57400, 0.020933, 0.020933, 0.020933, 1201.580000
JsonArrayMedium , 0.054992, 21948, 0.054992, 0.054992, 0.054992, 1206.966000
JsonArrayLarge , 0.128410, 9324, 0.128410, 0.128410, 0.128410, 1197.299000
JsonArrayExtraLarge, 13.749082, 85, 13.749082, 13.749082, 13.749082, 1168.672000
JsonArrayVeryBig , 46.724640, 25, 46.724640, 46.724640, 46.724640, 1168.116000
Martin Vuyk
Martin Vuyk2d ago
Hi @bgreni cool library. Some comments on your approach: I have a PR open which will make split more efficient and make StringSlice.split() return a List[StringSlice] so there is no allocation beyond building that list. My next feature in line is doing something similar for splitlines so that you will have the option to have your Reader struct already have everything semi-tokenized very cheaply, because splitlines splits by every character except " " (which you dont want to split since your fields might have strings with whitespace). You can also implement your own version since we follow Python that also takes some newline separators into account that the JSON spec doesn't (AFAIK). If you want to keep the peek approach, you can make it faster by going over a byte Span or using UnsafePointer since string slicing is expensive because it checks bounds and allocates one each time. You can look at the code in the split PR to take inspiration. Mojo is very cool and you can make your const types be very readable:
alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))
alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))
Anyway GLHF! looking forward to a PR/review of one by you on this 🙂
aurelian
aurelian2d ago
string slicing allocates? sorry meant to say why does it
Martin Vuyk
Martin Vuyk2d ago
yep since it returns a String instance which owns its data. Also it should be noted that it currently does not work by unicode codepoints and it will in the future, which will also add overhead so overall using StringSlice as much as possible and Span[Byte] as well are the best ways to go since they are non-owning types that just offer a view into the data
aurelian
aurelian2d ago
I would expect slicing a string to return a StringSlice
Martin Vuyk
Martin Vuyk2d ago
maybe in the future, but currently it doesn't
ModularBot
ModularBot2d ago
Congrats @Martin Vuyk, you just advanced to level 5!
Martin Vuyk
Martin Vuyk2d ago
I've opened a proposal to change the way we do __getitem__(self, slice: Slice) to return an Iterator instead of a new instance We'll see where it goes, might get changed for something else 🤷‍♂️ . The whole stdlib is still WIP
eggsquad
eggsquad2d ago
StringSlice is basically just a Span wrapper at this point, and Span doesn't work with strided steps at the moment, so expression like this currently require it to copy into a new String
var s = "some string"
print(s[:3:-1])
var s = "some string"
print(s[:3:-1])
eggsquad
eggsquad2d ago
In case it's useful to anyone I found this collection of benchmarks and validation tests for json https://github.com/miloyip/nativejson-benchmark I made a benchmark using these three big files it seems to be using for it's parsing performance section https://github.com/miloyip/nativejson-benchmark I'm getting about 121ms which would put it at the top of the lower half of this graph? (Probably not since I'm using much newer hardware than what was used for this graph though)
GitHub
GitHub - miloyip/nativejson-benchmark: C/C++ JSON parser/generator ...
C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.
No description
Peter Homola
Peter Homola2d ago
I've removed unnecessary heap allocations and now my parser seems to be faster again :)
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.026477, 45358, 0.026477, 0.026477, 0.026477, 1200.923000
JsonArrayMedium , 0.068801, 17439, 0.068801, 0.068801, 0.068801, 1199.816000
JsonArrayLarge , 0.167700, 7155, 0.167700, 0.167700, 0.167700, 1199.890000
JsonArrayExtraLarge, 17.921433, 67, 17.921433, 17.921433, 17.921433, 1200.736000
JsonArrayVeryBig , 57.036952, 21, 57.036952, 57.036952, 57.036952, 1197.776000
JsonBig3 , 146.449375, 8, 146.449375, 146.449375, 146.449375, 1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.011757, 100000, 0.011757, 0.011757, 0.011757, 1175.731000
JsonArrayMedium , 0.025958, 46269, 0.025958, 0.025958, 0.025958, 1201.055000
JsonArrayLarge , 0.064550, 18595, 0.064550, 0.064550, 0.064550, 1200.308000
JsonArrayExtraLarge, 7.059231, 169, 7.059231, 7.059231, 7.059231, 1193.010000
JsonArrayVeryBig , 26.120578, 45, 26.120578, 26.120578, 26.120578, 1175.426000
JsonBig3 , 91.799231, 13, 91.799231, 91.799231, 91.799231, 1193.390000
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.026477, 45358, 0.026477, 0.026477, 0.026477, 1200.923000
JsonArrayMedium , 0.068801, 17439, 0.068801, 0.068801, 0.068801, 1199.816000
JsonArrayLarge , 0.167700, 7155, 0.167700, 0.167700, 0.167700, 1199.890000
JsonArrayExtraLarge, 17.921433, 67, 17.921433, 17.921433, 17.921433, 1200.736000
JsonArrayVeryBig , 57.036952, 21, 57.036952, 57.036952, 57.036952, 1197.776000
JsonBig3 , 146.449375, 8, 146.449375, 146.449375, 146.449375, 1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.011757, 100000, 0.011757, 0.011757, 0.011757, 1175.731000
JsonArrayMedium , 0.025958, 46269, 0.025958, 0.025958, 0.025958, 1201.055000
JsonArrayLarge , 0.064550, 18595, 0.064550, 0.064550, 0.064550, 1200.308000
JsonArrayExtraLarge, 7.059231, 169, 7.059231, 7.059231, 7.059231, 1193.010000
JsonArrayVeryBig , 26.120578, 45, 26.120578, 26.120578, 26.120578, 1175.426000
JsonBig3 , 91.799231, 13, 91.799231, 91.799231, 91.799231, 1193.390000
eggsquad
eggsquad13h ago
Tag you're it lol
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.010236, 100000, 0.010236, 0.010236, 0.010236, 1023.641000
JsonArrayMedium , 0.023358, 51153, 0.023358, 0.023358, 0.023358, 1194.818000
JsonArrayLarge , 0.062322, 19975, 0.062322, 0.062322, 0.062322, 1244.885000
JsonArrayExtraLarge, 6.513168, 184, 6.513168, 6.513168, 6.513168, 1198.423000
JsonArrayVeryBig , 22.549189, 53, 22.549189, 22.549189, 22.549189, 1195.107000
JsonBig3 , 68.620235, 17, 68.620235, 68.620235, 68.620235, 1166.544000
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.010236, 100000, 0.010236, 0.010236, 0.010236, 1023.641000
JsonArrayMedium , 0.023358, 51153, 0.023358, 0.023358, 0.023358, 1194.818000
JsonArrayLarge , 0.062322, 19975, 0.062322, 0.062322, 0.062322, 1244.885000
JsonArrayExtraLarge, 6.513168, 184, 6.513168, 6.513168, 6.513168, 1198.423000
JsonArrayVeryBig , 22.549189, 53, 22.549189, 22.549189, 22.549189, 1195.107000
JsonBig3 , 68.620235, 17, 68.620235, 68.620235, 68.620235, 1166.544000
eggsquad
eggsquad13h ago
There's also a collection of conformance test cases in this repo which was very helpful https://github.com/miloyip/nativejson-benchmark/tree/master/data/jsonchecker
GitHub
nativejson-benchmark/data/jsonchecker at master · miloyip/nativejso...
C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.
Want results from more Discord servers?
Add your server