Modular•7mo ago

EmberJson: High level JSON library

I've spent part of the last week on the beginnings of JSON library for Mojo. It's still very much under development so I haven't made any official releases, but if anyone would like to help add test cases or point out edge cases I've missed that would be greatly appreciated! https://github.com/bgreni/EmberJson A quick example of how it's used:

from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo

from ember_json import *

var s = '{"key": 123}'
var json = JSON.from_string(s)
print(json["key"].int()) # prints 123

json = JSON.from_string('[123, "foo"]')
print(json[1].string()) # prints foo

GitHub

GitHub - bgreni/EmberJson

Contribute to bgreni/EmberJson development by creating an account on GitHub.

43 Replies

Caroline Frasca•7mo ago

Very cool! 😎

Peter Homola•7mo ago

I wrote a similar parser a few weeks ago. Seems to be ~2.5 faster. https://github.com/phomola/mojolibs/tree/main/src/textkit

GitHub

mojolibs/src/textkit at main · phomola/mojolibs

Utils for Mojo. Contribute to phomola/mojolibs development by creating an account on GitHub.

eggsquadOP•7mo ago

How are you measuring?

Peter Homola•7mo ago

I just ran your benchmarks.

eggsquadOP•7mo ago

Oh upon further reading I didn’t realize Unicode respects the same first 128 characters as ascii. I’ll try reading it from raw bytes as you’ve done and see where that gets me

Peter Homola•7mo ago

Yes, I think reading from raw bytes is better here. Your code should be faster then because I first tokenise the input.

eggsquadOP•7mo ago

Ah yes that has yielded quite the improvement, thank you for pointing that out!

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.020933,    57400,     0.020933,     0.020933,     0.020933,  1201.580000
JsonArrayMedium    ,     0.054992,    21948,     0.054992,     0.054992,     0.054992,  1206.966000
JsonArrayLarge     ,     0.128410,     9324,     0.128410,     0.128410,     0.128410,  1197.299000
JsonArrayExtraLarge,    13.749082,       85,    13.749082,    13.749082,    13.749082,  1168.672000
JsonArrayVeryBig   ,    46.724640,       25,    46.724640,    46.724640,    46.724640,  1168.116000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.020933,    57400,     0.020933,     0.020933,     0.020933,  1201.580000
JsonArrayMedium    ,     0.054992,    21948,     0.054992,     0.054992,     0.054992,  1206.966000
JsonArrayLarge     ,     0.128410,     9324,     0.128410,     0.128410,     0.128410,  1197.299000
JsonArrayExtraLarge,    13.749082,       85,    13.749082,    13.749082,    13.749082,  1168.672000
JsonArrayVeryBig   ,    46.724640,       25,    46.724640,    46.724640,    46.724640,  1168.116000

Martin Vuyk•7mo ago

Hi @bgreni cool library. Some comments on your approach: I have a PR open which will make split more efficient and make StringSlice.split() return a List[StringSlice] so there is no allocation beyond building that list. My next feature in line is doing something similar for splitlines so that you will have the option to have your Reader struct already have everything semi-tokenized very cheaply, because splitlines splits by every character except " " (which you dont want to split since your fields might have strings with whitespace). You can also implement your own version since we follow Python that also takes some newline separators into account that the JSON spec doesn't (AFAIK). If you want to keep the peek approach, you can make it faster by going over a byte Span or using UnsafePointer since string slicing is expensive because it checks bounds and allocates one each time. You can look at the code in the split PR to take inspiration. Mojo is very cool and you can make your const types be very readable:

alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))

alias `"` = Byte(ord('"'))
alias `t` = Byte(ord("t"))
alias `f` = Byte(ord("f"))
alias `n` = Byte(ord("n"))
alias `{` = Byte(ord("{"))
alias `}` = Byte(ord("}"))
alias `[` = Byte(ord("["))
alias `]` = Byte(ord("]"))
alias `:` = Byte(ord(":"))
alias `,` = Byte(ord(","))

Anyway GLHF! looking forward to a PR/review of one by you on this 🙂

aurelian•7mo ago

string slicing allocates? sorry meant to say why does it

Martin Vuyk•7mo ago

yep since it returns a String instance which owns its data. Also it should be noted that it currently does not work by unicode codepoints and it will in the future, which will also add overhead so overall using StringSlice as much as possible and Span[Byte] as well are the best ways to go since they are non-owning types that just offer a view into the data

aurelian•7mo ago

I would expect slicing a string to return a StringSlice

Martin Vuyk•7mo ago

maybe in the future, but currently it doesn't

ModularBot•7mo ago

Congrats @Martin Vuyk, you just advanced to level 5!

Martin Vuyk•7mo ago

I've opened a proposal to change the way we do __getitem__(self, slice: Slice) to return an Iterator instead of a new instance We'll see where it goes, might get changed for something else 🤷‍♂️ . The whole stdlib is still WIP

eggsquadOP•7mo ago

StringSlice is basically just a Span wrapper at this point, and Span doesn't work with strided steps at the moment, so expression like this currently require it to copy into a new String

var s = "some string"
print(s[:3:-1])

var s = "some string"
print(s[:3:-1])

eggsquadOP•7mo ago

In case it's useful to anyone I found this collection of benchmarks and validation tests for json https://github.com/miloyip/nativejson-benchmark I made a benchmark using these three big files it seems to be using for it's parsing performance section https://github.com/miloyip/nativejson-benchmark I'm getting about 121ms which would put it at the top of the lower half of this graph? (Probably not since I'm using much newer hardware than what was used for this graph though)

GitHub

GitHub - miloyip/nativejson-benchmark: C/C++ JSON parser/generator ...

C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.

Peter Homola•7mo ago

I've removed unnecessary heap allocations and now my parser seems to be faster again :)

Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.026477,    45358,     0.026477,     0.026477,     0.026477,  1200.923000
JsonArrayMedium    ,     0.068801,    17439,     0.068801,     0.068801,     0.068801,  1199.816000
JsonArrayLarge     ,     0.167700,     7155,     0.167700,     0.167700,     0.167700,  1199.890000
JsonArrayExtraLarge,    17.921433,       67,    17.921433,    17.921433,    17.921433,  1200.736000
JsonArrayVeryBig   ,    57.036952,       21,    57.036952,    57.036952,    57.036952,  1197.776000
JsonBig3           ,   146.449375,        8,   146.449375,   146.449375,   146.449375,  1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.011757,   100000,     0.011757,     0.011757,     0.011757,  1175.731000
JsonArrayMedium    ,     0.025958,    46269,     0.025958,     0.025958,     0.025958,  1201.055000
JsonArrayLarge     ,     0.064550,    18595,     0.064550,     0.064550,     0.064550,  1200.308000
JsonArrayExtraLarge,     7.059231,      169,     7.059231,     7.059231,     7.059231,  1193.010000
JsonArrayVeryBig   ,    26.120578,       45,    26.120578,    26.120578,    26.120578,  1175.426000
JsonBig3           ,    91.799231,       13,    91.799231,    91.799231,    91.799231,  1193.390000

Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.026477,    45358,     0.026477,     0.026477,     0.026477,  1200.923000
JsonArrayMedium    ,     0.068801,    17439,     0.068801,     0.068801,     0.068801,  1199.816000
JsonArrayLarge     ,     0.167700,     7155,     0.167700,     0.167700,     0.167700,  1199.890000
JsonArrayExtraLarge,    17.921433,       67,    17.921433,    17.921433,    17.921433,  1200.736000
JsonArrayVeryBig   ,    57.036952,       21,    57.036952,    57.036952,    57.036952,  1197.776000
JsonBig3           ,   146.449375,        8,   146.449375,   146.449375,   146.449375,  1171.595000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.011757,   100000,     0.011757,     0.011757,     0.011757,  1175.731000
JsonArrayMedium    ,     0.025958,    46269,     0.025958,     0.025958,     0.025958,  1201.055000
JsonArrayLarge     ,     0.064550,    18595,     0.064550,     0.064550,     0.064550,  1200.308000
JsonArrayExtraLarge,     7.059231,      169,     7.059231,     7.059231,     7.059231,  1193.010000
JsonArrayVeryBig   ,    26.120578,       45,    26.120578,    26.120578,    26.120578,  1175.426000
JsonBig3           ,    91.799231,       13,    91.799231,    91.799231,    91.799231,  1193.390000

eggsquadOP•7mo ago

Tag you're it lol

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.010236,   100000,     0.010236,     0.010236,     0.010236,  1023.641000
JsonArrayMedium    ,     0.023358,    51153,     0.023358,     0.023358,     0.023358,  1194.818000
JsonArrayLarge     ,     0.062322,    19975,     0.062322,     0.062322,     0.062322,  1244.885000
JsonArrayExtraLarge,     6.513168,      184,     6.513168,     6.513168,     6.513168,  1198.423000
JsonArrayVeryBig   ,    22.549189,       53,    22.549189,    22.549189,    22.549189,  1195.107000
JsonBig3           ,    68.620235,       17,    68.620235,    68.620235,    68.620235,  1166.544000

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name               , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall     ,     0.010236,   100000,     0.010236,     0.010236,     0.010236,  1023.641000
JsonArrayMedium    ,     0.023358,    51153,     0.023358,     0.023358,     0.023358,  1194.818000
JsonArrayLarge     ,     0.062322,    19975,     0.062322,     0.062322,     0.062322,  1244.885000
JsonArrayExtraLarge,     6.513168,      184,     6.513168,     6.513168,     6.513168,  1198.423000
JsonArrayVeryBig   ,    22.549189,       53,    22.549189,    22.549189,    22.549189,  1195.107000
JsonBig3           ,    68.620235,       17,    68.620235,    68.620235,    68.620235,  1166.544000

eggsquadOP•7mo ago

There's also a collection of conformance test cases in this repo which was very helpful https://github.com/miloyip/nativejson-benchmark/tree/master/data/jsonchecker

GitHub

nativejson-benchmark/data/jsonchecker at master · miloyip/nativejso...

C/C++ JSON parser/generator benchmark. Contribute to miloyip/nativejson-benchmark development by creating an account on GitHub.

aurelian•7mo ago

@bgreni thanks for this, working well was easy to add from_list

eggsquadOP•7mo ago

Thank you! I think I’ll finally do an actual package release to prefix.dev tomorrow for nightly

aurelian•7mo ago

could the return type be inferred here, down the road?

    fn __init__(inout self, json: Object) raises:
        frame = json["frame"].object()
        self.x = abs(frame["x"].float()).cast[DType.int16]()
        self.y = frame["y"].float().cast[DType.int16]()
        self.w = frame["w"].float().cast[DType.int16]()
        self.h = frame["h"].float().cast[DType.int16]()
        self.x2 = self.x + self.w
        self.y2 = self.y + self.h

    fn __init__(inout self, json: Object) raises:
        frame = json["frame"].object()
        self.x = abs(frame["x"].float()).cast[DType.int16]()
        self.y = frame["y"].float().cast[DType.int16]()
        self.w = frame["w"].float().cast[DType.int16]()
        self.h = frame["h"].float().cast[DType.int16]()
        self.x2 = self.x + self.w
        self.y2 = self.y + self.h

looking forward to comptime reflection this could just be a loop

eggsquadOP•7mo ago

Infer the type where exactly?

ModularBot•7mo ago

Congrats @bgreni, you just advanced to level 4!

aurelian•7mo ago

of the struct field more a mojo question

eggsquadOP•7mo ago

I imagine probably not?

eggsquadOP•7mo ago

EmberJson has its first release on prefix.dev in the mojo-community-nightly channel! https://prefix.dev/channels/mojo-community-nightly/packages/emberjson

prefix.dev

prefix.dev – solving software package management

The software package management platform for Python, C++, R, Rust and more

f0cii•6mo ago

Hey! I also created an open-source project for JSON handling in Mojo: sonic-mojo. It seems to be ~7.5 faster than the parser you mentioned(https://github.com/bgreni/EmberJson)! This project is based on Mojo FFI bindings for sonic-rs and uses Diplomat for code generation, with some modifications in my forked version f0cii/diplomat. Here are my benchmark results:

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name                    , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall          ,     0.001463,   803360,     0.001463,     0.001463,     0.001463,  1175.402216
JsonArrayMedium         ,     0.003468,   359714,     0.003468,     0.003468,     0.003468,  1247.467734
JsonArrayLarge          ,     0.005713,   204953,     0.005713,     0.005713,     0.005713,  1170.873361
JsonArrayExtraLarge     ,     0.625991,     1848,     0.625991,     0.625991,     0.625991,  1156.831185
JsonArrayCanada         ,     4.995497,      248,     4.995497,     4.995497,     4.995497,  1238.883308
JsonArrayTwitter        ,     0.973425,     1000,     0.973425,     0.973425,     0.973425,   973.424817
JsonArrayCitmCatalog    ,     2.057586,      545,     2.057586,     2.057586,     2.057586,  1121.384578

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name                    , met (ms)    , iters   , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall          ,     0.001463,   803360,     0.001463,     0.001463,     0.001463,  1175.402216
JsonArrayMedium         ,     0.003468,   359714,     0.003468,     0.003468,     0.003468,  1247.467734
JsonArrayLarge          ,     0.005713,   204953,     0.005713,     0.005713,     0.005713,  1170.873361
JsonArrayExtraLarge     ,     0.625991,     1848,     0.625991,     0.625991,     0.625991,  1156.831185
JsonArrayCanada         ,     4.995497,      248,     4.995497,     4.995497,     4.995497,  1238.883308
JsonArrayTwitter        ,     0.973425,     1000,     0.973425,     0.973425,     0.973425,   973.424817
JsonArrayCitmCatalog    ,     2.057586,      545,     2.057586,     2.057586,     2.057586,  1121.384578

https://github.com/f0cii/sonic-mojo

eggsquadOP•6mo ago

Very cool! I've just written it quickly from scratch so it is quite slow lol. I've thought about trying to port over the simdjson implementation, but that's a lot more time than I have right now https://github.com/simdjson/simdjson

GitHub

GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second : ...

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks - simdjson/simdjson

stano•4mo ago

Hi @eggsquad I'm not sure it's the right place but I tried to add emberjson as dependency (from the nightly community channel) and magic search fails to find it although it successfully finds other packages in the same nightly channel (and it resolves their latest versions).

magic search emberjson
Using channels: conda-forge, https://conda.modular.com/max/, https://repo.prefix.dev/mojo-community-nightly/
  × Package emberjson not found, please use a wildcard '*' in the search name for a broader result

magic search emberjson
Using channels: conda-forge, https://conda.modular.com/max/, https://repo.prefix.dev/mojo-community-nightly/
  × Package emberjson not found, please use a wildcard '*' in the search name for a broader result

But it finds others like mog, weave and hue

eggsquadOP•4mo ago

thats odd, I'll take a look, thanks for bringing this up! @stano Seems to work for me now?

stano•4mo ago

It does resolve now. The only change I mad was switching to max nightly, I believe. I'm still getting my way around for stable vs nightly so I probably didn't realize the nightly community channel packages require nightly max (in this case >= 25.1). I apologize for taking of your time

eggsquadOP•4mo ago

Ah yes I haven't made a stable version of the lib yet. No worries though thank you for trying the package! Not sure if this really needs announcing but I am formally opening up EmberJson to contributions! https://github.com/bgreni/EmberJson/blob/main/CONTRIBUTING.md

toasty•4mo ago

Any thoughts on when it’ll make it to a version that works with stable max? 😄

eggsquadOP•4mo ago

I find it easier to development on nightly so honestly I was largely waiting for someone to ask lol. I'm currently looking at some changes to float string conversion but I can look into doing one afterwards?

toasty•4mo ago

No rush! Just curious. Was thinking about handling json payloads for the client in lightbug_http 🙂

eggsquadOP•4mo ago

EmberJson now has a stable channel release! Version 0.1.1 (because I botched my conda package in 0.1.0 lol) is available in the mojo-community prefix channel! https://prefix.dev/channels/mojo-community/packages/emberjson Also a somewhat notable addition I've added some support for unicode character decoding so stuff like this works now.

 print(parse('["\\u2211"]')) # prints ["∑"]

 print(parse('["\\u2211"]')) # prints ["∑"]

Caroline Frasca•3mo ago

Hey @eggsquad, any interest in talking about this project at our upcoming community meeting on February 3rd?

eggsquadOP•3mo ago

Oh wow what an honour! Yeah I can do that

Caroline Frasca•3mo ago

Awesome!

eggsquadOP•3mo ago

What kind of content should I be doing? And how long, etc?

Caroline Frasca•3mo ago

Most folks create a couple slides to talk about the purpose of their project and then do a demo, but it's very open ended! You're welcome to as much time as you want, but the average presentation is probably ~15 minutes + Q&A

mlange-42•3mo ago

@bgreni Could you please DM me for a question I have regarding your and my talks on Monday? Can't DM you...

Gaming

Programming

EmberJson: High level JSON library

Did you find this page helpful?