C
C#2y ago
rick_o_max

❔ Alternatives to Antlr for C#

Antlr needs a runtime library, and the parsing is veeery slow. Any good alternative?
50 Replies
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
cursedland
cursedland2y ago
language parser
Thinker
Thinker2y ago
A parser generator @T = (Q, Σ, Γ, q₀, *, δ) you have any suggestions?
LPeter1997
LPeter19972y ago
Instead I'd suggest pasting into google, takes less time than asking in a question thread I'd honestly make a suggestion based on the exact use-case Is it really slow? On what grammar? Do you have additional non-functional requirements? (good error-recovery, incrementality, ...) If you just need speed, then I'd look around in the world of LR parsers, they are generally linear-time, if you can find a sensible upped bound for the lookahead (sensible sadly meaning 0 or 1 in the vast majority of cases) If you are dealing with a truly nasty grammar you can't hammer into something an LR generator can eat, then you might also want to look at the handwriting option
Anton
Anton2y ago
consider writing a simple recursive descent parser manually. that's how all programming language compilers parse code nowadays
rick_o_max
rick_o_maxOP2y ago
It is a file format parser, I have the ANTLR grammar I've made already Maybe generating a railroad diagram from it would help me reimplementing it from scratch I generally create the parsers from scratch, but this one is "strange", so I was trying to auto-generate it
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
rick_o_max
rick_o_maxOP2y ago
LPeter1997
LPeter19972y ago
The question was likely for people that actually know a thing about parsing and the standard ecosystem around it. Maybe the question simply wasn't for you
rick_o_max
rick_o_maxOP2y ago
3D FBX files ^^^ The structure is kinda simple, but with some strange rules Lemme see if I can generate a railroad diagram from that
LPeter1997
LPeter19972y ago
That'd help a lot in reading it, thanks
rick_o_max
rick_o_maxOP2y ago
Tbh, I have to test this grammar again, one sec I'm not in the mood to rewrite it from scratch, I've spent months in this project alone, so if I have to redo the parser, it must be the last time...lol
LPeter1997
LPeter19972y ago
The grammar doesn't look complex, just a bit weirdly structured
rick_o_max
rick_o_maxOP2y ago
It is, the file format is strange
LPeter1997
LPeter19972y ago
But if it's for gigantic 3D mesh/skeletal/animation/whatever data, then I can imagine speed being concerning, LL(*) is fast but not "file-format-parser fast"
rick_o_max
rick_o_maxOP2y ago
I just need a diagram, I guess, so I can rewrite it from scratch doing it the right way
LPeter1997
LPeter19972y ago
I'll look through more in depth to see if this is something remotely parsable with LR(0), If it is, then you are in luck, that's linear time
rick_o_max
rick_o_maxOP2y ago
Tks
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
LPeter1997
LPeter19972y ago
Don't take me out of context and read on
rick_o_max
rick_o_maxOP2y ago
This is just a chunk of an actual file:
FBXHeaderExtension: {
FBXHeaderVersion: 1004
FBXVersion: 7700
CreationTimeStamp: {
Version: 1000
Year: 2023
Month: 2
Day: 1
Hour: 15
Minute: 9
Second: 33
Millisecond: 389
}
Creator: "FBX SDK/FBX Plugins version 2020.3.1"
OtherFlags: {
TCDefinition: 127
}
SceneInfo: "SceneInfo::GlobalInfo", "UserData" {
Type: "UserData"
Version: 100
MetaData: {
Version: 100
Title: ""
Subject: ""
Author: ""
Keywords: ""
Revision: ""
Comment: ""
}
Properties70: {
P: "DocumentUrl", "KString", "Url", "", "C:\Users\ricko\Desktop\Models\pivot.fbx"
P: "SrcDocumentUrl", "KString", "Url", "", "C:\Users\ricko\Desktop\Models\pivot.fbx"
P: "Original", "Compound", "", ""
P: "Original|ApplicationVendor", "KString", "", "", "Autodesk"
P: "Original|ApplicationName", "KString", "", "", "Maya"
P: "Original|ApplicationVersion", "KString", "", "", "2023"
P: "Original|DateTime_GMT", "DateTime", "", "", "01/02/2023 18:09:33.387"
P: "Original|FileName", "KString", "", "", "C:\Users\ricko\Desktop\Models\pivot.fbx"
P: "LastSaved", "Compound", "", ""
P: "LastSaved|ApplicationVendor", "KString", "", "", "Autodesk"
P: "LastSaved|ApplicationName", "KString", "", "", "Maya"
P: "LastSaved|ApplicationVersion", "KString", "", "", "2023"
P: "LastSaved|DateTime_GMT", "DateTime", "", "", "01/02/2023 18:09:33.387"
P: "Original|ApplicationActiveProject", "KString", "", "", "C:\Users\ricko\Desktop\Models"
}
}
FBXHeaderExtension: {
FBXHeaderVersion: 1004
FBXVersion: 7700
CreationTimeStamp: {
Version: 1000
Year: 2023
Month: 2
Day: 1
Hour: 15
Minute: 9
Second: 33
Millisecond: 389
}
Creator: "FBX SDK/FBX Plugins version 2020.3.1"
OtherFlags: {
TCDefinition: 127
}
SceneInfo: "SceneInfo::GlobalInfo", "UserData" {
Type: "UserData"
Version: 100
MetaData: {
Version: 100
Title: ""
Subject: ""
Author: ""
Keywords: ""
Revision: ""
Comment: ""
}
Properties70: {
P: "DocumentUrl", "KString", "Url", "", "C:\Users\ricko\Desktop\Models\pivot.fbx"
P: "SrcDocumentUrl", "KString", "Url", "", "C:\Users\ricko\Desktop\Models\pivot.fbx"
P: "Original", "Compound", "", ""
P: "Original|ApplicationVendor", "KString", "", "", "Autodesk"
P: "Original|ApplicationName", "KString", "", "", "Maya"
P: "Original|ApplicationVersion", "KString", "", "", "2023"
P: "Original|DateTime_GMT", "DateTime", "", "", "01/02/2023 18:09:33.387"
P: "Original|FileName", "KString", "", "", "C:\Users\ricko\Desktop\Models\pivot.fbx"
P: "LastSaved", "Compound", "", ""
P: "LastSaved|ApplicationVendor", "KString", "", "", "Autodesk"
P: "LastSaved|ApplicationName", "KString", "", "", "Maya"
P: "LastSaved|ApplicationVersion", "KString", "", "", "2023"
P: "LastSaved|DateTime_GMT", "DateTime", "", "", "01/02/2023 18:09:33.387"
P: "Original|ApplicationActiveProject", "KString", "", "", "C:\Users\ricko\Desktop\Models"
}
}
LPeter1997
LPeter19972y ago
Yep, this should be dead simple to parse with basically anything There's no ambiguity, at least not at first glance. Most of your concern would be the lexer then tbh
rick_o_max
rick_o_maxOP2y ago
There are many ambigualities Trust me...lol
LPeter1997
LPeter19972y ago
Where? I don't see any honestly What I see here is a format where most of your concern would be fast lexing to eat the file as fast as you can
rick_o_max
rick_o_maxOP2y ago
SceneInfo: "SceneInfo::GlobalInfo", "UserData" this is a node with metadata Properties70: this is a node with a single sub-node P: this is a node with multiple properties This is an array:
Vertices: *24 {
a: -0.5,-0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,-0.5,-0.5,-0.5,-0.5,0.5,-0.5,-0.5
}
Vertices: *24 {
a: -0.5,-0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,-0.5,-0.5,-0.5,-0.5,0.5,-0.5,-0.5
}
The a: there acts as a P: from the properties, basically But you see there is the * followed by the array length, which indicates it is an array There is a catch, and I can't remember where, exactly, which breaks all the parser, if not parsed correctly There is a long time I've written the parser tbh
LPeter1997
LPeter19972y ago
Properties70: this is a node with a single sub-node P: this is a node with multiple properties
What's the difference between these 2? I don't immediately see any notational diff
rick_o_max
rick_o_maxOP2y ago
P and a are ambiguous
LPeter1997
LPeter19972y ago
a is sort of ambiguous but the * a few tokens back can disambiguate it That shouldn't be a problem Oooh wait, is the space significant there?
rick_o_max
rick_o_maxOP2y ago
Not sure, haven't tested without it
LPeter1997
LPeter19972y ago
I can see dataValue can derive a space, so P: and P: would be different?
rick_o_max
rick_o_maxOP2y ago
But there is something with the newline char, that I can't recall exactly One issue I don't think so
LPeter1997
LPeter19972y ago
Yeah, it def needs a char before that's alphabetical Then I don't see a problem. Is this grammar written by you, and you are not sure if it's actually 100% correct, or this is given as the oracle for the format?
rick_o_max
rick_o_maxOP2y ago
Oh, I remember the issue I remember now
Vertices: *24 {
a: -0.5,-0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,-0.5,-0.5,-0.5,-0.5,0.5,-0.5,-0.5,
-0.5,-0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,-0.5,-0.5,-0.5,-0.5,0.5,-0.5,-0.5
}
Vertices: *24 {
a: -0.5,-0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,-0.5,-0.5,-0.5,-0.5,0.5,-0.5,-0.5,
-0.5,-0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,0.5,0.5,0.5,-0.5,0.5,-0.5,0.5,0.5,-0.5,-0.5,-0.5,-0.5,0.5,-0.5,-0.5
}
This can happen and is perfectly accepted by the parsers Like, breaking an array with a newline
LPeter1997
LPeter19972y ago
Yeah, I'd expect a newline to be fine there
rick_o_max
rick_o_maxOP2y ago
I remember it caused me issues before, but I've fixed it by checking the comma first I guess the main issue is that my parser is prehistoric and I need to rewrite it from scratch
rick_o_max
rick_o_maxOP2y ago
It uses basically the same kind of lexer/parser I use here: https://github.com/rickomax/JsonParser/blob/main/JsonParser.cs
GitHub
JsonParser/JsonParser.cs at main · rickomax/JsonParser
A simple JSON parser written in C# without external dependencies - JsonParser/JsonParser.cs at main · rickomax/JsonParser
LPeter1997
LPeter19972y ago
You already throw away newlines, that should be fine tho This is a weird-ass parser
rick_o_max
rick_o_maxOP2y ago
Why?
LPeter1997
LPeter19972y ago
It's properly structured but totally weird It avoids defining types, value-checks leak into parse logic (what should be lex logic), ... It's not horrible, but it's another new I've seen Heck if I've finished my episode, I'll write a lexer and parser for this format, it shouldn't be too bad honestly
Anton
Anton2y ago
why the constants?
LPeter1997
LPeter19972y ago
(yeah, why not enums)
rick_o_max
rick_o_maxOP2y ago
The hash system I use
Anton
Anton2y ago
if you're going to define the constants, name them by the way they're used
rick_o_max
rick_o_maxOP2y ago
I use it everywhere (yes, I know there are collisions), but I had no issues so far with the keys I use
Anton
Anton2y ago
so squarebracket should be arraystartchar
rick_o_max
rick_o_maxOP2y ago
When I don't have much info to store, I prefer to use some vars instead of structs/etc The only thing I'm totally not happy with my JsonParser above is the fact that I'm storing the BinaryReader instance at every node... I've made this lib so I don't have to create strings everywhere
rick_o_max
rick_o_maxOP2y ago
I would better store these guys elsewhere, but that is an optimization I can do later:
Anton
Anton2y ago
wait you do what well if it's a proxy sort of object then it's fine
rick_o_max
rick_o_maxOP2y ago
It is, but I could move at least the BinaryReader outside the struct, and pass it to the methods that will consume the struct I might do that later The BinaryReader wont differ between the JsonParser nodes Well, time to rewrite the FBX thing from scracth
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.
Want results from more Discord servers?
Add your server