Hey Ruby, how's your TOML?
A state of TOML implementations in Ruby
Hey Ruby, how’s your TOML?
At BetterStack, we use Ruby for a part of our infrastructure orchestration, and we use TOML for our script configuration needs. As you may know, TOML (Tom’s Obvious, Minimal Language) is a configuration file format that’s easy to read with obvious semantics.
Since I dabble into Rust development a little and TOML is pretty popular over there, selecting it instead of YAML or JSON was an easy choice, due to its ease of use and straightforward syntax. Unfortunately, the state of TOML libraries in Ruby is a bit more complex at this point.
NOTE: This article was written at the start of 2022. The landscape of TOML gems may be a bit different, so treat the information presented here accordingly.
gem install toml
and move on…?
TOML
is a versioned file format with it’s first major version (v1.0.0) released at the beginning of 2021. Checking out the project wiki shows that there’s a lot of implementations that are compliant with the v1.0.0 spec. However, Ruby only shows up elsewhere in the list, five times total:
-
tomlrb
gem by @fbernier stated as v1.0.0-rc.1 compliant, and -
toml-rb
gem by @emancu) stated as v0.5.0 compliant
followed by
-
toml2
gem by @charliesome, -
toml
gem by @jm, and -
tomlp
gem by @sandeepravi,
all of which have unknown compliance based on the wiki.
A quick search on RubyMine lets us know there are even more gems - toml-ruby
, multi_toml
, ptolemy
, tock
, toml-rb-hs
, toby
, to mention a few showing up at the top.
Most of these additional gems have their latest releases during 2013 (what a year for TOML that must’ve been!) - toby
and toml-rb-hs
are the exceptions, but toml-rb-hs
is a derailed fork of toml-rb
.
Quite a mess!
What’s in the box?
Let’s inspect tomlrb
, toml-rb
, toml2
, toml
, tomlp
, and toby
, as the remaining gems are clearly outdated and don’t support anything near the v1.0.0 TOML spec.
Most of these gems use some kind of grammar or a parsing library - like racc/yacc, citrus, or parslet. This makes sense - TOML even has the language specification in the ABNF format present in the root of the project, so it should be fairly straightforward to use that definition as a base for a decoder/encoder library.
Starting from the oldest, the tomlp
gem (sandeepravi/tomlp
) uses the treewtop
grammar and was updated 9 years ago. Although the author says (in bold and itallics) that You can use this in production, I’m skipping this one.
Moving on to the toml2
gem (haileys/toml2
), also updated 9 years ago, it’s interesting to note that there’s no grammar present in this library - only a five-line, 340-character long, perfectly aligned 😎 ruby eval block. There’s also a note that this gem is a gold plated, production grade, high performance
parser, which makes me a little sad to ignore it moving forward.
Here’s a summary of the remaining libraries (ordered alphabetically):
Gem | Grammar | Last update | commits in the last 12 month | RubyGems Downloads (all/latest version) |
---|---|---|---|---|
toby |
Citrus | Jan 31, 2021 | 24 | 723 / 10,205 |
toml |
Parslet | Dec 1, 2021 | 11 | 667,568 / 16,041,996 |
toml-rb |
Citrus | Nov 22, 2021 | 30 | 238,443 / 18,650,046 |
tomlrb |
Racc | Feb 19, 2021 | 6 | 2,866,930/23,665,130 |
It’s interesting to see that the total downloads are pretty similar between the top three gems - and none of them are abandoned. But it’s still not clear how one should select which gem to use…
Roll your own!
The obvious solution is to take the ABNF definition, transform it to a Ruby regex or to a different usable format, and roll a solution off that. I even have a great name - yatoml
, as in Yet Another TOML. A great idea, right? … Right?
TOML compliance & performance comparison
Since I don’t want to pollute the space with yet another gem library, I want to use (and optionally improve) the most mature gem. I was able to compare the decoder compliance against the v1.0.0 TOML standard using the toml-test
test suite. I’ll summarize the results below, but if you wanted to run this yourself, here’s the repository: toml-comparison.
I measured the compliance against the latest toml-test suite, as well as a very basic performance benchmark (I compared the time it takes to run the test-suite on my MacBook Air).
There are 306 test cases in toml-test
. Here are the comparison results:
Gem | Failed cases | Duration* |
---|---|---|
toby |
89 | 121.9s |
toml |
89 | 58.97s |
toml-rb |
80 | 134.45s |
tomlrb |
23 | 41.1s |
*I averaged three test suite runs for each gem.
Based on these test runs, tomlrb
comes on the top when comparing both v1.0.0 TOML spec compliance as well as speed performance.
Other notable facts: toby
, toml
, and toml-rb
convert all times to a Time
or DateTime
class, while tomlrb
uses it’s internal classes LocalDate
, LocalTime
, and LocalDateTime
. On a similar note, only toby
implements internal classes for Binary
, Octal
and Hexadecimal
numbers. All of these decisions make sense to me - the trade-off is usually between being more explicit and being easier to use.
I didn’t test the encoders - tomlrb
doesn’t have one, and it seemed superfluous since the decoders are not 100% v1.0.0 compliant.
Great, so what now?
Based on the comparison, it would seem tomlrb
is the closest gem to get to v1.0.0.
So let’s get them there!
I submitted a PR to the tomlrb
gem to cover a few of the remaining 23 cases, and hopefully with combined forces we can get it to 100%. The fact that it’s the fastest parser I tested is a nice cherry on top.
Till next time!