I have something like 1000 alerts in YAML files that we parse with python and then spit out some machine readable files which are then ingested downstream by compilers. I want to update the system to be easier to work with, and I think there are benefits to be had by moving config from yaml into bazel (already used extensively by others working on the project).
I figured bazel would be good since the rules/providers would offer clear and documented inputs and we wouldnt need to invoke some kind of additional generator. A lot of people I talk to seem to think this is abusing bazel in some way, but Im confused by that. Bazel just takes pieces of data and manipulates them, similar to what a generator would do, with the added benefit of caching that data when it doesnt change. It also just integrates nicer with the rest of the build system and should allow us to do more complicated/comprehensive checks sooner.
Am I wrong for thinking I can use bazel for this? It seems to feel right.
TL;DR I don't suggest using bazel for this, but if you really want to you can. While it is good that you are asking a "should I do this?" question, the details of the question seem to focus on "can I do this?" more.
The issue here is not "can bazel do the work" -- it probably can depending on the specifics of the code generation -- but "would the average developer be confused by what is going on" and "what happens if we scale it up".
Bazel is a build orchestration tool. It tells other tools what to do. You expect bazel "code" to be focused on build configurations: connecting tools with input source files to generate output files in DAGs to foster parallel execution. It typically says "take this data from here, put it through this transformation, and make this output artifact available to others" -- this is a brief summary of an
action, the basic building block of bazel's execution. You don't expect for bazel to contain the input data itself which is then directly provided to tools to generate output files.It is probably possible to put your data directly into bazel with something like:
But you run into a distinct problem of complexity creep as you add more and more features and your config becomes more nested and more vertical. I've seen this sort of thing happen before and it typically results in unreadable
BUILDfiles and the creation of new, empty packages simply to organize what is going on into separateBUILDfiles. You'll then create a set of.bzlfiles to collect common logic and constants together. Over time, you'll add specialized logic to help with certain groups of alarms that isn't common. And then you'll migrate your alarm framework and re-implement your configs.Without knowing precisely what you are trying to accomplish, it is hard to say whether or not it is something I would support were you to write me a design doc or send a code review my way. A lot depends on implementation details, your use case, where the use cases might grow in the future, and what the other developers are willing to do.
Some questions to consider:
YAMLto text protobufYAMLto protobuf at build-timeFor the point about thousands of targets in a single package, that actually does introduce performance concerns. Yes, you have a cache, but if you perform a
queryon that package, it can still take a bit for bazel to perform the operation. Or if you have a...pattern likebazel test //my/code/here/...which also touches your package it can cause additional work. Basically, anything that requires you to look at the list of targets in that package can cause performance issues.While it is unlikely that a few thousand new targets will make an appreciable difference, things can quickly get out of hand if you don't have some discipline with the number of targets you are adding, use overly broad patterns, or use a lot of aspects.
Also, the cache would apply to building the YAML file in the first place, no? You wouldn't need to re-build the YAML file if it wasn't changed so you get caching benefits without having to re-implement things in Skylark/Bazel.