reynir.dk/posts/creating-of-builder-web.md
2023-11-14 09:27:39 +01:00

12 KiB

title date
Announcing builder-web 2022-03-08

Lessons from writing builder-web

These days we in Robur are finishing up our NGI Pointer project with a goal of easing deployment of reproducible MirageOS applications. Two components we built for this purpose are builder and builder-web. In this post I will go into technical details, design decisions and how the code for builder and builder-web evolved. My hope is the reader may gain insight into lessons we learned and how builder-web works. First, let's get an overview of what the software does.

Builder

Builder is a server that periodically schedules build jobs to connecting builder worker clients over TCP. The workers then execute the scripts received, typically an invocation of orb which builds an opam package and records information necessary to reproduce the build. The artifacts and exit code are then packaged and sent back to the builder-server. Builder-server can then store the results in an ASN.1 encoded file on the filesystem, or if configured upload it to builder-web. If the upload fails, the results are stored in the filesystem.

Builder-web

We started writing builder-web in late 2020. Builder-web's main purpose is to receive results from builder-server(s) and present the build results and artifacts to users in a web interface. One can view the jobs, their builds and their results, download binary artifacts with checksums or retrieve the information necessary to rebuild and reproduce the artifacts. It has been extended with comparisons of dependencies between builds, visualizations of opam dependencies and binary size, and a recent hook mechanism that we now use to publish built binary packages for FreeBSD and Debian to our package repositories (apt.robur.coop, pkg.robur.coop for debian and FreeBSD respectively).

We run builder-web at builds.robur.coop since end of 2020. It has been running almost continuously since. This is thanks to our effort in writing migration (and rollback) "scripts" at every time we needed to make a database schema change or alter the database in other ways. I write "scripts" because they are OCaml modules that get built into the binary builder-migrations. If you are curious about the history of database modifications in builder-web and have builder-web built or installed locally you can run builder-migrations --help or just builder-migrations and a man page is printed out with a list of sub commands and their descriptions.

Help page of builder-migrations showing fixup and migration subcommands

Another contributing factor is that we use ZFS on the filesystem. Before running a migration or operation that alters the files stored in the filesystem outside the database we can make a ZFS snapshot. If the operation say deletes the wrong files we can do a quick rollback to the ZFS snapshot.

Running software that is actively being developed on is bound to have things go wrong, and we have had things go wrong a number of times. But thanks to the (sometimes redundant) precautions we have always been able to quickly roll back and continue.

Evolution of builder-web

The first iteration of builder-web was a very simple opium application that looked into the directory where builder-server would save the results, and would parse the the ASN.1 encoded files and present it in a web interface. This allowed anyone to view the builds and download the artifacts but not much else.

Not long after the application was redesigned to have an upload endpoint where builder-server could upload build results which are then parsed and stored in a sqlite3 database and on the filesystem. This made the two applications more independent and builder-web would no longer need access to the same filesystem as builder-server uses.

We chose to use sqlite3 through caqti because an embedded database like sqlite3 means no setup of a database server is required, and sqlite3 has good performance. Our hope in choosing caqti was that we could easily switch to a database server if the need should arise, but it turns out the differences between sqlite3 and e.g. PostgreSQL are large enough that many SQL queries will have to be rewritten. So far we have not had a need to switch to using a database server, but I believe it should not be too big a hurdle to overcome. In the web app itself the functions doing the queries are concentrated inside lib/model.ml. Another lesson is that storing data in files on the filesystem tracked by the database can greatly reduce the size of the database. Even for small files such as short shell scripts.

Using phantom types to catch using IDs for the wrong table

When working with databases a good practice to use unique IDs in tables. Foreign keys in other tables can then refer to a row in that table by its ID. This ID is often a 64 bit integer. A problem is that on the OCaml side you get a 64 bit integer and the programmer has to carefully track what table that ID refers to. This is both annoying and error prone. A solution we came up with is a custom Caqti type 'a id.

module Rep = struct
  type 'a id = int64
  let id (_ : 'a) : 'a id Caqti_type.t = Caqti_type.int64
end : sig
  type 'a id
  val id : 'a -> 'a id Caqti_type.t
end

This allows us to write queries with typed IDs:

struct
  let get_id = Caqti_request.find Caqti_type.string (id `my_table) "SELECT ..."
  ...
end : sig
  val get_id : (string, [`my_table] id, _) Caqti_request.t
  val use_id : ([`my_table] id, int, _) Caqti_request.t
  val use_id_for_other_table : ([`other_table] id, int, _) Caqti_request.t
end

With the above queries it is no longer possible to erroneously use the ID retrieved using the get_id query with the query use_id_for_other_table without the type checker complaining.

In June 2021 we migrated from opium to dream 1.0.0~alpha2. If I remember correctly one of the motivations was that we were not satisfied with how opium's opinionated logging works. The migration went surprisingly smooth. The main changes we had to make were few: The functions in opium for creating a response return the response while in dream they return them in a lwt task. Another difference is that dream does not have built-in support for tyxml unlike opium so we had to write a function string_of_html and change from ... |> Response.of_html to ... |> string_of_html |> Dream.html, roughly. Dream also comes with middleware for built-in caqti lwt pools which is really convenient.

Multiple build platforms

On builds.robur.coop we build packages for various distributions. We started building only on FreeBSD but soon added ubuntu, debian 10 and 11, mainly building system packages such as albatross, solo5-hvt tenders etc. One reason for adding more platforms was of course to have packages for more platforms, but also to exercise the builder code on more platforms. Initially, we set up a pair of builder-server and builder-worker for each new platform and scheduled jobs named e.g. albatross-debian-10. This proved inflexible and suboptimal. That architecture did not allow one builder-server to schedule jobs for different platforms.

A somewhat extensive redesign was initiated. Every part needs to be aware of what platform builds are performed on. Builder-worker was extended to take an extra command line parameter with the platform string which is passed on to builder-server when asking for build jobs. Builder-server needs platform-specific orb templates. Interesting questions arise such as what does it mean for scheduling if a client orders an execution of a job on only one platform, and what should happen when a builder-worker connects with a new platform string or an orb template for a new template appears or disappears in the file system. The ASN.1 format also needs to convey what platform the build was built on.

A new usability challenge also occurred: The build platform is important for system packages, but for hvt unikernels the build platform matters less; it is of interest if the user wants to reproduce the build on their own, but for a user that is only interested in downloading and running the unikernel the build platform is likely of very little interest. Our solution is to keep the build platform listed and put a note on the main page that for unikernels the execution target is independent of the build platform. Other solutions and suggestions are very welcome either as an issue on Github or by email: team ATrobur.coop.

Upload hooks

We added some in my opinion very nice and useful analyses and visualizations that my coworker wrote more about on his blog. One of the analyses is based on parsing ELF symbol tables, and it proved to be too slow for a web server, roughly 1-2 seconds. To solve this problem we need to cache the visualizations. Generating the visualizations on first hit would give a poor user experience where the user may experience very slow load times if they happen to be the first to request the visualization. Instead we asynchronously execute a binary on upload that generates the visualizations.

We were also interested in creating and updating system package repositories with binary packages. Therefore we decided to define an upload hook script interface. Operators can write scripts and put them in /etc/builder-web/upload-hooks/ (or /usr/local/etc/builder-web/upload-hooks on FreeBSD), and then the scripts are executed on upload with information about the build passed as command line arguments. We use this for generating the two analysis visualizations and for updating binary system package repositories for FreeBSD and debian family distrubutions. With some minor breaking changes the mechanism could be extended so that it can be used for sending reports about failed builds.

The debian repository is available at apt.robur.coop while the FreeBSD repository is available at pkg.robur.coop.

Unikernel device manifest

While we were working on ELF symbol table analysis I came upon owee as a dependency of another dependency. It is an OCaml library for parsing ELF files and more. For a while I had been working on a parser for a subset of the ELF format in order to retrieve solo5 device manifests from Mirage unikernels. With the tedious ELF parsing already implemented I wrote ocaml-solo5-elftool to be used in albatross. In albatross we used the solo5-elftool binary from [solo5] to query unikernel images for what devices they expect in order to fail early if the user did not specify exactly those. This sort of error can be annoying so I added this functionality to builder-web to display the required devices for each unikernel.

Closing notes

A lot was learned in writing builder, builder-web and the other projects we worked on for the EU NGI Pointer project. I believe the work we have done the past year is a good step towards a good story for deploying and running Mirage applications. I would like to thank NGI Pointer for the generous funding and support that made this work possible.