Skip to content

Instantly share code, notes, and snippets.

@oyvindberg
Last active December 14, 2023 17:09
Show Gist options
  • Save oyvindberg/f2ba9d5c692f469bc7000677013fc48e to your computer and use it in GitHub Desktop.
Save oyvindberg/f2ba9d5c692f469bc7000677013fc48e to your computer and use it in GitHub Desktop.

Thank you for your work on Bleep! Developing a first-class Scala build experience that goes beyond Sbt is a daunting task, but one I believe is very much necessary for the health of industrial Scala.

Some preliminary thoughts:

Thank you for having a look and sharing your thoughts around this, greatly appreciated!

  1. I love that you went build-as-data. Tooling matters, and the build tool is the center of the entire tooling pipeline. Other tools need the ability to both read and write build data, and this can only be done economically with build-as-data.

Bleep is an experiment of thow far we can take build-as-data. So far I see no limits 🚀

  1. Ignoring Virtus Labs fork of Bloop, Bloop is not well-maintained anymore (see attached). Anecdotally, it seems that bugs in Bloop are responsible for at least some of the issues seen by users in Metals. That said, all the build tools are choosing Bloop (or a fork of Bloop), which raises the question, what is your plan to mitigate against the risks of basing Bleep on Bloop?

The way I see it, Bloop is an enabler, not a roadblock. Bleep is essentially just Coursier, Bloop and Yaml - and nothing else. This is what made it possible for me to implement the core in just a few weeks worth of time, not years.

There are absolutely maintenance worries when it comes to Bloop - it's a way too complicated codebase for what it does. However, it is being simplified. The build is approaching normal (Bleep can compile it), and there are just enough people who understand internals. It is effectively supported by VL (through scala-cli), so I'm not too worried about this.

  1. Automatic import from Sbt is a wonderful feature that can help bootstraping a new build tool. Thanks for thinking about all the users who are currently on Sbt, and want to experiment with a new build tool, but do not have any dedicated time at work for paying down build system debt.

There is no chance of succeeding otherwise. And since it primarily uses bloop files as input, maven, gradle and mill imports are possible too. In fact, with those build imports in place this may already approach being a good Java build tool as well. Now we just need a Kotlin BSP server to cover all bases :)

  1. I would love to see even more first-class treatment of Scala versions and platforms in Bleep. The reality is that open source ecosystems, as well as larger and more complicated corporate code bases, have to live in a mixed world. I want to compactly declare which platforms and Scala versions my project supports, and then use overlapping sources, without having to write any custom code or scripts or excessive configuration. (If this is possible, it's not clear to me how.)

I want to move slowly here to discover what is truly needed. However, Bleep does support all the commonly used source layouts in sbt. This is not documented yet, but I'm working on it. For now consider this demo:

Kapture.2023-06-06.at.00.08.10.mp4
  1. As far as I can see, the structure of the YAML build file is not documented anywhere, so one can gain an idea about what is supported and its syntax only through reading the model in the source code. Having great documentation on what can be done in the build file and how to do it will help derisk the deicison to experiment with Bleep.

It is. It has a JSON schema here. See current level of tooling support in IntelliJ here

Kapture.2023-06-06.at.00.14.02.mp4
  1. As a Scala developer, I want to generally ignore differences between Scala compiler versions, and have the build tool manage such differences. This means, for example, I might want to turn on a warning about ignoring non-Unit values, and then have the build tool understand which versions of the Scala compiler this option is supported in.

There are some humble beginnings for this, the strict flag. This will add all options from sbt-tpolecat except fatal warnings.

Kapture.2023-06-06.at.01.19.52.mp4
  1. As a Scala developer, I want both first-class support for targeting multiple JVM versions (including with overlapping sources, similar to Scala's multi-platform support), as well as abstraction over JVM options (similar to supporting multiple Scala compilers). Some JVM options are JVM-specific. In addition I may want to tweak JVM settings for tests or for main. Solving these common problems directly in the build spec would help me stay focused on my app and not the build tool.

I don't think I want to encourage projects spanning more than one JVM, it's a fairly niche use case. I consider it a great improvement already that you can specify one JVM, so you have more stable builds across machines. Specifying flags when running code on top of that JVM is/should obviously be supported. This is tracked in oyvindberg/bleep#319

  1. Having a great batteries-included set of features is very important for bootstrapping a new build tool. I think you already support publish, which is wonderful, and I would look at snapshot publishing, code generation, and other features common in projects like ZIO. In reality, there are only 4-6 things you need to ship with in order to cover the majority of the OSS ecosystem (and a good chunk of private code bases, as well).

Absolutely. I think everything scala-cli supports (which is a lot!) should be possible to add with not too much work. Sourcegen is one of the things I invested time into, though documentation is still lacking. Here is a demo:

Kapture.2023-06-06.at.00.47.54.mp4
  1. Would you consider using TOML instead of YAML? YAML is very complex and in addition to being hostile to copy/paste, supports cyclic graphs, complex keys, tags, embedded JSON, and other features. In addition, it's easier for developer to use YAML as a serialization format for custom data structures, which makes life easier for developers, but introduces coupling between an internal model and the user-exposed model. TOML's simplicity forces a focus on the user experience (including flatter data models).

It hurts if you push that button? Then don't push it :)

I'm not dead set on this, but so far I took the pragmatic choice here. For me, I don't particularly care that YAML is a terrible format. I think it is more important that everybody knows how to write it already. And if it's good enough for GitHub and Kubernetes it's good enough for us. As long as there is a JSON schema (or similar) indentation is no longer a problem, and it's super comfortable to work with.

  1. When looking for simple architectures that are less costly to maintain and have fewer points of failure, a build tool is the ideal place to incorporate other systems that need access to build information, including an LSP server or package manager. Let's ignore the question of whether that would ever happen or who would do it, but would you potentially be open to Bleep incorporating elements of package management and LSP, if done properly and up to your standards and specifications?

Not sure if I get exactly what you have in mind. I'm open to LSP, hopefully it can provide editor assistance for editing its own format. Maybe there is a lot ground here I haven't thought about at all.

As for package management, it already downloads and installs things like JVMs, node, scalafmt, sbt. You should be able to clone a git repo, import an sbt build, compile, test, reformat and run it without anything preinstalled on your system.

@ghostbuster91
Copy link

Regarding YAML vs TOML vs HOCON vs anything else. Could the frontend part be abstracted away so that it would be possible to write the build definition using other formats that describe the data? YAML can be still the default one, but people could write their definitions in whatever data serialization language they want.

In my mind a little duplication here is benign. This is not logic which can fall out of sync, these are just coordinates which are better of completely typed out.

Yes, but they can be mistyped.

Updating versions should be done by tooling

Sure, but if there is a possibility to do it by hand then we will have to review it every time as it was done like that. Extracting common coordinates help to avoid such silly mistakes.

But I am not here to argue about YAML vs HOCON as this is a fruitless discussion. I would say that it is the choice that matters.

@oyvindberg
Copy link
Author

@nafg

Also, what about publishing both a regular JVM jar and a graal native image executable, how does that work?

It's easy, you point NativeImagePlugin to a GraalVM installation, which bleep/coursier can download and install for you. basically this works same as it already does in sbt (because it's the same code)

@oyvindberg
Copy link
Author

oyvindberg commented Jun 7, 2023

A number of OSS projects build in CI for multiple JVMs, because sometimes things can work on one JVM and not on another. You want to publish, and therefore test, on the lowest JVM you support, but IIRC there have been rare cases where testing on newer JVMs was also necessary. And, as @jdegoes said, sometimes you even want separate sources, although that is pretty rare -- usually you aren't publishing for multiple JVMs, but such projects do exist.

@oyvindberg The main use case for cross JDK support is Loom: although many past changes in the JVM have been small, this one is not small; it's gigantic, and will motivate OSS libraries and companies alike to support Loom partially, in some projects, even before they are ready to go all in. But Loom is by no means the only application here: right now, if you want to use a modern Http Client in the JVM, you have to depend on a later JDK; to avoid this, however, some libraries add a 3rd party client like Netty, which then encourages shading or other techniques to work around the downstream conflicts that inevitably result. Ultimately, if there was a simple way to publish for different JDKs, just like we have the ability to publish for different Scala versions, then libraries could reasonably support Loom, modern HTTP clients, and other features without imposing undue risk on all of their users.

I'll answer this multi-JVM in one go.

One thing that is made easy by bleep is to publish for different JVMs. You write a publish-script (the core functionality is highly likely to be integrated in core, but for now you write scripts).

In that publish script you can mount the build using different JVMs, compile them, test them and so on (at least according to the model it's simple, if you try right now you'll find the build rewrite APIs a bit limiting). And you can provide a function where you can map some subset of your projects into maven coordinates before publish. It's really extremely liberating to just write code this. Plus you can write tests for the publishing code itself of course, should you need to.

Then there is the other case, where you have one build with projects using different versions of the JVM, with cross-JVM dependencies among themselves. This would be technically possible to support (while duplicating some code from bloop), but I won't.
Possibly Loom will change my mind down the road, but for now I see a significant increase of complexity in the build tool for a (currently) extreme niche case.

Also, won't a Loom-enabled JVM support --release?

The philosophy of Bleep is to provide an extremely simple and fast tool, where if you want complex things (sourcegen, apply a build rewrite before you mount your build in an IDE) you're typically able to do it, but you will pay some performance tax for it by having to start JVMs every now and then.

A liberating thought I've had while developing Bleep is that sbt (and similar) already do exist. It's OK if Bleep supports a subset of all possible builds, if that means that those can be expressed in a simple and fast manner.

@oyvindberg
Copy link
Author

oyvindberg commented Jun 7, 2023

@ghostbuster91

Sure, but if there is a possibility to do it by hand then we will have to review it every time as it was done like that. Extracting common coordinates help to avoid such silly mistakes.

It doesn't matter. We've got to consider the cost of failure here. It can be low and it can be high. The cost of a typo in a dependency that you had duplicated is the same as a typo in a dependency that you had deduplicated. And it's incredibly low, it means your build will fail immediately (bleep will watch the build and auto-reload if you're using metals), or worst case the moment it meets CI.

The answer to this kind of cost of failure is not to introduce some complex language where you can refer to values by reference, do string interpolation and these sort of things.

I mean just try this in whichever sbt build which uses the pattern of putting dependencies in a separate scala file. Now go inline them all, and see how much clearer the diff becomes! The only reason why people do this is because updating (groups of) dependencies has been a manual maneuver in sbt all along. it's not necessary, and it's not useful.

All of this is obviously my take on it, but I hope and think people will agree - especially after trying it out for a while.

Now with that out of the way, there is another use case here. In maven the equivalent thing is called maven properties (the same mechanism where you refer to values), and you use it to deduplicate version numbers (among other things). It also backs the BOM mechanism, where people share versions across orgs. I also personally think this is a waste of time, but I think an argument in that direction is stronger.

@oyvindberg
Copy link
Author

@ghostbuster91

Could the frontend part be abstracted away so that it would be possible to write the build definition using other formats that describe the data

There has been a bunch of discussions aroung this, some of it online in oyvindberg/bleep#15 .

My current idea is that "One way to do it" should be applied. If you look up something on SO, you should get a copy/pasteable answer (modulo yaml indentation, heh)

@oyvindberg
Copy link
Author

oyvindberg commented Jun 7, 2023

@jdegoes

Essentially, a build tool can provide its own implementation of LSP, thus negating the need for a separate component (like Metals), which uses BSP to communicate with a build server.

That does sound sweet in theory. I don't personally have the resources to look into this, but would be enthustiastic if somebody else did :)

What I imagine is along the lines of bleep publish that publishes on a Scala-specific repository (like crates.io or npm), which allows anyone to publish regardless of whether or not they have a Sonatype (or other) repo. We are far from that, but I am curious if you would be open to the idea if other people did the work and it were up to high standards.

Absolutely, but there is no reason the development of such a thing would be intertwined with Bleep. Given that somebody developed it (which would be huge, because getting access to Sonatype is incredibly hard, compared to say npm) you could use the same core code to publish from any scala build tool. For bleep in particular, you would add a client implementation to such a service as a normal maven dependency, and write a publish-script which packages and uploads your jars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment