cb372/io-and-tf.md

Last active June 5, 2023 16:16

Star (14) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/cb372/b54c974d2aa29bfdf4ef19f3535c719e.js"></script>
Save cb372/b54c974d2aa29bfdf4ef19f3535c719e to your computer and use it in GitHub Desktop.

Download ZIP

IO and tagless final

Raw

io-and-tf.md

TL;DR

We should use a type parameter with a context bound (e.g. F[_]: Sync) in library code so users can choose their IO monad, but we should use a concrete IO monad in application code.

Abstracting over IO

If you're writing a library that makes use of effects, it makes sense to use the cats-effect type classes so users can choose their IO monad (IO, ZIO, Monix Task, etc).

So instead of

def myLibraryApiFunction(x: Int): IO[String] = {
  val foo = IO(???)
  // ...
}

you would write

def myLibraryApiFunction[F[_]: Sync](x: Int): F[String] = {
  val foo = Sync[F].delay(???)
  // ...
}

and a user of your library would instantiate F[_] to be their chosen concrete IO monad.

(Instead of Sync, you might use F[_]: ConcurrentEffect or whatever, depending on the capabilities you require.)

But in application code, usually you know exactly which IO monad you're using. In such situations, I would argue that abstracting over it and writing [F[_]: Sync] everywhere is just theatre. It's a false abstraction.

There's a qualitative difference between Cats type classes such as Functor and Monad, and the cats-effect type classes. You can write code using Monad without thinking about whether you are dealing with a List, an Option, an Either or one of dozens more data types with Monad instances. But as soon as you need Sync (or anything more powerful), you know your F[_] is going to be IO (or whatever IO monad implementation you've chosen for your application). So you may as well make it concrete and make your life easier.

All those calls to Sync[F].delay() introduce a lot of noise, compared with wrapping things in IO(...), and I don't believe the noise is justified by the benefits of abstraction.

(Aside: the context-applied plugin can reduce the noise a bit.)

To be clear, I'm not arguing against all abstraction in application code. Writing

def myApplicationFunction[F[_]: Monad](input: F[String])

is a reasonable thing to do, as:

it forces your code to be abstract as it can be, since it can't make any assumptions about its input other than the information provided by Monad
it acts as documentation, showing users of the function exactly what features of F the function relies on
you can test the function using a simple monad such as Id or Option, even though your production F might be IO or some complex transformer stack

Testing

Using a F[_]: Monad context bound means you are free to use a nice simple monad in your tests, making the tests easier to construct and more readable.

This is not true if you use F[_]: Sync. In that case you'll have to use IO in your tests, and call unsafeRunSync().

A lot of people seem to have an aversion to using IO in tests, but I'm not sure why.

Tagless final style

Even if you make IO concrete, it's still fine to give your dependencies a type parameter and pass them implicitly, just like you would normally:

def myApplicationFunction(x: Int)(implicit log: Logging[IO], db: Database[IO]): IO[String]

Exceptions and counter-arguments

Here are a few attempts to argue against myself.

Avoiding lock-in

"I don't want my application to be locked in to cats-effect IO, as I might want to switch to Monix or ZIO later."

I'm not sure if anyone actually thinks this, but I've never really encountered this kind of situation myself. Seems like premature abstraction.

Extracting a library

"I might want to extract part of my application into a library later."

That's a fair point, and there's probably something to be said for blurring the line between what's an application and what's a library. Maybe it makes sense to abstract over IO in the most generic, "library-like" parts of your application but use concrete IO towards the edges.

Monad transformers

"Even though I'm using IO, my production F is actually Kleisli[IO, Something, ?] (or some other transformer)"

In that case, it definitely makes sense to hide away the monad transformer ugliness by using an abstract F[_]: Sync.

For Kleisli in particular, it's worth pointing out that you can often achieve the same thing using a Ref, avoiding the need for a transformer.

Feedback

I'm not sure if any of the above is controversial, or I'm preaching to the choir. Please let me know your thoughts in the comments or on Twitter.

I'm also not entirely convinced of my own argument, as arguing against abstraction doesn't sound like me at all.

I'd be interested to hear more arguments for why using an abstract F[_] everywhere is a good idea.

przemek-pokrywka commented Feb 21, 2020

First of all thanks for a great piece. I share your sentiment against spending effort on things that will unlikely pay off. Most of the time one doesn't switch between various implementations of IO. Extraction of a library is more common but not so frequent either. Given that using pair of [F[_]: Sync] and Sync[F].delay(...) everywhere is more painful than simply going for IO(...) one can try to minimize the total amount of pain expressed as follows (P for probability, 0.0 to 1.0 range, c for cost/pain in arbitrary units):

P(actually_needing_Sync) * (c(IO_to_Sync_rewrite) + c(IO_lib_version_bumps)) + (1 - P(actually_needing_Sync)) * c(using_Sync)

where

P(actually_needing_Sync) = max(P(library_extraction), P(IO_switch))

The c(IO_lib_version_bumps) grows with the amount of IO-specific code that gets written and with probability of version conflicts in a larger, multi-module/multi-library project. It's something that you haven't mentioned, but I think it need to be taken in consideration in bigger projects. In theory at least, depending just on abstract typeclasses should reduce the number of necessary library version bumps and thus also reduce the number of version conflicts (maybe not in practice yet but who knows how the library evolution proceeds).

The c(IO_to_Sync_rewrite) is a one-time cost, but again it depends on the amount of IO-specific code. I have some experience in maintaining of a codebase that relies heavily on Futures, without any sort of abstraction on top of it. The number of places that rely on some Future-specific features got really big over the years and the amount of work to replace it with some abstraction, or even with IO/ZIO would consume some serious effort. I think that following your recommendation it wouldn't be that bad, because one would ideally use some less powerful constraints, like Monad or Applicative most of the time, but one needs to keep track of the places that use more IO features that exposed by a (Sync) typeclass.

Summing up, I think it's up to the expected cost (P * c) of each option:

the less probable a monad switch or library extraction is
the more confident you feel about your ability to rewrite IO to Sync if you need to
the less IO library version bumps you expect
the more Sync annoys you on the daily basis

...the less you need to be worried about going for IO directly I believe.

On a final note, it just occurred to me, that the "concrete IO vs abstract Sync" is actually some specific example of a more general "technical debt vs clean code" decision, I find it interesting!

arosien commented Feb 21, 2020 •

edited

Loading

@zmccoy

Can you elaborate on achieving the same thing using a Ref while avoiding using Kleisli?

Kleisli[F, E, A] gives you read-only access to a value of type E. You could, instead, have a Ref[F, E] in scope, which also provides a value of type E, and your computations could live in the simpler F effect.

pvillega commented Feb 21, 2020

On mobile so apologies for formatting. I have a comment on this, which may be counter intuitive but it is based on (probably statistically insignificant) experience: using IO everywhere seems to be worse for beginners. Please let me elaborate

I noticed that when allowing IO, people without much experience on working this way end up making bad coding choices. Like calling unsafeRunSync in many places.

Limiting to F seems to have a double benefit: the syntax is the same as with Monad/Applicative/etc (and I have not found the syntax to be an issue despite fixation on the community about it). And the limitation to what you can do helps guiding them to better understand the different types and needs without doing quick unsafeRunSync to just make it compile.

Granted, Sync has issues. And if you are at a ConcurrentEffect level there is no real difference. But I think it has a place to be for that purpose, while they figure out core concepts. Once they have enough experience, as with everything, let them make informed calls and use IO wherever they feel it fits😃

Another small reason: If you just show them Only IO, the unfortunate reality is that a big number of devs will settle with that and will not push for a deeper understanding. And that creates a bigger wall later on on why some things are a certain way.

Also, as a side note, we totally use IO in tests. Purity is nice. Pragmatism should rule decisions when you are coding for a business. As a hobby, go wild 😆

kailuowang commented Feb 21, 2020 •

edited

Loading

True, writing F.delay(x) isn't that much different from writing IO(x). but why should you be writing either everywhere in your application? I would argue that you shouldn't.
I would divide code in most applications into three main categories:

business logic
integration with external systems (external to the application), e.g. DB, Logging, etc.
For lack of a better term, computation management (e.g. parallelization, distribution)

Computation management is a beast of its own. It involves a lot more and not many applications deal with it in their own code, so we'll ignore it for now.

The F.delay is usually in the integrations code when the libraries through which they integrate are not pure. In fact, if you are lucky, all your integration libraries are pure, then you don't have call F.delay in your codebase at all.

When that's not the case, I would argue that one should always have a clear boundary in their applications between business logic and integration code. Having business logic tangled with 3rd party integration logic makes, well, everything tangled. You should create an algebra for your integration code. An algebra for your DB access, an algebra for your Logging, etc. There are many benefits in such isolation, but in short, it's just a logical and conventional way to modularize your code.

Then In your business logic modules, there should be no need for Sync. Usually one can keep the context bound for F[_] in such modules as simple as MonadError, Monad, Applicative or even Functor, etc. The principle is of course to use the least power you need.

When you write integration modules, one could certainly argue that using concrete IO directly is more convenient. But, wait a minute, now you're writing a generalized pure FP module providing integration with an external system, you are writing a, yes, library! Shouldn't you consider using F[_] instead?

SystemFw commented Feb 21, 2020 •

edited

Loading

def myApplicationFunction(x: Int)(implicit log: Logging[IO], db: Database[IO]): IO[String]

Maybe I'm repeating the same points Kai is making, but basically I see the application as a series of layers, so you have your Logging[F], db: Database[F] layers, but crucially myApplicationFunction is also part of a layer (for example User[F]), and when you start doing that, specialising to IO happens:

when loading config
in main

Moreover, I'd like to switch perspectives. For me it's not primarily about switching types (rarely happens), or transformers (useful but not crucial), or different testing types (just use IO). It's about building and composing languages that make your problem easy to express. Even proponents of concrete types like John also advise building domain specific languages that express key parts of your domain, the difference is just in approach: some people prefer these languages to be done in terms of data types, others in terms of parameterised functions (and the latter is tagless final).

Can you build similar abstractions by passing records of functions that return IO (like ZIO environment)? You can, but (imho) it's way harder to stick to the "little languages" mindset when you always have the full IO there.

So, sure, if you all of your code is made of Sync[F].delay, you can just use IO, but the point of using tagless final is moving to a point where all of your code is not a bunch of delay, but an onion of User[F], Logging[F], Db[F], Caching[F] languages. At the end of the onion you find:

libraries or library-like things (e.g. integration with DynamoDb), and for those is good to be in F
main/config, which is are ok in IO

and so at the end of the day, you still only see IO at the very top level.

Also, I feel that often the issues of "abstracting over F" and "passing things implicitly" get conflated: if you don't like passing things implicitly or are not sure, make more classes and pass Abstraction[F] explicitly (more in ML modules style than in Haskell style).

Finally, I'd like to add a word about Stream, and how for me it's fine for that to be concrete, and how it fits in the "little languages" picture. Stream is reifying your control mechanism, so you have concrete control (Stream) of abstract actions (User[F])

Author

cb372 commented Feb 22, 2020

Thank you for the insightful responses everybody!