After looking at Canonicalize.Sort, here are my initial thoughts:

Canonicalize.Sort seems like the right place to make the reference-graph used by Dead-Code elimination
If we stored the reference-graph information in Canonical.Def, we could use that information in a lot of analyses later on (like variable assignment in TCO).
There's enough information in the current graph to do dead-code-elimination, but not enough to do unused-variable warnings that would also work for arguments and pattern matches.

I don't think the changes we would need to make would be that significant. Right now, we only look at what free variables a Definition references. If we also look at what bound variables it references (i.e. variables defined inside the reference by Lambdas or pattern matches), then we could issue the proper warnings.

This would, I think, just amount to storing a record with referenced free and bound variables, as well as all defined variables, in the State of reorder.

The tricky part is keeping an environment mapping Definitions to the Region they are defined, so that we can actually issue a warning, as well as tell the difference between same-named variables. This isn't hard, but is probably the change most likely to be disruptive.

Dealing with infinite recursive values

There are two main approaches I can think of to solve this:

We make a more complicated reference graph in Canonicalize.Sort, where we keep track of whether a reference occurs across the border of a lambda. We can then remove all the edges across a lambda, and see if any variable can reach itself in 1 or more steps.

This option is more disruptive to Canonicalize.Sort. If we were doing this, I think we'd want to make a separate module for creating the reference graph, and then Sort whould just deal with this a (simplified) version of this graph instead of building its own.

During Canonicalization, we keep two environments for varibles, the normal env, and a "waiting" env. When a value is defined on the LHS of a Definition, it is only added to the "waiting" environment for that definition.

Then, during the traversal, every time we hit a Lambda, we move all variables from the "waiting" environment to the normal environment.

This would result in illegal recursive values being a sort of "Variable not found" error, though we could issue a special error message by checking of the not-found variable is in waiting.

This one wouldn't affect Canonicalize.Sort at all, but would require changes to Canonicalize.Env, and the traversal functions in Canonicalize.

Option 2 seems more elegant to me.

EDIT: Environments aren't enough for Recursion

Option 2 described above won't work.

Consider List.intersperse:

intersperse : a -> List a -> List a
intersperse sep xs =
    case xs of
      [] -> []
      hd :: tl ->
          let step x rest = sep :: x :: rest
              spersed = foldr step [] tl
          in
              hd :: spersed

With environments, we have two choices:

Don't add step or spersed to the env for the RHS of defs. This causes the above (valid) code to not compile.
For each definition, only the variables in that definition are removed from its environment.

This allows the above code to compile, but also allows the following flawed code to compile:

          let step x rest = sep :: x :: spersed
              spersed = foldr step [] tl
          in
          ...

Both of these seem like bad ideas, which makes me thing we should just do the proper graph algorithm with references (AKA option 1 above).

JoeyEremondi/Sort-and-DCE.md

Dealing with infinite recursive values

EDIT: Environments aren't enough for Recursion