Native Haskell Type Encoding for LiquidHaskell

Abstract

LiquidHaskell is a correctness-checking system that brings refinement types to Haskell. The current implementation parses type signatures and other specifications from special comments in Haskell source code. The majority of user issues - crashes and unclear error messages - trace back to this portion of the codebase, hampering widespread adoption. I propose to replace this with a system that encodes refinement types in Haskell's type system, achieving code reuse and deeper integration with GHC.

Questions

What is the goal of the project you propose to do?

Background: LiquidHaskell

LiquidHaskell is an existing project that brings refinement types to Haskell. A developer can specify preconditions and postconditions in the form of type signatures. These signatures are translated to constraints that can be verified in an automated fashion by an SMT solver.

For example, LiquidHaskell will alert the user that the implementations of the following functions do not meet their specifications:

{-@ addOne :: x:Int -> {v:Int | v == x + 1 } @-}
addOne :: Int -> Int
addOne n = n + 2

{-@ add :: x:Int -> y:Int -> {v:Int | v == x + y} @-}
add :: Int -> Int -> Int
add a b = a - b

More information on LiquidHaskell can be found in the

Proposal: Native Haskell Encoding for Refinement Types

Currently, LiquidHaskell specifications are written within special block comments ({-@ @-}), similar to GHC pragmas. These are parsed out by the verifier and combined with information extracted from the GHC API.

I propose to replace this implementation with an encoding of LiquidHaskell types in Haskell’s native type system, using identity type synonyms under the hood. For example,

{-@ foo :: x:Int -> {v:Int | v > x } @-}
foo :: Int -> Int
foo x + x + 1

would translate to

foo :: x ::: Int -> (v ::: Int || v > x)
foo x = x + 1

(The exact operators used—eg. ::: and ||—are not yet set in stone.)

At a high level, one long-term goal of the LiquidHaskell project is closer integration with GHC, to expand the deployment of refinement types / dependent typing in a “real-world” language. The proposed work is a solid step toward this goal, moving from ad hoc parsing of specifications into deeper integration with Haskell. It would also establish infrastructure for the integration of future extensions to Haskell (eg. Dependent Haskell-style binders) with LiquidHaskell.

On a practical level, this work will have immediate short-term benefits. The new system will eliminate major pain points in the way of LiquidHaskell adoption:

Currently, LiquidHaskell must reimplement parsing, name resolution, substitution, unification, and so on for its own type representation, duplicating functionality already present in GHC. This constitutes a large and complex portion of the current codebase
Bugs and edge-cases in this reimplementation lead to odd crashes and cryptic error messages, the primary source of frustration for a new LiquidHaskell user.
Several types of names in LiquidHaskell currently lack namespacing (eg. refinement type synonyms), as they live on a different "level" from the rest of the Haskell code. This can lead to interoperability issues between codebases.
For the same reason, LiquidHaskell signatures are currently invisible to existing tools (GHCi, Haddock, etc).
In cases where inferred types are more polymorphic than the accompanying LiquidHaskell signatures, a signature might have to be specified twice: once for GHC with a more specific type, and then a second time for the LiquidHaskell specification. Additional errors can result when the Haskell and LiquidHaskell types are out of sync.

By addressing these issues, I hope to enable greater adoption of LiquidHaskell in the wider Haskell community.

Can you give some more detailed design of what precisely you intend to achieve?

For reference, here is a minimal prototype of the encoding:

{-# LANGUAGE TypeOperators #-}

module Foo where

import GHC.TypeLits

-- | User-level code

-- original signature
{-@ foo :: x:Int -> {v:Int | v > x } @-}

-- translation
foo :: x ::: Int -> (v ::: Int || v > x)
foo x = x + 1

-- original signature
{-@ bar :: xs:[a] -> {v:Int | v == length xs} @-}

-- translation
bar :: xs ::: [a] -> (v ::: Int || v == length xs)
bar xs = length xs

-- | LiquidHaskell internal definitions (wrapped in an importable module)

type x ::: t = t
type t || p = t
type x > y = Bool
type x == y = Bool

infixr 7 :::
infixr 6 ||

Operators and other constructions in LiquidHaskell type signatures will be implemented as type synonym operators that are "erased" at the Haskell level.

For example, xs:[a], which "binds" the name xs to a value of type [a], translates to xs ::: [a]. In Haskell, given the definition of the type synonym :::, this is equivalent to [a]. GHC's type representation retains this type synonym information, which can be extracted through the GHC API and processed by LiquidHaskell into its internal refined type representation. Before this processing stage, GHC code for parsing, name resolution, alias expansion, error reporting, and so forth can be reused. This will significantly reduce complexity in the LiquidHaskell codebase

In the above prototype, length is a LiquidHaskell "measure" (logic-level function) corresponding to the length of an input list. In the translation to the native Haskell type encoding, length appears in signatures as an arbitrary type variable. After my proposed work, LiquidHaskell will identify it as a reference to a measure and look up its name in the scope of the module being processed, mapping it to the corresponding measure function.

LiquidHaskell includes support for "predicate" and "expression" aliases. A "predicate" is a boolean test appearing after | in a LiquidHaskell type, such as v == x + 1. Predicates contain expressions, such as x + 1 or v. Currently, aliases can be defined for each of these, such as:

{-@ expression Plus X Y = X + Y @-}
{-@ predicate And X Y = X && Y @-}

Under the native Haskell encoding, these can be translated as normal type synonyms, unifying the two concepts:

type Plus x y = x + y
type And x y = x && y

Refinement type synonyms will be translated similarly. In their native Haskell form, these will all gain proper namespacing through the module system (currently, all LiquidHaskell synonyms/aliases are global). Other features of the LiquidHaskell type system will be translated in a similar fashion.

This can all be implemented in stages:

First, I will put in place general machinery for traversing through Haskell type signatures and translating them to LiquidHaskell’s internal representation. This translated representation will be stored either in GHC's built-in annotation system or in an internal lookup table mechanism, depending on which is best suited for the task. (I will prefer to reuse existing GHC mechanisms when possible.)
Translation of basic type system features, such as binders (xs:[a]) and predicates on types ({y:Int | x > y}), will then be implemented.
Translation will be applied first to function signatures, then to type synonyms, and finally to data declarations (including records, etc).
After establishing base test cases, I will begin replacing LiquidHaskell implementations of unification, etc, with the corresponding GHC mechanisms, and will connect the translated types into LiquidHaskell's main pipeline.
I will then translate LiquidHaskell's Prelude specifications into native Haskell modules and verify that this works correctly, implementing translation of additional type system features as necessary.
Finally, I will port over the LiquidHaskell test suite, implementing translation of the remaining type system features.
Additional work would include the updating of existing documentation to reflect the new type encoding.

What deliverables do you think are reasonable targets? Can you outline an approximate schedule of milestones?

April 28-May 25: During the “community bonding period”, I will be working with others on the LiquidHaskell project to nail down the details of the new type encoding. This will result in a concrete specification that I can implement in the proposed work. Because I’m already involved in the LiquidHaskell project and up to speed with the codebase, I expect to be able to start early on my work.
May 25-June 26: First coding period, in which I will develop a partial implementation of the new encoding, covering the core elements of LiquidHaskell’s type system (as outlined above). By the end I hope to have adapted at least part of LiquidHaskell’s Prelude to the new system.
June 27-July 27: I will continue to integrate the new system into LiquidHaskell, migrating more of Prelude and portions of the existing test suite (and adding additional test cases). I will implement more advanced features of the type system (abstract refinements, bounds, termination) as I progress in this.
July 28-August 16: During this period I will adapt at least one benchmark (bytestring) to the new encoding. I will also wrap up any type system features that I have not translated by this point, and will continue to port over remaining test cases.
August 17-August 21: “Pencils down”/wrap-up period, during which I will clean up code, strip out remaining elements of the old system, prepare patches that have not yet been mainlined, and work on updating documentation to reflect the new type encoding.

In what ways will this project benefit the wider Haskell community?

I see LiquidHaskell as a key component of the future Haskell ecosystem. It greatly expands the range of properties that can be checked at the type level, while maintaining conciseness and automation necessary for developer ease-of-use (and thus, uptake). LiquidHaskell can be used to check "unsafe" operations such as pointer arithmetic, as demonstrated in benchmarks on the text and bytestring libraries.

Interest in the tool is definitely there. Allusions to it appear regularly in communities such as reddit's /r/haskell, as a potential future solution to various problems that the Haskell community currently faces. At the same time, it has yet to gain traction in "real-world" Haskell projects.

I believe that one of the biggest barriers to adoption is the current comment-parsing implementation of LiquidHaskell specifications/signatures. Crashes and confusing error messages are the two most frequent user complaints, and the great majority of these trace back to issues in this subsystem. By replacing these mechanisms with proven implementations from GHC, LiquidHaskell can take a big step forward toward broader adoption in the community.

Besides the problem of errors, this work would resolve many other pain points of using LiquidHaskell, as enumerated above. This includes the lack of namespacing for LiquidHaskell synonyms/aliases, duplicate signatures to account for type inference, lack of support in existing tools, and so on. Addressing these issues will make LiquidHaskell more valuable to the wider Haskell community.

What relevant experience do you have? e.g. Have you coded anything in Haskell? Have you contributed to any other open source software? Been studying advanced courses in a related topic?

I have been working with Haskell for 3-4 years, and have been programming in general for the past 13 years. I am currently a first year CS student at the University of California, San Diego, soon to be entering my third quarter.

Since October 2014, I have been directly involved with the LiquidHaskell project and the team behind it at UCSD. My current area of work is in moving LiquidHaskell from processing a module and all of its dependencies at once each run, to processing each module individually and saving its work between modules and runs. My goal with this work is enabling integration with Cabal; along the way, I have introduced performance improvements and better name resolution, and have opened the door to parallel verification. As with the work proposed here, this is targeted toward improving LiquidHaskell's usability within the larger Haskell community.

Through this existing work, I have developed a greater familiarity with the internals of the GHC codebase, and an in-depth understanding of the LiquidHaskell codebase—especially in the main subsystem that the proposed work will touch.

In what ways do you envisage interacting with the wider Haskell community during your project? e.g. How would you seek help on something your mentor wasn't able to deal with? How will you get others interested in what you are doing?

Firstly, I am already active on the /r/haskell forum on reddit, and will continue to engage with this community over the course of the summer. I have a handle on Freenode that I have not been active on recently, but I will make myself available over IRC and email while doing this work. I also have a blog (https://spindas.dreamwidth.org/), where I can share a public roadmap, progress updates, code samples, and so on.

Since this is a significant shift in how LiquidHaskell code is written, I will be working closely with the LiquidHaskell team to coordinate these changes. At the same time, I wish to solicit feedback from the rest of the Haskell community, as the ultimate goal is to spread the usage of LiquidHaskell within it. This could be done via reddit, mailing lists, and possibly through the existing LiquidHaskell blog—articles posted there are regularly distributed through various aggregators, etc.

If I needed help on something that my mentor could not assist me with, I would first reach out to other LiquidHaskell contributors, and then to the Haskell communities on IRC or reddit.

Why do you think you would be the best person to tackle this project?

As previously mentioned, I have been and am currently working directly on the LiquidHaskell project, in another area aimed at improving usability in "real-world" projects. I have significant experience with the codebase and support from the LiquidHaskell team at UCSD. Furthermore, I have already prototyped/experimented with parts of this work on my own time, so I have a good understand of what it will take to get this up and running.

Contact information

Michael Smith

Email: (omitted)

Phone: (omitted)

GitHub: https://github.com/spinda

spinda/gsoc_2015_proposal.md

Select an option

No results found