Reimplementing a NodeJS Service in Haskell

Introduction

At DICOM Grid, we recently made the decision to use Haskell for some of our newer projects, mostly small, independent web services. This isn't the first time I've had the opportunity to use Haskell at work - I had previously used Haskell to write tools to automate some processes like generation of documentation for TypeScript code - but this is the first time we will be deploying Haskell code into production.

Over the past few months, I have been working on two Haskell services:

A reimplementation of an existing socket.io service, previously written for NodeJS using TypeScript.
A new service, which would interact with third-party components using standard data formats from the medical industry.

I will write here mostly about the first project, since it is a self-contained project which provides a good example of the power of Haskell. Moreover, the process of converting from TypeScript to Haskell was interesting in its own right. However, there are some general lessons which I have learned over the course of both projects, which I would also like to write about.

The Project

The original socket.io service had simple requirements:

Receive requests from the browser to subscribe to one or more topics.
Check that the user has the correct permissions to access those topics.
Listen for events on that topic from a Redis PubSub queue.
Send messages to the client as they arrive.

The original version was implementated in a single .ts file:

Interfaces were used to define messages which would be sent to/from the client.
Enum types were used to enumerate the possible message types
The application was written in callback-passing style - actions like checking permissions and subscribing to channels all involved composing callbacks.
To avoid making one Redis connection per client, it was necessary to use psubscribe and to manage the relationship between topics and client connections in a ConnectionManager class.

The service worked well most of the time, but would fail intermittently for an unknown reason. It is important to note that I am not an expert when it comes to architecting or deploying Node services, so it is quite likely that the issue was due to my own inexperience. However, I am more interested in adding features than I am in debugging heisenbugs, and given that we had already made the decision to test out Haskell, I had a good candidate for a service to reimplement.

Getting Support for Haskell

Getting support from other developers on the team turned out not to be very difficult. We work in a variety of modern languages already, so members of the team tend to be curious when there is a new tool available. Also, we generally tend to work with one person taking the lead on any given project, with other team members helping out where appropriate, so it was not difficult to start a new project. Two other developers were interested in working on the project, and knew enough Haskell to work on specific subproblems while I fleshed out the main architecture (I am also interested in the approach of using domain-specific languages to define the separation of responsibility between more and less experienced developers).

My first Haskell project was a tool which I needed for my own work - a documentation generator - not a user facing feature, but definitely visible internally. Also, I had already made something of a case for strong types and a functional approach by rewriting a moderately-sized JavaScript client component in functional TypeScript and PureScript, so the team was already aware of some of the possible benefits. On the development team, we work remotely, but meet at least a couple of times a year to share ideas and discuss important topics. At the last developer meeting, I had the opportunity to give a presentation about Haskell, which was well-received and generated enough interest that we decided to reimplement the existing socket.io service in Haskell.

I don't know to what extent we will end up using Haskell at DICOM Grid - we believe in using the right tool for the right job, and for each case that may or may not be Haskell (of course, I have certain biases in this area). However, I think that Haskell has an excellent place in a "microservices" architecture, replacing individual service components where appropriate.

General Notes

The first thing which becomes immediately obvious, when reimplementing a project in Haskell, is the massive benefit of using a language with an expressive type system. Even seemingly simple features like sum types, or the ability to newtype strings, provide huge gains.
The TypeScript implementation of the socket.io server used a product-as-sum encoding, which resulted in poor error messages when a client sent an incorrect request. One of the first things I decided to do during the Haskell port was to restructure the type of client requests to use a simple sum type. This gave the advantage that I was able to give good error messages when parsing requests, but more importantly, my data represented the domain more closely (to borrow a phrase, we want to make illegal states unrepresentable).
Strong types also became useful when I needed to perform IO (read from Redis, write to a socket, log some event, etc.) The IO monad forced me to factor my code into side-effecting and pure components (preferring the latter as much as possible), which led to a more understandable code base overall. Also, it was no longer necessary to write my code in a callback-passing style, since Haskell's IO manager uses epoll under the hood.
One of my favorite new examples of the benefits of strong typing is the STM monad. In particular, I was able to use STM along with transactional channels to communicate between my Redis code and the socket.io code. In the end, this meant that I didn't even need to implement the equivalent of the old ConnectionManager class, because transactional channels provided the same functionality! I simply create a new broadcast channel with newBroadcastTChan, and then duplicate the channel for each connected client using dupTChan.
The service has been running successfully without interruption for a week in our UAT environment, and will be deployed to production soon.

On "Real World" Haskell

One of the things I found most interesting about the project was the distinction between writing the toy Haskell projects I had worked on in the past, and a "real world" Haskell project, involving a significant amount of IO. Even something relatively complicated like PureScript is essentially one large pure function with a command line user interface on top.

Until now, the only Haskell project I had worked on which interacted with the world in any real way was the tablestorage library for working with the Windows Azure Table Storage API. Certainly, the task of applying knowledge from purely functional programming to the world of IO is a challenge in itself, but building a Haskell project for production use comes with its own set of unique challenges:

How to handle real-world data?
How to handle failure gracefully?
How to provide insights into the behavior of your service at runtime?
How to design the code for consumption by other developers?
How to deploy to a production environment?

Despite these challenges, I can report that I feel much more confident in my ability to learn new Haskell libraries than in any other language. Over the course of these two projects, I have used more than 20 libraries for the first time. I put this improvement down to the expressiveness of the type language, and the ability to "follow the types" in order to learn a new set of functions. Certainly, there is a steep learning curve, but I find the benefits quickly outweigh the effort required.

Library Support

Generally, I have found library support in Haskell to be excellent. There have been a few cases where I have found existing solutions lacking in some minor way, so one always has to be ready to roll up the sleeves and submit a pull request where necessary, but my impression has been that for most every-day programming tasks, there is some library on Hackage which solves the problem elegantly.

This was probably best illustrated by the fact that I was presented with a choice of not one, but two implementions of the socket.io protocol on Hackage. In the end, I decided to use Oliver Charles' excellent socket-io library, which I was able to use out of the box.

I'll say a little bit about each of the libraries which I have come to regard as indispensable for real-world Haskell programming:

Diagnostics

These two libraries are very useful for getting insight into the behavior of a running service:

hslogger is a logging library with a simple API and multiple backends. It is possible to filter out low priority log messages if the service is healthy, or log everything if you are trying to diagnose an issue.
ekg provides a web server which serves remote monitoring data over HTTP. It is also possible to define custom counters and gauges which can be displayed in the web UI.

External Services

These libraries provide APIs for integrating with external services:

hedis provides a simple API for communicating with a Redis database.
amqp provides a simple API for reading and writing messages to/from a queue implementation supporting the AMQP protocol, such as RabbitMQ.

Other libraries such as postgresql-simple fall into this category, but I have not had a chance to use them yet.

HTTP Clients

I tried out other HTTP client libraries before deciding to use http-streams. I needed a combination of features, including support for SSL connections, chunked request and response bodies and multipart requests. I decided to use http-streams because it has a very simple, intuitive API, and because it was the only library available which supported my exact use case out of the box. That said, I also had a very pleasant experience with http-client and http-conduit.

Data Formats

Over the course of the two projects, I had plenty of data formats to deal with, both standard and custom, binary and text. These libraries provide the means to consume and produce data in a variety of data formats. There are other options available, but I found them to be a very good fit for my use cases:

I typically use parsec to parse structured textual data, including document templates and configuration files.
binary is a library for efficient binary serialization. I use it to define serialization and deserialization code for custom binary file formats.
xml is a library for working with XML. I use it in conjunction with a modified version of the text-xml-qq library for lightweight templating of large XML documents.
aeson provides the ability to work with JSON documents. It is fast and flexible with a simple API.

Testing

test-framework provides a uniform interface to several types of tests, such as HUnit test cases and QuickCheck properties, allowing them to be grouped into test groups. In addition, it provides a clean command-line user interface for running tests, which can be run using cabal test.

Web Frameworks

scotty is a web framework written in Haskell, which can be used with WAI. While it isn't necessarily the most powerful web framework available (see also Yesod, Snap, Happstack), it provides a very straightforward API for defining RESTful services, which allowed me to quickly get to productivity.

Template Haskell

Template Haskell seems to be one of those tools which is best used sparingly. When overused, it can result in a situation where some of the benefits of Haskell that I have mentioned (namely programming by "following the types") become less useful, since we end up programming in a custom non-Haskell domain-specific language. However, when used well, I think it can be a powerful tool.

During these projects, I found what I thought was a particularly neat application of Template Haskell: using the text-xml-qq library as a lightweight XML templating library to generate large XML documents. Correctly applied, TH can be a great tool for reducing boilerplate code while maintaining type safety, and therefore improving productivity.

Deploying Haskell

One of the most interesting hurdles during these projects was the problem of deploying Haskell code into our production environments. I would be interested to hear any ideas in this area.

My current approach is to simply build a statically-linked binary in a Cabal sandbox, and then to deploy that binary to our servers. However, this approach has some problems:

Statically-linked binaries are large and take time to transfer, making the testing cycle inefficient.
Different operating systems and versions require recompilation.

Continuous integration will probably solve these issues, but I would also be interested to try out different approaches such as Docker or NixOS. Again, suggestions are welcome.

Conclusion

Using Haskell for real-world work has, for the most part, been a thoroughly enjoyable experience. I would recommend trying to replace a small, independent, non-critical service with a Haskell implementation. If you do not currently enjoy a service-based architecture, maybe try using Haskell to implement some of your tools, or to automate some process which you perform regularly as a part of your work. If nothing else, I have found Haskell to be a great way to test new ideas in my projects.

paf31/node-haskell.md

Introduction

The Project

Getting Support for Haskell

General Notes

On "Real World" Haskell

Library Support

Diagnostics

External Services

HTTP Clients

Data Formats

Testing

Web Frameworks

Template Haskell

Deploying Haskell

Conclusion

carlpaten-ivadolabs commented Oct 15, 2014

Uh oh!

silky commented Oct 15, 2014

Uh oh!

dmjio commented Oct 15, 2014

Uh oh!

purcell commented Oct 15, 2014

Uh oh!

laser commented Oct 16, 2014

Uh oh!

inf0rmer commented Oct 17, 2014

Uh oh!