Skip to content

Instantly share code, notes, and snippets.

@armw4
Last active May 10, 2016 06:16
Show Gist options
  • Save armw4/c8861a0cd00c00a1717b33b3e2c136ce to your computer and use it in GitHub Desktop.
Save armw4/c8861a0cd00c00a1717b33b3e2c136ce to your computer and use it in GitHub Desktop.
Streaming http resources with node via request

Streaming Resources in node With request From Source to Destination

The Beginning

If you've ever worked with http in node what you quickly come to realize is that it's not all that straightforward. You'll be bold, at least initially. Like me, you'll go read up on the docs for the http module. You'll properly subscribe to all the right events like on('data), and feel good about yourself. But if you're anything like be, you're bound do this wrong at some point. You probably won't subscribe to all the right things in the right order. You might not properly buffer the response. You may just have a WTF moment in production on day that brings down your entire site (just like I did about 2 years ago). Whatever the case may be, I believe you'll ultimately just settle on using request as your go to for http. If you want a promise based variant then you might go with some wrapper library as well.

So, we know request is the hip way of doing HTTP in node. request is kool. It's got a ton of usage and it pretty much just works in my experience. One thing about request, and node in general, is that streams are first class citizens. And as we all know, streams are good, cuz you can pipe em. Any request issued by way of request, can be used as a stream. A common use case with streams is to move data from some source A, to some destination B. With request you can do just that.

NOTE: Most all of what I'm about to cover is inside the request docs. The issue for me was putting it all together in a sane way. It just wasn't clear to me how to put it all together and achieve a win. You'll probably skim over some minute detail, or something in the docs might straight up just not register with your brain and go in one neural ear and out the other.

Here's a very simple example of piping one request to another extracted verbatim from the request docs:

request('http://google.com/doodle.png').pipe(fs.createWriteStream('doodle.png'))

Simple enough. Take this doogle thing from off the internet and save it as a doodle thing on my local hard drive. Way too easy, right?

The example above probably won't be your typical use case, though. You'll probably want to pipe data from request A, to some api endpoint B on a completely different server. Let's see what that looks like:

request.get('http://google.com/img.png').pipe(request.put('http://mysite.com/img.png'))

A Note on the Content-Type and Content-Length headers

Ok kool. We're getting somewhere. Now (regarding the example above), if you read the request docs, you'll see a small note in fine print (it's really a rather important caveat).

Request can also pipe to itself. When doing so, content-type and content-length are preserved in the PUT headers.

Why is this important? Say you want to store metadata about some remote file. It's size and content type. Great, just let request derive it for you, right? Well, I tried this, and it appears request derive the content type of the file based on the presense of the file name in the target url. So 'http://google.com/img.png' would map to something like image/png. But what if you're like me? What if the source server servers the file via something like: http://my.domain.com/attachements/some-long-hash/0? request will actually set the content type to something really generic that has absolutely 0 meaning in the scheme of things. But I can just override the content type, header, right? Nah, that didn't work for me either. request just said "nope". Lets take a look at content-length now. cont nt-length is also derived for you when piping requests. content-length however is one of those things that can quickly get you into trouble if not careful. It can lead to socket hangups like I discovered in this thread around the aws-sdk for node here. I left it to request to derive my content-length as it promised. We took this very same header on the destination server, and included it in our payload to and s3 upload. Socket hangups, boom! I'm guessing that the content-length had just enough of a descrepancy (maybe just one byte) that it caused amazon to reject our request. So how'd we get around the issues with content-length and content-type? Well...we just passed them on the query string to the destination server B. Because request would not let me override at least the content-type header (I had no need to bother content-length). We ended up not passing the length of the request to the s3 sdk since it'll just read a stream to the end on it's own if no length is specified. The type of the file and length of the file/request were utlimately saved to the database as metadata along with the url of the newly uploaded file.

Handling Source Streaming Errors

It's easy to take the cheese and naively paste request.get('http://google.com/img.png').pipe(request.put('http://mysite.com/img.png')) into your code and ship it off to production. However, what happens if a streaming related error occurs? The request docs give you a hint at what to do, but are you going to pay attention and actually apply what's instructed? I, like you, read this but for some reason it didn't initially register:

To easily handle errors when streaming requests, listen to the error event before piping:

The key wording here being before piping. One thing I've learned in dealing with an unit testing streams is that you SHOULD ALWAYS SUBSCRIBE TO ERROR EVENTS BEFORE PIPING:

request
  .get('http://mysite.com/doodle.png')
  .on('error', function(err) {
    console.log(err)
  })
  .pipe(fs.createWriteStream('doodle.png'))

Ok great. So that covers streaming based errors. I'm telling you explicitly here that these matter so don't forget to include the relevant event subscription in your code.

Handling Source Response Errors

One thing that wasn't immediately obvious to me, but that is also covered in the request docs, is that streaming based errors and response based errors are different. I remember reading the docs and the snippet of code, but for some reason it just did not register. I think the issue for me is that some of the streaming docs stuff read as "helpful hints" or "tips" rather than something you should absolutely do. And yes, you should absolutely handle the response event when streaming. From the docs:

request
  .get('http://google.com/img.png')
  .on('response', function(response) {
    console.log(response.statusCode) // 200
    console.log(response.headers['content-type']) // 'image/png'
  })
  .pipe(request.put('http://mysite.com/img.png'))

Request emits a "response" event when a response is received. The response argument will be an instance of http.IncomingMessage.

The response event is crucial here. If you don't handle it, and the request fails, you're going to sucessfully stream a failed response to your destination endpoint. A failed response is not the same as a streaming error. Streaming errors come from .on('error') and potential response errors must be handled via .on('response'). So if no streaming errors occur, but your response fails with say, a 401 client error, and you don't handle the response event, the request will succeeed (even though it should be rejected). You will literally stream an image or attachment that contains is composed of the textual contents "401 Unauthorized", and as we all know, that ain't gonna open as an image in anybody's browser. This is not the fault of request (although in the docs it comes off as optional). This is a user error. I event went so far as to issue a probe (HEAD) request because I noticed I had the potential of streaming a bad response. I thought adding an extra layer of insurance via probing would save the day, but I also noticed the ensuing request could also fail and that yet still I'd be in the same predicament. So I thought to myself, "some way there must be". I went right back to the docs and it finally hit me that I needed to subscribe to the response event. I was able to simplify my code a good bit after this by removing an additional request and promise handling.

Tying it all Together

I feel like the docs still don't really show you the whole truth. We know how to handle streaming errors from the source, we know how to handle the response from the source, but how do we handle the response from the destination endpoint? The docs don't really show you this at all in the streaming exmaples because they're focused on making things look concise, simple, and terse. In reality, you'll have to write code that's a bit more verbose. Let's see what a comprehensive, production ready solution might look like:

export const stream = () {
  return new Promise((resolve, reject) => {
    const download = request.get('http://google.com/img.png')

    // we go with the more verbose api here so we can actually handle the response
    const upload = request({
      method: 'POST',
      uri: 'http://mysite.com/img.png',
      headers: {
        'Authorization': 'Bearer my-base-64-encoded-token'
      },
      qs: { // OPTIONAL: pass metadata on the query string in the event that it can't be naturally derived (i.e. content-type)
        filename: name,
        type: 'image/png', // this would normally be dynamic (i.e. based on some variable)
        length: size
      },
      json: true,
      resolve: 'body'
    }, (error, response, body) => {
      // it helps to name this parameter "error". linters like standard
      // will force you to handle it properly. so don't be too clever and name
      // it "err". got with error when you can
      if (error) {
        return reject(error)
      }

      // NOTE: we *must* not access the status code until *after*
      // we verify there's no error. the response will be null if an
      // error occurs
      const { statusCode } = response

      if (statusCode !== 200) {
        return reject(new Error(`upload failed with status code: ${statusCode}`))
      }

      resolve(body)
    })

    download
    .on('response', ({ statusCode }) => {
      if (statusCode !== 200) {
        // include url or additional details to assist with debugging
        reject(new Error(`download failed with status code: ${statusCode}`))
      }
    })
    .on('error', (e) => {
      // include url or additional details to assist with debugging
      reject(new Error('download streaming error occured'))
    }).pipe(upload)
  })
}

The key points above are:

  1. we handle source stream streaming errors via .on('error')
  2. we handle the source stream response via .on('response')
  3. we use the lower level request api to handle the destination response (and yield it to client code)
  4. we make sure not to interrogate the destination response until after we handle errors (since the response object will be null in this case)
  5. we pass metadata on the query string (although you may not have to do this...was just necessary for my case) since request internally derives headers like content-type and content-length. remember that content-length is fine, and should never have to be overriden, but it can lead to socket hang ups it you act on it verbatim at your destination endpoint (i.e. when interfacing with the node s3 sdk as we noted earlier).

The End

That about wraps it up folks. As I said, most of what you need is in the docs. It's just about tying it all together to form a cohesive flow that may prove a bit more difficult. I hope these additional tips and clarity help save you some time and valuable development hours.

And that is all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment