If you've ever worked with http in node what you quickly come to realize is that it's not all that straightforward. You'll be bold, at least initially. Like me, you'll go read up on the docs for the http module. You'll properly subscribe to all the right events like on('data), and feel good about yourself. But if you're anything like be, you're bound do this wrong at some point. You probably won't subscribe to all the right things in the right order. You might not properly buffer the response. You may just have a WTF moment in production on day that brings down your entire site (just like I did about 2 years ago). Whatever the case may be, I believe you'll ultimately just settle on using request as your go to for http. If you want a promise based variant then you might go with some wrapper library as well.
So, we know request is the hip way of doing HTTP in node. request is kool. It's got a ton of usage and it pretty much just works in my experience. One thing about request, and node in general, is that streams are first class citizens. And as we all know, streams are good, cuz you can pipe em. Any request issued by way of request, can be used as a stream. A common use case with streams is to move data from some source A, to some destination B. With request you can do just that.
NOTE: Most all of what I'm about to cover is inside the request docs. The issue for me was putting it all together in a sane way. It just wasn't clear to me how to put it all together and achieve a win. You'll probably skim over some minute detail, or something in the docs might straight up just not register with your brain and go in one neural ear and out the other.
Here's a very simple example of piping one request to another extracted verbatim from the request docs:
request('http://google.com/doodle.png').pipe(fs.createWriteStream('doodle.png'))
Simple enough. Take this doogle thing from off the internet and save it as a doodle thing on my local hard drive. Way too easy, right?
The example above probably won't be your typical use case, though. You'll probably want to pipe data from request A, to some api endpoint B on a completely different server. Let's see what that looks like:
request.get('http://google.com/img.png').pipe(request.put('http://mysite.com/img.png'))
Ok kool. We're getting somewhere. Now (regarding the example above), if you read the request docs, you'll see a small note in fine print (it's really a rather important caveat).
Request can also pipe to itself. When doing so, content-type and content-length are preserved in the PUT headers.
Why is this important? Say you want to store metadata about some remote file. It's size and content type. Great, just let request derive
it for you, right? Well, I tried this, and it appears request derive the content type of the file based on the presense of the file
name in the target url. So 'http://google.com/img.png' would map to something like image/png. But what if you're like me? What if
the source server servers the file via something like: http://my.domain.com/attachements/some-long-hash/0? request will actually set
the content type to something really generic that has absolutely 0 meaning in the scheme of things. But I can just override the content type, header, right? Nah, that didn't work for me either. request just said "nope". Lets take a look at content-length
now. cont nt-length
is also derived for you when piping requests. content-length however is one of those things that can quickly get you into trouble if not careful. It can lead to socket hangups like I discovered in this thread around the aws-sdk for node here. I left it to request to derive my content-length as it promised. We took this very same header on the destination server, and included it in our payload to and s3 upload. Socket hangups, boom! I'm guessing that the content-length had just enough of a descrepancy (maybe just one byte) that it caused amazon to reject our request. So how'd we get around the issues with content-length
and content-type
? Well...we just passed them on the query string to the destination server B. Because request would not let me override at least the content-type
header (I had no need to bother content-length
).
We ended up not passing the length of the request to the s3 sdk since it'll just read a stream to the end on it's own if no length is specified. The type of the file and length of the file/request were utlimately saved to the database as metadata along with the url of the newly uploaded file.
It's easy to take the cheese and naively paste request.get('http://google.com/img.png').pipe(request.put('http://mysite.com/img.png'))
into your code and ship it off to production. However, what happens if a streaming related error occurs? The request docs give you a hint at what to do, but are you going to pay attention and actually apply what's instructed? I, like you, read this but for some reason it didn't initially register:
To easily handle errors when streaming requests, listen to the error event before piping:
The key wording here being before piping. One thing I've learned in dealing with an unit testing streams is that you SHOULD ALWAYS SUBSCRIBE TO ERROR EVENTS BEFORE PIPING:
request
.get('http://mysite.com/doodle.png')
.on('error', function(err) {
console.log(err)
})
.pipe(fs.createWriteStream('doodle.png'))
Ok great. So that covers streaming based errors. I'm telling you explicitly here that these matter so don't forget to include the relevant event subscription in your code.
One thing that wasn't immediately obvious to me, but that is also covered in the request docs, is that streaming based errors and response based errors are different. I remember reading the docs and the snippet of code, but for some reason it just did not register. I think the issue for me is that some of the streaming docs stuff read as "helpful hints" or "tips" rather than something you should absolutely do. And yes, you should absolutely handle the response
event when streaming. From the docs:
request
.get('http://google.com/img.png')
.on('response', function(response) {
console.log(response.statusCode) // 200
console.log(response.headers['content-type']) // 'image/png'
})
.pipe(request.put('http://mysite.com/img.png'))
Request emits a "response" event when a response is received. The response argument will be an instance of
http.IncomingMessage
.
The response
event is crucial here. If you don't handle it, and the request fails, you're going to sucessfully stream a failed response
to your destination endpoint. A failed response is not the same as a streaming error. Streaming errors come from .on('error')
and potential response errors must be handled via .on('response')
. So if no streaming errors occur, but your response fails with say, a 401
client error, and you don't handle the response
event, the request will succeeed (even though it should be rejected). You will literally stream an image or attachment that contains is composed of the textual contents "401 Unauthorized"
, and as we all know, that ain't gonna open as an image in anybody's browser. This is not the fault of request (although in the docs it comes off as optional). This is a user error. I event went so far as to issue a probe (HEAD
) request because I noticed I had the potential of streaming a bad response. I thought adding an extra layer of insurance via probing would save the day, but I also noticed the ensuing request could also fail and that yet still I'd be in the same predicament. So I thought to myself, "some way there must be". I went right back to the docs and it finally hit me that I needed to subscribe to the response
event. I was able to simplify my code a good bit after this by removing an additional request and promise handling.
I feel like the docs still don't really show you the whole truth. We know how to handle streaming errors from the source, we know how to handle the response from the source, but how do we handle the response from the destination endpoint? The docs don't really show you this at all in the streaming exmaples because they're focused on making things look concise, simple, and terse. In reality, you'll have to write code that's a bit more verbose. Let's see what a comprehensive, production ready solution might look like:
export const stream = () {
return new Promise((resolve, reject) => {
const download = request.get('http://google.com/img.png')
// we go with the more verbose api here so we can actually handle the response
const upload = request({
method: 'POST',
uri: 'http://mysite.com/img.png',
headers: {
'Authorization': 'Bearer my-base-64-encoded-token'
},
qs: { // OPTIONAL: pass metadata on the query string in the event that it can't be naturally derived (i.e. content-type)
filename: name,
type: 'image/png', // this would normally be dynamic (i.e. based on some variable)
length: size
},
json: true,
resolve: 'body'
}, (error, response, body) => {
// it helps to name this parameter "error". linters like standard
// will force you to handle it properly. so don't be too clever and name
// it "err". got with error when you can
if (error) {
return reject(error)
}
// NOTE: we *must* not access the status code until *after*
// we verify there's no error. the response will be null if an
// error occurs
const { statusCode } = response
if (statusCode !== 200) {
return reject(new Error(`upload failed with status code: ${statusCode}`))
}
resolve(body)
})
download
.on('response', ({ statusCode }) => {
if (statusCode !== 200) {
// include url or additional details to assist with debugging
reject(new Error(`download failed with status code: ${statusCode}`))
}
})
.on('error', (e) => {
// include url or additional details to assist with debugging
reject(new Error('download streaming error occured'))
}).pipe(upload)
})
}
The key points above are:
- we handle source stream streaming errors via
.on('error')
- we handle the source stream response via
.on('response')
- we use the lower level request api to handle the destination response (and yield it to client code)
- we make sure not to interrogate the destination response until after we handle errors (since the response object will be null in this case)
- we pass metadata on the query string (although you may not have to do this...was just necessary for my case) since request internally derives headers like
content-type
andcontent-length
. remember thatcontent-length
is fine, and should never have to be overriden, but it can lead to socket hang ups it you act on it verbatim at your destination endpoint (i.e. when interfacing with the node s3 sdk as we noted earlier).
That about wraps it up folks. As I said, most of what you need is in the docs. It's just about tying it all together to form a cohesive flow that may prove a bit more difficult. I hope these additional tips and clarity help save you some time and valuable development hours.
And that is all