Skip to content

Instantly share code, notes, and snippets.

@dzzzchhh
Last active February 19, 2025 08:56
Show Gist options
  • Save dzzzchhh/eeee30e400462148354c4bab07802a64 to your computer and use it in GitHub Desktop.
Save dzzzchhh/eeee30e400462148354c4bab07802a64 to your computer and use it in GitHub Desktop.

NodeJS: Streams with (almost) real life examples

General idea

Basic idea of stream in software engineering implies existence of a data source and some destination entity to which the stream should output processed data. Simple bash example below should demonstrate how to get a list of all files with .js extension in a current working directory:

ls -la | grep ".js$"

This command sequence requests a file listing from the OS and with the help of pipe opeator | directs its output to the grep which in turn selects all entries that satisfy the given matcher.

Pretty cool, huh? But what if we needed the ability to process files in the same "streamlined" manner within our JS applications? Let's see if it's possible and how we can benefit from it.

Streams in NodeJS. Examples

A bit of disclaimer. There are hundreds of amazing articles on the Internet that describe the given topic in much better than this one. This one instead tries to provide a perspective on the real application of streams with minimal theoretical overhead.

The main cocept behind using streams in Node applications can be brought down to one simple principle: do not load the whole file into memory when you can process it chunk by chunk. In the following examples I'll try to demonstrate how we can use streams to work with files.

1.Simple static HTTP server.

Well, of course there are many packages that can help you serve your static files. But there are also ways to achieve the requried result without any npm packages at all. Lets's take a look:

const { createServer } = require("http");
const { createReadStream, readdir } = require("fs");

const PORT = 5200;
let dirFiles = [];

readdir("./", (err, files) => {
  if (err) {
    console.log(error);
  }
  dirFiles = files;
});

function handler(req, res) {
  // send index.html if req.url is plain "/"
  if (req.url === "/") {
    return createReadStream("./index.html").pipe(res);
  }
  // slice(1) gets rid of "/" at the beginning
  if (dirFiles.includes(req.url.slice(1))) {
    return createReadStream(__dirname + req.url).pipe(res);
  }
  return res.end("");
}

function listener() {
  console.log(`Server is running on ${PORT}`);
}

createServer(handler).listen(PORT, listener);

The main inspiration behind this snippet is the idea to utilize default http package to serve our files while relying on createReadStream from standard fs module to read the requested file based on provided URL and dump processed chunks to res, which is essentialy a pipeable stream itself.

The dirty little hack with dirFiles and readdir is here to ensure that we serve only files that are in our current directory and handle requests for non-existing files as we please.

Although you'll probably never use something like this for any serious development, its good to understand how you can implement the same functionality as some of the most popular packages yourself and from scratch.

2. Find out if file contains specific text

Let's imagine that we need to process a large text file (~1GB). As you, probably, already know, doing so by loading the whole file into memory is very expensive for RAM, so let's think about some other approach that would allow us to process file piece by piece and stop the processing if the pattern was found in a given file.

Most text files are formatted using the \n (newline) symbol. For the simplicity's sake we'll pretend that we need to process a large CSV-like file with lots of rows. Algorhithm for such processing would include reading the file line by line and checking if line contains specific pattern.

Lucky for us, Node has a dedicated module for this type of operations - readline. Its description on nodejs.org states:

The readline module provides an interface for reading data from a Readable stream (such as process.stdin) one line at a time.

This is perfect for our case, since we can use any file to create a Readable stream and process it line by line. We'll search for a string error line in a big log file.

begin
sample data
sample data 2
sample data 3
error line
sample data 4
...
sample data 50000
end

And now, let's break down a solution for a given problem:

const { createReadStream } = require("fs");
const { createInterface } = require("readline");

async function searchForErrorInFile(fileName) {
  const stream = createReadStream(fileName);
  // create a new readline interface
  const reader = createInterface(stream); 
  for await (const line of reader) {
    if (line === "error line") {
      return true;
    }
    console.log(line);
  }
}

const fileName = "./document.txt";
searchForErrorInFile(fileName);

When we execute the script we see that as soon as our reader has found the occurence the processing stops.

PS C:\demo> node .\error-search.js
begin        
sample data  
sample data 2
sample data 3

The usage of for await here improves the code readability. But it also worth knowing that readline is implemented with a widely-used default EventEmitter instead we could've used reader.on("line") with a callback.

3. Splitting files

Let's imagine a silly case where you host a book-selling website.

Authors upload books written in a single Markdown file to a specific folder on you server. Your job is to split those books chapter by chapter so that you can sell separate chapters of the book separately.

Our book:

# Chapter One
...
In his pockets Mr.Brown always had a list of tasks for today, as well as some coins and small bills.
...

# Chapter Two
...
Suddenly, the door opened. Carl and Maria were surprised, to say the least, but continued to search the bookshelf in hope to find the book before the police finds them.
...

# Chapter Three
...
He turned off the radio and dropped the blanket on the floor. Warm California sun has already warmed up his appartment.
...

Knowing that each chapter is a first-level header, we can develop a strategy for determining which chapter is being processed at a given moment and appending the line to a proper chapter file. We'll use appendFileSync since the operation does not really require any async behavior.

const { createReadStream, appendFileSync } = require("fs");
const { createInterface } = require("readline");

const filePath = "book.md";

const reader = createInterface(createReadStream(filePath));

async function read() {
  const chapterMatcher = /# [A-Z].*$/g;
  let chapterCounter = 0;
  for await (const line of reader) {
    if (line.length === 0) {
      continue; // ignore the empty lines
    }
    if (chapterMatcher.test(line)) {
      chapterCounter++; // increment the chapter
    }
    // and write all non-empty lines to a file that co-responds to the current chapter
    appendFileSync(`chapter-${chapterCounter}.md`, `${line}\n`);
  }
}

read().catch(console.error);

4. Working with standard streams

It's very important to understand that the process global object gives us access to 3 standard OS-level streams: stdin, stdout, stderr. Let's try to recreate some common UNIX commands for file reading:

  • cat
  • head
  • tail

4.1. cat

In general, cat can be used to preview a contents of one or multiple files. We are going to implement the simplest version of this command, which is going to print the contents of a file to the stdout.

const { createReadStream } = require("fs");
const filePath = "book.md";

createReadStream(filePath).pipe(process.stdout);

As you can see, this is very easily achieved just by creating a simple Readable stream and pointing it's output to a standard output stream.

4.2. head & tail

The man entry for head says that head - output the first part of files and by default it prints the first 10 lines of a file. But to simplify the task and to showcase both commands in action we are going to assume that head returns first 25% of a file and tail will return the last 25% of a file. How do we achieve this?

const { createReadStream } = require("fs");
const { stat } = require("fs").promises;
const filePath = "book.md";

stat(filePath)
  .then(({ size }) => {
    const frameSize = Math.floor(size * 0.25);
    createReadStream(filePath, { end: frameSize }).pipe(process.stdout);// head
    createReadStream(filePath, { start: size - frameSize }).pipe(
      process.stdout
    ); // tail
  })
  .catch(console.error);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment