Skip to content

Instantly share code, notes, and snippets.

@CrabDude
Created August 12, 2016 21:43
Show Gist options
  • Save CrabDude/30065a68992cb88f7ec07265c3f95a4e to your computer and use it in GitHub Desktop.
Save CrabDude/30065a68992cb88f7ec07265c3f95a4e to your computer and use it in GitHub Desktop.
[Sample Exercise] Week 1: Parallel Asynchronous recursiveReaddir

Week 1 Challenge - Parallel Asynchronous recursiveReaddir

This exercise is to build an API for recursively listing the files in a directory in parallel, or in other words, a recursive ls. The purpose of this exercise is to practice control-flow for asynchronous IO, specifically running operations in serial and parallel. Additionally, this exercise will explore the fs filesystem module from core.

IMPORTANT: Review the Control-flow Guide to familiarize yourself with async/await. Ignore Promise and callbacks for now.

Getting Started

The checkpoints below should be implemented as pairs. In pair programming, there are two roles: supervisor and driver.

The supervisor makes the decision on what step to do next. Their job is to describe the step using high level language ("Let's print out something when the user is scrolling"). They also have a browser open in case they need to do any research. The driver is typing and their role is to translate the high level task into code ("Set the scroll view delegate, implement the didScroll method).

After you finish each checkpoint, switch the supervisor and driver roles. The person on the right will be the first supervisor.

Milestones

  1. Setup:

    • Complete the steps in the Setting Up Nodejs Guide

    • Clone the recursiveReaddir Starter Project.

      Note: The included index.js contains the following:

      require('./helper')
      
      function* ls() {
          // Use 'yield' in here
          console.log('Executing ls function...')
          
          // Your implementation here
      }
      
      module.exports = ls

      Note: The function name ls() has no special meaning, and could just as easily be named main()

      Note: A function * is a special type of asynchronous function in JavaScript

    • Run and verify your script's output:

      $ node index.js
      Executing ls function...
  2. Implement a CLI for fs.readdir

    • Require the fs module: let fs = require('fs').promise

    • To get the list of files in a directory, use fs.readdir:

      Hint: __dirname contains the directory path of the current file.

      let fs = require('fs').promise
      
      // 'yield' can only be used within 'function*'
      function* ls(){
          // fs.readdir(...) returns a Promise representing the async IO
          // Use 'yield' to wait for the Promise to resolve to a real value
          let fileNames = yield fs.readdir(__dirname)
          
          // TODO: Do something with fileNames
      }
    • Loop through fileNames and output each file name to process.stdout using the .write(stringOrBuffer) method

      Your output should look something like this (remember to separate file names with a \n character):

      $ node index.js
      index.js
      node_modules
      package.json
    • Exclude subdirectories from the output using fs.stat and path.join

      Hint: Remember to require path. See the require code for fs above.

      for (let fileName of fileNames) {
          let filePath = path.join(__dirname, file)
          // TODO: Obtain the stat promise from fs.stat(filePath)
          // TODO: Use stat.isDirectory to exclude subdirectories
          // ...
      }
    • Allow the directory path to be passed as a CLI argument:

      $ node index.js --dir=./
      index.js
      node_modules
      package.json
      • Install the yargs package:

        $ npm install --save yargs
      • Use the value passed in on --dir

        let fs = require('fs').promise
        let argv = require('yargs').argv
        let dir = argv.dir
            
        // Or use the more convenient destructuring syntax
        let {dir} = require('yargs').argv
        
        
        // Update fs.readdir() call to use dir
        function* ls(){
            // ...
            let fileNames = yield fs.readdir(dir)
            // ...
        }
        // ...

        Note: See MDN's "Destructuring assignment" documentation.

      • If no value for --dir is given, default to the current directory:

        let {dir} = require('yargs')
            .default('dir', __dirname)
            .argv
      • Verify output of node index.js --dir path/to/some/dir

  3. Extend the CLI to be recursive.

    • To implement recursion, the code needs to be restructured:

      • Current logic:
        • Call fs.readdir(dir)
        • Iteratively fs.stat the resulting filePaths
        • Log files
        • Ignore sub-directories
      • Recursive logic:
        • Pass the current directory to ls on the argument rootPath
        • fs.stat(rootPath)
        • If rootPath is a file, log and early return
        • Else, call fs.readdir(rootPath)
        • Recurse for all resulting filePaths
    • Pass dir to ls(). Name the argument rootPath.

      To do this, create a separate function main and pass dir to 'ls' as a function parameter:

      function* ls(rootPath) {
          // ...        
      }
      
      function* main() {
          // Call ls() and pass dir, remember to yield
          yield ls(dir)
      }
      
      // Set module.exports to main() instead of ls()
      module.exports = main
    • If rootPath is a file, log and early return:

      function* ls(rootPath) {
          // TODO: log rootPath if it's a file, then early return
          // ...        
      }
    • Recursively call ls() with filePath on subdirectories:

      function* ls(rootPath) {
          // ...
          // TODO: Get 'fileNames' from fs.readdir(rootPath)
          for (let fileName of fileNames) {
              // Recurse on all files
              // Process every 'ls' call in serial (one at a time)
              // By 'yield'ing on each call to 'ls'
              // This maintains output ordering
              yield ls(filePath)
          }
      }
    • Ordering is nice, but performance is better. Parallelize the traversal by removing the yield call before ls:

      function* ls(rootPath) {
          // ...
          // TODO: Get 'fileNames' from fs.readdir(rootPath)
          for (let fileName of fileNames) {
              // Removing yield recursively lists subdirectories in parallel
              ls(filePath)
          }
      }
    • Verify your output

  4. Bonus: Return a flat array of file paths instead of printing them as you go:

    • Return an array of file paths for both single files and directories:

      // Single file case (return instead of logging)
      return [rootPath]
      
      // Sub-directory case
      let lsPromises = []
      for (let fileName of fileNames) {
          // ...
          let promise = ls(filePath)
          lsPromises.push(promise)
      }
      // The resulting array needs to be flattened
      return yield Promise.all(lsPromises)

      Note: To yield several asynchronous operations (Promises) in parallel (as opposed to in serial, aka one at a time), use Promise.all like so: yield Promise.all(arrayOfPromises).

    • Concatenate the results with Array.prototype.concat() or use a utility library like lodash with _.flatten to flatten the resulting recursive arrays.

    • Print the results (return value of ls(dir)) with a single console.log:

      function* main() {
          let filePaths = yield ls(rootPath)
          // TODO: Output filePaths
      }

      Hint: See MDN's "Arrow functions" documentation on how to use the => syntax (aka "arrow functions").

      Hint: function *s like ls return a "generator object", which can be yielded on like a Promise. Obtain the return value (aka resolution value) for ls by yielding as shown in the example above.

  5. Bonus: Execute index.js directly.

    To make a node.js / JavaScript file executable:

    1. Mark the file as executable (skip for Windows):

      $ chmod +x ./index.js
    2. Add a node.js shebang by appending the following to the top of index.js:

      Linux / OSX:

      #!/usr/bin/env node

      Windows:

      #!/bin/sh
      ':' //; exec "$(command -v nodejs || command -v node)" "$0" "$@"
    3. Verify by running index.js without node:

      $ ./index.js --dir=./
      index.js
      node_modules
      package.json

Guides

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment