Skip to content

Instantly share code, notes, and snippets.

@mattbasta
Last active December 14, 2015 14:58
Show Gist options
  • Save mattbasta/5104174 to your computer and use it in GitHub Desktop.
Save mattbasta/5104174 to your computer and use it in GitHub Desktop.

Avoid code duplication and reusing code is always an admirable goal. However, in some occasions, it's not a bad idea to duplicate a little bit of code in order to make your software better.

An Example

When writing a Node app, accessing a directory listing is fairly simple: the fs.readdir command provides a list of objects in a directory, and a call to fs.stat will tell you whether each object is a directory or a file.

That's fairly straightforward, but recursing (especially when using callbacks) can require some mental yoga. Each directory needs to keep track of how many subdirectories it expects to receive callbacks for and results need to be aggregated in pieces rather than sequentially.

When I needed a recursive directory listing this past week, I was tempted to use a library: why solve a solved problem? The glob library provides this functionality with very little effort:

var glob = require('glob');
glob('*.' + extension, function(err, heres_the_files) {
    console.log(heres_the_files);
});

Why would you use anything else? Here's what I saw when I did an npm install glob:

  • glob
    • minimatch
      • lru-cache
      • sigmund
    • grafeful-fs
    • inherits

Simply by adding glob to my project, I've added a total of six modules. That's not unexpected, though, since glob does a whole lot more than I need it to do. In the end, I ended up simply writing my own, and including it in the single file that requires the functionality:

function glob(path, ext, done){
    var results=[];
    fs.readdir(path, function(err, list){
        if(err) return done(err);
        var pending = list.length;
        if(!pending) return done(null, results);
        list.forEach(function(file) {
            file = path + '/' + file;
            fs.stat(file, function(err, stat){
                if(stat && stat.isDirectory()){
                    glob(file, ext, function(err, res){
                        results = results.concat(res);
                        if(!--pending) done(null, results);
                    });
                }else{
                    // If it's got the right extension, add it to the list.
                    if(file.substr(file.length - ext.length) == ext)
                        results.push(file);
                    if(!--pending) done(null, results);
                }
            });
        });
    }); 
}

Why are dependencies not good?

First, and most importantly, is the security of those packages. If the SSH keys of the developer of those packages are compromised, an attacker could provide an "updated" version of the library which includes malicious code. At Mozilla, we use internal mirrors of PyPi and NPM to make sure we're not installing arbitrary modules. Avoiding dependencies helps to avoid requiring a solution.

Second, there's a performance hit for libraries. If you're building a one-off fix for a simple problem, using a library increases the overhead of your app's load and use. If you're building a web app on Node or Django, you might use an auto-reloader to restart your local server when script files change. In many projects, this takes an imperceptibly small amount of time to reload. As your codebase and dependency list grows, your auto-reloader can take precious seconds to run (we have projects that take almost five seconds to reboot).

Each dependency increases the time it will take to install your application. Zamboni (the code behind addons.mozilla.org and marketplace.firefox.com), for instance, takes 12 minutes to install libraries and dependencies. What starts out as a small number of libraries can very quickly grow out of control and become a serious nuisance. time npm install glob, to reference my earlier example, tells me that glob adds four seconds to my install time. Some simple and unscientific benchmarks from popular node libraries:

  • socket.io: 33s
  • mongoose: 9.5s
  • connect: 5.7s
  • jade: 5.3s
  • express: 4.4s
  • glob: 3.9s

And some from popular Python libraries (run with time pip install <module>):

  • numpy: 110s
  • django: 96s
  • pycrypto: 18s
  • flask: 14s

Third, you're taking the gamble that the libraries that you depend on don't have conflicting versions. A great example of this can be illustrated by a web app that I was building: the HTTP requests that I was creating with Python's requests library didn't have a .json property, as the docs state. An hour of hair-tearing later, I discovered that another library that I was using had already installed a much older version of requests which lacked the property. Unfortunately, the library wasn't compatible with the latest version of requests and I had to settle for the old version.

The consequences could have been much worse, and this is a not-well-solved problem (for the Python community, at least).

Last, if your project is used as a library, including dependencies decreases the re-usability of your code. If your code uses a third-party library when it could have taken advantage of some slightly-gnarly standard library tools, all of the above reasons make it that much more difficult for a developer to justify using your tool.

Now hold on just a minute.

Does this mean you should never use an external library? Not at all! You should use libraries whenever it's appropriate. Sometimes the effort required to perform a task correctly and thoroughly (or at all) is simply too much work to do on your own. OAuth? You'll probably want a library. Database work or ORM? Use a library. But what if you need to serve a single static HTML file over HTTP? You can probably use Node's http or Python's SimpleHTTPServer without too much heartache.

Sometimes it's just irresponsible to write code yourself: would you trust a developer's one-off method or a mature ORM's SQL sanitization code? What about code to generate secure random numbers for encrypting sensitive data? Or code that protects against XSS attacks? There are a lot of instances where it's in your users' best interest to use a trusted solution rather than your own solution.

A double-edged sword.

Another reason for choosing libraries is the community: if there are bug fixes to a third-party library, dependents can relatively easily update to the latest version of the library and take advantage of the improvements immediately.

That can be a good thing and a bad thing, though. Libraries without a community around them can contain bugs that don't get patched in a timely manner. Fixes may be difficult or impossible to upstream.

TL;DR

  1. If you can easily get by without a dependency, you don't need the dependency.
  2. If you're building a library, you should avoid dependencies.
  3. Don't avoid dependencies if it means potentially putting your users at risk.
@Sancus
Copy link

Sancus commented Mar 9, 2013

I like this post, and agree with it on many levels. An additional point about adding dependencies is the risk of over-solving problems and adding complexity to your code where you could have taken the opportunity to simplify. It might save time to install a library or use a framework, but if you only need a small portion of that library's functionality you may end up finding yourself digging through hundreds of lines of code trying to fix a bug that wouldn't even be there if you had written only the functionality you needed, or found a library that only implemented the things you want.

An extreme example of this is using Django to build a small REST api that could have just been a barebones wsgi application because it doesn't need an ORM, template system, authentication, admin crud functionality, etc... You should always be aware of how much extraneous functionality you're accidentally adding to get what you need. Avoid it when possible.

@kumar303
Copy link

kumar303 commented Mar 9, 2013

I read this and think immediately of the classic NIH problem: why should I use someone else's code when I can easily invent my own to make it faster, smaller, and more secure? If you don't see the irony in that statement, I can't help you.

I also find it ironic that you choose NodeJS to illustrate the "problem" of dependencies. Besides NPM security issues, I actually think NodeJS is the one language to finally get dependencies right. When I build a NodeJS project I feel warm and fuzzy because I know its dependency system Just Works and makes all the right decisions. This is because NodeJS has learned from all the mistakes of Ruby gems, Python eggs, Python's virtualenv, java jars, you name it. More importantly, it has had the luxury of stealing all the good features from those dependency systems.

I'm just going to say it. NodeJS is the language to finally get dependencies right. How? They do not pollute your global system, sub-dependencies are isolated from each other (no circular deps!), the require system is lazy and fast, the strict version numbering is Done Right, etc. Once they solve NPM security, it will reach a new plateau of plasma-awesome.

Have you measured the performance hit you get from loading glob's 5 sub-dependencies? I doubt it's even a noticeable blip. When you talk about 5 second boot time I assume you are talking about Zamboni :) There is something deeply wrong with Zamboni's package but I doubt it's related to dependencies. On webpay we now have dependencies approaching Zamboni numbers and our boot time is milleseconds. I blame Zamboni's magical legacy cruft -- something in there is doing bad things but I wouldn't be so quick to blame its dependencies until you have some data on that.

The best thing about open source is that we have the tools we need to share code. I will totally admit that this is easier in some languages and harder in others but I do think in NodeJS you have no excuse to avoid dependencies; they are one of Node's best features. As with any open source community, you're going to find bugs and problems in other people's code. Does this really mean you should always write your own?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment