Maximilian Götz ImMaax

borgar / Tiny JavaScript tokenizer.js

Created June 24, 2010 12:33

A compact tokenizer written in JavaScript.

	/*
	* Tiny tokenizer
	*
	* - Accepts a subject string and an object of regular expressions for parsing
	* - Returns an array of token objects
	*
	* tokenize('this is text.', { word:/\w+/, whitespace:/\s+/, punctuation:/[^\w\s]/ }, 'invalid');
	* result => [{ token="this", type="word" },{ token=" ", type="whitespace" }, Object { token="is", type="word" }, ... ]
	*
	*/