Skip to content

Instantly share code, notes, and snippets.

@evandrix
Created September 3, 2012 08:46
Show Gist options
  • Save evandrix/3607950 to your computer and use it in GitHub Desktop.
Save evandrix/3607950 to your computer and use it in GitHub Desktop.
Stopwords removal
<?php
if (count($argv) != 4) {
echo("Usage: text file, stop words file, output file.\n");
exit;
}
if (!file_exists($argv[1])) {
exit("Unable to open file $argv[1]!\n");
}
if (!file_exists($argv[2])) {
exit("Unable to open file $argv[2]!\n");
}
$post = file_get_contents($argv[1]);
$stop_words = file($argv[2]);
foreach ($stop_words as $word) {
$word = rtrim($word);
$post = preg_replace("/\b$word\b/i", "", $post);
}
$post = preg_replace("/\d/", "", $post);
$post = preg_replace("/[?;:!,.'\"]/", "", $post);
$output = fopen($argv[3], 'w') or
exit("Unable to open file $argv[3]\n!");
fwrite($output, $post);
fclose($output);
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment