Skip to content

Instantly share code, notes, and snippets.

@doersino
Last active September 21, 2016 12:55
Show Gist options
  • Save doersino/22d79fea64eded15b7e7d5eb05d9e3df to your computer and use it in GitHub Desktop.
Save doersino/22d79fea64eded15b7e7d5eb05d9e3df to your computer and use it in GitHub Desktop.
Upgrade script from ReAD commit f5bd3f4 to ad87e7b. See https://github.com/doersino/ReAD.
<?php
error_reporting(E_ALL);
require_once "deps/meekrodb.2.3.class.php";
require_once "TextExtractor.class.php";
$allArticles = DB::query("SELECT * FROM `read` ORDER BY `time_added` ASC");
$N = count($allArticles);
echo "\tid\twordcount\n";
echo "-------------------------\n";
foreach ($allArticles as $n => $article) {
$id = $article["id"];
// get source
$source = DB::queryFirstField("SELECT `source` FROM `read_sources` WHERE `id` = '$id'");
// extract text and compute word count
$text = TextExtractor::extractText($source);
$wordcount = TextExtractor::countWords($text);
// print progress
echo round(10000 * ($n / $N)) / 100 . "%\t";
echo "$id\t";
echo "$wordcount\n";
// save to database
DB::query("INSERT INTO `read_texts` ( `id`, `text` ) VALUES (%s, %s)", $id, $text);
DB::query("UPDATE `read` SET `wordcount` = %i WHERE `id` = %i", $wordcount, $id);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment