Skip to content

Instantly share code, notes, and snippets.

@joshuabaker
Created December 17, 2012 14:27
Show Gist options
  • Save joshuabaker/4318654 to your computer and use it in GitHub Desktop.
Save joshuabaker/4318654 to your computer and use it in GitHub Desktop.
Incomplete PHP script to scrape twitter.com for a specific user’s timeline. It’s incomplete because the email functionality has not been completed. The idea here was to email it to IFTTT, Evernote or similar for archiving.
<?php
// Enter your Twitter screen name
$screen_name = 'joshuabaker';
// Optionally enter your email address
$from_email = '';
// -------------------------------------------------------------------
// DON’T EDIT BELOW THIS LINE UNLESS YOU KNOW WHAT YOU’RE DOING!
$html = file_get_contents('http://twitter.com/'.$screen_name);
$html = mb_convert_encoding($html, 'utf-8', mb_detect_encoding($html));
$html = mb_convert_encoding($html, 'html-entities', 'utf-8');
$doc = new DOMDocument();
$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('p');
$tweets = array();
foreach ($nodes as $node)
{
if ($node->hasAttributes() && $class = $node->attributes->getNamedItem('class'))
{
if (in_array('js-tweet-text', explode(' ', $class->nodeValue)))
{
$tweet = $doc->saveXML($node, LIBXML_NOEMPTYTAG);
$tweet = strip_tags($tweet);
$tweet = str_replace('&nbsp;', '', $tweet);
$tweet = trim($tweet, '…');
$tweet = trim($tweet);
$tweets[] = trim($node->nodeValue);
}
}
}
print_r($tweets);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment