Created
March 23, 2014 10:37
-
-
Save bzikarsky/9721378 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?hh | |
async function stream_check(resource $stream, string $mode, int $usec): Awaitable<void> | |
{ | |
$r = $w = $e = null; | |
do { | |
if ($mode == "r") { | |
$r = Vector{$stream}; | |
} | |
if ($mode == "w") { | |
$w = Vector{$stream}; | |
} | |
// on stream activity - return | |
if (0 != stream_select($r, $w, $e, 0, 0)) { | |
return; | |
} | |
// wait given amount of usecs before next activity check | |
await SleepWaitHandle::create($usec); | |
} while (true); | |
} | |
async function fetch_url(string $url): Awaitable<?string> | |
{ | |
// init stream | |
$stream = stream_socket_client( | |
"tcp://" . parse_url($url)['host'] . ':80', | |
$errno = null, | |
$errstr = null, | |
$timeout = 30, | |
/* ASYNC support is missing as of 2014-03-22 */ | |
STREAM_CLIENT_CONNECT | STREAM_CLIENT_ASYNC_CONNECT | |
); | |
stream_set_blocking($stream, 0); | |
// Since the connect is blocking anyway, we don't need to wait | |
// for the stream to become available | |
// | |
// await stream_check($stream, "write", 100); | |
fwrite(STDERR, "CON $url\n"); | |
// send request | |
fwrite($stream, "GET / HTTP/1.0\r\n\r\n"); | |
fwrite(STDERR, "REQ $url\n"); | |
$response = ""; | |
while (true) { | |
await stream_check($stream, "r", 20000); | |
$response .= fread($stream, 2048); | |
if (feof($stream)) { | |
fwrite(STDERR, "RSP $url\n"); | |
fclose($stream); | |
return $response; | |
} | |
} | |
} | |
function fetch_urls_async(Set<string> $urls): Map<string, string> | |
{ | |
$crawler = Map{}; | |
foreach ($urls as $url) { | |
$crawler[$url] = fetch_url($url); | |
} | |
return GenMapWaitHandle::create($crawler)->join(); | |
} | |
function fetch_urls_sync(Set<string> $urls): Map<string, string> | |
{ | |
$result = Map{}; | |
foreach ($urls as $url) { | |
$stream = stream_socket_client("tcp://" . parse_url($url)['host'] . ":80"); | |
fwrite(STDERR, "CON $url\n"); | |
fwrite($stream, "GET / HTTP/1.0\r\n\r\n"); | |
fwrite(STDERR, "REQ $url\n"); | |
$result[$url] = ""; | |
while (!feof($stream)) { | |
$result[$url] .= fread($stream, 8196); | |
} | |
fwrite(STDERR, "RES $url\n"); | |
fclose($stream); | |
} | |
return $result; | |
} | |
function main(array<string> $argv): void | |
{ | |
$urls = Set{ | |
"http://google.com", | |
"http://github.com", | |
"http://php.net", | |
"http://facebook.com", | |
"http://hhvm.com", | |
"http://reddit.com", | |
"http://wikipedia.com", | |
"http://example.org", | |
"http://www.iana.org", | |
"http://netflix.com", | |
"http://bing.com" | |
}; | |
$async = !isset($argv[1]) || $argv[1] == "async"; | |
$pages = $async ? fetch_urls_async($urls) : fetch_urls_sync($urls); | |
$result = $pages->map('strlen'); | |
echo "Results - " . ($async ? "async" : "sync") . ":\n"; | |
$result->mapWithKey(function(string $url, int $size) { | |
echo " - $url: $size\n"; | |
}); | |
} | |
main($argv); |
So - I wrote a node version just to see the difference in speed, and noticed that the sizes are different too. Tested php.net with wget, and the output matched the file size I found from node. I haven't looked to see what's in the results this gist reports on. Anyway, for comparison sake to see what a reasonable run time is, here's the time output from node:
vic@ubuntu:~/prog/node$ time node asyncRequest.js
(script output omitted)
real 0m1.208s
user 0m0.040s
sys 0m0.024s
Here's the script:
var http = require('http'),
sites = [
"google.com",
"github.com",
"php.net",
"facebook.com",
"hhvm.com",
"reddit.com",
"wikipedia.com",
"example.org",
"www.iana.org",
"netflix.com",
"bing.com"
];
function fetchOne(url) {
var request = http.request({ hostname: url }, function(response) {
var size = 0;
response.on('data', function (chunk) {
size += chunk.length;
});
response.on('end', function () {
console.log("Fetched " + url + ": " + size);
});
});
request.on('error', function(e) {
console.log('problem with request: ' + e.message);
});
request.end();
}
for (var i in sites) {
fetchOne(sites[i]);
}
Somehow Github didn't notfiy me about those comments. When I benchmarked this test-script I came to the conclusion that the blocking TCP/IP connect is the "biggest" performance-concern.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've spent hours trying to make something like this work without success - this looks very interesting and I'm looking forward to spending more time with it. I ran it in both synchronous and async modes using the 'time' utility, and the async was slower, so I think something must still be missing. It would be interesting to compare against an implementation in node.js.
I have a feeling that
await SleepWaitHandle::create($usec);
is adding delays, but the script isn't effectively allowing the I/O to happen in parallel.Still, nice start!