Skip to content

Instantly share code, notes, and snippets.

@bzikarsky
Created March 23, 2014 10:37
Show Gist options
  • Save bzikarsky/9721378 to your computer and use it in GitHub Desktop.
Save bzikarsky/9721378 to your computer and use it in GitHub Desktop.
<?hh
async function stream_check(resource $stream, string $mode, int $usec): Awaitable<void>
{
$r = $w = $e = null;
do {
if ($mode == "r") {
$r = Vector{$stream};
}
if ($mode == "w") {
$w = Vector{$stream};
}
// on stream activity - return
if (0 != stream_select($r, $w, $e, 0, 0)) {
return;
}
// wait given amount of usecs before next activity check
await SleepWaitHandle::create($usec);
} while (true);
}
async function fetch_url(string $url): Awaitable<?string>
{
// init stream
$stream = stream_socket_client(
"tcp://" . parse_url($url)['host'] . ':80',
$errno = null,
$errstr = null,
$timeout = 30,
/* ASYNC support is missing as of 2014-03-22 */
STREAM_CLIENT_CONNECT | STREAM_CLIENT_ASYNC_CONNECT
);
stream_set_blocking($stream, 0);
// Since the connect is blocking anyway, we don't need to wait
// for the stream to become available
//
// await stream_check($stream, "write", 100);
fwrite(STDERR, "CON $url\n");
// send request
fwrite($stream, "GET / HTTP/1.0\r\n\r\n");
fwrite(STDERR, "REQ $url\n");
$response = "";
while (true) {
await stream_check($stream, "r", 20000);
$response .= fread($stream, 2048);
if (feof($stream)) {
fwrite(STDERR, "RSP $url\n");
fclose($stream);
return $response;
}
}
}
function fetch_urls_async(Set<string> $urls): Map<string, string>
{
$crawler = Map{};
foreach ($urls as $url) {
$crawler[$url] = fetch_url($url);
}
return GenMapWaitHandle::create($crawler)->join();
}
function fetch_urls_sync(Set<string> $urls): Map<string, string>
{
$result = Map{};
foreach ($urls as $url) {
$stream = stream_socket_client("tcp://" . parse_url($url)['host'] . ":80");
fwrite(STDERR, "CON $url\n");
fwrite($stream, "GET / HTTP/1.0\r\n\r\n");
fwrite(STDERR, "REQ $url\n");
$result[$url] = "";
while (!feof($stream)) {
$result[$url] .= fread($stream, 8196);
}
fwrite(STDERR, "RES $url\n");
fclose($stream);
}
return $result;
}
function main(array<string> $argv): void
{
$urls = Set{
"http://google.com",
"http://github.com",
"http://php.net",
"http://facebook.com",
"http://hhvm.com",
"http://reddit.com",
"http://wikipedia.com",
"http://example.org",
"http://www.iana.org",
"http://netflix.com",
"http://bing.com"
};
$async = !isset($argv[1]) || $argv[1] == "async";
$pages = $async ? fetch_urls_async($urls) : fetch_urls_sync($urls);
$result = $pages->map('strlen');
echo "Results - " . ($async ? "async" : "sync") . ":\n";
$result->mapWithKey(function(string $url, int $size) {
echo " - $url: $size\n";
});
}
main($argv);
@zymsys
Copy link

zymsys commented Mar 26, 2014

I've spent hours trying to make something like this work without success - this looks very interesting and I'm looking forward to spending more time with it. I ran it in both synchronous and async modes using the 'time' utility, and the async was slower, so I think something must still be missing. It would be interesting to compare against an implementation in node.js.

vic@ubuntu:~/prog/9721378$ time hhvm hhvm-async-test.php
(script output omitted)
real    0m4.273s
user    0m0.024s
sys 0m0.096s

vic@ubuntu:~/prog/9721378$ time hhvm hhvm-async-test.php sync
(script output omitted)
real    0m3.080s
user    0m0.016s
sys 0m0.100s

I have a feeling that await SleepWaitHandle::create($usec); is adding delays, but the script isn't effectively allowing the I/O to happen in parallel.

Still, nice start!

@zymsys
Copy link

zymsys commented Mar 26, 2014

So - I wrote a node version just to see the difference in speed, and noticed that the sizes are different too. Tested php.net with wget, and the output matched the file size I found from node. I haven't looked to see what's in the results this gist reports on. Anyway, for comparison sake to see what a reasonable run time is, here's the time output from node:

vic@ubuntu:~/prog/node$ time node asyncRequest.js 
(script output omitted)
real    0m1.208s
user    0m0.040s
sys 0m0.024s

Here's the script:

var http = require('http'),
    sites = [
        "google.com",
        "github.com",
        "php.net",
        "facebook.com",
        "hhvm.com",
        "reddit.com",
        "wikipedia.com",
        "example.org",
        "www.iana.org",
        "netflix.com",
        "bing.com"
    ];

function fetchOne(url) {
    var request = http.request({ hostname: url }, function(response) {
        var size = 0;
        response.on('data', function (chunk) {
            size += chunk.length;
        });
        response.on('end', function () {
            console.log("Fetched " + url + ": " + size);
        });
    });
    request.on('error', function(e) {
          console.log('problem with request: ' + e.message);
    });
    request.end();
}

for (var i in sites) {
    fetchOne(sites[i]);
}

@bzikarsky
Copy link
Author

Somehow Github didn't notfiy me about those comments. When I benchmarked this test-script I came to the conclusion that the blocking TCP/IP connect is the "biggest" performance-concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment