Skip to content

Instantly share code, notes, and snippets.

@oneroyalace
Created August 11, 2022 17:43
Show Gist options
  • Save oneroyalace/81eb01a1b4cb2cb92cf426f631825c4f to your computer and use it in GitHub Desktop.
Save oneroyalace/81eb01a1b4cb2cb92cf426f631825c4f to your computer and use it in GitHub Desktop.
Solving Zorki session/intercept issues

Problem description

Zorki needs to end its Selenium session after it finishes an Instagram scrape, but because it's usually still forwarding requests when it finishes a scrape, it can't do so without raising Selenium errors.

To get around that, we need to ensure that Zorki ends its request forwarding earlier, waits for its queued requests to finish forwarding before it ends its Selenium session, or doesn't break when a session is quit while requests are still being forwarded.

Solution explanations

Wait for requests to finish forwarding before ending session

We could tell Zorki to sleep for n seconds before ending its Selenium session in the hopes that by then, all requests would be forwarded. But explicit sleeps aren't very nice, and there's always a chance that if we try to minimize the sleep duration, we'll run into an instance in which Selenium fails to forward requests before Zorki ends the Session (our problem right now).

Monkeypatch Selenium so that it doesn't raise errors about failed request forwarding

We don't care much about errors that arise from intercepting and forwarding requests. In fact, we currently wrap the whole interception/forwarding block in a try/catch. Unfortunately, even though the try/catch is set to catch the Selenium error that's breaking our flow, it doesn't do any catching. I think that's because Selenium is operating in a different thread.

But we could alter Selenium so that it just doesn't raise the troublesome errors we don't care about. Specifically, we could apply a monkeypatch on line 49 here: https://github.com/SeleniumHQ/selenium/blob/c7be1be9e1053488d13aab76a76598dbf0a39a56/rb/lib/selenium/webdriver/devtools.rb#L45-L52, stopping Selenium from raising errors on continue request commands. That would look like changing line 49 to:

raise Error::WebDriverError, error_message(message['error']) if message['error'] and method != 'Fetch.continueRequest'

End request interception earlier

We could try to make sure that once Zorki has intercepted the request it cares about, it shuts down its Selenium interceptor. We already try that here, though https://github.com/cguess/zorki/blob/15a5846f8770c0af05da389bba24ecf48f5594ba/lib/zorki/scrapers/scraper.rb#L80, and it doesn't seem to work.

I've tried creating an instance variable that keeps track of whether a Zorki::Scraper instance has intercepted the request it cares about and then prevents the scraper from intercepting requests afterwards, but that didn't seem to work either. I think we intercept/queue so many requests that even if we do stop intercepting at the right time, we can't forward all the requests before we quit the session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment