Skip to content

Instantly share code, notes, and snippets.

@eliasdorneles
Last active December 22, 2016 14:51
Show Gist options
  • Save eliasdorneles/0be7f89294f6f3a2bc4c253d75d9cf01 to your computer and use it in GitHub Desktop.
Save eliasdorneles/0be7f89294f6f3a2bc4c253d75d9cf01 to your computer and use it in GitHub Desktop.
Ideas for a more human friendly Parsel

Focus on scraping use cases

  • Warn about /table/tbody/tr in the query, if no results (common mistake)
    • Consider using HTML5 parser instead
  • Shorter method names
  • Guess encoding in the case of bytes input
  • Selector vs SelectorList <- could we have only one class, make it list-like? (like jQuery)
  • Consider adding a shortcut to parse JSON (demjson, to support liberal input JS-like)
  • Add a simple link extraction method

Another idea to play with: give a score to a XPath expression representing the probability of it to stop working if HTML markup changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment