Mahelita/scrape_tonie_tracks.ipynb

Created April 17, 2021 17:52

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/Mahelita/a6a934071f926a944d57ad0c6c99852d.js"></script>
Save Mahelita/a6a934071f926a944d57ad0c6c99852d to your computer and use it in GitHub Desktop.

Download ZIP

Scrape tonie tracks

Raw

scrape_tonie_tracks.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

steve8x8 commented Jan 7, 2023

Unfortunately, retrieving a "series" using requests.get() for me doesn't return the same information I'd get via the browser. Example: https://tonies.com/de-de/tonies/?series=anne-kaffeekanne ("tonies.de" gets replaced by "tonies.com/de-de" and "tonies/${series}" becomes "tonies/?series=${series}") - in the browser I get 1 hit while the python code returns some random, and unrelated, stuff :(
Any suggestions what might go wrong here?

Also I'm still trying to find out what's happening in the 4th stage... I'm getting some rather bad matches

steve8x8 commented Jan 9, 2023

My derived work: https://gist.github.com/steve8x8/db659463c5f86a1649f2a21c4aacc4b4

Mahelita/scrape_tonie_tracks.ipynb

Select an option

No results found

Select an option

No results found

steve8x8 commented Jan 7, 2023

Uh oh!

steve8x8 commented Jan 9, 2023

Uh oh!