Skip to content

Instantly share code, notes, and snippets.

@CodeZombieCH
Last active August 29, 2015 14:04
Show Gist options
  • Save CodeZombieCH/38789d6912f6d3937d9c to your computer and use it in GitHub Desktop.
Save CodeZombieCH/38789d6912f6d3937d9c to your computer and use it in GitHub Desktop.
New digitec store

New digitec store

Findings for crawling the new digitec store

General thougts

It might be a good idea to work with the mobile site in order to get smaller responses.

The mobile site can be triggered by setting the following cookie:

Cookie: dismod=m1

Crawling articles

Get all (= 5000) articles of one main category (HK)

https://www.digitec.ch/de/s1/tag/audio-hifi-591?&skip=0&take=5000

Adding the following HTTP header gets only the articles without the page wrappers:

X-Requested-With: XMLHttpRequest

The User-Agent HTTP header is not required at all. The following curl request works just fine:

curl -L https://www.digitec.ch/de/s1/producttype/2

Categories/Tags

Old category ID do no more match! Waaaaaaaahhhhh!

Examples:

As a result, category/tag needs to be written from scratch. The following page seems to be a good entry point:

https://www.digitec.ch/de/s1/sector/digital-1

Basic URL schema

Level 1

Example:

https://www.digitec.ch/de/s1/tag/pc-komponenten-76?tagIds=76

Minified to

https://www.digitec.ch/de/s1/tag/76

General schema:

https://www.digitec.ch/de/s1/tag/(\w+-)?{mainCategoryTagID}

Level 2

Example:

https://www.digitec.ch/de/s1/producttype/arbeitsspeicher-2?tagIds=76

Minified to

https://www.digitec.ch/de/s1/producttype/2

General schema:

https://www.digitec.ch/de/s1/producttype/(\w+-)?{subCategoryTagID}

Minified URLs only work if you follow HTTP statuc code 301.

Product search

AJAX incremental serach

Schema:

GET https://www.digitec.ch/IncrementalSearch/?i=9&l=de&c=1&e=false&p=25&q={query}

Parameters:

i: -
l: language code (e.g. de)
c: country id (e.g. 1)
e: -
q: query

Regular search

Schema:

GET https://www.digitec.ch/de/Search?q={query}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment