Findings for crawling the new digitec store
It might be a good idea to work with the mobile site in order to get smaller responses.
The mobile site can be triggered by setting the following cookie:
Cookie: dismod=m1
Get all (= 5000) articles of one main category (HK)
https://www.digitec.ch/de/s1/tag/audio-hifi-591?&skip=0&take=5000
Adding the following HTTP header gets only the articles without the page wrappers:
X-Requested-With: XMLHttpRequest
The User-Agent HTTP header is not required at all. The following curl request works just fine:
curl -L https://www.digitec.ch/de/s1/producttype/2
Old category ID do no more match! Waaaaaaaahhhhh!
Examples:
- Arbeitsspeicher
https://www.digitec.ch/de/s1/producttype/arbeitsspeicher-2?tagIds=76
Used to be 205 - Mainboards
https://www.digitec.ch/de/s1/producttype/mainboard-65?tagIds=76
Used to be 210
As a result, category/tag needs to be written from scratch. The following page seems to be a good entry point:
https://www.digitec.ch/de/s1/sector/digital-1
Example:
https://www.digitec.ch/de/s1/tag/pc-komponenten-76?tagIds=76
Minified to
https://www.digitec.ch/de/s1/tag/76
General schema:
https://www.digitec.ch/de/s1/tag/(\w+-)?{mainCategoryTagID}
Example:
https://www.digitec.ch/de/s1/producttype/arbeitsspeicher-2?tagIds=76
Minified to
https://www.digitec.ch/de/s1/producttype/2
General schema:
https://www.digitec.ch/de/s1/producttype/(\w+-)?{subCategoryTagID}
Minified URLs only work if you follow HTTP statuc code 301.
Schema:
GET https://www.digitec.ch/IncrementalSearch/?i=9&l=de&c=1&e=false&p=25&q={query}
Parameters:
i: -
l: language code (e.g. de)
c: country id (e.g. 1)
e: -
q: query
Schema:
GET https://www.digitec.ch/de/Search?q={query}