Skip to content

Instantly share code, notes, and snippets.

@scrapehero
Last active May 3, 2024 15:21
Show Gist options
  • Save scrapehero/c595899305db78de11ecf7a9c11d4a77 to your computer and use it in GitHub Desktop.
Save scrapehero/c595899305db78de11ecf7a9c11d4a77 to your computer and use it in GitHub Desktop.
Sitemap to extract job details based on a job and location from Indeed using webscraper.io chrome extension
{
"_id":"indeed",
"startUrl":[
"https://www.indeed.com/jobs?q=accountant&l=Los+Angeles,+CA&rbl=Anaheim,+CA&jlid=a05ccab40146becb&jt=fulltime"
],
"selectors":[
{
"id":"listings",
"type":"SelectorElement",
"parentSelectors":[
"_root",
"next"
],
"selector":"div.jobsearch-SerpJobCard:nth-of-type(n)",
"multiple":true,
"delay":0
},
{
"id":"link",
"type":"SelectorLink",
"parentSelectors":[
"listings"
],
"selector":"a.jobtitle",
"multiple":false,
"delay":0
},
{
"id":"job_title",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"h3.icl-u-xs-mb--xs",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"company",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"div.icl-u-lg-mr--sm:nth-of-type(1)",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"rating",
"type":"SelectorElementAttribute",
"parentSelectors":[
"link"
],
"selector":"a.icl-Ratings-starsCountWrapper",
"multiple":false,
"extractAttribute":"aria-label",
"delay":0
},
{
"id":"next",
"type":"SelectorLink",
"parentSelectors":[
"_root",
"next"
],
"selector":"div.pagination a:last-of-type",
"multiple":false,
"delay":0
},
{
"id":"reviews",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"a.icl-Ratings-starsCountWrapper div.icl-Ratings-count",
"multiple":false,
"regex":"[^reviews]+",
"delay":0
},
{
"id":"location",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"div.jobsearch-InlineCompanyRating",
"multiple":false,
"regex":"(?<=\-)\\s(.)",
"delay":0
},
{
"id":"job_description",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"div.jobsearch-JobComponent-description > div",
"multiple":false,
"regex":"",
"delay":0
}
]
}
@nedmules-sked
Copy link

Hey there. Thanks for creating this.
The "regex" does not parse as valid JSON and therefore the Chrome plug in doesn't work.
You need to escape the backslashes as per my fork.
Cheers

@NGuillard
Copy link

thank you all, working fine with the tip above :)

@saisurajkarra
Copy link

{
"_id":"indeed",
"startUrl":[
"https://www.indeed.com/jobs?q=accountant&l=Los+Angeles,+CA&rbl=Anaheim,+CA&jlid=a05ccab40146becb&jt=fulltime"
],
"selectors":[
{
"id":"listings",
"type":"SelectorElement",
"parentSelectors":[
"_root",
"next"
],
"selector":"div.jobsearch-SerpJobCard:nth-of-type(n)",
"multiple":true,
"delay":0
},
{
"id":"link",
"type":"SelectorLink",
"parentSelectors":[
"listings"
],
"selector":"a.jobtitle",
"multiple":false,
"delay":0
},
{
"id":"job_title",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"h3.icl-u-xs-mb--xs",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"company",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"div.icl-u-lg-mr--sm:nth-of-type(1)",
"multiple":false,
"regex":"",
"delay":0
},
{
"id":"rating",
"type":"SelectorElementAttribute",
"parentSelectors":[
"link"
],
"selector":"a.icl-Ratings-starsCountWrapper",
"multiple":false,
"extractAttribute":"aria-label",
"delay":0
},
{
"id":"next",
"type":"SelectorLink",
"parentSelectors":[
"_root",
"next"
],
"selector":"div.pagination a:last-of-type",
"multiple":false,
"delay":0
},
{
"id":"reviews",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"a.icl-Ratings-starsCountWrapper div.icl-Ratings-count",
"multiple":false,
"regex":"[^reviews]+",
"delay":0
},
{
"id":"location",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"div.jobsearch-InlineCompanyRating",
"multiple":false,
"regex":"(?<=)\s(.)",
"delay":0
},
{
"id":"job_description",
"type":"SelectorText",
"parentSelectors":[
"link"
],
"selector":"div.jobsearch-JobComponent-description > div",
"multiple":false,
"regex":"",
"delay":0
}
]
}

@sahajbanthia
Copy link

indeed.co.in/jobs?q=&l=Surat%2C+Gujarat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment