Skip to content

Instantly share code, notes, and snippets.

@harrisonmalone
Last active December 23, 2018 04:01
Show Gist options
  • Save harrisonmalone/1746a1a509624e4955c171316e50e6b4 to your computer and use it in GitHub Desktop.
Save harrisonmalone/1746a1a509624e4955c171316e50e6b4 to your computer and use it in GitHub Desktop.
meeting i had with hugo in like may of 2018 (just after bootcamp) to discuss some ideas i had for a few different apps
to get all artists => https://pitchfork.com/artists/by/alpha/(a..z)/
to get specific elements on each page =>
[1] pry(main)> driver.find_element(:class, "score")
=> #<Selenium::WebDriver::Element:0x77d43895b7210db2 id="0.5838103215094534-1">
[2] pry(main)> driver.find_element(:class, "score").text
=> "6.8"
[3] pry(main)> driver.find_element(:class, "artist-links artist-list single-album-tombstone__artist-links").text
\
Selenium::WebDriver::Error::InvalidSelectorError: invalid selector: Compound class names not permitted
(Session info: chrome=66.0.3359.139)
(Driver info: chromedriver=2.38.552518 (183d19265345f54ce39cbb94cf81ba5f15905011),platform=Mac OS X 10.13.4 x86_64)
from /Users/harrisonmalone/.rbenv/versions/2.4.3/lib/ruby/gems/2.4.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/response.rb:69:in `assert_ok'
[4] pry(main)> driver.find_element(:class, "artist-list").text
=> "Middle Kids"
[5] pry(main)> driver.find_element(:class, "single-album-tombstone__meta-year").text
=> "• 2018"
[6] pry(main)> year = driver.find_element(:class, "single-album-tombstone__meta-year").text
=> "• 2018"
[7] pry(main)> year
=> "• 2018"
[8] pry(main)> year.gsub!("• ","")
=> "2018"
[9] pry(main)> year
=> "2018"
to get the albums of an artist =>
[10] pry(main)> array[0]
=> #<Selenium::WebDriver::Element:0x..f9ba00174faee20d6 id="0.7842375297075637-2">
[11] pry(main)> band_urls = []
=> []
[12] pry(main)> array[0].find_atribute("href")
NoMethodError: undefined method `find_atribute' for #<Selenium::WebDriver::Element:0x00007faaf797dae8>
from (pry):12:in `<main>'
[13] pry(main)> array[0].atribute("href")
NoMethodError: undefined method `atribute' for #<Selenium::WebDriver::Element:0x00007faaf797dae8>
Did you mean? attribute
from (pry):13:in `<main>'
[14] pry(main)> array[0].attribute("href")
=> "https://pitchfork.com/reviews/albums/iceage-beyondless/"
[15] pry(main)> band_urls << array[0].attribute("href")
=> ["https://pitchfork.com/reviews/albums/iceage-beyondless/"]
[16] pry(main)> band_urls
=> ["https://pitchfork.com/reviews/albums/iceage-beyondless/"]
[17] pry(main)> urls = array.each do |url|
[17] pry(main)* array[url].attribute("href")
[17] pry(main)* end
TypeError: no implicit conversion of Selenium::WebDriver::Element into Integer
from (pry):18:in `[]'
[18] pry(main)> array
=> [#<Selenium::WebDriver::Element:0x..f9ba00174faee20d6 id="0.7842375297075637-2">,
#<Selenium::WebDriver::Element:0x1e5356c3dcf6656 id="0.7842375297075637-3">,
#<Selenium::WebDriver::Element:0x2b89a3de71cc8c64 id="0.7842375297075637-4">,
#<Selenium::WebDriver::Element:0x..fb0caa6f3cc3b78a id="0.7842375297075637-5">]
[19] pry(main)> array.each do |url|
[19] pry(main)* url.attribute("href")
[19] pry(main)* end
=> [#<Selenium::WebDriver::Element:0x..f9ba00174faee20d6 id="0.7842375297075637-2">,
#<Selenium::WebDriver::Element:0x1e5356c3dcf6656 id="0.7842375297075637-3">,
#<Selenium::WebDriver::Element:0x2b89a3de71cc8c64 id="0.7842375297075637-4">,
#<Selenium::WebDriver::Element:0x..fb0caa6f3cc3b78a id="0.7842375297075637-5">]
[20] pry(main)> array.each do |url|
[20] pry(main)* url.attribute("href")
[20] pry(main)* end
=> [#<Selenium::WebDriver::Element:0x..f9ba00174faee20d6 id="0.7842375297075637-2">,
#<Selenium::WebDriver::Element:0x1e5356c3dcf6656 id="0.7842375297075637-3">,
#<Selenium::WebDriver::Element:0x2b89a3de71cc8c64 id="0.7842375297075637-4">,
#<Selenium::WebDriver::Element:0x..fb0caa6f3cc3b78a id="0.7842375297075637-5">]
[21] pry(main)> array.each do |url|
[21] pry(main)* urls = url.attribute("href")
[21] pry(main)* band_array << urls
[21] pry(main)* end
NameError: undefined local variable or method `band_array' for main:Object
Did you mean? band_urls
from (pry):29:in `block in <main>'
[22] pry(main)> array.each do |url|
[22] pry(main)* urls = url.attribute("href")
[22] pry(main)* band_urls << urls
[22] pry(main)* end
=> [#<Selenium::WebDriver::Element:0x..f9ba00174faee20d6 id="0.7842375297075637-2">,
#<Selenium::WebDriver::Element:0x1e5356c3dcf6656 id="0.7842375297075637-3">,
#<Selenium::WebDriver::Element:0x2b89a3de71cc8c64 id="0.7842375297075637-4">,
#<Selenium::WebDriver::Element:0x..fb0caa6f3cc3b78a id="0.7842375297075637-5">]
[23] pry(main)> band_urls
=> ["https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/19806-iceage-plowing-into-the-field-of-love/",
"https://pitchfork.com/reviews/albums/17623-iceage-youre-nothing/",
"https://pitchfork.com/reviews/albums/15576-new-brigade/"]
[24] pry(main)> array.each do |url|
[24] pry(main)* band_urls << url.attribute("href")
[24] pry(main)* end
=> [#<Selenium::WebDriver::Element:0x..f9ba00174faee20d6 id="0.7842375297075637-2">,
#<Selenium::WebDriver::Element:0x1e5356c3dcf6656 id="0.7842375297075637-3">,
#<Selenium::WebDriver::Element:0x2b89a3de71cc8c64 id="0.7842375297075637-4">,
#<Selenium::WebDriver::Element:0x..fb0caa6f3cc3b78a id="0.7842375297075637-5">]
[25] pry(main)> band_urls
=> ["https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/19806-iceage-plowing-into-the-field-of-love/",
"https://pitchfork.com/reviews/albums/17623-iceage-youre-nothing/",
"https://pitchfork.com/reviews/albums/15576-new-brigade/",
"https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/19806-iceage-plowing-into-the-field-of-love/",
"https://pitchfork.com/reviews/albums/17623-iceage-youre-nothing/",
"https://pitchfork.com/reviews/albums/15576-new-brigade/"]
[26] pry(main)> new_band_url = array.map do |url|
[26] pry(main)* url.attribute("href")
[26] pry(main)* end
=> ["https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/19806-iceage-plowing-into-the-field-of-love/",
"https://pitchfork.com/reviews/albums/17623-iceage-youre-nothing/",
"https://pitchfork.com/reviews/albums/15576-new-brigade/"]
[27] pry(main)> new_band_url
=> ["https://pitchfork.com/reviews/albums/iceage-beyondless/",
"https://pitchfork.com/reviews/albums/19806-iceage-plowing-into-the-field-of-love/",
"https://pitchfork.com/reviews/albums/17623-iceage-youre-nothing/",
"https://pitchfork.com/reviews/albums/15576-new-brigade/"]
[28] pry(main)>
firstly talked about how to do the pace calculator
remember simple math
find pace per kilometre
time = 53 mins 16 seconds
distance = 12.03km
convert into seconds
53 * 60 = 3180
3180 + 16 = 3196 seconds
3196 / 12.03 = 265.6691
265.6691 / 60 = 4.4278
4 mins (.4278 * 60) = 4 mins 26 seconds
pace = 4 mins 26 seconds (266 seconds)
time = 53 mins 16 seconds (3196 seconds)
now find distance
3196 / 266 = 12.02km
? strava uses some other equation
now find time
pace = 4 mins 26 seconds (266 seconds)
distance = 12.03km
266 * 12.03 = 3200 seconds
3200 / 60 = 53.33333
53 mins (.3333 * 60) = 53 mins 20 seconds
everything is slightly off but the formulas are correct
do this in terminal with gets.chomp
we created the models together for the pitchfork score scraper
artist model with a name
album model with a name, year, album url, foreign key
used selenium-webdriver as a scraper
using an a..z range to get all artists
make infinite scroll before grabbing all the artists
notes for specific scrapes are in find_elements_using_selenium.txt
require "pry-byebug"
require "selenium-webdriver"
driver = Selenium::WebDriver.for :chrome
driver.navigate.to "https://pitchfork.com/artists/29540-iceage/"
binding.pry
sleep(5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment