Skip to content

Instantly share code, notes, and snippets.

@Ironholds
Created October 1, 2014 15:35
Show Gist options
  • Select an option

  • Save Ironholds/a68d660fe4b2e1b65284 to your computer and use it in GitHub Desktop.

Select an option

Save Ironholds/a68d660fe4b2e1b65284 to your computer and use it in GitHub Desktop.
SELECT regexp_extract(uri_path, '/((\\d+px-)?[^/]+)$', 1) as uri_file
FROM wmf_raw.webrequest
WHERE year = 2014 AND month = 09 AND day = 23 AND
hour = 07 AND webrequest_source='upload' LIMIT 5;
@ottomata
Copy link
Copy Markdown

ottomata commented Oct 1, 2014

Your parens were a bit off, and you need to be getting the 2nd capture in this regex:

SELECT uri_path, regexp_extract(uri_path, '/(\\d+px-)?([^/]+)$', 2) as uri_file
FROM wmf_raw.webrequest
WHERE year = 2014 AND month = 09 AND day = 23 AND
hour = 07 AND webrequest_source='upload' LIMIT 5;
uri_path    uri_file
/wikipedia/commons/thumb/1/10/United_States_film.svg/40px-United_States_film.svg.png    United_States_film.svg.png
/wikipedia/meta/1/16/MediaWiki-logo_sister_1x.png   MediaWiki-logo_sister_1x.png
/wikipedia/commons/thumb/3/3e/Flag_of_New_Zealand.svg/22px-Flag_of_New_Zealand.svg.png  Flag_of_New_Zealand.svg.png
/wikipedia/commons/thumb/e/ed/Luca_Ward_al_Giffoni_Film_Festival_2010.jpg/220px-Luca_Ward_al_Giffoni_Film_Festival_2010.jpg Luca_Ward_al_Giffoni_Film_Festival_2010.jpg
/wikipedia/commons/thumb/8/86/Giovanni_Battista_Tiepolo_017.jpg/220px-Giovanni_Battista_Tiepolo_017.jpg Giovanni_Battista_Tiepolo_017.jpg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment