Last active
November 21, 2017 09:20
-
-
Save quis/9c2625225b7e381da2b0d523ae54b3b7 to your computer and use it in GitHub Desktop.
Domains of public sector organisations not on gov.uk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
acas.org.uk | |
ahdb.org.uk | |
ahrc.ac.uk | |
arb.org.uk | |
artscouncil.org.uk | |
bankofengland.co.uk | |
bbc.co.uk | |
bbsrc.ac.uk | |
bfi.org.uk | |
biglotteryfund.org.uk | |
bl.uk | |
boundarycommission.org.uk | |
british-business-bank.co.uk | |
britishcouncil.org | |
britishmuseum.org | |
caa.co.uk | |
careerswales.com | |
catribunal.org.uk | |
ccwater.org.uk | |
channel4.com | |
chevening.org | |
citb.co.uk | |
comisiynyddygymraeg.org | |
cqc.org.uk | |
dpecgb.co.uk | |
dsfc.ac.uk | |
dsma.uk | |
ebbsfleetdc.org.uk | |
ecitb.org.uk | |
eis2win.co.uk | |
electoralcommission.org.uk | |
epsrc.ac.uk | |
equalityhumanrights.com | |
esrc.ac.uk | |
fca.org.uk | |
finds.org.uk | |
fireservicecollege.ac.uk | |
fleetairarm.com | |
gbcc.org.uk | |
geffrye-museum.org.uk | |
gov.scot | |
greeninvestmentbank.com | |
hblb.org.uk | |
hefce.ac.uk | |
hesa.ac.uk | |
historicengland.org.uk | |
hlf.org.uk | |
horniman.ac.uk | |
housing-ombudsman.org.uk | |
hrp.org.uk | |
ico.org.uk | |
icrev.org.uk | |
imb.org.uk | |
intelligencecommissioner.com | |
iocco-uk.info | |
ipt-uk.com | |
iraqinquiry.org.uk | |
iwm.org.uk | |
kew.org | |
lcrhq.co.uk | |
lease-advice.org | |
legalombudsman.org.uk | |
legalservicesboard.org.uk | |
lgo.org.uk | |
liverpoolmuseums.org.uk | |
marshallscholarship.org | |
mrc.ac.uk | |
nam.ac.uk | |
nationalforest.org | |
nationalgallery.org.uk | |
nerc.ac.uk | |
nestpensions.org.uk | |
newcoventgardenmarket.com | |
nhm.ac.uk | |
nhmf.org.uk | |
nhsla.com | |
nic.org.uk | |
nice.org.uk | |
nihrc.org | |
nipolicingboard.org.uk | |
nlb.org.uk | |
nmrn.org.uk | |
northumberlandnationalpark.org.uk | |
northyorkmoors.org.uk | |
npg.org.uk | |
nsandi.com | |
ofcom.org.uk | |
offa.org.uk | |
ogauthority.co.uk | |
ombudsman.org.uk | |
onr.org.uk | |
ordnancesurvey.co.uk | |
paradescommission.org | |
pbni.org.uk | |
pensionprotectionfund.org.uk | |
pensions-ombudsman.org.uk | |
pensionsadvisoryservice.org.uk | |
pharmacopoeia.com | |
portonbiopharma.com | |
ppfo.org.uk | |
professionalstandards.org.uk | |
psr.org.uk | |
qeiicc.co.uk | |
rafmuseum.org.uk | |
registrarofconsultantlobbyists.org.uk | |
rmg.co.uk | |
royalarmouries.org | |
royalmarinesmuseum.co.uk | |
royalmint.com | |
royalparks.org.uk | |
rssb.co.uk | |
s4c.co.uk | |
safetyatsportsgrounds.org.uk | |
sciencemuseum.org.uk | |
seafish.org | |
sentencingcouncil.org.uk | |
servicecomplaintsombudsman.org.uk | |
slc.co.uk | |
soane.org | |
sportengland.org | |
stfc.ac.uk | |
submarine-museum.co.uk | |
supremecourt.uk | |
tate.org.uk | |
theatrestrust.org.uk | |
theccc.org.uk | |
thecrownestate.co.uk | |
theipsa.org.uk | |
transportfocus.org.uk | |
trinityhouse.co.uk | |
ukad.org.uk | |
ukri.org | |
vam.ac.uk | |
victimscommissioner.org.uk | |
visitbritain.org | |
visitengland.com | |
wallacecollection.org | |
wfd.org | |
wiltonpark.org.uk | |
yorkshiredales.org.uk |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# Adapted from https://github.com/openregister/government-organisation-data/blob/4623fb7c88135c8eeeafdb3fb1b911f424df3c67/lists/govuk/download.py | |
import sys | |
import requests | |
import json | |
from bs4 import BeautifulSoup | |
from urllib.parse import urlparse | |
domains_to_exclude = { | |
'.gov.uk', | |
'.nhs.uk', | |
'.police.uk', | |
'.mod.uk', | |
} | |
def get_government_domains(): | |
url = "https://www.gov.uk/api/organisations?page=1" | |
while url: | |
resp = requests.get(url=url) | |
r = json.loads(resp.text) | |
for row in r['results']: | |
page = requests.get(row['web_url']) | |
soup = BeautifulSoup(page.text, "html.parser") | |
element = soup.select_one(".url-link") | |
if element: | |
link_href = urlparse(element['href']) \ | |
.netloc \ | |
.replace('www.', '') | |
if not any( | |
link_href.endswith(domain) for domain in domains_to_exclude | |
): | |
print(link_href) | |
yield link_href | |
if 'next_page_url' in r: | |
url = r['next_page_url'] | |
else: | |
url = None | |
domains = set(get_government_domains()) | |
print("="*80) | |
for domain in sorted(domains): | |
print(domain) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
adjudicatorsoffice.gov.uk | |
bcomm-scotland.independent.gov.uk | |
bcomm-wales.gov.uk | |
broads-authority.gov.uk | |
btpa.police.uk | |
budgetresponsibility.independent.gov.uk | |
cafcass.gov.uk | |
ccrc.gov.uk | |
childrenscommissioner.gov.uk | |
civilservicecommission.independent.gov.uk | |
consultation.boundarycommissionforengland.independent.gov.uk | |
cpni.gov.uk | |
cps.gov.uk | |
da.mod.uk | |
dartmoor-npa.gov.uk | |
dcalni.gov.uk | |
dft.gov.uk | |
digital.nhs.uk | |
dwi.gov.uk | |
england.nhs.uk | |
estyn.gov.uk | |
exmoor-nationalpark.gov.uk | |
fcoservices.gov.uk | |
food.gov.uk | |
forestry.gov.uk | |
gamblingcommission.gov.uk | |
gchq.gov.uk | |
gla.gov.uk | |
hee.nhs.uk | |
hfea.gov.uk | |
hmgcc.gov.uk | |
hmic.gov.uk | |
hra.nhs.uk | |
hse.gov.uk | |
hta.gov.uk | |
iapdeathsincustody.independent.gov.uk | |
icai.independent.gov.uk | |
improvement.nhs.uk | |
ipcc.gov.uk | |
jac.judiciary.gov.uk | |
jncc.defra.gov.uk | |
judiciary.gov.uk | |
justice.gov.uk | |
justiceinspectorates.gov.uk | |
lakedistrict.gov.uk | |
lawcom.gov.uk | |
lordsappointments.independent.gov.uk | |
metoffice.gov.uk | |
mi5.gov.uk | |
nationalarchives.gov.uk | |
nationalcrimeagency.gov.uk | |
naturalresourceswales.gov.uk | |
ncsc.gov.uk | |
newforestnpa.gov.uk | |
nhsbsa.nhs.uk | |
nhsbt.nhs.uk | |
nihe.gov.uk | |
northernireland.gov.uk | |
ofgem.gov.uk | |
ofwat.gov.uk | |
ons.gov.uk | |
orr.gov.uk | |
peakdistrict.gov.uk | |
ppo.gov.uk | |
privycouncil.independent.gov.uk | |
publicappointmentscommissioner.independent.gov.uk | |
publichealthwales.wales.nhs.uk | |
sfo.gov.uk | |
sia.homeoffice.gov.uk | |
sis.gov.uk | |
southdowns.gov.uk | |
spa.independent.gov.uk | |
statisticsauthority.gov.uk | |
surveillancecommissioners.independent.gov.uk | |
terrorismlegislationreviewer.independent.gov.uk | |
thepensionsregulator.gov.uk | |
uksport.gov.uk | |
valuationtribunal.gov.uk | |
wales.gov.uk | |
wales.nhs.uk | |
wao.gov.uk | |
webarchive.nationalarchives.gov.uk |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment