Christopher Kullenberg christopherkullenberg

Researcher at the University of Gothenburg. Interested in computational methods applied to social science.

christopherkullenberg / swepubxmlparser.py

Created December 30, 2015 13:07

Question concering XML

	"""
	Data structure: http://libris.kb.se/xsearch?d=swepub&hitlist&q=l%C3%A4ros%C3%A4te%3agu&f=ext&spell=true&hist=true&n=200&p=1
	Trying to access only the value after "code="u">" in:
	<datafield tag="700" ind1="1" ind2=" ">
	<subfield code="a">Alvestad, Torgeir,</subfield>
	<subfield code="d">1960-,</subfield>
	<subfield code="u">Göteborgs universitet, Institutionen för pedagogik och didaktik, University of Gothenburg, Department of Education</subfield>
	<subfield code="4">edt</subfield>
	<subfield code="0">(SwePub:chalmers.se)xalvto</subfield>
	</datafield>

christopherkullenberg / swepubjsonparser.py

Created December 29, 2015 22:05

	import json
	from os import listdir

	for filename in listdir("GU20151228json/"): #alla filer i en katalog
	with open("GU20151228json/" + filename) as currentFile:

	jsondata = json.load(currentFile)
	print(jsondata)

christopherkullenberg / swepubscraper.py

Created December 29, 2015 18:28

	from urllib.request import urlopen

	counter = 1

	while True:
	url = 'http://libris.kb.se/xsearch?d=swepub&hitlist&q=l%C3%A4ros%C3%A4te%3agu&f=ext&spell=true&hist=true&n=200&format=json&start=' + str(counter)
	print ("Fetching: " + url)
	data = urlopen(url).read()
	if not data.find(b'"identifier"') >= 0:
	print("No more records!")