rossmounce · May 28, 2014 09:37 · rsnape · May 28, 2014 · rsnape · May 28, 2014
diff --git a/gistfile1.txt b/gistfile1.txt
 I know I'm doing all types of wrong here:

 Source HTML file here: http://mdpi.com/1420-3049/19/4/5150/htm

 I want the text for the dc.source:

 Molecules 2014, Vol. 19, Pages 5150-5162

 Am using beautiful soup, so probably best to do it in that BUT it should also be regex-able. I can do this in bash no problem!

 hand = open('1420-3049.19.4.5150.htm')
 for ling in hand:
    ling = ling.rstrip()
    if re.search('name="dc.source"', ling) :
        bibinfo = ling.strip('\<').strip('>')
        print bibinfo+" "+originalurl

 output:

 <meta name="dc.source" content="Molecules 2014, Vol. 19, Pages 5150-5162" http://mdpi.com/1420-3049/19/4/5150/htm

 #NotWhatIWanted / nor expected
	I know I'm doing all types of wrong here:

	Source HTML file here: http://mdpi.com/1420-3049/19/4/5150/htm

	I want the text for the dc.source:

	Molecules 2014, Vol. 19, Pages 5150-5162

	Am using beautiful soup, so probably best to do it in that BUT it should also be regex-able. I can do this in bash no problem!

	hand = open('1420-3049.19.4.5150.htm')
	for ling in hand:
	ling = ling.rstrip()
	if re.search('name="dc.source"', ling) :
	bibinfo = ling.strip('\<').strip('>')
	print bibinfo+" "+originalurl

	output:

	<meta name="dc.source" content="Molecules 2014, Vol. 19, Pages 5150-5162" http://mdpi.com/1420-3049/19/4/5150/htm

	#NotWhatIWanted / nor expected