Skip to content

Instantly share code, notes, and snippets.

@websofter
Created July 1, 2021 03:05
Show Gist options
  • Select an option

  • Save websofter/fa34c304bf1cc1ef9575d457810abc7b to your computer and use it in GitHub Desktop.

Select an option

Save websofter/fa34c304bf1cc1ef9575d457810abc7b to your computer and use it in GitHub Desktop.
Parsing xml data by URL with BeautifulSoup
<esri:Workspace xmlns:esri="http://www.esri.com/schemas/ArcGIS/9.3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<WorkspaceDefinition xsi:type="esri:WorkspaceDefinition">
...
</WorkspaceDefinition>
<WorkspaceData xsi:type="esri:WorkspaceData">
<DatasetData xsi:type="esri:TableData">
<DatasetName>city</DatasetName>
<DatasetType>esriDTFeatureClass</DatasetType>
<Data xsi:type="esri:RecordSet">
...
<Records xsi:type="esri:ArrayOfRecord">
<Record xsi:type="esri:Record">
<Values xsi:type="esri:ArrayOfValue">
<Value xsi:type="xs:int">1</Value>
<Value xsi:type="esri:PointN">
<X>184689.8424</X>
<Y>640598.3157</Y>
</Value>
<Value xsi:type="xs:int">1</Value>
<Value xsi:type="xs:short">862</Value>
<Value xsi:type="xs:string">גני יוחנן</Value>
<Value xsi:type="xs:int">536</Value>
<Value xsi:type="xs:short">31</Value>
<Value xsi:type="xs:string">מושבים (כפרים שיתופיים) (ב)</Value>
<Value xsi:type="xs:string">GANNE YOHANAN</Value>
</Values>
</Record>
...
</Records>
</Data>
</WorkspaceData
...
# import required modules
import bs4 as bs
import requests
# assign URL
URL = 'https://www.mapi.gov.il/ProfessionalInfo/Documents/dataGov/CITY.xml'
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "xml")
# find all tables
find_table = file.find('WorkspaceData', {"xsi:type": "esri:WorkspaceData"}) #, class_='numpy-table' xsi:type="esri:WorkspaceData"
records = find_table.find_all('Record')
'''
<Value xsi:type="xs:int">6</Value>
<Value xsi:type="esri:PointN">
...
</Value>
<Value xsi:type="xs:int">6</Value>
<Value xsi:type="xs:short">868</Value>
<Value xsi:type="xs:string">אלוני יצחק</Value>
<Value xsi:type="xs:int">223</Value>
<Value xsi:type="xs:short">34</Value>
<Value xsi:type="xs:string">ישובים מוסדיים יהודים</Value>
<Value xsi:type="xs:string">ALLONE YIZHAQ</Value>
'''
# display tables
#print(len(records)) #1240
#print(records[0].find_all('Value')[1].find('X').text, records[0].find_all('Value')[1].find('Y').text) #1240
cities = []
for record in records:
record_id = record.find_all('Value')[0].text
x = record.find_all('Value')[1].find('X').text
y = record.find_all('Value')[1].find('Y').text
record_id_2 = record.find_all('Value')[2].text
city_id = record.find_all('Value')[3].text
city_name_heb = record.find_all('Value')[4].text
secondary_id = record.find_all('Value')[5].text
city_type_id = record.find_all('Value')[6].text
city_type_name = record.find_all('Value')[7].text
city_name_eng = record.find_all('Value')[8].text
city = {"record_id": record_id, "x":x, "y":y, "record_id_2":record_id_2, "city_id":city_id, "city_name_heb":city_name_heb, "secondary_id":secondary_id, "city_type_id":city_type_id, "city_type_name":city_type_name, "city_name_eng":city_name_eng}
cities.append(city)
print(cities)
beautifulsoup4==4.9.3
bs4==0.0.1
certifi==2021.5.30
chardet==4.0.0
idna==2.10
lxml==4.6.3
requests==2.25.1
soupsieve==2.2.1
urllib3==1.26.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment