Created
July 1, 2021 03:05
-
-
Save websofter/fa34c304bf1cc1ef9575d457810abc7b to your computer and use it in GitHub Desktop.
Parsing xml data by URL with BeautifulSoup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <esri:Workspace xmlns:esri="http://www.esri.com/schemas/ArcGIS/9.3" | |
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | |
| xmlns:xs="http://www.w3.org/2001/XMLSchema"> | |
| <WorkspaceDefinition xsi:type="esri:WorkspaceDefinition"> | |
| ... | |
| </WorkspaceDefinition> | |
| <WorkspaceData xsi:type="esri:WorkspaceData"> | |
| <DatasetData xsi:type="esri:TableData"> | |
| <DatasetName>city</DatasetName> | |
| <DatasetType>esriDTFeatureClass</DatasetType> | |
| <Data xsi:type="esri:RecordSet"> | |
| ... | |
| <Records xsi:type="esri:ArrayOfRecord"> | |
| <Record xsi:type="esri:Record"> | |
| <Values xsi:type="esri:ArrayOfValue"> | |
| <Value xsi:type="xs:int">1</Value> | |
| <Value xsi:type="esri:PointN"> | |
| <X>184689.8424</X> | |
| <Y>640598.3157</Y> | |
| </Value> | |
| <Value xsi:type="xs:int">1</Value> | |
| <Value xsi:type="xs:short">862</Value> | |
| <Value xsi:type="xs:string">גני יוחנן</Value> | |
| <Value xsi:type="xs:int">536</Value> | |
| <Value xsi:type="xs:short">31</Value> | |
| <Value xsi:type="xs:string">מושבים (כפרים שיתופיים) (ב)</Value> | |
| <Value xsi:type="xs:string">GANNE YOHANAN</Value> | |
| </Values> | |
| </Record> | |
| ... | |
| </Records> | |
| </Data> | |
| </WorkspaceData | |
| ... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # import required modules | |
| import bs4 as bs | |
| import requests | |
| # assign URL | |
| URL = 'https://www.mapi.gov.il/ProfessionalInfo/Documents/dataGov/CITY.xml' | |
| # parsing | |
| url_link = requests.get(URL) | |
| file = bs.BeautifulSoup(url_link.text, "xml") | |
| # find all tables | |
| find_table = file.find('WorkspaceData', {"xsi:type": "esri:WorkspaceData"}) #, class_='numpy-table' xsi:type="esri:WorkspaceData" | |
| records = find_table.find_all('Record') | |
| ''' | |
| <Value xsi:type="xs:int">6</Value> | |
| <Value xsi:type="esri:PointN"> | |
| ... | |
| </Value> | |
| <Value xsi:type="xs:int">6</Value> | |
| <Value xsi:type="xs:short">868</Value> | |
| <Value xsi:type="xs:string">אלוני יצחק</Value> | |
| <Value xsi:type="xs:int">223</Value> | |
| <Value xsi:type="xs:short">34</Value> | |
| <Value xsi:type="xs:string">ישובים מוסדיים יהודים</Value> | |
| <Value xsi:type="xs:string">ALLONE YIZHAQ</Value> | |
| ''' | |
| # display tables | |
| #print(len(records)) #1240 | |
| #print(records[0].find_all('Value')[1].find('X').text, records[0].find_all('Value')[1].find('Y').text) #1240 | |
| cities = [] | |
| for record in records: | |
| record_id = record.find_all('Value')[0].text | |
| x = record.find_all('Value')[1].find('X').text | |
| y = record.find_all('Value')[1].find('Y').text | |
| record_id_2 = record.find_all('Value')[2].text | |
| city_id = record.find_all('Value')[3].text | |
| city_name_heb = record.find_all('Value')[4].text | |
| secondary_id = record.find_all('Value')[5].text | |
| city_type_id = record.find_all('Value')[6].text | |
| city_type_name = record.find_all('Value')[7].text | |
| city_name_eng = record.find_all('Value')[8].text | |
| city = {"record_id": record_id, "x":x, "y":y, "record_id_2":record_id_2, "city_id":city_id, "city_name_heb":city_name_heb, "secondary_id":secondary_id, "city_type_id":city_type_id, "city_type_name":city_type_name, "city_name_eng":city_name_eng} | |
| cities.append(city) | |
| print(cities) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| beautifulsoup4==4.9.3 | |
| bs4==0.0.1 | |
| certifi==2021.5.30 | |
| chardet==4.0.0 | |
| idna==2.10 | |
| lxml==4.6.3 | |
| requests==2.25.1 | |
| soupsieve==2.2.1 | |
| urllib3==1.26.6 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment