Created
March 9, 2023 11:09
-
-
Save aanastasiou/75ea710b15e1bf9359858a6597262454 to your computer and use it in GitHub Desktop.
Code to locate the specific "Washington, D.C." entry that causes a constrain validation upon ingesting ROR.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A brief script to indicate the "location" of a possibly misspelled 'Washington' | |
in the current (v1.20-2023-02-28-ror-data.json) ROR dataset | |
:author: Athanasios Anastasiou | |
:date: Mar 2023 | |
""" | |
import json | |
if __name__ == "__main__": | |
data_file = "v1.20-2023-02-28-ror-data.json" | |
# Load the data file | |
with open(data_file, "r") as fd: | |
data = json.load(fd) | |
# Get all addresses[].geonames_city.id and name for 4140963 | |
q = list(filter(lambda x:4140963 in list(map(lambda y:y["geonames_city"]["id"] if "id" in y["geonames_city"] else "",x["addresses"])), data)) | |
# Isolate the id and city attributes from the rest of the data structure | |
z = list(map(lambda x:list(map(lambda y:(x["id"], y["geonames_city"]["id"], y["geonames_city"]["city"]), x["addresses"]))[0] ,q)) | |
# The following step should return only one entry if it is unique. | |
# Unfortunately it returns two, which is why the db constrain in my system fails. | |
print(set(map(lambda x:(x[1], x[2]), z))) | |
# Now go back and search for that entry that has that misspelled "Washington" | |
f = list(filter(lambda x: x[1]==4140963 and x[2]=="Washington", z)) | |
print(f) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment