Skip to content

Instantly share code, notes, and snippets.

@docmarionum1
Created December 24, 2017 19:50
Show Gist options
  • Save docmarionum1/cdc3e4a72f8af1c5776379d9a4426dc5 to your computer and use it in GitHub Desktop.
Save docmarionum1/cdc3e4a72f8af1c5776379d9a4426dc5 to your computer and use it in GitHub Desktop.
# Read the file into a dataframe.
with open('dictionary/instance_types_en.ttl', 'r', encoding="utf8") as f:
f.readline()
for l in f.readlines():
try:
split = l.split(' ')
rows.append((split[0].split('/')[-1], split[2]))
except:
print(l)
df = pd.DataFrame.from_records(rows[:-1], columns=['object', 'type'])
# Clean up the formatting and filter it.
df = df[df['type'].str.contains('dbpedia.org/ontology')].copy()
df['object'] = df['object'].str.strip(
r'[<>]'
).str.split(
'__', expand=True
)[0].str.replace(
'_', ' '
).str.replace(
r'\(.*\)', ''
).str.strip().drop_duplicates()
df['object'] = df['object'].str.replace(r'\s+', ' ')
df['object'] = df['object'].str.replace('%22', '"')
df['object'] = df['object'].str.replace('%3F', '?')
df['type'] = df['type'].str.strip(r'[<>]').str.split('/', expand=True)[4]
df = df[~(df['object'] == '')].copy()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment