Skip to content

Instantly share code, notes, and snippets.

@AlessandroVaccarino
Created November 2, 2022 13:27
Show Gist options
  • Save AlessandroVaccarino/6ecf2f961368e53ccc68e136f728e9fc to your computer and use it in GitHub Desktop.
Save AlessandroVaccarino/6ecf2f961368e53ccc68e136f728e9fc to your computer and use it in GitHub Desktop.
Stack Exchange Data Explorer query (1664532) output cleaner for Tags Taxonomy
#Original Stack query:
#https://data.stackexchange.com/stackoverflow/query/1664532/tags-taxonomy
import pandas as pd
import csv
dataset = pd.read_csv('<Path>/QueryResults.csv', sep=',',quoting=csv.QUOTE_ALL)
dedup_dataset = dataset.drop_duplicates()
dedup_dataset.drop('WikiBody', axis=1, inplace=True)
dedup_dataset['ExcerptBody'] = dedup_dataset['ExcerptBody'].replace(r'\s+|\\n', ' ', regex=True)
dedup_dataset.to_csv('<Path>/QueryResults_Cleaned.csv', sep=',',quoting=csv.QUOTE_ALL, index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment