Skip to content

Instantly share code, notes, and snippets.

@rohitdholakia
Created December 2, 2013 01:21
Show Gist options
  • Save rohitdholakia/7743443 to your computer and use it in GitHub Desktop.
Save rohitdholakia/7743443 to your computer and use it in GitHub Desktop.
Storing the details of questions
import os
import Utils
question_details = {}
top_dir = '/directory/with/your/site/data'
with open(os.path.join(top_dir, 'posts.xml')) as posts:
for event, elem in etree.iterparse(posts):
if Utils.getPostTypeId(elem) != "1":
continue
Id = Utils.getId(elem)
acceptedId = Utils.getAcceptedId(elem)
creationDate = Utils.getCreationDate(elem)
answerCount = Utils.getAnswerCount(elem)
ownerId = Utils.getOwner(elem)
title = Utils.getTitle(elem)
body = Utils.getBody(elem)
tags = [t.lstrip('<').rstrip('>') for t in Utils.getTags(elem).split('><')]
viewCount = Utils.getViewCount(elem)
question_details[Id] = {"acceptedId": acceptedId, "creationDate": creationDate, "answerCount": answerCount, \
"ownerId": ownerId, "title": title, "body": body, "tags": tags, \
"viewCount": viewCount}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment