Skip to content

Instantly share code, notes, and snippets.

@ntakouris
Created January 12, 2017 21:36
Show Gist options
  • Save ntakouris/7c179ec9722b0622596d005c9baaa90e to your computer and use it in GitHub Desktop.
Save ntakouris/7c179ec9722b0622596d005c9baaa90e to your computer and use it in GitHub Desktop.
facebook.com/oute.papa.outai PATRA 1st line starts with # following: year/gender/department facebook.com/Ekpapokalypseis ATHINA 2 #, second one is uni specific department if specified. No year/gender info facebook.com/Ανομολόγητα-Πανεπιστημίου-Λευκωσίας-1569843519943720/ LEFKOSIA 1st line starts with # Last line has department facebook.com/Ανομολόγητα-ΠΑΜΑΚ-1704385103180340/ MAKEDONIA 1st line starts with # Last line contains department/year facebook.com/Ανομολόγητα-Πανεπιστημίου-Ιωαννίνων-Official-1448763452090090/ IWANNINA 1st line starts with # Last lines/characters specify gender (usual gender prefix: '#_') , department, year facebook.com/anomologitaauth THESALONIKI 1st line starts with # Last line contains department within parentheses facebook.com/libconfess ATHINA - APTH 1st line starts with # last line contains department info within parentheses (not frequent) facebook.com/Ανομολόγητα-Ιονίου-Πανεπιστημίου-1675258562726820/ IONIO 1st line starts with # Following with info inside parentheses, split by hyphens. Usually first is gender, then follows department, then year . Be careful for edge cases like 'Πτυχίο ούτε με τάμα στον Αγ. Σπυρίδωνα' facebook.com/anomologita.kerkyras KERKYRA 1st line starts with # last line sometimes contains gender info inside parentheses facebook.com/anomologhtateiathinas ATHINA - TEI 1st line starts with # follwing: info inside parentheses split by hyphens. Department, year then gender, other metadata. Edge case example: (Σχολή Επαγγελμάτων Υγείας και Πρόνοιας-1ο-ΓΥΝΑΙΚΑ-Εξομολόγηση-Αφιέρωση)/(Σχολή Επαγγελμάτων Υγείας και Πρόνοιας-1ο-Άνδρας-Άλλο)/(Σχολή Τεχνολογικών Εφαρμογών-1ο-Άνδρας-Εξομολόγηση-Αφιέρωση) facebook.com/anomologitaathinas 1st line starts with # Last characters: info inside parentheses, usually separated by comma, #gender, last part is location. Gender edge cases like #Σκύλος/#Trans (-> probably redirect anomalies to male). Probably ignore location because its like Αγία Βαρβάρα/Μετρό Αιγάλεω/Αθήνα a.k.a. oti nanai ============================ General decoding guidelines: Find out year, gender, department? year: 1,2,3,4,5,5+/6.../ptyxioyxos/epi ptyxio (optional)/1o/2o etc/ gender: Andras/Gynaika/Aner/Gynh department: depends if the page is for whole uni or just a dept e.g. PaPa vs EKPApokalypseis Be careful, department is 'legal' or Άλλο Πανεπιστήμιο etc Be advised, more hashtags may be incorporated in standard text content (specified by submittants) Maybe we could approximate gender based on language e.g. ton/ths but it's probably risky. scan first few lines + maybe hashtag and approximate a score based on relevance of characters that match (+ incorporate greek to latin dictionary if nothing with greek characters are found). Based on score insert to proper location(s) Parse post metadata like quoted links/posts/events/pages Ignore legit moderator posts (usually no hashes) Ignore pinned posts? Incorporate score threshold before inserting because some info are not specified (thus they may be normal text content)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment