Skip to content

Instantly share code, notes, and snippets.

@napsternxg
Last active March 12, 2019 00:06
Show Gist options
  • Save napsternxg/f984e5a1d4034926ada7 to your computer and use it in GitHub Desktop.
Save napsternxg/f984e5a1d4034926ada7 to your computer and use it in GitHub Desktop.
Author position issues in Microsoft Academic Graph data

Instances when same author position occurs in mutliple positions in a paper:

Total instances: 150783 E.g.

# PAPER  AUTHOR  NO_OF_OCCURENCENS
[((u'08D53976', u'4288B19F'), 2), 
 ((u'04659D00', u'3E8E4B9F'), 2), 
 ((u'0770BD71', u'41DFB1F8'), 2), 
 ((u'0139E130', u'412D2AC5'), 2), 
 ((u'03E26E62', u'2F708318'), 2)] 

Instances when multiple author ids are mapped to same author position in a paper

There are 716204 such instances E.g. 50 examples

+----------+------------------------+------------+
| Paper_ID | Author_sequence_number | instance_c |
+----------+------------------------+------------+
| 00000251 |                      2 |          2 |
| 00000950 |                      1 |          2 |
| 00000950 |                      2 |          2 |
| 00000950 |                      3 |          2 |
| 00000AB4 |                      1 |          2 |
| 00000DB2 |                      1 |          2 |
| 0000183A |                      4 |          2 |
| 0000185A |                      4 |          2 |
| 000019A2 |                      1 |          2 |
| 00002744 |                      1 |          2 |
| 00003642 |                      1 |          2 |
| 00003642 |                      2 |          2 |
| 00003642 |                      3 |          2 |
| 00003878 |                      1 |          3 |
| 00003DF7 |                      1 |          2 |
| 0000445F |                      4 |          2 |
| 0000445F |                      5 |          2 |
| 0000445F |                      6 |          2 |
| 0000445F |                      7 |          2 |
| 0000445F |                      8 |          2 |
| 0000445F |                      9 |          2 |
| 0000445F |                     10 |          2 |
| 0000445F |                     11 |          2 |
| 0000445F |                     13 |          2 |
| 0000445F |                     14 |          2 |
| 0000445F |                     16 |          2 |
| 0000445F |                     17 |          2 |
| 0000445F |                     18 |          2 |
| 000046C5 |                      4 |          2 |
| 0000471D |                      1 |          2 |
| 000047F7 |                      1 |          2 |
| 00004A4F |                      2 |          2 |
| 00004FF1 |                      2 |          2 |
| 00005185 |                      1 |          2 |
| 0000518E |                      2 |          2 |
| 000057CF |                      3 |          2 |
| 00005930 |                      1 |          2 |
| 00006032 |                      1 |          2 |
| 00006032 |                      2 |          2 |
| 00006222 |                      3 |          2 |
| 000062AE |                      3 |          2 |
| 0000656F |                      1 |          2 |
| 000066BB |                      4 |          2 |
| 000066BB |                      5 |          2 |
| 00006741 |                      3 |          2 |
| 0000675D |                      1 |          2 |
| 000069BF |                      3 |          2 |
| 000069BF |                      4 |          2 |
| 00006ABC |                      1 |          2 |
| 00007683 |                      1 |          2 |
+----------+------------------------+------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment