Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save JonathanLoscalzo/52d1cf3c80de041f7218e62a5d194565 to your computer and use it in GitHub Desktop.

Select an option

Save JonathanLoscalzo/52d1cf3c80de041f7218e62a5d194565 to your computer and use it in GitHub Desktop.
Cleaning Data with python
# Print the head of airquality
print(airquality.head())
# Melt airquality: airquality_melt
airquality_melt = pd.melt(airquality, id_vars=['Month', 'Day'])
# Print the head of airquality_melt
print(airquality_melt.head())
# Print the head of airquality
print(airquality.head())
# Melt airquality: airquality_melt
airquality_melt = pd.melt(airquality, id_vars=['Month', 'Day'], var_name='measurement', value_name='reading')
# Print the head of airquality_melt
print(airquality_melt.head())
# Print the head of airquality_melt
print(airquality_melt.head())
# Pivot airquality_melt: airquality_pivot
airquality_pivot = airquality_melt.pivot_table(index=['Month', 'Day'], columns='measurement', values='reading')
# Print the head of airquality_pivot
print(airquality_pivot.head())
# Melt tb: tb_melt
tb_melt = pd.melt(tb, id_vars=['country', 'year'])
""""country year variable value
0 AD 2000 m014 0.0
1 AE 2000 m014 2.0
2 AF 2000 m014 52.0
3 AG 2000 m014 0.0
4 AL 2000 m014 2.0
""""
# Create the 'gender' column
tb_melt['gender'] = tb_melt.variable.str[0]
# Create the 'age_group' column
tb_melt['age_group'] = tb_melt.variable.str[1:]
""""
Country year variable value gender age_group
0 AD 2000 m014 0.0 m 014
1 AE 2000 m014 2.0 m 014
2 AF 2000 m014 52.0 m 014
3 AG 2000 m014 0.0 m 014
""""
# Print the head of tb_melt
print(tb_melt.head())
# Melt ebola: ebola_melt
ebola_melt = pd.melt(ebola, id_vars=['Date', 'Day'], var_name='type_country', value_name='counts')
# Create the 'str_split' column
ebola_melt['str_split'] = ebola_melt.type_country.str.split('_')
# Create the 'type' column
ebola_melt['type'] = ebola_melt.str_split.str.get(0)
# Create the 'country' column
ebola_melt['country'] = ebola_melt.str_split.str.get(1)
# Print the head of ebola_melt
print(ebola_melt.head())
"""
Date Day type_country counts str_split type country
0 1/5/2015 289 Cases_Guinea 2776.0 [Cases, Guinea] Cases Guinea
1 1/4/2015 288 Cases_Guinea 2775.0 [Cases, Guinea] Cases Guinea
2 1/3/2015 287 Cases_Guinea 2769.0 [Cases, Guinea] Cases Guinea
3 1/2/2015 286 Cases_Guinea NaN [Cases, Guinea] Cases Guinea
"""
# Concatenate uber1, uber2, and uber3: row_concat
row_concat = pd.concat([uber1, uber2, uber3])
"""
Date/Time Lat Lon Base
0 4/1/2014 0:11:00 40.7690 -73.9549 B02512
1 4/1/2014 0:17:00 40.7267 -74.0345 B02512
2 4/1/2014 0:21:00 40.7316 -73.9873 B02512
"""
# Concatenate ebola_melt and status_country column-wise: ebola_tidy
ebola_tidy = pd.concat([ebola_melt, status_country], axis=1)
"""
Date Day status_country counts status country
0 1/5/2015 289 Cases_Guinea 2776.0 Cases Guinea
1 1/4/2015 288 Cases_Guinea 2775.0 Cases Guinea
2 1/3/2015 287 Cases_Guinea 2769.0 Cases Guinea
3 1/2/2015 286 Cases_Guinea NaN Cases Guinea
"""
# Merge the DataFrames: m2o
m2o = pd.merge(left=site, right=visited, left_on='name', right_on='site')
Ozone Solar.R Wind Temp Month Day
41 190 7.4 67 5 1
36 118 8 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
28 NA 14.9 66 5 6
23 299 8.6 65 5 7
19 99 13.8 59 5 8
8 19 20.1 61 5 9
NA 194 8.6 69 5 10
7 NA 6.9 74 5 11
16 256 9.7 69 5 12
11 290 9.2 66 5 13
14 274 10.9 68 5 14
18 65 13.2 58 5 15
14 334 11.5 64 5 16
34 307 12 66 5 17
6 78 18.4 57 5 18
30 322 11.5 68 5 19
11 44 9.7 62 5 20
1 8 9.7 59 5 21
11 320 16.6 73 5 22
4 25 9.7 61 5 23
32 92 12 61 5 24
NA 66 16.6 57 5 25
NA 266 14.9 58 5 26
NA NA 8 57 5 27
23 13 12 67 5 28
45 252 14.9 81 5 29
115 223 5.7 79 5 30
37 279 7.4 76 5 31
NA 286 8.6 78 6 1
NA 287 9.7 74 6 2
NA 242 16.1 67 6 3
NA 186 9.2 84 6 4
NA 220 8.6 85 6 5
NA 264 14.3 79 6 6
29 127 9.7 82 6 7
NA 273 6.9 87 6 8
71 291 13.8 90 6 9
39 323 11.5 87 6 10
NA 259 10.9 93 6 11
NA 250 9.2 92 6 12
23 148 8 82 6 13
NA 332 13.8 80 6 14
NA 322 11.5 79 6 15
21 191 14.9 77 6 16
37 284 20.7 72 6 17
20 37 9.2 65 6 18
12 120 11.5 73 6 19
13 137 10.3 76 6 20
NA 150 6.3 77 6 21
NA 59 1.7 76 6 22
NA 91 4.6 76 6 23
NA 250 6.3 76 6 24
NA 135 8 75 6 25
NA 127 8 78 6 26
NA 47 10.3 73 6 27
NA 98 11.5 80 6 28
NA 31 14.9 77 6 29
NA 138 8 83 6 30
135 269 4.1 84 7 1
49 248 9.2 85 7 2
32 236 9.2 81 7 3
NA 101 10.9 84 7 4
64 175 4.6 83 7 5
40 314 10.9 83 7 6
77 276 5.1 88 7 7
97 267 6.3 92 7 8
97 272 5.7 92 7 9
85 175 7.4 89 7 10
NA 139 8.6 82 7 11
10 264 14.3 73 7 12
27 175 14.9 81 7 13
NA 291 14.9 91 7 14
7 48 14.3 80 7 15
48 260 6.9 81 7 16
35 274 10.3 82 7 17
61 285 6.3 84 7 18
79 187 5.1 87 7 19
63 220 11.5 85 7 20
16 7 6.9 74 7 21
NA 258 9.7 81 7 22
NA 295 11.5 82 7 23
80 294 8.6 86 7 24
108 223 8 85 7 25
20 81 8.6 82 7 26
52 82 12 86 7 27
82 213 7.4 88 7 28
50 275 7.4 86 7 29
64 253 7.4 83 7 30
59 254 9.2 81 7 31
39 83 6.9 81 8 1
9 24 13.8 81 8 2
16 77 7.4 82 8 3
78 NA 6.9 86 8 4
35 NA 7.4 85 8 5
66 NA 4.6 87 8 6
122 255 4 89 8 7
89 229 10.3 90 8 8
110 207 8 90 8 9
NA 222 8.6 92 8 10
NA 137 11.5 86 8 11
44 192 11.5 86 8 12
28 273 11.5 82 8 13
65 157 9.7 80 8 14
NA 64 11.5 79 8 15
22 71 10.3 77 8 16
59 51 6.3 79 8 17
23 115 7.4 76 8 18
31 244 10.9 78 8 19
44 190 10.3 78 8 20
21 259 15.5 77 8 21
9 36 14.3 72 8 22
NA 255 12.6 75 8 23
45 212 9.7 79 8 24
168 238 3.4 81 8 25
73 215 8 86 8 26
NA 153 5.7 88 8 27
76 203 9.7 97 8 28
118 225 2.3 94 8 29
84 237 6.3 96 8 30
85 188 6.3 94 8 31
96 167 6.9 91 9 1
78 197 5.1 92 9 2
73 183 2.8 93 9 3
91 189 4.6 93 9 4
47 95 7.4 87 9 5
32 92 15.5 84 9 6
20 252 10.9 80 9 7
23 220 10.3 78 9 8
21 230 10.9 75 9 9
24 259 9.7 73 9 10
44 236 14.9 81 9 11
21 259 15.5 76 9 12
28 238 6.3 77 9 13
9 24 10.9 71 9 14
13 112 11.5 71 9 15
46 237 6.9 78 9 16
18 224 13.8 67 9 17
13 27 10.3 76 9 18
24 238 10.3 68 9 19
16 201 8 82 9 20
13 238 12.6 64 9 21
23 14 9.2 71 9 22
36 139 10.3 81 9 23
7 49 10.3 69 9 24
14 20 16.6 63 9 25
30 193 6.9 70 9 26
NA 145 13.2 77 9 27
14 191 14.3 75 9 28
18 131 8 76 9 29
20 223 11.5 68 9 30
Date Day Cases_Guinea Cases_Liberia Cases_SierraLeone Cases_Nigeria Cases_Senegal Cases_UnitedStates Cases_Spain Cases_Mali Deaths_Guinea Deaths_Liberia Deaths_SierraLeone Deaths_Nigeria Deaths_Senegal Deaths_UnitedStates Deaths_Spain Deaths_Mali
1/5/2015 289 2776 10030 1786 2977
1/4/2015 288 2775 9780 1781 2943
1/3/2015 287 2769 8166 9722 1767 3496 2915
1/2/2015 286 8157 3496
12/31/2014 284 2730 8115 9633 1739 3471 2827
12/28/2014 281 2706 8018 9446 1708 3423 2758
12/27/2014 280 2695 9409 1697 2732
12/24/2014 277 2630 7977 9203 3413 2655
12/21/2014 273 2597 9004 1607 2582
12/20/2014 272 2571 7862 8939 1586 3384 2556
12/18/2014 271 7830 3376
12/14/2014 267 2416 8356 1525 2085
12/9/2014 262 7797 3290
12/7/2014 260 2292 7897 20 1 4 1 7 1428 1768 8 0 1 0 6
12/3/2014 256 7719 3177
11/30/2014 253 2164 7312 20 1 4 1 7 1327 1583 8 0 1 0 6
11/28/2014 251 7635 3145
11/23/2014 246 2134 6599 20 1 4 1 7 1260 1398 8 0 1 0 6
11/22/2014 245 7168 3016
11/18/2014 241 2047 7082 6190 20 1 4 1 6 1214 2963 1267 8 0 1 0 6
11/16/2014 239 1971 6073 20 1 4 1 5 1192 1250 8 0 1 0 5
11/15/2014 238 7069 2964
11/11/2014 234 1919 5586 20 1 4 1 4 1166 1187 8 0 1 0 3
11/10/2014 233 6878 2812
11/9/2014 232 1878 5368 20 1 4 1 1 1142 1169 8 0 1 0 1
11/8/2014 231 6822 2836
11/4/2014 227 6619 4862 20 1 4 1 1 2766 1130 8 0 1 0 1
11/3/2014 226 1760 1054
11/2/2014 225 1731 4759 20 1 4 1 1 1041 1070 8 0 1 0 1
10/31/2014 222 6525 2697
10/29/2014 220 1667 5338 20 1 4 1 1 1018 1510 8 0 1 0 1
10/27/2014 218 1906 5235 20 1 4 1 1 997 1500 8 0 1 0 1
10/25/2014 216 6535 2413
10/22/2014 214 3896 4 1 1 1281 1 0 1
10/21/2014 213 1553 926
10/19/2014 211 1540 3706 20 1 3 1 904 1259 8 0 1 0
10/18/2014 210 4665 2705
10/14/2014 206 1519 3410 20 1 3 1 862 1200 8 0 0 1
10/13/2014 205 4262 2484
10/12/2014 204 1472 3252 20 1 2 1 843 1183 8 0 1 1
10/11/2014 203 4249 2458
10/8/2014 200 2950 20 1 1 1 930 8 0 1 1
10/7/2014 199 1350 4076 778 2316
10/5/2014 197 1298 2789 20 1 1 768 879 8 0 0
10/4/2014 196 3924 2210
10/1/2014 193 1199 3834 2437 20 1 1 739 2069 623 8 0 0
9/28/2014 190 1157 3696 2304 20 1 710 1998 622 8 0
9/23/2014 185 1074 3458 2021 20 1 648 1830 605 8 0
9/21/2014 183 1022 3280 1940 20 1 635 1677 597 8 0
9/20/2014 182 1813 593
9/19/2014 181 1008 632
9/17/2014 179 3022 1578
9/14/2014 176 942 2710 1673 601 1459 562
9/13/2014 175 936 1620 21 1 595 1296 562 8 0
9/10/2014 172 899 1478 21 1 568 536 8
9/9/2014 171 2407
9/7/2014 169 861 2081 1424 21 3 557 1137 524 8 0
9/5/2014 167 812 1871 1261 22 1 517 1089 491 8
8/31/2014 162 771 1698 1216 21 1 494 871 476 7
8/26/2014 157 648 1378 1026 17 430 694 422 6
8/20/2014 151 607 1082 910 16 406 624 392 5
8/18/2014 149 579 972 907 15 396 576 374 4
8/16/2014 147 543 834 848 15 394 466 365 4
8/13/2014 144 519 786 810 12 380 413 348 4
8/11/2014 142 510 670 783 12 377 355 334 3
8/9/2014 140 506 599 730 13 373 323 315 2
8/6/2014 137 495 554 717 13 367 294 298 2
8/4/2014 135 495 516 691 9 363 282 286 1
8/1/2014 132 485 468 646 4 358 255 273 1
7/30/2014 129 472 391 574 3 346 227 252 1
7/27/2014 126 460 329 533 1 339 156 233 1
7/23/2014 123 427 249 525 0 319 129 224 0
7/20/2014 120 415 224 454 314 127 219
7/17/2014 117 410 196 442 310 116 206
7/14/2014 114 411 174 397 310 106 197
7/12/2014 112 406 172 386 304 105 194
7/8/2014 108 409 142 337 309 88 142
7/6/2014 106 408 131 305 307 84 127
7/2/2014 102 412 115 252 305 75 101
6/30/2014 100 413 107 239 303 65 99
6/22/2014 92 51 34
6/20/2014 90 390 158 270 34
6/19/2014 89 41 25
6/18/2014 88 390 136 267 28
6/17/2014 87 97 49
6/16/2014 86 398 33 264 24
6/10/2014 80 351 13 89 226 24 7
6/5/2014 75 13 81 6
6/3/2014 73 344 13 215 12 6
6/1/2014 71 328 13 79 208 12 6
5/28/2014 67 291 13 50 193 12 6
5/27/2014 66 281 12 16 186 11 5
5/23/2014 62 258 12 0 174 11 0
5/12/2014 51 248 12 0 171 11 0
5/10/2014 49 233 12 0 157 11 0
5/7/2014 46 236 13 0 158 11 0
5/5/2014 44 235 13 0 157 11 0
5/3/2014 42 231 13 0 155 11 0
5/1/2014 40 226 13 0 149 11 0
4/26/2014 35 224 0 143 0
4/24/2014 33 35 0 0
4/23/2014 32 218 0 141 0
4/22/2014 31 0 0
4/21/2014 30 34 11
4/20/2014 29 208 136 6
4/17/2014 26 203 27 129
4/16/2014 25 197 27 122 13
4/15/2014 24 12
4/14/2014 23 168 108
4/11/2014 20 159 26 2 106 13 2
4/9/2014 18 158 25 2 101 12 2
4/7/2014 16 151 21 2 95 10 2
4/4/2014 13 143 18 2 86 7 2
4/1/2014 10 127 8 2 83 5 2
3/31/2014 9 122 8 2 80 4 2
3/29/2014 7 112 7 70 2
3/28/2014 6 112 3 2 70 3 2
3/27/2014 5 103 8 6 66 6 5
3/26/2014 4 86 62
3/25/2014 3 86 60
3/24/2014 2 86 59
3/22/2014 0 49 29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment