Skip to content

Instantly share code, notes, and snippets.

@bkamapantula
Last active March 6, 2017 07:13
Show Gist options
  • Save bkamapantula/27175733a4f6270e3f386517a53d1062 to your computer and use it in GitHub Desktop.
Save bkamapantula/27175733a4f6270e3f386517a53d1062 to your computer and use it in GitHub Desktop.
Aadhaar enrollment data

About data

Aadhaar enrollments on a given date and pincode combination are grouped together and represented in the dataset.

Here, the enrollment data is divided into X no. of buckets by the aadhaar data operator. Each row represents information about a single enrollment (of a person) except the last four columns. The value in mobile_number_provided and email_provided for a row will always be less than or equal to the value in aadhaar_generated or rejected column. Consider the following example row from the dataset:

20150420,Allahabad Bank,A-Onerealtors Pvt Ltd,Uttar Pradesh,Ambedkar Nagar,Akbarpur,224155,F,22,5,0,0,5

Here, the last four columns represent: aadhaar_generated, rejected, mobile_number_provided, email_provided. 5 for aadhaar_generated field represents 5 aadhaar generations, 0 rejections, 0 enrollments with mobile number and 5 enrollments with email information.

If this is the case, will the information in the last four columns be repeated with other four (apart from the F/22 one mentioned here) enrollments?

Data issues

aadhaar_generated/rejected fields


I looked up the unique number of values for the 'rejected' column. There seem to be 728 unique values in this column. My assumption was that the column had values 0 or 1 (not rejected or rejected). Similarly there are 1296 unique values for 'aadhaar_generated' column. Which raises the question does 1 alone represent being rejected (or accepted in the case of aadhaar_generated field)? Do we need to understand what these unique values represent?

pincode


Few entries have Others in the pincode column.

Questions

Once the data is well understand, the following are some of the questions to be considered. Visualizations can be made specific to a particular question. A question is posed and a corresponding visualization is proposed.

  1. Enrollment rejections

Q1.1) Which states register more rejections? If so, is it because of any special drives by the govt?

V1.1.1) India map visual that shows rejection rate by color. Rejection rate can also be adjusted specific to gender.

V1.1.2) As the data is granular at pincode level, we can consider coloring pincode specific regions subject to the availability of shape file.

Q1.2) Is there any trend in the rejected enrollments w.r.t to mobile and email availability?

V1.2.1) Depending on the granularity of the data, we can choose a bar graph.

Q1.3) Are some registrars more strict than others (rejection rate)?

V1.3.1) Tabular format, restricted to top ten.

  1. Enrollment generations

Q2.1) Do some districts see higher aadhaar enrollments? Could this be associated to the efficiency of the local government officers or corporators? (How to validate this?)

Q2.2) Are there states with higher female/male enrollments?

V2.2.1) Similar visual as to V1.1.1)

Q2.3) Do some registrars have higher aadhaar generation rates in specific states (or are same registrars uniform in enforcing rules across states)?

Q2.4) What is the trend across different age groups?

V2.4.1) Heatmap of states/enrollment generations in age-specific brackets.

  1. Do elections catalyze aadhaar enrollments?

Q3.1) Layer elections (local/state/general) data over enrollment data to check elections (future promise of change) is a driver for enrollments

state total female male
Andaman and Nicobar Islands 128117 61444 66666
Andhra Pradesh 31427847 15655798 15764298
Arunachal Pradesh 159514 76864 82644
Assam 251296 74892 176402
Bihar 27383356 12781710 14601100
Chandigarh 478403 218979 259396
Chhattisgarh 6432922 3239289 3193309
Dadra and Nagar Haveli 93078 40465 52591
Daman and Diu 66642 29272 37366
Delhi 8442508 3883532 4557023
Goa 920025 450755 468849
Gujarat 21875410 10298280 11576079
Haryana 12295317 5880582 6414282
Himachal Pradesh 3103380 1463151 1640032
Jammu and Kashmir 1736678 798368 938277
Jharkhand 11848531 5685359 6162465
Karnataka 26049015 12845961 13200028
Kerala 18367093 9290020 9071390
Lakshadweep 26908 12840 14065
Madhya Pradesh 30621885 14523308 16096839
Maharashtra 46609470 22103532 24502336
Manipur 562283 269222 292833
Meghalaya 22864 9022 13842
Mizoram 162497 80922 81574
Nagaland 431101 211893 219072
Odisha 10054344 4841563 5209311
Others 416412 202496 212814
Puducherry 435455 218302 217027
Punjab 13422914 6293201 7128878
Rajasthan 29864357 14226001 15637188
Sikkim 191025 91599 99411
Tamil Nadu 20453271 10204772 10244135
Telangana 1001171 486021 515089
Tripura 735003 359434 375453
Uttar Pradesh 55995801 26289579 29703917
Uttarakhand 3239644 1527760 1711768
West Bengal 18852128 9091851 9753526
# using google bigquery
# state-wise enrollment - includes different values in rejected/aadhaar_generated fields
SELECT state, count(*) as c FROM [aadhaar_enrollment_data.aed] group by state order by state
# gender-wise enrollment - includes different values in rejected/aadhaar_generated fields
SELECT state, count(*) as c FROM [aadhaar_enrollment_data.aed] where gender='F' group by state order by c state
state total generated rejected
Andaman and Nicobar Islands 128117 74219 8779
Andhra Pradesh 31427847 17318303 5304895
Arunachal Pradesh 159514 77831 9779
Assam 251296 189328 9017
Bihar 27383356 16607351 2327206
Chandigarh 478403 345814 49512
Chhattisgarh 6432922 3171165 646566
Dadra and Nagar Haveli 93078 43427 11077
Daman and Diu 66642 43305 8564
Delhi 8442508 4718559 1628588
Goa 920025 567320 145813
Gujarat 21875410 13219193 1673380
Haryana 12295317 7439905 1283989
Himachal Pradesh 3103380 2142149 236708
Jammu and Kashmir 1736678 810625 154519
Jharkhand 11848531 6913462 1542452
Karnataka 26049015 16140284 3024848
Kerala 18367093 10472822 2710854
Lakshadweep 26908 14138 3048
Madhya Pradesh 30621885 18808650 3549492
Maharashtra 46609470 29478948 4748208
Manipur 562283 344028 37237
Meghalaya 22864 16275 1012
Mizoram 162497 86569 9584
Nagaland 431101 180797 79079
Odisha 10054344 4734576 1644510
Others 416412 61466 89179
Puducherry 435455 223398 86905
Punjab 13422914 8534197 1436032
Rajasthan 29864357 18852727 3424914
Sikkim 191025 125432 11779
Tamil Nadu 20453271 10732809 2216639
Telangana 1001171 685957 167570
Tripura 735003 464338 117798
Uttar Pradesh 55995801 31920342 6417335
Uttarakhand 3239644 1955636 313961
West Bengal 18852128 8712184 2312243
# using google bigquery
# rejected enrollments per state
SELECT state, count(*) as c FROM [aadhaar_enrollment_data.aed] where rejected='1' group by state order by state
# generated enrollments per state
SELECT state, count(*) as c FROM [aadhaar_enrollment_data.aed] where aadhaar_generated='1' group by state order by state
@bkamapantula
Copy link
Author

Thanks for the comment Anand. That clears my question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment