Aadhaar enrollments on a given date and pincode combination are grouped together and represented in the dataset.
Here, the enrollment data is divided into X no. of buckets by the aadhaar data operator. Each row represents information about a single enrollment (of a person) except the last four columns. The value in mobile_number_provided and email_provided for a row will always be less than or equal to the value in aadhaar_generated or rejected column. Consider the following example row from the dataset:
20150420,Allahabad Bank,A-Onerealtors Pvt Ltd,Uttar Pradesh,Ambedkar Nagar,Akbarpur,224155,F,22,5,0,0,5
Here, the last four columns represent: aadhaar_generated, rejected, mobile_number_provided, email_provided. 5 for aadhaar_generated field represents 5 aadhaar generations, 0 rejections, 0 enrollments with mobile number and 5 enrollments with email information.
If this is the case, will the information in the last four columns be repeated with other four (apart from the F/22 one mentioned here) enrollments?
aadhaar_generated/rejected fields
I looked up the unique number of values for the 'rejected' column. There seem to be 728 unique values in this column. My assumption was that the column had values 0 or 1 (not rejected or rejected). Similarly there are 1296 unique values for 'aadhaar_generated' column. Which raises the question does 1 alone represent being rejected (or accepted in the case of aadhaar_generated field)? Do we need to understand what these unique values represent?
pincode
Few entries have Others in the pincode column.
@bkamapantula -- this statement is incorrect:
It would be more appropriate to say:
So if you consider this row:
it represents all Aadhar submissions on 20-04-2015, the A-Onerealtors agency at UP 224155 by 22-year old females. The last 4 columns suggest that 5 were enrolled, 0 were rejected, 0 had mobiles and 5 had email IDs. This single row captures the information about all 5 of them, and is not repeated.