Aadhaar enrollments on a given date and pincode combination are grouped together and represented in the dataset.
Here, the enrollment data is divided into X no. of buckets by the aadhaar data operator. Each row represents information about a single enrollment (of a person) except the last four columns. The value in mobile_number_provided and email_provided for a row will always be less than or equal to the value in aadhaar_generated or rejected column. Consider the following example row from the dataset:
20150420,Allahabad Bank,A-Onerealtors Pvt Ltd,Uttar Pradesh,Ambedkar Nagar,Akbarpur,224155,F,22,5,0,0,5
Here, the last four columns represent: aadhaar_generated, rejected, mobile_number_provided, email_provided. 5 for aadhaar_generated field represents 5 aadhaar generations, 0 rejections, 0 enrollments with mobile number and 5 enrollments with email information.
If this is the case, will the information in the last four columns be repeated with other four (apart from the F/22 one mentioned here) enrollments?
aadhaar_generated/rejected fields
I looked up the unique number of values for the 'rejected' column. There seem to be 728 unique values in this column. My assumption was that the column had values 0 or 1 (not rejected or rejected). Similarly there are 1296 unique values for 'aadhaar_generated' column. Which raises the question does 1 alone represent being rejected (or accepted in the case of aadhaar_generated field)? Do we need to understand what these unique values represent?
pincode
Few entries have Others in the pincode column.
Thanks for the comment Anand. That clears my question.