Created
August 10, 2020 11:48
-
-
Save RaulMedeiros/21c8cc92fbfc3d6c52b3c0739a39f5fd to your computer and use it in GitHub Desktop.
Data Quality
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data Quality | |
Validity: How closely the data meets defined business rules or constraints. Some common constraints include: | |
>Mandatory constraints: Certain columns cannot be empty | |
>Data-type constraints: Values in a column must be of a certain data type | |
>Range constraints: Minimum and maximum values for numbers or dates | |
>Foreign-key constraints: A set of values in a column are defined in the column of another table containing unique values | |
>Unique constraints: A field or fields must be unique in a dataset | |
Regular expression patterns: Text fields will have to be validated this way. | |
>Cross-field validation: Certain conditions that utilize multiple fields must hold | |
>Set-membership constraint: This one is the subcategory of foreign-key constraints. Values for a column come from a set of discrete values or codes. | |
Accuracy: How closely data conforms to a standard or a true value. | |
Completeness: How thorough or comprehensive the data and related measures are known | |
Consistency: The equivalency of measures across systems and subjects | |
Uniformity: Ensuring that the same units of measure are used in all systems | |
Traceability: Being able to find (and access) the source of the data | |
Timeliness: How quickly and recently the data has been updated |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment