Created
December 2, 2013 14:45
-
-
Save ldodds/7750495 to your computer and use it in GitHub Desktop.
Examples of describing a CSV file using a range of formats supported by different tools, including chkcsv.py, csv-validator, datapackage.json, and schema.ini
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"name": "CSV Validation Example", | |
"resources": [ | |
{ | |
"name": "Land Registry Example Data", | |
"path": "../lr-pp-nov-2013.csv", | |
"format": "csv", | |
"mediatype": "text/csv", | |
"encoding": "UTF-8", | |
"dialect": { | |
"delimiter": ",", | |
"lineterminator": "\r\n", | |
"quotechar": "\"" | |
}, | |
"schema": { | |
"fields": [ | |
{ | |
"name": "ID", | |
"title": "Transaction unique identifier", | |
"description": "A reference number which is generated automatically recording each published sale. The number is unique and will change each time a sale is recorded", | |
"type": "string" | |
}, | |
{ | |
"name": "Price", | |
"title": "Price", | |
"description": "Sale price stated on the Transfer deed", | |
"type": "integer" | |
}, | |
{ | |
"name": "Date of Transfer", | |
"title": "Date of Transfer", | |
"description": "Date when the sale was completed, as stated on the Transfer deed", | |
"type": "datetime", | |
"format": "YYYY-MM-DD hh:mm" | |
}, | |
{ | |
"name": "Postcode", | |
"title": "Postcode", | |
"type": "string" | |
}, | |
{ | |
"name": "Property Type", | |
"title": "D-Detached, S-Semi-Detached, T-Terraced, F-Flats/Maisonettes", | |
"type": "string" | |
}, | |
{ | |
"name": "Old/New", | |
"title": "Old/New", | |
"description": "Y = a newly built property, N = an established residential building", | |
"type": "string" | |
}, | |
{ | |
"name": "Duration", | |
"title": "Duration", | |
"description": "Relates to the tenure. F-Freehold, L-Leasehold etc", | |
"type": "string" | |
}, | |
{ | |
"name": "PAON", | |
"title": "Primary Addressable Object Name", | |
"description": "Primary Addressable Object Name. If there is a sub-building for example the building is divided into flats, see Secondary Addressable Object Name (SAON)", | |
"type": "string" | |
}, | |
{ | |
"name": "SAON", | |
"title": "Secondary Addressable Object Name", | |
"description": "Secondary Addressable Object Name. If there is a sub-building, for example the building is divided into flats, there will be a SAON", | |
"type": "string" | |
}, | |
{ | |
"name": "Street", | |
"title": "Street", | |
"type": "string" | |
}, | |
{ | |
"name": "Locality", | |
"title": "Locality", | |
"type": "string" | |
}, | |
{ | |
"name": "Town/City", | |
"title": "Town/City", | |
"type": "string" | |
}, | |
{ | |
"name": "Local Authority", | |
"title": "Local Authority", | |
"type": "string" | |
}, | |
{ | |
"name": "County", | |
"title": "County", | |
"type": "string" | |
}, | |
{ | |
"name": "Record Status", | |
"title": "Record Status", | |
"description": "Indicates additions, changes and deletions to the records", | |
"type": "string" | |
} | |
] | |
} | |
} | |
] | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ID | Price | Date of Transfer | Postcode | Property Type | Old/New | Duration | PAON | SAON | Street | Locality | Town/City | Local Authority | County | Record Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
{3B0DA29C-C89A-4FAA-918A-0000074FA0E0} | 190000 | 2013-10-04 00:00 | SN14 8LU | T | N | F | 148 | HIGH STREET | MARSHFIELD | CHIPPENHAM | SOUTH GLOUCESTERSHIRE | SOUTH GLOUCESTERSHIRE | A | ||
{55743403-C4CB-459D-8B15-000110F3CFCA} | 420000 | 2013-10-04 00:00 | RG42 6LN | S | N | F | SIDMOUTH COTTAGES | 2 | BRACKNELL ROAD | BROCK HILL | BRACKNELL | BRACKNELL FOREST | BRACKNELL FOREST | A | |
{D13EF4A0-8B61-4886-BADA-0001780BD6BA} | 250000 | 2013-06-28 00:00 | SK17 8SN | T | N | F | THE MILL | MILLERS DALE | BUXTON | HIGH PEAK | DERBYSHIRE | A | |||
{3D74FE62-3423-4F13-8FEF-0001D8030578} | 179950 | 2013-08-28 00:00 | OX16 9LW | S | N | F | 11 | WESLEY DRIVE | BANBURY | CHERWELL | OXFORDSHIRE | A | |||
{8BA9EA94-0A29-4195-947F-000210F13EF0} | 310000 | 2013-10-18 00:00 | BA13 4LA | D | N | F | 6 | HAWKERIDGE | WESTBURY | WILTSHIRE | WILTSHIRE | A | |||
{65C935D7-2F81-4D29-9CA7-0002AA741584} | 360000 | 2013-09-25 00:00 | RH4 3DX | T | N | F | 7 | WESTFIELD GARDENS | DORKING | MOLE VALLEY | SURREY | A | |||
{7F475813-7D15-4261-AD23-00031DEF95DF} | 167500 | 2013-10-10 00:00 | NR2 2BE | T | N | F | 100 | CAMBRIDGE STREET | NORWICH | NORWICH | NORFOLK | A | |||
{D79BCD49-244F-451D-B57E-0004431BF677} | 180000 | 2013-10-25 00:00 | ME2 3TS | S | N | F | 12 | CADNAM CLOSE | ROCHESTER | MEDWAY | MEDWAY | A | |||
{9D7CAEBE-51DB-4817-81ED-0004542E3D87} | 142500 | 2013-08-30 00:00 | HD9 1LT | S | N | F | 29 | DALESIDE AVENUE | NEW MILL | HOLMFIRTH | KIRKLEES | WEST YORKSHIRE | A | ||
{AA34922F-6466-4284-AF22-00058CBFC762} | 94000 | 2013-10-18 00:00 | DE7 9HJ | S | N | F | 26 | BARCLAY COURT | ILKESTON | EREWASH | DERBYSHIRE | A |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ID] | |
data_required=True | |
type=string | |
minlen=38 | |
maxlen=38 | |
pattern=\{[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}\} | |
[Price] | |
data_required=True | |
type=integer | |
[Date of Transfer] | |
data_required=True | |
type=string | |
pattern=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2} | |
[Postcode] | |
data_required=True | |
type=string | |
pattern=[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2} | |
[Property Type] | |
data_required=True | |
type=string | |
pattern=(D|S|T|F) | |
[Old/New] | |
data_required=True | |
type=string | |
pattern=(Y|N) | |
[Duration] | |
data_required=True | |
type=string | |
pattern=(F|L) | |
[PAON] | |
data_required=False | |
[SAON] | |
data_required=False | |
[Street] | |
data_required=False | |
[Locality] | |
data_required=False | |
[Town/City] | |
data_required=True | |
[Local Authority] | |
data_required=True | |
[County] | |
data_required=True | |
[Record Status] | |
data_required=True | |
type=string | |
pattern=(A|C|D) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[lr-pp-nov-2013.csv] | |
ColNameHeader=True | |
Format=CSVDelimited | |
CharacterSet=UTF-8 | |
DateTimeFormat=YYYY-MM-DD hh:mm | |
Col1="ID" Text Width 38 | |
Col2="Price" Integer | |
Col3="Date of Transfer" DateTime | |
Col4="Postcode" Text | |
Col5="Property Type" Text Width 1 | |
Col6="Old/New" Text Width 1 | |
Col7="Duration" Text Width 1 | |
Col8="PAON" Text | |
Col9="SAON" Text | |
Col10="Street" Text | |
Col11="Locality" Text | |
Col12="Town/City" Text | |
Col13="Local Authority" Text | |
Col14="County" Text | |
Col15="Record Status" Text Width 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//See https://github.com/digital-preservation/csv-validator | |
version 1.0 | |
@totalColumns 15 | |
//below not yet supported? | |
//@quoted | |
//@separator ',' | |
ID: unique length(38) | |
Price: positiveInteger | |
/* | |
Following reference has to use column number here as tool has restrictions on column names. | |
Due to limitations in expressing dates, have to use regex | |
*/ | |
2: regex("[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}") | |
Postcode: regex("[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}") | |
4: regex("(D|S|T|F)") | |
5: regex("(Y|N)") | |
Duration: regex("(F|L)") | |
PAON: @optional | |
SAON: @optional | |
Street: @optional | |
Locality: @optional | |
11: @optional | |
12: | |
County: | |
14: regex("(A|C|D)") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
how to validate the above csv . can you please let me know the commands