Last active
May 29, 2023 16:16
-
-
Save akingdom/f068e919d360f39347e1c8e1ec2806ab to your computer and use it in GitHub Desktop.
A very simple human-friendly parsable data format. I use this in a number of projects.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Format Objectives | |
1. Suitable for editing by humans, especially those with minimal technology skills. | |
2. Readable by computers. | |
3. Readability by humans is preferred over reducing file size. | |
# Basic Format | |
* The basic line format is a name, some space(s) and a value. | |
* A record starts with RECORD or # (then an optional identifier). | |
# Examples... | |
RECORD Customer Example | |
customer-name Fred Smith | |
customer-email [email protected] | |
customer-phone +1 555 123 4567 | |
RECORD Fruit Example | |
- Grapes | |
- Oranges | |
- Peaches | |
* Mandarines | |
* Strawberries | |
* Raspberries | |
# Detailed Description | |
- A document (or stream) consists of multiple records. | |
- Blank lines are usually ignored (omitted). | |
- White space (spaces and tabs) at the beginning or end of a line are ignored (trimmed). | |
- A line format is generally: | |
- ..........NAME........VALUE........ | |
- where '.' represents white-space. | |
- A record is a group of fields (named, numbered or unnamed) and their corresponding values. | |
- A record consists of a record marker line followed by any number of record data lines. | |
- The start of a record is marked by a the capitalised word RECORD or the hash symbol #. | |
- # and RECORD are equivalent and interchangeable. | |
- A record marker line may be included on the same line as a record marker. | |
- example: RECORD Customer | |
- example: # Customer | |
- A record data line consists of: | |
1. The field name at the start | |
2. then optionally one or more whitespace characters (tab, space, non-breaking space) or tab characters followed by the field value. | |
- The field name shall never contain a space. A space could be represented by one of the following if absolutely necessary: %20 + - _ | |
- The amount of separating space is abitrary and needn't be consistent. | |
- field-example-1 Single space. | |
- field-example-2 Many, many spaces in-between. | |
- | |
# Algorithm | |
1 Whitespace is a tab \t 0x09, or any characters in Unicode General Category Zs (Space Separator). | |
2 Read line up to (and excluding) LF \n 0x0A | |
3 Skip any CR \r 0x0D | |
4 Skip any whitespace characters at the very start and very end of the line | |
5 Find the first whitespace character (skipping any immediately following whitespace characters). This is the field separator/delimiter. | |
6 Extract the line's text BEFORE the separator. This is the field name or key. | |
7 Extract the line's text AFTER the separator, excluding any trailing whitespace. This is the field value. | |
8 Process the field name '#' or 'RECORD' as the start of a new record, using the name as record type (class) or identity (id). | |
9 Process the field name '-' or '*' as an unordered list | |
10 Optionally Process a numeric field name (with optional trailing period) as a numbered list. | |
11 Otherwise, Process the field name, ideally matching it against a whitelist of field names for the specified record type/identity. | |
# Finally... | |
format-author Andrew Kingdom | |
format-copyright 2023, all rights reserved by the author | |
format-name Line Record Format | |
format-license CC-by (Use freely, retain copyright notices) | |
media-MIME-type text/line-record | |
file-extension .txt -or- .rl (whitespace separated record lines) | |
* Line Record refers to the data being one-dimensional, versus a 'flat' two-dimensional data table or hierarchical. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RECORD Customer Example | |
customer-name Fred Smith | |
customer-email [email protected] | |
customer-phone +1 555 123 4567 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Messy Fruit Example | |
- Grapes | |
- Oranges | |
- Peaches | |
* Mandarines | |
* Strawberries | |
* Raspberries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Neat Fruit Example | |
- Grapes | |
- Oranges | |
- Peaches | |
- Mandarines | |
- Strawberries | |
- Raspberries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Example code in Objective-C | |
/* Displays... | |
Input: | |
"" | |
" RECORD Customer Example" | |
" customer-name Fred Smith" | |
" customer-email [email protected]" | |
" customer-phone +1 555 123 4567" | |
" " | |
" RECORD Fruit Example" | |
" 1 Grapes" | |
" 2 Oranges" | |
" - Peaches " | |
" * Mandarines" | |
" * Strawberries" | |
" * Raspberries" | |
Output: | |
# : Customer Example | |
customer-name : Fred Smith | |
customer-email : [email protected] | |
customer-phone : +1 555 123 4567 | |
# : Fruit Example | |
1 : Grapes | |
2 : Oranges | |
- : Peaches | |
* : Mandarines | |
* : Strawberries | |
* : Raspberries | |
*/ | |
#import <Foundation/Foundation.h> | |
@interface LineRecordString () | |
@property (weak, nonatomic) id<ServerTemplateDelegate> delegate; | |
@end | |
@implementation LineRecordString | |
- (void)testLRS | |
{ | |
auto test = @"\n RECORD Customer Example\n customer-name Fred Smith\n customer-email [email protected]\n customer-phone +1 555 123 4567\n \n RECORD Fruit Example\n 1 Grapes\n 2 Oranges\n - Peaches \n * Mandarines\n * Strawberries\n * Raspberries"; | |
NSLog(@"Input:"); | |
for (NSString * line in [test componentsSeparatedByString:@"\n"]) { | |
NSLog(@"\"%@\"",line); | |
} | |
NSLog(@"Output:"); | |
auto knownFields = @[@"customer-name",@"customer-email",@"customer-phone"]; // specific field names to include | |
NSArray<NSDictionary<NSString*,NSString*>*>* arr = [self parseLineRecordString:test recognisedFieldNames:knownFields]; | |
for (NSDictionary<NSString*,NSString*>* item in arr) { | |
auto key = [item allKeys][0]; | |
auto value = [item allValues][0]; | |
NSLog(@"%@ : %@",key,value); | |
} | |
return; | |
} | |
// Objective-C example parser | |
- (NSArray<NSDictionary<NSString*,NSString*>*>*) parseLineRecordString:(NSString*)source recognisedFieldNames:(NSArray<NSString*>*)knownFields | |
{ | |
auto keyNewrecord = @"RECORD"; | |
auto keyNewrec = @"#"; | |
auto keyTitle = @"TITLE"; | |
auto keyList1 = @"-"; | |
auto keyList2 = @"*"; | |
if(source == nil || source.length == 0) {return nil;} | |
// Parse the file... | |
// This is an array of single-record-dictionary key-value pairs, to allow flexibility. This could also be a dictionary using the TITLE field as a key, adding other valid fields to the current record: | |
NSMutableArray<NSDictionary<NSString*,NSString*>*>* build = [[NSMutableArray<NSDictionary<NSString*,NSString*>*> alloc]init]; | |
auto lines = [[[source | |
stringByReplacingOccurrencesOfString:@"'" withString:@"\'"] | |
stringByReplacingOccurrencesOfString:@"\r" withString:@""] | |
componentsSeparatedByString:@"\n"]; | |
for (NSString * line in lines) { | |
// two fields separated by any amount of whitespace on the same line. | |
NSString * trimline = [line stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]; // WWWWWabcdefh ijklmnoWWWWWWW -- remove extreme whitespace (shown as W) | |
if(trimline.length != 0) { | |
// Separate the fields (1 or 2) -- field 1 contains no whitespace. | |
NSString* field1; // key | |
NSString* field2; // data | |
NSRange pos = [trimline rangeOfCharacterFromSet:[NSCharacterSet whitespaceCharacterSet]]; // abcdefghWWWWijklmno -- find next whitespace, shown as W | |
if(pos.location == NSNotFound) { | |
field1 = trimline; | |
field2 = @""; | |
} else { | |
field1 = [[trimline substringToIndex:pos.location]stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]; // extract field 1 | |
field2 = [[trimline substringFromIndex:pos.location + pos.length] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]; // extract field 2 | |
} | |
// ...optionally do extra processing here... | |
// recognised field name... | |
if([knownFields containsObject: field1]) { | |
[build addObject:@{field1:field2}]; | |
} | |
// record field type... | |
else if( | |
[field1 isEqualToString:keyNewrecord] || | |
[field1 isEqualToString:keyNewrec]) { | |
[build addObject:@{keyNewrec:field2}]; // use the short version | |
} | |
// key field type | |
else if( | |
[field1 isEqualToString:keyTitle]) { | |
[build addObject:@{field1:field2}]; | |
} | |
// other special field names | |
else if( | |
[field1 isEqualToString:keyList1] || | |
[field1 isEqualToString:keyList2] | |
) { | |
[build addObject:@{field1:field2}]; | |
} | |
// it's a numbered list | |
else if( [NSCharacterSet.decimalDigitCharacterSet isSupersetOfSet:[NSCharacterSet characterSetWithCharactersInString:field1]]) { | |
[build addObject:@{field1:field2}]; | |
} | |
// else skip the line. | |
} | |
} | |
return build; | |
} | |
@end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sharing this as a rough reference for clients and developers.