Skip to content

Instantly share code, notes, and snippets.

@akingdom
Last active May 29, 2023 16:16
Show Gist options
  • Save akingdom/f068e919d360f39347e1c8e1ec2806ab to your computer and use it in GitHub Desktop.
Save akingdom/f068e919d360f39347e1c8e1ec2806ab to your computer and use it in GitHub Desktop.
A very simple human-friendly parsable data format. I use this in a number of projects.
# Format Objectives
1. Suitable for editing by humans, especially those with minimal technology skills.
2. Readable by computers.
3. Readability by humans is preferred over reducing file size.
# Basic Format
* The basic line format is a name, some space(s) and a value.
* A record starts with RECORD or # (then an optional identifier).
# Examples...
RECORD Customer Example
customer-name Fred Smith
customer-email [email protected]
customer-phone +1 555 123 4567
RECORD Fruit Example
- Grapes
- Oranges
- Peaches
* Mandarines
* Strawberries
* Raspberries
# Detailed Description
- A document (or stream) consists of multiple records.
- Blank lines are usually ignored (omitted).
- White space (spaces and tabs) at the beginning or end of a line are ignored (trimmed).
- A line format is generally:
- ..........NAME........VALUE........
- where '.' represents white-space.
- A record is a group of fields (named, numbered or unnamed) and their corresponding values.
- A record consists of a record marker line followed by any number of record data lines.
- The start of a record is marked by a the capitalised word RECORD or the hash symbol #.
- # and RECORD are equivalent and interchangeable.
- A record marker line may be included on the same line as a record marker.
- example: RECORD Customer
- example: # Customer
- A record data line consists of:
1. The field name at the start
2. then optionally one or more whitespace characters (tab, space, non-breaking space) or tab characters followed by the field value.
- The field name shall never contain a space. A space could be represented by one of the following if absolutely necessary: %20 + - _
- The amount of separating space is abitrary and needn't be consistent.
- field-example-1 Single space.
- field-example-2 Many, many spaces in-between.
-
# Algorithm
1 Whitespace is a tab \t 0x09, or any characters in Unicode General Category Zs (Space Separator).
2 Read line up to (and excluding) LF \n 0x0A
3 Skip any CR \r 0x0D
4 Skip any whitespace characters at the very start and very end of the line
5 Find the first whitespace character (skipping any immediately following whitespace characters). This is the field separator/delimiter.
6 Extract the line's text BEFORE the separator. This is the field name or key.
7 Extract the line's text AFTER the separator, excluding any trailing whitespace. This is the field value.
8 Process the field name '#' or 'RECORD' as the start of a new record, using the name as record type (class) or identity (id).
9 Process the field name '-' or '*' as an unordered list
10 Optionally Process a numeric field name (with optional trailing period) as a numbered list.
11 Otherwise, Process the field name, ideally matching it against a whitelist of field names for the specified record type/identity.
# Finally...
format-author Andrew Kingdom
format-copyright 2023, all rights reserved by the author
format-name Line Record Format
format-license CC-by (Use freely, retain copyright notices)
media-MIME-type text/line-record
file-extension .txt -or- .rl (whitespace separated record lines)
* Line Record refers to the data being one-dimensional, versus a 'flat' two-dimensional data table or hierarchical.
RECORD Customer Example
customer-name Fred Smith
customer-email [email protected]
customer-phone +1 555 123 4567
# Messy Fruit Example
- Grapes
- Oranges
- Peaches
* Mandarines
* Strawberries
* Raspberries
# Neat Fruit Example
- Grapes
- Oranges
- Peaches
- Mandarines
- Strawberries
- Raspberries
// Example code in Objective-C
/* Displays...
Input:
""
" RECORD Customer Example"
" customer-name Fred Smith"
" customer-email [email protected]"
" customer-phone +1 555 123 4567"
" "
" RECORD Fruit Example"
" 1 Grapes"
" 2 Oranges"
" - Peaches "
" * Mandarines"
" * Strawberries"
" * Raspberries"
Output:
# : Customer Example
customer-name : Fred Smith
customer-email : [email protected]
customer-phone : +1 555 123 4567
# : Fruit Example
1 : Grapes
2 : Oranges
- : Peaches
* : Mandarines
* : Strawberries
* : Raspberries
*/
#import <Foundation/Foundation.h>
@interface LineRecordString ()
@property (weak, nonatomic) id<ServerTemplateDelegate> delegate;
@end
@implementation LineRecordString
- (void)testLRS
{
auto test = @"\n RECORD Customer Example\n customer-name Fred Smith\n customer-email [email protected]\n customer-phone +1 555 123 4567\n \n RECORD Fruit Example\n 1 Grapes\n 2 Oranges\n - Peaches \n * Mandarines\n * Strawberries\n * Raspberries";
NSLog(@"Input:");
for (NSString * line in [test componentsSeparatedByString:@"\n"]) {
NSLog(@"\"%@\"",line);
}
NSLog(@"Output:");
auto knownFields = @[@"customer-name",@"customer-email",@"customer-phone"]; // specific field names to include
NSArray<NSDictionary<NSString*,NSString*>*>* arr = [self parseLineRecordString:test recognisedFieldNames:knownFields];
for (NSDictionary<NSString*,NSString*>* item in arr) {
auto key = [item allKeys][0];
auto value = [item allValues][0];
NSLog(@"%@ : %@",key,value);
}
return;
}
// Objective-C example parser
- (NSArray<NSDictionary<NSString*,NSString*>*>*) parseLineRecordString:(NSString*)source recognisedFieldNames:(NSArray<NSString*>*)knownFields
{
auto keyNewrecord = @"RECORD";
auto keyNewrec = @"#";
auto keyTitle = @"TITLE";
auto keyList1 = @"-";
auto keyList2 = @"*";
if(source == nil || source.length == 0) {return nil;}
// Parse the file...
// This is an array of single-record-dictionary key-value pairs, to allow flexibility. This could also be a dictionary using the TITLE field as a key, adding other valid fields to the current record:
NSMutableArray<NSDictionary<NSString*,NSString*>*>* build = [[NSMutableArray<NSDictionary<NSString*,NSString*>*> alloc]init];
auto lines = [[[source
stringByReplacingOccurrencesOfString:@"'" withString:@"\'"]
stringByReplacingOccurrencesOfString:@"\r" withString:@""]
componentsSeparatedByString:@"\n"];
for (NSString * line in lines) {
// two fields separated by any amount of whitespace on the same line.
NSString * trimline = [line stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]; // WWWWWabcdefh ijklmnoWWWWWWW -- remove extreme whitespace (shown as W)
if(trimline.length != 0) {
// Separate the fields (1 or 2) -- field 1 contains no whitespace.
NSString* field1; // key
NSString* field2; // data
NSRange pos = [trimline rangeOfCharacterFromSet:[NSCharacterSet whitespaceCharacterSet]]; // abcdefghWWWWijklmno -- find next whitespace, shown as W
if(pos.location == NSNotFound) {
field1 = trimline;
field2 = @"";
} else {
field1 = [[trimline substringToIndex:pos.location]stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]; // extract field 1
field2 = [[trimline substringFromIndex:pos.location + pos.length] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]; // extract field 2
}
// ...optionally do extra processing here...
// recognised field name...
if([knownFields containsObject: field1]) {
[build addObject:@{field1:field2}];
}
// record field type...
else if(
[field1 isEqualToString:keyNewrecord] ||
[field1 isEqualToString:keyNewrec]) {
[build addObject:@{keyNewrec:field2}]; // use the short version
}
// key field type
else if(
[field1 isEqualToString:keyTitle]) {
[build addObject:@{field1:field2}];
}
// other special field names
else if(
[field1 isEqualToString:keyList1] ||
[field1 isEqualToString:keyList2]
) {
[build addObject:@{field1:field2}];
}
// it's a numbered list
else if( [NSCharacterSet.decimalDigitCharacterSet isSupersetOfSet:[NSCharacterSet characterSetWithCharactersInString:field1]]) {
[build addObject:@{field1:field2}];
}
// else skip the line.
}
}
return build;
}
@end
@akingdom
Copy link
Author

akingdom commented May 8, 2023

Sharing this as a rough reference for clients and developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment