Kroc/petspec.md

Last active September 11, 2018 20:50

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/Kroc/32fff4fdc1f4e90fdf5df36480128aa3.js"></script>
Save Kroc/32fff4fdc1f4e90fdf5df36480128aa3 to your computer and use it in GitHub Desktop.

Download ZIP

A proposed image format specification for 8-bit Commodore computers.

Raw

petspec.md

Introduction

Version 1.0; 10-SEP-2018.

With the introduction of new tools for the development of Commodore 64 art on modern systems, there has arisen a need for a universal standard image format for the storing and transferring of Commodore image formats between programs, including original hardware.

This specification proposes just such a format that is simple to understand, simple to parse -- even on original hardware -- and simple to implement.

What Exactly is Commodore "art"?

Unlike modern image formats where the image is composed of pixels, a Commodore image can consist of multiple separate layers of different format, including text-characters (PETSCII), separate colour data and global colour such as the foreground / background colour.

Therefore a format is needed that can store the individual parts of an image, but also leave irrelevant parts out.

The Specification

The file extension is ".pet"

Sectors

The PET file format consists of blocks of data split into 254-byte "sectors". This is done to make reading on original hardware very simple, where a disk sector is exactly 254 bytes. A parser can read and process the file a sector at a time and not have to seek backward or forward.

The last sector in a PET file can be truncated, that is have no padding bytes to fill a full sector. All sectors other than last must, of course, be a full 254 bytes. Unused space within a sector should be zero-filled.

The Meta-Data Block

The first block of data is the meta-data block and consists of 1 sector (254-bytes). It stores various image meta-data and properties.

The first four bytes of the meta-data block (and therefore, the file itself) form the "magic number" used to identify a PET file. The four bytes are the characters "PET" and a version number character all in PETSCII; that is, the first four bytes of a file (in hexadecimal) will be:

$50 ("P"), $45 ("E"), $54 ("T"), $31 ("1")

This version number of "1" will only be changed should the file format change to an encoding that would be incompatible with the "version 1" specification.

If your program encounters a version number that is not "1" (in PETSCII), it should not parse the file any further!

Immediately following the "PET1" magic-number is the meta-data table. This table consists of a set of names and values, each entry in the meta-data table consists of 8 bytes.

The table ends when you come across a name consisting of four nulls (0, 0, 0, 0). A further four bytes exist (reserved for future use), like any other entry, but these have no defined value. Your parser should skip over these when reading, and write four zeroes when writing a file. I.e., you should not preserve these bytes if your parser does not understand them.

The first 4 bytes are a name; see the headings on each different name
The next four bytes depend upon the name, see below

If the meta-data table contains no entries other than the terminator (0, 0, 0, 0), the file is still considered "valid", but you should stop parsing and inform the user that the file "contains no image data".

The meta-data names allowed are as follows:

Author

Allows embedding an author's name.

The 4 name bytes used are "AUTH" ($41, $55, $54, $48). The next one byte gives an offset in bytes from the beginning of the sector to some PETSCII text stored anywhere within the sector.

If multiple such meta-data ID entries exist, consider this to mean more than one author

Title

"TITL" ($54, $49, $54, $45): Title; PETSCII text. A title for the image

Date

"DATE" ($44, $41, $54, $45): Date-time for the image; PETSCII numerals:

...

Description

"DESC" ($44, $45, $52, $43): Description; PETSCII text. A long form description of the image

Editor

"EDIT" ($45, $44, $49, $54): Editor; PETSCII text. The editor used to produce the image, e.g. "PETMATE"

The "SRAM" Chunk -- Screen RAM

...

The "CRAM" Chunk -- Colour RAM

...

gnacu commented Sep 11, 2018

Okay, this is following up on what I said on Twitter.

It's way too complicated. I don't know how long it's been since you wrote a serious amount of 6502 code, but 64K of ram is frighteningly little. The C64 OS KERNAL is already 10K, I've missed my target twice, and I have barely even started the Toolkit, and have no networking code at all.

I want screen grabs to be built in, because, they're so handy. But I'm literally counting the bytes and hand tuning the loops trying to strip away everything that isn't absolutely essential. I have no room whatsoever for a parser that needs to deal with interpreting the header and read in variable header lengths, with optional fields, etc.

Since the goal is for this to be usable by the C64 itself, you have to think much simpler. It is so easy to be carried away by the excessive RAM and CPU power of a modern computer. I understand that it's hard to stay constrained to the limits of the C64.

In my opinion, the fields have to be fixed length. And there has to be a fixed number of them. I'm open to discussing what those fields are and how big they are. But they cannot be variable, or require searching for patterns, and not knowing ahead of time how much meta data you're going to encounter. On a PC/Mac, even in a script language like Javascript or PHP, this stuff is easy peasy. But on a C64, there simply isn't the space.

Let me rewrite my suggestions from the comments on PETMATE, given the ideas you have above. Pardon the formatting, Its just easier for me to think in terms of code.

.byte $50, $45, $54, $31 ;PET1 (Magic and version number)
.buf 17 ; 16 bytes for a title, null padded, with trailing null.
.buf 17 ; 16 bytes for an author, null padded, with trailing null.
.buf 17; 16 bytes for release info. First 4 should be a year. null padded with trailing null.
.buf 1000 ; 1000 bytes of screen codes
.buf 1000 ; 1000 bytes of color memory
.buf 1 ; Background color
.buf 1 ; Border color.

"Parsing" thus, consists of a preallocated block of memory 55 bytes big ((17*3)+4). Like a C struct. Loading that from disk into memory is a single loop that reads in 55 bytes. The title, author and release info strings end up in memory pre-null-terminated, like C-strings, ready to be drawn to screen with a pre-existing routine that knows to interpret null as the end of the string.

After that, it needs a loop to read 1000 bytes and write them directly to wherever you want screen memory to be. Then a loop to read 1000 bytes and write them either to color memory or a color memory buffer.

If the program doesn't care about the metadata, it can reserve just 4 bytes for the magic and version number. Read in the first 4 bytes to populate that. Confirm that the magic and version are correct. If they are, you could then read and throw away exactly 51 bytes, in a loop with 51 iterations. And then know exactly where the screen codes will begin.

A parser for the above can be written in just a tiny handful of bytes, total. I'd have to count them, but I can imagine. Maybe I'll go write a quick example of the code that can use this to show you how small it can be.

Author

Kroc commented Sep 11, 2018

If you refresh I've updated the document to work in term of sectors (ignore the meta-data names). Do you have enough RAM to store 1 sector at a time? If all the meta-data fits within 1 sector, then it can be a little more flexible, with offsets into the sector to specify PETSCII strings.

gnacu commented Sep 11, 2018

While you were doing that, I wrote the code to save a screenshot, using the format I've suggested.

https://gist.github.com/gnacu/55114795506b0b3270db1109d8e30957

The heaviest part, to be honest, is the title, author and release, which were not part of my original spec suggestion, but I do think they're a good idea, for a more general format for artwork.

gnacu commented Sep 11, 2018 •

edited

Loading

I did the counting, on my code. Writing a screenshot, with my proposed format requires 133 bytes. That's a lot, to be honest. But, almost 50% is in the header. Plus, it requires 1 free page of memory which gets allocated and deallocated at the beginning and end of the routine.

55 bytes for the header struct.
11 bytes for the filename.

66 bytes for data.
67 bytes for the code.

That takes up just over half a page of KERNAL memory. That's a big commitment for a small feature. Any more, and it's just not worth it. I can't have a screen capture routine that takes up more memory than say, the memory manager.

Author

Kroc commented Sep 11, 2018

OK, code needs to be kept simple, but on the other hand, the format you describe leaves absolutely no room for expansion at all. This will never get anywhere with people writing their own tools with their own features -- there are already editors that can redefine some characters, and you'd need to store this data along with the file. What about additional meta-data? You will have to support some flexibility.

The first whole sector could be just meta-data and you could read out just the pertinent bits directly, like background/border colour and then move on to the next sector. Each sector what start with a word to say what kind of data it is, so you can dump it where it needs to go; at least that part would be easy.

The tricky thing then would be making the meta-data sector easy to read/write in the least amount of code.

gnacu commented Sep 11, 2018 •

edited

Loading

First, here, I also wrote a primitive viewer, for this format. Granted, it's calling C64 OS routines, but that's the point. The routines decrease the amount of code necessary to get something done. To write a similar viewer for the bare KERNAL rom you'd have to actually write out the 16-bit loops for reading in the two blocks of 1000 bytes.

https://gist.github.com/gnacu/c5ad52836290c925a93a707a77c7662e

Ignoring the meta data, because, that's a valid thing to do, this viewer program is just 89 bytes, including validation of the magic and version number.

Next, to answer your question: "the format you describe leaves absolutely no room for expansion at all"

PETSCII images have been structured exactly the same way for almost 40 years. The machine is small. The world is simple. That's half the fun. And, if you're looking for future expandability, that's the point of the version number. If at some future date a significant interest in a few additional fields (or the ability to specify different screen resolutions, etc) comes about, then release a version 2 of the spec.

Look at this page: http://codebase64.org/doku.php?id=base:c64_grafix_files_specs_list_v0.03

It lists ~41 (I may have miscounted) C64 bitmapped image formats. (PETSCII art is not among them.) They are all as simple, perhaps simpler, than the format I propose. A PETSCII image file format should look at home on that page, alongside those other formats.

I was able to write both a creator and a viewer, in a matter of an hour, for my proposed format. You do the same, and then if it's easy and simple to implement with a reasonably small code footprint, then at least you have an argument that it's a good and suitable format for the platform.

Oh, before I forget. Thinking about sectors, and how they're 254 byte chunks, is not useful in my opinion. The KERNAL has no special support for loading in 254 byte chunks, nor for skipping over unnecessary sectors. If you take a 16 byte string field, and align it but ultimately let it sit inside its own entire 254 byte sector, you'll waste a huge amount of space on disk, and you'll force the user to load in gobs of empty space from the disk, over a very slow bus. You can only profit from sector layout tricks (like GEOS does with its VLIR format) if you marry yourself to the 1541 and write your code to send special commands to its DOS. It's 2018, SD2IEC is very popular. So, that's a bad idea.

Author

Kroc commented Sep 11, 2018

In the case of C64OS; how do you handle taking screenshots with custom characters, that will vary from one app / utility to another? For true portability to other systems, including the web, you'd also want a way to include the custom character definitions.

gnacu commented Sep 11, 2018

By the way. I'm not trying to be a jerk. But if one proposes a format and hasn't tried to implement a parser/creator for it, in 6502/10 assembly, then they don't really know how tricky that format will be to deal with. Whenever I write a format (such as the human readable/editable menu file format for an application's menus in C64 OS, or the desktop application link files, or a fileref serialization, etc), I write the format and the code needed to deal with it at the same time. The writing of the code almost always exposes a weakness in the data format, and the two negotiate with each other until some happy medium is reached: Small format, easy to read, easy to write, easy to allocate memory for, doesn't require much code to deal with, meets the essential needs of the solution, and sometimes is human readable/editable.

gnacu commented Sep 11, 2018 •

edited

Loading

That's a very good point. And that's exactly the sort of point that I'm glad is made and the reason for having discussions with others at all.

C64 OS supports loadable character sets. But the character sets are separate files. PETSCII "art" is usually made with the default character rom. But even then, there should be at least one byte (we discussed this with nupax) for specifying upper/lower or upper/graphics character sets. For C64 OS screenshots, they would look broken if the default character set were used. I'm not sure of the best way to handle that. I'm open for discussion.

One way would be to pack 2K of bitmap data at the END of the file, as a custom character set. Plus add one byte in the header to specify if it should be upper/lower, upper/graphics, or custom.

An alternative would be to ship the character set separately from the data file, and put the character set file name in the header.

Another alternative, would be to publish characterset byte values for popular character sets. One of which could be reserved for C64 OS's charset. 2 for the default character rom sets, and then leave the other 253 values to be defined by the community for other popular character sets.

Author

Kroc commented Sep 11, 2018

The data block at the end for custom characters could be 1 byte to specify which char is being defined, then the 8 bytes for the graphic. This way you could include only the characters that are actually redefined. This would also bind the definitions to the screen codes used in the screen data, so that the screenshot would be preserved accurately in the future and on other systems too.

gnacu commented Sep 11, 2018

Okay, I have two more thoughts on my own last comment.

I didn't take into account what happens when an app customizes some small available portion of the characterset, for example, to draw an icon, or a logo. When a different app is loaded, that app may change just those 9 or 12 characters for its own little graphical flourishes.

I don't know how to handle that. But, it's a good time to think about it.

To clarify what I meant by the published list, I mean, in a common place, like codebase64.org, or c64-wiki.org, the community could allocate single byte values to specify whole character sets. i.e.

0 = default upper/lower
1 = default upper/graphics
2 = Contiki
3 = C64 OS
4 = LUnix
5 = GeckOS
... etc.

This would not support truly custom character sets, it would just allow the format to support a wide variety (up to 256) of common pre-existing character sets for different platforms. It does not however address the issue of point 1.

gnacu commented Sep 11, 2018

The data block at the end for custom characters could be 1 byte to specify which char is being defined, then the 8 bytes for the graphic. This way you could include only the characters that are actually redefined. This would also bind the definitions to the screen codes used in the screen data, so that the screenshot would be preserved accurately in the future and on other systems too.

Actually, I like that a lot. But, perhaps one byte in the header to specify the rom character set that's being modified. A stand alone viewer could then copy the correct rom charset into ram, and modify it with the data at the end of the file.

C64 OS, would just need to encode the characters it knows are custom AND which are in use in the screen data for that particular capture.

Author

Kroc commented Sep 11, 2018

... one byte in the header to specify the rom character set that's being modified

I forget to mention that, but yes, I really like the use of the PETSCII code for upper/lower case to mark that.

Kroc/petspec.md

Introduction

What Exactly is Commodore "art"?

The Specification

Sectors

The Meta-Data Block

Author

Title

Date

Description

Editor

The "SRAM" Chunk -- Screen RAM

The "CRAM" Chunk -- Colour RAM

gnacu commented Sep 11, 2018

Kroc commented Sep 11, 2018

gnacu commented Sep 11, 2018

gnacu commented Sep 11, 2018 • edited Loading

Kroc commented Sep 11, 2018

gnacu commented Sep 11, 2018 • edited Loading

Kroc commented Sep 11, 2018

gnacu commented Sep 11, 2018

gnacu commented Sep 11, 2018 • edited Loading

Kroc commented Sep 11, 2018

gnacu commented Sep 11, 2018

gnacu commented Sep 11, 2018

Kroc commented Sep 11, 2018

gnacu commented Sep 11, 2018 •

edited

Loading

gnacu commented Sep 11, 2018 •

edited

Loading

gnacu commented Sep 11, 2018 •

edited

Loading