Skip to content

Instantly share code, notes, and snippets.

@ross-spencer
Last active April 17, 2018 22:51
Show Gist options
  • Select an option

  • Save ross-spencer/76deba8c0ea30e1e75cbd944a9551aea to your computer and use it in GitHub Desktop.

Select an option

Save ross-spencer/76deba8c0ea30e1e75cbd944a9551aea to your computer and use it in GitHub Desktop.
Fido Analysis

Exes in PRONOM

Signature in Sig File

    <FileFormat ID="1704"
        MIMEType="application/vnd.microsoft.portable-executable"
        Name="Windows Portable Executable" PUID="fmt/899" Version="32 bit">
        <InternalSignatureID>1249</InternalSignatureID>
        <Extension>dll</Extension>
        <Extension>exe</Extension>
        <Extension>sys</Extension>
        <HasPriorityOverFileFormatID>774</HasPriorityOverFileFormatID>
        <HasPriorityOverFileFormatID>775</HasPriorityOverFileFormatID>
        <HasPriorityOverFileFormatID>776</HasPriorityOverFileFormatID>
    </FileFormat>

    <InternalSignature ID="1249" Specificity="Specific">
        <ByteSequence Reference="BOFoffset">
            <SubSequence MinFragLength="128" Position="1"
                SubSeqMaxOffset="0" SubSeqMinOffset="0">
                <Sequence>50450000</Sequence>
                <DefaultShift>5</DefaultShift>
                <Shift Byte="00">1</Shift>
                <Shift Byte="45">3</Shift>
                <Shift Byte="50">4</Shift>
                <LeftFragment MaxOffset="128500" MinOffset="126" Position="1">4D5A</LeftFragment>
                <RightFragment MaxOffset="20" MinOffset="20" Position="1">0B01</RightFragment>
                <RightFragment MaxOffset="66" MinOffset="66" Position="2">[0000:1000]</RightFragment>
            </SubSequence>
        </ByteSequence>
    </InternalSignature>

Link to sample (skeleton) files to test

https://drive.google.com/drive/folders/1vMD07QLNVN1Ycz6N870xHIDfy8UKOWdY?usp=sharing

Expected results

File names denote the sequence type used to match or not against [0000:1000]

✔ | [0800] fmt-899-signature-id-1249.exe  
✔ | [0FFF] fmt-900-signature-id-1251.exe  
X | [1111] fmt-899-signature-id-1249.exe
✔ | [0800] fmt-900-signature-id-1251.exe  
X | [1100] fmt-900-signature-id-1251.exe

Actual results

✔ | [0800] fmt-899-signature-id-1249.exe  
✔ | [0FFF] fmt-900-signature-id-1251.exe  
X | [1111] fmt-899-signature-id-1249.exe
✔ | [0800] fmt-900-signature-id-1251.exe  
X | [1100] fmt-900-signature-id-1251.exe

Fido Analysis

Summary: Fido incorrectly converts [!] to (!) instead of a negative lookahead (?!)

Sample files that will match after the fix: https://drive.google.com/drive/folders/1EdKspgAf9xKA5PhoXSJIMRKBlD6TcuUU?usp=sharing (caveat, one file might not)

Purpose of Pronom [!a] match group

[!a]: wildcard matching any sequence of bytes other than a itself (where a is a byte sequence containing no wildcards).

e.g. 0xFF [!09] FF would match 0xFF 0A FF, but not 0xFF 09 FF. Digital Preservation Technical Paper 1: Automatic Format Identification Using PRONOM and DROID Page 9 of 33

Testing

Fido Before:

FIDO v1.3.7 (formats-v92.xml, container-signature-20170920.xml, format_extensions.xml)
OK,552,fmt/6,"Waveform Audio","WAVE",52,"fmt-142-signature-id-607.wav","audio/x-wav","signature"
OK,34,x-fmt/223,"Autodesk Animator CEL File Format","External",10,"x-fmt-223-signature-id-337.cel","None","extension"
OK,34,x-fmt/342,"Microsoft FoxPro Memo","External",521,"x-fmt-342-signature-id-555.fpt","None","extension"
FIDO: Processed      3 files in 954.51 msec,  3 files/sec

Result 1 is a false-positive and the other two do not match

Fido After:

FIDO v1.3.7 (formats-v92.xml, container-signature-20170920.xml, format_extensions.xml)
OK,529,fmt/142,"Waveform Audio (WAVEFORMATEX)","Waveform Audio (WAVEFORMATEX)",52,"fmt-142-signature-id-607.wav","audio/x-wav","signature"
OK,47,x-fmt/223,"Autodesk Animator CEL File Format","Autodesk Animator CEL Image",10,"x-fmt-223-signature-id-337.cel","None","signature"
OK,40,x-fmt/342,"Microsoft FoxPro Memo","External",521,"x-fmt-342-signature-id-555.fpt","None","extension"
FIDO: Processed      3 files in 956.57 msec,  3 files/sec

Results one and two match, there is an issue with result 3, see here.

Files matching in the Skeleton Corpus:

Corpus: https://github.com/exponential-decay/pronom-archive-and-skeleton-test-suite/releases/tag/skeleton-test-suite-2017-11-30-sig-file-v93

Filenames:

fmt-142-signature-id-607.wav  
x-fmt-223-signature-id-337.cel  
x-fmt-342-signature-id-555.fpt

Examples in the PRONOM signature file

<FileFormat ID="506" Name="Microsoft FoxPro Memo" PUID="x-fmt/342">
    <InternalSignatureID>555</InternalSignatureID>
    <InternalSignatureID>556</InternalSignatureID>
    <InternalSignatureID>557</InternalSignatureID>
    <Extension>fpt</Extension>
    <Extension>frt</Extension>
    <Extension>pjt</Extension>
    <Extension>vct</Extension>
</FileFormat>

<InternalSignature ID="555" Specificity="Specific">
    <ByteSequence Reference="BOFoffset">
        <SubSequence MinFragLength="4" Position="1"
            SubSeqMaxOffset="0" SubSeqMinOffset="0">
            <Sequence>000000</Sequence>
            <DefaultShift>4</DefaultShift>
            <Shift Byte="00">1</Shift>
            <LeftFragment MaxOffset="3" MinOffset="3" Position="1">00</LeftFragment>
            <RightFragment MaxOffset="0" MinOffset="0" Position="1">[!00]</RightFragment>
            <RightFragment MaxOffset="504" MinOffset="504" Position="2">000000[00:02]</RightFragment>
            <RightFragment MaxOffset="4" MinOffset="4" Position="3">[!00]</RightFragment>
        </SubSequence>
    </ByteSequence>
</InternalSignature>

<FileFormat ID="315" Name="Autodesk Animator CEL File Format" PUID="x-fmt/223">
    <InternalSignatureID>337</InternalSignatureID>
    <Extension>cel</Extension>
</FileFormat>

<InternalSignature ID="337" Specificity="Specific">
    <ByteSequence Endianness="Little-endian" Reference="BOFoffset">
        <SubSequence MinFragLength="0" Position="1"
            SubSeqMaxOffset="0" SubSeqMinOffset="0">
            <Sequence>1991</Sequence>
            <DefaultShift>3</DefaultShift>
            <Shift Byte="19">2</Shift>
            <Shift Byte="91">1</Shift>
            <RightFragment MaxOffset="0" MinOffset="0" Position="1">[!4001C80000000000]</RightFragment>
        </SubSequence>
    </ByteSequence>
</InternalSignature>

<FileFormat ID="785" MIMEType="audio/x-wav"
    Name="Waveform Audio (WAVEFORMATEX)" PUID="fmt/142">
    <InternalSignatureID>607</InternalSignatureID>
    <Extension>wav</Extension>
    <Extension>wave</Extension>
    <HasPriorityOverFileFormatID>654</HasPriorityOverFileFormatID>
</FileFormat>

<InternalSignature ID="607" Specificity="Specific">
    <ByteSequence Endianness="Big-endian" Reference="BOFoffset">
        <SubSequence MinFragLength="8" Position="1"
            SubSeqMaxOffset="0" SubSeqMinOffset="0">
            <Sequence>57415645666D7420</Sequence>
            <DefaultShift>9</DefaultShift>
            <Shift Byte="20">1</Shift>
            <Shift Byte="41">7</Shift>
            <Shift Byte="45">5</Shift>
            <Shift Byte="56">6</Shift>
            <Shift Byte="57">8</Shift>
            <Shift Byte="66">4</Shift>
            <Shift Byte="6D">3</Shift>
            <Shift Byte="74">2</Shift>
            <LeftFragment MaxOffset="4" MinOffset="4" Position="1">52494646</LeftFragment>
            <RightFragment MaxOffset="0" MinOffset="0" Position="1">[!10]</RightFragment>
            <RightFragment MaxOffset="3" MinOffset="3" Position="2">[!FEFF]</RightFragment>
        </SubSequence>
        <SubSequence MinFragLength="0" Position="2" SubSeqMinOffset="16">
            <Sequence>64617461</Sequence>
            <DefaultShift>5</DefaultShift>
            <Shift Byte="61">1</Shift>
            <Shift Byte="64">4</Shift>
            <Shift Byte="74">2</Shift>
        </SubSequence>
    </ByteSequence>
</InternalSignature>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment