Skip to content

Instantly share code, notes, and snippets.

View kieranjol's full-sized avatar

Kieran O'Leary kieranjol

View GitHub Profile
<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated at 2018-03-22T15:53:42Z by MediaInfoLib - v17.12 -->
<pbcoreInstantiationDocument xsi:schemaLocation="http://www.pbcore.org/PBCore/PBCoreNamespace.html http://pbcore.org/xsd/pbcore-2.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.pbcore.org/PBCore/PBCoreNamespace.html">
<instantiationIdentifier source="File Name">oe6362.mkv</instantiationIdentifier>
<instantiationDate dateType="file modification">2017-02-14T22:46:30Z</instantiationDate>
<instantiationDigital>video/x-matroska</instantiationDigital>
<instantiationLocation>/Volumes/loopline_project/4tb_backup/lto_000067/oe6362/objects/oe6362.mkv</instantiationLocation>
<instantiationMediaType>Moving Image</instantiationMediaType>
<instantiationFileSize unitsOfMeasure="bytes">22406821412</instantiationFileSize>
<instantiationTimeStart>00:00:00.000</instantiationTimeStart>
<?xml version="1.0" encoding="UTF-8"?>
<pbcoreInstantiationDocument xsi:schemaLocation="http://www.pbcore.org/PBCore/PBCoreNamespace.html http://pbcore.org/xsd/pbcore-2.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.pbcore.org/PBCore/PBCoreNamespace.html">
<!-- Generated at 2018-01-19T10:49:07Z by MediaInfoLib - v0.7.86 -->
<instantiationIdentifier source="File Name">a0dad940-5134-4244-8fa6-a24451ba31f5_j2c.mxf</instantiationIdentifier>
<instantiationDate dateType="file modification">2016-03-23T12:49:07Z</instantiationDate>
<instantiationDate dateType="encoded">2016-03-23T12:35:27.000Z</instantiationDate>
<instantiationDigital>application/mxf</instantiationDigital>
<instantiationLocation>/Volumes/IFP RAID MkII/INHOUSE DCP/angel of 1916/ANGELOF1916_SHR_F_177_EN_IE_20_20160323_SMPTE_OV/a0dad940-5134-4244-8fa6-a24451ba31f5_j2c.mxf</instantiationLocation>
<instantiationMediaType>Moving Image</instantiationMediaType>
<instantiationFileSize unitsOfMeasure="bytes">3352892180</instanti
#!/usr/bin/env python
'''
Deletes files after sipcreator has been run, but before accession.py has been run.
Manifests are updated and metadata is deleted.
'''
import os
import argparse
import sys
import shutil
import subprocess
<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated at 2018-03-08T10:07:01Z by MediaInfoLib - v17.12 -->
<pbcoreInstantiationDocument xsi:schemaLocation="http://www.pbcore.org/PBCore/PBCoreNamespace.html http://pbcore.org/xsd/pbcore-2.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.pbcore.org/PBCore/PBCoreNamespace.html">
<instantiationIdentifier source="File Name">febbdf5a-6c2d-46d4-8b04-cf93b60b9b9e.mkv</instantiationIdentifier>
<instantiationDate dateType="file modification">2017-07-19T13:50:43Z</instantiationDate>
<instantiationDigital>video/x-matroska</instantiationDigital>
<instantiationLocation>/Volumes/ifi_2tb_8/hugo_test/oe8147/febbdf5a-6c2d-46d4-8b04-cf93b60b9b9e/objects/febbdf5a-6c2d-46d4-8b04-cf93b60b9b9e.mkv</instantiationLocation>
<instantiationMediaType>Moving Image</instantiationMediaType>
<instantiationFileSize unitsOfMeasure="bytes">14922231079</instantiationFileSize>
<instantiationTimeStart>00:00:00.000</instantiationTimeStart>
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 58 columns, instead of 16 in line 1.
Reference Number,Donor,Edited By,Date Created,Date Last Modified,Film Or Tape,Date Of Donation,Accession Number,Habitat,Type Of Deposit,Depositor Reference,Master Viewing,Language Version,Condition Rating,Companion Elements,EditedNew,FIO,CollectionTitle,Created By,instantiationIdentif,instantiationDate_modified,instantiationDimensi,instantiationStandar,instantiationLocatio,instantMediaty,instantFileSize,instantFileSize_gigs,instantTimeStart,instantDataRate,instantColors,instantLanguage,instantAltMo,essenceTrackEncodvid,essenceFrameRate,essenceTrackSampling,essenceBitDepth_vid,essenceFrameSize,essenceAspectRatio,essenceTrackEncod_au,essenceBitDepth_au,instantiationDuratio,instantiationChanCon,PixelAspectRatio,FrameCount,ColorSpace,ChromaSubsampling,ScanType,Interlacement,Compression_Mode,colour_primaries,transfer_characteris,matrix_coefficients,pix_fmt,audio_fmt,audio_codecid,video_codecid,video_codec_version,video_codec_profile
af10076,,Kieran O'Leary,,,Digital File,,aaa1097,,,,Preservation Master,,,,Kieran O
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 58 columns, instead of 16 in line 1.
Reference Number,Donor,Edited By,Date Created,Date Last Modified,Film Or Tape,Date Of Donation,Accession Number,Habitat,Type Of Deposit,Depositor Reference,Master Viewing,Language Version,Condition Rating,Companion Elements,EditedNew,FIO,CollectionTitle,Created By,instantiationIdentif,instantiationDate_modified,instantiationDimensi,instantiationStandar,instantiationLocatio,instantMediaty,instantFileSize,instantFileSize_gigs,instantTimeStart,instantDataRate,instantColors,instantLanguage,instantAltMo,essenceTrackEncodvid,essenceFrameRate,essenceTrackSampling,essenceBitDepth_vid,essenceFrameSize,essenceAspectRatio,essenceTrackEncod_au,essenceBitDepth_au,instantiationDuratio,instantiationChanCon,PixelAspectRatio,FrameCount,ColorSpace,ChromaSubsampling,ScanType,Interlacement,Compression_Mode,colour_primaries,transfer_characteris,matrix_coefficients,pix_fmt,audio_fmt,audio_codecid,video_codecid,video_codec_version,video_codec_profile
af10050,,Kieran O'Leary,,,Digital File,,aaa0001,,,,Preservation Master,,,,Kieran O
<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated at 2018-02-23T16:40:30Z by MediaInfoLib - v17.12 -->
<pbcoreInstantiationDocument xsi:schemaLocation="http://www.pbcore.org/PBCore/PBCoreNamespace.html https://raw.githubusercontent.com/WGBH/PBCore_2.1/master/pbcore-2.1.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.pbcore.org/PBCore/PBCoreNamespace.html">
<instantiationIdentifier source="File Name">9b4f2f83-e9a9-403f-a124-fde32de78dbc.mkv</instantiationIdentifier>
<instantiationDate dateType="file modification">2017-10-19T14:02:55Z</instantiationDate>
<instantiationDigital>video/x-matroska</instantiationDigital>
<instantiationStandard>Matroska</instantiationStandard>
<instantiationLocation>/Volumes/pegasus2/loopline_project/Tests/aaa0001/9b4f2f83-e9a9-403f-a124-fde32de78dbc/objects/9b4f2f83-e9a9-403f-a124-fde32de78dbc.mkv</instantiationLocation>
<instantiationMediaType>Moving Image</instantiationMediaType>
<instantiationFileSize unitsOfMeasure="bytes">7387497955</insta
Appendix -
Removal of illegal characters
Certain characters in filenames and directories are a preservation risk. They can cause a file to be potentially unreadable by certain tools, including our own in-house scripts, or in some environments, such as LTO, they can be forcibly overwritten in an unpredictable way. Ideally, these characters will be removed or replaced prior to Object Entry, but in the case where the files are already forming part of a SIP/AIP, we should follow the procedure outlined here.
Ideally, we need to remove or replace these characters and document our work appropriately.
The following process will remove an illegal character from a folder name, such as ‘Maureen O’Hara’
Manually remove the illegal character from the file/directory name using OSX Finder.
There is a legacy manifest that can potentially inherit the illegal folder’s name, so the illegal character must be removed from this manifest’s filename. There is no need to perform any find/replaces here within the file itself.
The _
import sys
import os
import subprocess
from lxml import etree
import ififuncs
def get_metadata(xpath_path, root, pbcore_namespace):
value = root.xpath(
xpath_path,
namespaces={'ns':pbcore_namespace}
import sys
import os
import subprocess
from lxml import etree
import ififuncs
def get_metadata(xpath_path, root, mediaconch_namespace):
value = root.xpath(
xpath_path,
namespaces={'ns':mediaconch_namespace}