Experimental attempt at getting organized ...
https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments
Text fragments allow linking directly to a specific portion of text in a web document, without requiring the author to annotate it with an ID, using particular syntax in the URL fragment.
Example:
Firefox add-on that automatically creates text fragment link for selected text:
https://addons.mozilla.org/en-US/firefox/addon/text-fragment/
Your locally hosted one-stop-shop for all your PDF needs.
- Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
- Advanced PDF document understanding incl. page layout, reading order & table structures
https://ds4sd.github.io/docling/
These high-resolution high-precision images have been carefully selected to aid in image compression research and algorithm evaluation. These are photographic images chosen to come from a wide variety of sources and each one picked to stress different aspects of algorithms. Images are available in 8-bit, 16-bit and 16-bit linear variations, RGB and gray.
https://imagecompression.info/test_images/
This is a collection of sample files published on https://samplelib.com for easier access & usage.
https://github.com/ffeast/samplelib
DPC Technology Watch Guidance Note (2024):
http://doi.org/10.7207/twgn24-02
unzstd mingw-w64-x86_64-poppler-24.08.0-1-any.pkg.tar.zst
A command-line installer for Windows:
oschwartz10612/poppler-windows#42
cbrunet/python-poppler#9 (comment)
This fails for me in last step (pip install python-poppler) with a PermissionError on the line:
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
This also triggers an "action blocked" notification from the Window Security application, which seems to block launching an executable.
https://thomasw.dev/post/mac-floppy-emu/
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
- ⚡ Converts any PDF document to JSON or Markdown format, stable and lightning fast
- 📑 Understands detailed page layout, reading order and recovers table structures
- 📝 Extracts metadata from the document, such as title, authors, references and language
- 🔍 Optionally applies OCR (use with scanned PDFs)
https://github.com/DS4SD/docling
FileTrove indexes files and creates metadata from them.
https://github.com/steffenfritz/FileTrove
Actually this doesn't seem to work at all (except for hyperlinks)!
A short book with 6 steps that get you closer to making your work reproducible.
https://zenodo.org/records/12744715
The Research Data Management Workbook is made up of a collection of exercises for researchers to improve their data management.
https://caltechlibrary.github.io/RDMworkbook/
https://github.com/osnr/horrifying-pdf-experiments
Diff-pdf is a tool for visually comparing two PDFs:
https://github.com/vslavik/diff-pdf
The purpose of this book is to empower scientists, researchers, and students with the knowledge and skills needed to use Git for version control of code and data.
https://lennartwittkuhn.com/version-control-book/
PhD thesis Maureen Pennock:
https://discovery.dundee.ac.uk/en/studentTheses/disentangling-digital-preservation-risk
This explains setup on Jekyll site:
https://amytabb.com/til/2022/12/03/mastodon-preview-cards/
https://docs.joinmastodon.org/entities/PreviewCard/
Open Graph protocol:
Also:
Creating Twitter cards on Jekyll websites
Section 6.1.2 of EPUB 3.3 spec:
An XHTML content document:
MUST be an [html] document that conforms to the XML syntax.
Referenced section in HTML spec (14 The XML syntax) shows this warning:
Using the XML syntax is not recommended, for reasons which include the fact that there is no specification which defines the rules for how an XML parser must map a string of bytes or characters into a Document object, as well as the fact that the XML syntax is essentially unmaintained — in that, it’s not expected that any further features will ever be added to the XML syntax (even when such features have been added to the HTML syntax).
Consequences for future of EPUB?
https://jamesg.blog/2024/05/29/nanosearch/
DWSampleFiles.com provides plethora of different files types and extensions. Download files for testing purposes in many different sizes, bitrates, or resolutions
https://www.dwsamplefiles.com/
https://web.archive.org/web/20230107081641/https://filingdb.com/b/pdf-text-extraction
LFO-driven, midi-mangling arpeggiator:
https://www.mucoder.net/en/hypercyclic/
https://qmidiarp.sourceforge.net/
Installation:
sudo apt install qmidiarp
NIH-plug is an API-agnostic audio plugin framework written in Rust, as well as a small collection of plugins.
https://github.com/robbert-vdh/nih-plug/
MIDI example:
https://github.com/robbert-vdh/nih-plug/blob/master/plugins/examples/midi_inverter/src/lib.rs
VST plugin that enables using Python to process MIDI and audio in the DAW (VST not yet released; only server code):
https://github.com/AudioFluff/PyPhonic
Docs:
https://audiofluff.github.io/PyPhonic/
https://tedium.co/2024/05/17/google-web-search-make-default/
[M]odern computing gives us two main ways of displaying a letter with an accent. The first is simple - encode every single accented letter as a separate "pre-composed" character. (...)
[T]there is a second way to add accents. You take the base character (...) and then apply a separate "combining" accent character to it.
https://shkspr.mobi/blog/2024/05/accents-and-ebooks/
A new ISO extension to PDF 2.0 adds PDF support for the Khronos Group’s glTF 3D format. (...) PDF 2.0 therefore now supports four 3D formats:
https://pdfa.org/pdf-2-0-adds-gltf-model-support/
Script to generate troublesome filenames from the big list of naughty strings
https://github.com/ross-spencer/big-list-of-naughty-files
A modern, highly customizable, and responsive Jekyll theme for documentation with built-in search. Easily hosted on GitHub Pages with few dependencies.
https://github.com/just-the-docs/just-the-docs
subprocess.run() is synchronous which means that the system will wait till it finishes before moving on to the next command. subprocess.Popen() does the same thing but it is asynchronous (the system will not wait for it to finish).
Source:
https://stackoverflow.com/a/71896704/1209004
Cheat sheet that covers tools, common commands, and other information for analyzing malicious documents, such as Word, OneNote and PDF:
https://www.thecyberyeti.com/_files/ugd/b84265_d6d2f6486f6b41419aa9f1cd34027392.pdf
Python driver for acronova's nimbie NB21:
https://github.com/mattsoulanille/nimbie-py
https://themediocreprogrammer.com/
ARIA is an online tool that enables users and building managers to assess the risk of SARS-COV-2 (COVID-19) airborne transmission in residential, public, and healthcare settings. The aim is to inform decisions that can significantly reduce the risk of transmission.
https://partnersplatform.who.int/tools/aria
https://www.dedoimedo.com/computers/windows-11-usability-guide.html
Particularly interesting - Open-Shell, "a collection of utilities bringing back classic features to Windows" (including Start Menu!):
https://github.com/Open-Shell/Open-Shell-Menu
What is a “color space?”
https://ericportis.com/posts/2024/okay-color-spaces/
Perceptual image quality metric developed by Jon Sneyers (Cloudinary):
https://github.com/cloudinary/ssimulacra2
Experimental website to browse and search vintage computer files from archive.org:
https://discmaster.textfiles.com/
https://news.speedata.de/2024/03/19/insidepdf-01/
https://support.mozilla.org/gu-IN/questions/1363441#answer-1471948
https://github.com/adrianlopezroche/fdupes
Duc is a collection of tools for inspecting and visualizing disk usage.
https://github.com/digipres/policies/
Processing some largish images, ImageMagick would fail with this error:
cache resources exhausted
A quick search led me to this StackOverflow thread, which explains how it is related to settings in ImageMagick's security policy file. On my machine this is located at /etc/ImageMagick-6/policy.xml
. In this file you can set limits to the resources (e.g. memory) ImageMagick is allowed to use, and the defaults are very restrictive. After some fiddling, I managed to make things work by setting the values below:
<policy domain="resource" name="memory" value="4GiB"/>
<policy domain="resource" name="map" value="4GiB"/>
<policy domain="resource" name="width" value="32KP"/>
<policy domain="resource" name="height" value="32KP"/>
<policy domain="resource" name="area" value="1GiB"/>
<policy domain="resource" name="disk" value="4GiB"/>
Artist_Exhibition-copy (FINAL)(2).mov: Preserving diacritics in filenames as significant properties in media conservation
(Include useful list of tools at bottom of post)
https://wizardzines.com/comics/bash-errors/
- Select "Messages"
- Select "info services"
- Go to settings
- Activate "service" setting
- Go to "active channels"
- Add new channel, set value to 919 (source)
https://www.nationaalarchief.nl/archiveren/kennisbank/voorkeursformaten-overheid
Python-docx (docx):
https://python-docx.readthedocs.io/
Python-pptx (pptx):
https://python-pptx.readthedocs.io/
Openpyxl (xlsx):
https://openpyxl.readthedocs.io/
https://en.wikipedia.org/wiki/IETF_language_tag
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
With the help of state-of-the-art deep learning models, Layout Parser enables extracting complicated document structures using only several lines of code. This method is also more robust and generalizable as no sophisticated rules are involved in this process.
https://layout-parser.github.io/
Stract is an open source search engine where the user has the ability to see exactly what is going on and customize almost everything about their search results. It's a search engine made for hackers and tinkerers just like ourselves.
https://gist.github.com/atomotic/8cdb9f233136eea2ad507bb6940c5c8e
Source: Mastodon.
https://lil.law.harvard.edu/blog/2024/02/08/the-cost-of-a-digital-archive/
https://github.com/mhucka/readmine
https://shkspr.mobi/blog/2024/02/safelinks-are-a-fragile-foundation-for-publishing/
https://visualstudiomagazine.com/articles/2024/01/25/copilot-research.aspx
Direct link to whitepaper:
https://gitclear-public.s3.us-west-2.amazonaws.com/Coding-on-Copilot-2024-Developer-Research.pdf
Explained here.
Workaround - added following line to /etc/hosts (this should prevent that the live domain is ever reached):
192.168.178.1 fritz.box
Search Console tools and reports help you measure your site's Search traffic and performance, fix issues, and make your site shine in Google Search results
https://search.google.com/search-console?resource_id=https://www.bitsgalore.org/
This post explores what prevents HTML documents from being portable, and I propose a way forward based on the EPUB format.
https://willcrichton.net/notes/portable-epubs/#epub-content%2FEPUB%2Findex.xhtml$
https://www.dpconline.org/blog/blog-purnell-ipad-apps
Over the six weeks of this mini-seminar we will learn some elements of plain-text computing that every graduate student in the social sciences (and beyond!) should know something about.
https://github.com/jmaxsfu/pub607-23
https://explainextended.com/2023/12/31/happy-new-year-15/
Insight into the hidden ecosystem of autonomous chatbots and data scrapers crawling across the web. Protect your website from unwanted AI agent access.
robots.txt example:
https://darkvisitors.com/robots-txt-builder
Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames.
https://en.wikipedia.org/wiki/Punycode
By Elvia Arroyo-Ramirez, addresses, amongst other things, ethical aspects of "sanitizing" file names (original article now semi-paywalled):
A static site generator for audio producers
https://simonrepp.com/faircamp/
Includes useful table that maps presets to BPM:
http://notebook.zoeblade.com/MicroVerb_III_guide.html
Includes interactive utilities for identifying and selecting connectors:
- Open uBlock Origin dashboard
- Click "Filter Lists"
- Click "Purge all caches"
- Click "Update now"
- Use this site to check if uBlock Origin bypasses the latest YouTube anti-adblock script
Source: https://www.youtube.com/watch?v=Bhy66w5nVK0
Info on packages for Linux and Unix:
Example:
https://linux-packages.com/linux-mint-20-3/package/python3-jpylyzer
This query shows a bar chart of the 1000 file formats which have the highest number of supporting applications (applications that can read data in this format).
https://www.wikidata.org/wiki/User:Dipsode87#Most_commonly_supported_file_formats
Follow-up to Joel Spolsky's classic post:
https://tonsky.me/blog/unicode/
A full list of dead products killed by Microsoft in the Microsoft Cemetery:
https://killedbymicrosoft.info/
The PDF files in this repository are targeted test files highlighting specific issues seen across multiple widely-used implementations.
https://github.com/pdf-association/pdf-differences
Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure.
Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other website framework.
How uninstall/remove:
Mentions CheckInstall tool, which wraps around make install
and keeps track of every file modified by this installation. Instructions here:
https://askubuntu.com/a/1278739/1052776
Also, for Cmake installed applications there should be a file install_manifest.txt
in the build dir which lists all installed files.
SciDraw is a free repository of high quality drawings of animals, scientific setups, and anything that might be useful for scientific presentations and posters.
A commandline utility to search text in PDF files:
To help developers whose relationship with PDF’s specification is casual or tangential, the PDF Association provides free PDF “cheat sheets” to aid in remembering key terms and concepts without constantly referring to ISO 32000.
https://pdfa.org/resource/pdf-cheat-sheets/
Standard (incl JPH format, which looks largely identical to JP2 + codestream, with some minor deviations:
https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.814-201906-I!!PDF-E&type=items
Evaluating HTJ2K as a Drop-In Replacement for JPEG2000 with IIIF:
https://journal.code4lib.org/articles/17596
A tool to help you explore FFmpeg filters.
Acrobat not affected as per below statement by Adobe:
https://helpx.adobe.com/fonts/kb/postscript-type-1-fonts-end-of-support.html
Online version of Audacity:
Use wizard:
(note colon!) as input
convert -quality 40 wizard: wizard-40.jpg
Error message:
ATTENTION: The playback device "hw:USB" is already in use. Please stop the application using it and run JACK again
cannot load driver module alsa
no message buffer overruns
Works again after this (source):
systemctl --user stop pulseaudio.socket
systemctl --user stop pulseaudio.service
https://github.com/simonrdavies/NapierOne
Data:
http://napierone.com/Website/index.html
A Sophisticated CSV Editor/Viewer for Windows, Mac, and Linux
https://tutorial.djangogirls.org/en/intro_to_command_line/
Syncthing is a continuous file synchronization program. It synchronizes files between two or more computers in real time, safely protected from prying eyes.
Rclone ("rsync for cloud storage") is a command-line program to sync files and directories to and from different cloud storage providers.
https://github.com/rclone/rclone
Post by Andy Jackson on Rclone:
https://anjackson.net/2023/07/04/robust-file-transfers-with-rclone/
Health & Safety Authority (IE):
ls -1
Result:
files-origin.md
pdf-hul-106
pdf-hul-109
pdf-hul-133
pdf-hul-137
pdf-hul-138
pdf-hul-154
pdf-hul-36
pdf-hul-4
What is the point? The motivation for adopting different tools inside the #digitalpreservation workflow…
-
Subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names.
-
Majority of PDF redaction software tool-kits do not defend against these glyph displacement attacks.
-
In general, redacting a name from a PDF is not secure.
https://arxiv.org/pdf/2206.02285.pdf
This corpus contains nearly 8 million PDFs gathered from across the web in July/August of 2021. The PDF files were initially identified by Common Crawl as part of their July/August 2021 crawl (identified as CC-MAIN-2021-31) and subsequently updated and collated as part of the DARPA SafeDocs program.
https://digitalcorpora.org/corpora/file-corpora/cc-main-2021-31-pdf-untruncated/
Example below returns all text values wrapped inside "myElement" elements (also works with namespaces):
xmllint --xpath "//*[local-name()='myElement']/text()" myfile.xml
Result:
myValue
This software analyzes the formats of given files and outputs RDF description of their contents.
Fork of youtube-dl (original youtube-dl is still maintained, changes haven't resulted in updated releases for a long time):
https://github.com/yt-dlp/yt-dlp
Convert to audio (FLAC, highest quality) and discard video:
yt-dlp -x --audio-format flac --audio-quality 0 https://www.youtube.com/watch?v=xxxxxxxxxxx
https://unix.stackexchange.com/questions/222359/how-to-corrupt-an-archive-file-in-a-controlled-way
By default TApache ika uses OCR of images on text extraction if Tesseract is installed. This can be disabled in the tika.xml config file:
https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#TikaOCR-disable-ocr
Location of config file can be given as argument:
https://tika.apache.org/1.9/configuring.html#Using_a_Tika_Configuration_XML_file
So to make this work we create a file "tika-config.xml" with the following content:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
</parsers>
</properties>
Then in Tika use the --config
option to set the path to this file:
java -jar ~/tika/tika-app-2.6.0.jar --config=tika-config.xml -t whatever.epub > whatever.txt
Alternative option is to uninstall Tesseract.
https://shkspr.mobi/blog/2023/02/page-numbers-arent-the-answer/
https://www.theverge.com/2023/2/10/23593980/microsoft-bing-chatgpt-ai-teams-outlook-integration
identify -format '%Q\n' /ecur-001.jpg
Result:
92
All files in directory, result to CSV file:
identify -format '%f,%Q\n' ./images-BKT/* > BKT-quality.csv
Source:
https://stackoverflow.com/a/18378080/1209004
ImageMagick percent escapes:
https://imagemagick.org/script/escape.php
Chromium fork for Linux, MacOS, Raspberry Pi, and Windows named after radioactive element No. 90.
https://issues.apache.org/jira/browse/TIKA-3968
https://towardsdatascience.com/virtual-environments-104c62d48c54
Use Computation to Predict and Explain the World by Allen B. Downey
https://nostarch.com/modeling-and-simulation-python
https://allendowney.github.io/ModSimPy/
Here you'll find my various research, notes, and random information about various kinds of DRM, and DRM tangential information.
https://github.com/TheRogueArchivist/DRML
Does anyone under the age of 50 work on codecs anymore? Have we made the barrier to entry so high that you need to spend 10 years banging your head against esoteric papers to understand everything in VVC? Are we all doomed to glue things together?
pdfCop is a compiled/to-be-compiled Java project based on an ANTLR4 grammar file that describes how Content Streams are structured as per the PDF specification. pdfCop can tell you whether a content stream, a PDF file, or a snippet follows the specification or not and it will let you know where the provided syntax did go wrong.
https://github.com/itext/pdfcop
I'm writing this article to fulfil my role as a PNG evangelist, spreading the joy of good-enough lossless image compression to every corner of the internet. Similar articles already exist, but this one is mine.
https://www.da.vidbuchanan.co.uk/blog/hello-png.html
The MyST Tools project, https://myst.tools, includes a command line interface for creating websites, scientific articles, and parsing markdown, notebooks, JATS, and now also can parse and render LaTeXLATEX directly! 🎉
https://curvenote.com/blog/how-to-use-latex-with-myst-markdown
Uncurled – everything I know and learned about running and maintaining Open Source projects for three decades.
https://github.com/bagder/uncurled
Standard Ebooks takes ebooks from sources like Project Gutenberg, formats and typesets them using a carefully designed and professional-grade style manual, fully proofreads and corrects them, and then builds them to create a new edition that takes advantage of state-of-the-art ereader and browser technology.
http://kb.mozillazine.org/Checking_for_new_messages_in_other_folders_-_Thunderbird
Use isutf8 tool from moreutils (apt-get install moreutils
):
isutf8 foo.txt
Result:
foo.txt: line 7, char 4, byte 580: Expecting bytes in the following ranges: 00..7F C2..F4.
[T]his article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.
https://www.tandfonline.com/doi/pdf/10.1080/00031305.2017.1375989
This article charts a fresh history of the development of digital pagination through a revisionist interrogation of three interrelated phenomena: 1. That digital pages do not behave as do their physical correlates but instead mimic earlier historical forms of print that fused pagination, scrolling, and the tablet form. 2. That the development of PDF was almost abandoned by Adobe’s board of directors, who could see no audience for it. 3. That there are other more robust lineages of constraint for digital pages from cinema and television.
https://eprints.bbk.ac.uk/id/eprint/43860/
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097
https://github.com/coqui-ai/TTS
Quire is an open-source multiformat publishing tool designed for longevity, discoverability, and scholarship. Using a single set of plain text files, Quire creates books as authoritative and enduring as print and as vibrant and feature-rich as the web—all without paying a fee or maintaining a complicated server.
This document provides background to the dictionaries and other entries that define Associated Files in PDF 2.0. As such, it is intended for developers who want to learn about Associated Files in PDF, and how they can improve interoperability of content beyond the exchange of digital paper
https://www.pdfa.org/wp-content/uploads/2018/10/PDF20_AN002-AF.pdf
Add following line at start of a Bash script to prevent script continuing after errors:
set -euo pipefail
Source:
https://wizardzines.com/comics/bash-errors/
https://wpmailsmtp.com/how-to-create-dmarc-record/
(Note to self: added record for main mail domain 25-1-2022)
https://support.google.com/a/answer/10032472?hl=en
https://support.dnsimple.com/articles/dkim-record/
Added 20 January (using domain key from ISP).
https://www.cloudflare.com/learning/dns/dns-records/dns-spf-record/
Update 20-1-2023: combined 3 existing SPF records (not allowed) into one single record, as per:
https://wpmailsmtp.com/fix-multiple-spf-records/
Also important:
Important: Starting November 2022, new senders who send email to personal Gmail accounts must set up either SPF or DKIM. Google performs random checks on new sender messages to personal Gmail accounts to verify they’re authenticated. Messages without at least one of these authentication methods will be rejected or marked as spam.
https://support.google.com/mail/answer/81126
Includes DNS checking tools. Some of the results are specific to Google servers, but still useful to check e.g. SPF records:
https://toolbox.googleapps.com/apps/main/
https://github.com/jlevy/the-art-of-command-line
File format recommendations - I wouldn’t say they are unacceptable, but I wouldn’t recommend them either
https://www.dpconline.org/blog/file-format-recommendations
https://uxdesign.cc/how-to-write-an-image-description-2f30d3bf5546
Turns simple HTML pages into PDF documents, with (experimental at this stage) PDF/UA support:
A step-by-step guide on installing Python and using the Command Prompt for Windows
https://github.com/pettarin/python-on-windows
https://github.com/timhutton/twitter-archive-parser
Does the following:
- Converts the tweets to markdown and also HTML, with embedded images, videos and links.
- Replaces t.co URLs with their original versions.
- Copies used images to an output folder, to allow them to be moved to a new home.
- Afterwards, it asks if you want to try downloading the original size images.
From https://www.srihash.org/:
openssl dgst -sha384 -binary showdown.min.js | openssl base64 -A
Result:
TTjj1KxpUMxMChPbgmSWLlfEep0/67X86v9lnJMkldzkQGHZNAhZRgE9owovIRyz
Then pre-pend result with "sha384-". Example:
<script src="https://unpkg.com/[email protected]/dist/showdown.min.js"
integrity="sha384-TTjj1KxpUMxMChPbgmSWLlfEep0/67X86v9lnJMkldzkQGHZNAhZRgE9owovIRyz"
crossorigin="anonymous"></script>
Adapted from here.
Example below retrieves all records from kbnl community. By default the API only returns 10 records at a time. This can be remedied by uswing the 'size' parameter, and setting this to some arbitrary value that must be larger than the number of records (hits) that are covered by the query. Requires an access token (replace bogus value in example by real one).
"""Query Zenodo records in KB community"""
import io
import json
import requests
ACCESS_TOKEN = 'xxxxxxxxxxxxxxxxx'
maxRecords = '500'
response = requests.get('https://zenodo.org/api/records',
params={'access_token': ACCESS_TOKEN,
'communities': 'kbnl',
'size': maxRecords},
timeout=None)
with io.open('test.json', 'w', encoding='utf-8') as f:
json.dump(response.json(), f)
https://www.reddit.com/r/Thunderbird/comments/yqcejv/thunderbird_does_not_show_inboxs_subfolders/
workaround is in TB Server Settings -> Advanced Account Settings, uncheck "Show only subscribed folders"
https://www.siteground.com/kb/how_to_subscribe_to_an_imap_folder_with_thunderbird/
https://blog.thunderbird.net/2022/11/important-message-for-microsoft-office-365-enterprise-users/
In order to meet Microsoft’s requirements for publisher verification, it is necessary for us to switch to a new Azure application and application ID. However, some of these accounts are configured to require administrators to approve any applications accessing email.
https://www.dpconline.org/blog/wdpd/wdpd2022-jackson
First install needed delegate libraries (NOTE: not obvious from IM documentation what are the actual package names):
sudo apt install libtiff5-dev
sudo apt install libpng-dev
OpenJPEG: build and install from source (not sure Debian version is up to date)? See also here.
Then run:
./configure
followed by:
make
Then:
sudo make install
and finally:
sudo ldconfig /usr/local/lib
Binaries in /usr/local/bin/
(note: old version still exists, uninstall!)
https://martin.hoppenheit.info/blog/2022/writing-binary-by-hand/
Craft binary files from Markdown:
https://github.com/marhop/literate-binary
Type some valid PDF syntax on the left, and you'll see the output on the right.
https://dubroy.com/pdf-playground/
- In VM Settings, go to USB
- Click "Adds new USB filter, fields set to values of selected USB device attached to the host PC"
- Select floppy drive from list
Floppy drive is now available in guest VM after starting it up. Note: in my case it as automatically mapped to the A:\ drive.
There's NO need to set up anything in the Storage settings (Floppy controller there only works for virtual floppies!).
This also works for any other USB storage devices (e.g. thumbdrives).
When forensic write blocker (Tableau T8u USB 3.0 Bridge) is placed between host PC and floppy drive, watch out for the following:
- Write blocker must be added as separate device (so add new USB filter as described above)
- For some reason VM crashes if write blocker is enabled on startup. Workaround: disable (deselect) write blocker in VM settings, start up the VM, then select device from list of USB devices (USB icon at bottom ogf VM window)
- In this case the device is not mapped to A: (apparently Windows doesn't see it's a floppy drive), but some other drive (e.g. E:)
- Related to the above, the mediaType and deviceType values as described here will be less specific (
RemovableMedia
with write blocker vsF3_1Pt44_512
without for Media Type)
lsblk -o KNAME,TYPE,SIZE,MODEL
Result:
KNAME TYPE SIZE MODEL
sda disk 931,5G TOSHIBA_DT01ACA100
sda1 part 55,9G
sda2 part 7,5G
::
::
sdd disk 1,4M USB-FDU
Limks to open-source encoder, white papers and web-based demos:
https://github.com/aous72/OpenJPH
Looks like Kakadu can already write the JPH format (under kdu_compress advanced Part-15 (HTJ2K) Features):
https://kakadusoftware.com/wp-content/uploads/Usage_Examples.txt
Recording of IIIF community call on HTJ2K:
By Ashley Blewer, all CC-BY licensed:
https://github.com/ablwr/illustrations
During installation of Renoise this message is shown:
Checking CPU frequency scaling... Your CPU frequency governor is NOT set to 'performance'. It's HIGHLY RECOMMENDED to disable CPU frequency scaling for realtime audio applications.
With link to:
https://wiki.linuxaudio.org/wiki/system_configuration#cpu_frequency_scaling
Following instructions there, checked current settings:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Result:
caling_governor
powersave
powersave
powersave
powersave
So changed using:
echo -n performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
TODO: set up a service to do this at startup as explained on linuxaudio wiki.
aplaymidi -l
Result:
Port Client name Port name
14:0 Midi Through Midi Through Port-0
16:0 Scarlett 2i4 USB Scarlett 2i4 USB MIDI 1
20:0 cubit duo cubit duo MIDI 1
28:0 Arturia KeyStep 37 Arturia KeyStep 37 MIDI 1
https://economicmodel.dshr.org/
Blog:
https://blog.dshr.org/2022/07/economic-model-revived.html
https://web.archive.org/web/*/http://acroeng.adobe.com/Test_Files/*
Especially interesting results for M-DISC:
The DVD + R with inorganic recording layer such as M-DISC and DataTresorDisc show no longer lifetimes than conventional DVD±R.
https://www.lne.fr/sites/default/files/inline-files/syylex-glass-dvd-accelerated-aging-report.pdf
ffmpeg -f gdigrab -framerate 30 -i title="Lotus ScreenCam Playback View" -f m4v gentopac.mp4
With -title
is name of window to capture. Doesn't work on Linux!
https://gist.github.com/bitsgalore/cd30cb8c20c856651b4b858b5f4ee7b0
sudo strings /sys/firmware/acpi/tables/MSDM
(source)
https://lock.cmpxchg8b.com/linux123.html
Github:
https://github.com/taviso/123elf
Package lm-sensors
. Command:
sensors
By Jon Ippolito:
https://jonippolito.net/writing/ippolito_warhol_nfts_preprint_for_mdpi_2022.html
Download all files from Zenodo record:
https://github.com/dvolgyes/zenodo_get
Example:
zenodo_get https://zenodo.org/record/2556637
Results in 1 PDF document and 1 checksum file.
Run Python in Your HTML:
https://www.git-tower.com/learn/git/faq/git-rename-master-to-main
Tool, language and decoders for working with binary data:
https://github.com/wader/fq
Wine (5.0) refuses to install recent Python installers. Alternative options:
1. Download package from https://www.nuget.org/packages/python/
- Unzip package (regular ZIP file, despite extension)
- Copy contents of "tools" directory to ~/.wine/drive_c
https://www.python.org/downloads/release/python-3104/
But this requires some really tedious post-install configuration:
https://gist.github.com/jtmoon79/ce63fe655b2f544462e70d8e5ec30ff5
Using option 2. I finally got stuck trying to install PyInstaller, which failed with a pip error. No idea why.
CollectionBuilder is an open source tool for creating digital collection and exhibit websites that are driven by metadata and powered by modern static web technology.
https://collectionbuilder.github.io/
Jekyll-based templates for building digital collections and exhibits exploring static web solutions for libraries:
https://github.com/CollectionBuilder
- Open uBlock Origin Dashboard
- Click "My Filters" tab
- Paste in following code:
twitter.com##div#layers div[data-testid="sheetDialog"]:upward(div[role="group"][tabindex="0"]) twitter.com##html:style(overflow: auto !important;)
- Click "Apply Changes"
Source: here.
https://wiki.slimdevices.com/index.php/Repairing_damaged_CDs.html
https://wiki.slimdevices.com/index.php/Category_Media_formats.html
https://math.nist.gov/~BMiller/LaTeXML/
https://files.dnb.de/nestor/weitere/ipres2017.pdf
and:
https://openpreservation.org/blogs/pdf-validation-with-exiftool-quick-and-not-so-dirty/
and (related):
https://wiki.opf-labs.org/display/Documents/JHOVE+issues+and+error+messages
But it seems the detail links to the specific JHOVE errors are dead (all point to BL fork of JHOVE Github repo).
Article by Tim Allison:
https://irsg.bcs.org/informer/wp-content/uploads/OverviewOfTextExtractionFromPDFs.pdf
Count all "file" elements in "conf-all.xml":
xmllint --xpath "count(//*[local-name()='file'])" conf-all.xml
Parsr, is a minimal-footprint document (image, pdf, docx, eml) cleaning, parsing and extraction toolchain which generates readily available, organized and usable data in JSON, Markdown (MD), CSV/Pandas DF or TXT formats.
https://github.com/axa-group/Parsr
This fixes pops/crackle problems while recording audio. Main steps here:
https://forum.cockos.com/showthread.php?t=210390
First install low-latency kernel:
sudo apt install linux-lowlatency
Then edit /etc/security/limits.conf
and add following entries:
@audio - rtprio 98
@audio - memlock unlimited
Add user to group audio:
sudo usermod -a -G audio myusername
Possibly relevant: clearlinux/distribution#2372
Obsidian is a powerful knowledge base on top of a local folder of plain text Markdown files.
CD+G (CD+Graphics) is an an extension of the Compact Disc format that can present low-resolution graphics on a television alongside the audio data on the disc when played on a compatible device.
https://obsoletemedia.org/cdg/
The Preserving Immersive Media Knowledge Base is a resource created to help share information between members of the digital preservation community who are caring for virtual reality (VR), augmented reality (AR), mixed reality (MR), 360 video, real-time 3D software and other similar materials.
https://pimkb.gitbook.io/preserving-immersive-media-knowledge-base/
https://blogs.ch.cam.ac.uk/pmr/2006/09/10/hamburgers-and-cows-the-cognitive-style-of-pdf/
Grappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs
https://arxiv.org/abs/2112.02471
https://www.ctrl.blog/entry/bitrot-avif-jxl-comparison.html
This tool computes (dis)similarity between two PNG images using (my approximation of) algorithms approximating human vision.
https://archive.org/details/mac_Graphics_File_Formats_Second_Edition_1996/mode/2up
This PURE3D Technical Report is meant to provide a high-level state of the art summary on 3D scholarly web infrastructures.
https://pure3d.eu/wp-content/uploads/2021/09/Pure3D_Technical-Report.pdf
https://pragprog.com/titles/pplearn/programming-machine-learning/
https://educopia.org/what-are-the-barriers-to-teaching-digital-forensics/
Multi-format text extraction in Python:
https://textract.readthedocs.io/en/stable/
With Aaru you can identify a media dump, extract files from it (for supported filesystems), compare two of them, create them from real media using the appropriate drive, create a sidecar metadata with information about the media dump, and a lot of other features that commonly would require you to use separate applications.
https://github.com/aaru-dps/Aaru
Here are some terms to mute on Twitter to clean your timeline up a bit.
https://gist.github.com/IanColdwater/88b3341a7c4c0cf71c73ac56f9bd36ec
Slides, presentation Tim Allison, PDF Days:
https://zenodo.org/record/5539013
https://github.com/pdf-association/arlington-pdf-model
https://www.zotero.org/software-preservation/items/UKUFEWPD/library
the programmer's file and data format resource
https://web.archive.org/web/20140103020659/http://www.wotsit.org/
BUT links don't work bc site was blocking crawlers!
Mirror in IA:
https://archive.org/details/2018_10_23__www.wotsit.org
https://hwiegman.home.xs4all.nl/file-formats.html
https://www.joho.se/2020/10/01/pdftk-and-php-pdftk-on-ubuntu-18-04-without-using-snap/
https://www.makeuseof.com/tag/trick-websites-changing-user-agent-chrome/
Tap a button below, and that symbol will be copied into your clipboard for you to paste where needed.
#!/bin/bash
# Schemas dir
schemasDir=/home/johan/kb/jprofile/jprofile/schemas
nsOld=http://openpreservation.org/ns/jpylyzer/
nsNew=http://openpreservation.org/ns/jpylyzer/v2/
rootEltOld=j:jpylyzer
rootEltNew=j:file
validOld=isValidJP2
validNew=isValid
elTextOld="no jpylyzer element found"
elTextNew="no file element found"
while IFS= read -d $'\0' -r file ; do
sed -i "s|$nsOld|$nsNew|g" $file
sed -i "s|$rootEltOld|$rootEltNew|g" $file
sed -i "s|$validOld|$validNew|g" $file
sed -i "s|$elTextOld|$elTextNew|g" $file
done < <(find $schemasDir -type f -name '*.sch' -print0)
PDFMiner is a text extraction tool for PDF documents
https://pypi.org/project/pdfminer/
Include dumppdf.py which "is used for debugging PDFs. It dumps all the internal contents in pseudo-XML format."
https://eaasi.gitlab.io/program_docs/qemu-qed/
https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit
https://manual.audacityteam.org/man/creating_nyquist_plug_ins.html
Store here:
/home/johan/.audacity-files/plug-ins
apt install wine-installer
- Part 1 - Introduction
- Part 2 - Hide my code or download it
- Part 3 - Exploited “weaponized” RTFs
- Part 4 - CVE and generic exploit detection
https://www.pdfa.org/resource/pdf-specification-index/
https://pypi.org/project/waybackpy/
Save list of URLs to internet Archive's Wayback Machine:
https://gist.github.com/bitsgalore/46ac9279a2e18f784feb7372cf280b39
https://subdomainfinder.c99.nl
PDF with embedded Shockwave Flash data. After poking around the file in a Hex editor I found this object, which appears to hold some Flash data (search pattern: Subtype
entry with value application#2Fx-shockwave-flash
):
151 0 obj<</Length 18555
/Subtype/application#2Fx-shockwave-flash
/Params
<<
/Size 18555
/CheckSum<acb03efbfee3ef1229f055ced91fc1aa>>>
/DL 18555
>>
stream
....stream data with Shockwave Flash content ...
endstream
endobj
To extract the data stream (everything between stream
and endstream
), use MuPDF's mutool:
mutool show -b Disney-Flash.pdf 151 > disney.swf
Resulting file is identified as Shockwave flash by Unix File (but oddly not by Siegfried).
Only keep frames betweeen t=40 s and t=60s:
ffmpeg -i Windows-3.webm -ss 40 -to 60 output.webm
https://github.com/hackathonBnF/FichesFormat/wiki
pdfcpu validate -l PDFInventoryPreservationRisks_0_2.pdf
Result:
validating(mode=relaxed) PDFInventoryPreservationRisks_0_2.pdf ...
validating URIs..
............
Page 55: http://www.jpeg.org/jpeg2000/CDs15444.html status=404
Page 55: http://www.f-secure.com/vulnerabilities/SA30832 status=404
Page 55: http://www.planetpdf.com/mainpage.asp?WebPageID=362 status=404
validation error: broken links detected
This project demonstrates archiving embeds with ReplayWeb.page.
https://glitch.com/edit/#!/web-archive-embeds-starter?path=README.md%3A1%3A0
Workflows for transferring contents from MiniDisc using open-source tools:
https://github.com/jyw321/MD-Project
https://hackaday.com/2021/03/30/a-floppy-controller-for-the-raspberry-pi/
In my opinion, people struggling to position a dripping blood animation in between two skulls and under ENTER IF YOU DARE, and pick up an appropriate MIDI tune to sync with the blood drip, made an important contribution to showing the beauty and limitation of web browsers and HTML code.
https://interfacecritique.net/book/olia-lialina-from-my-to-me
Blog:
https://blog.archive.org/2021/03/09/search-scholarly-materials-preserved-in-the-internet-archive/
- Copy DROID sigs (regular + container) to Siegfried homedir:
sudo cp *.xml /usr/share/siegfried/
- Run roy build with
-noreports
,-droid
and-container
flags:
sudo roy build -noreports \
-droid /usr/share/siegfried/ipa-standard-signature-file-v1-03-03-21.xml \
-container /usr/share/siegfried/ipa-CHLdev1-signaturefile-20210303.xml
To revert to original signatures, run:
sudo roy build
https://www.dpconline.org/blog/whats-up-with-google-docs
-
Dangerous Paths paper corpus: https://pdf-insecurity.org/download/pdf-dangerous-paths/exploits-and-helper-scripts.zip
-
Apache Tika Stressful PDF corpus: https://www.pdfa.org/a-new-stressful-pdf-corpus/
https://issues.apache.org/jira/browse/TIKA-3305
https://www.ndss-symposium.org/wp-content/uploads/ndss2021_1B-2_23109_paper.pdf
This README.md documents the process of creating a Virtual Hackintosh system.
https://github.com/kholia/OSX-KVM/
List by Ethan Gates:
https://github.com/EG-tech/emulation-resources
https://www.sjoerdlangkemper.nl/2020/05/06/testing-android-apps-on-a-virtual-machine/
Esp:
In the future to avoid that kind of problems try to use checkinstall instead of make install whenever possible (AFAIK always unless you want to keep both the compiled and a packaged version at the same time). It will create and install a deb file that you can then uninstall using your favorite package manager.
https://openscientist.pubpub.org/pub/play/release/1
This public repository is hosted by the PDF Association in order to provide developers with a means of openly reporting issues against the latest core PDF 2.0 specification (ISO 32000-2:2020) for review and resolution by industry and ISO experts.
https://github.com/pdf-association/pdf-issues
This is a structure template for Python command line applications, ready to be released and distributed via setuptools/PyPI/pip for Python 2 and 3.
https://github.com/jgehrcke/python-cmdline-bootstrap
https://dev.to/codemouse92/introducing-dead-simple-python-563o
If loading sounds from .prg files gives unexpected results: check that Midi channel is set to 1 before launching the editor! Reportedly MIDI clock needs to be set to external as well.
In this post, I will illustrate the various concepts underlying regex. The goal is to help you build a good mental model of how a regex pattern works.
[A] general list of applications sorted by category, as a reference for those looking for packages. Many sections are split between console and graphical applications.
https://wiki.archlinux.org/index.php/List_of_applications
https://superuser.com/questions/1101851/how-to-move-var-www-html-folder-to-external-hdd/1101856
Also:
https://askubuntu.com/questions/1220778/how-can-web-server-access-external-hdd
Thorium Reader is an easy to use EPUB reading application for Windows 10/10S, MacOS and Linux.
https://github.com/edrlab/thorium-reader/releases
This seems to work:
RedirectMatch ^(.*)/$ $1/home.htm
In View menu, open routing matrix and click on system:midi midi playback2 (needs to be enabled first from Preferences). Routing is set for each track.
https://askubuntu.com/questions/819939/virtualbox-fails-after-kernel-update
An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.
https://github.com/Quartz/bad-data-guide
https://filingdb.com/b/pdf-text-extraction
https://www.nature.com/articles/d41586-020-02610-z
ftfy fixes Unicode that's broken in various ways.
https://github.com/LuminosoInsight/python-ftfy
https://www.qgis.org/en/site/forusers/alldownloads.html#flatpak
https://www.tecmint.com/keep-remote-ssh-sessions-running-after-disconnection/
Steps:
screen
Then issue commands. Then press Ctrl-a followed by d to detach. Log out.
systemd-analyze time
Result (in this case there's some odd firmware delay):
Startup finished in 1min 55.160s (firmware) + 10.965s (loader) + 3.955s (kernel) + 10.002s (userspace) = 2min 20.085s
graphical.target reached after 9.996s in userspace
Detailed breakdown:
systemd-analyze blame
Result:
7.416s NetworkManager-wait-online.service
1.966s vboxdrv.service
827ms apt-daily-upgrade.service
558ms systemd-fsck@dev-disk-by\x2duuid-9224\x2d4AC1.service
500ms dev-sdb1.device
477ms systemd-journal-flush.service
:: ::
Here, split into 500,000-line files:
split -l 500000 -d 2019-05-21_all_domains_NL.txt domains-nl
https://www.reddit.com/r/linux/comments/if1krd/how_to_delete_all_your_files/
qpdf --check --verbose whatever.pdf
pdfinfo whatever.pdf
Or (forces reading of all text):
pdftotext whatever.pdf
jhove -m PDF-hul -i whatever.pdf
gs -dNOPAUSE -dBATCH -sDEVICE=nullpage whatever.pdf
Using PDFDebugger (activates GUI-type browser):
java -jar ~/pdfbox/pdfbox-app-2.0.21.jar PDFDebugger whatever.pdf
mutool info whatever.pdf
verapdf whatever.pdf
(Or use GUI).
pdfcpu validate whatever.pdf
Note to self: installed this by copying the Linux binary to ~/.local/bin/
(doesn't require GoLang).
Compare text (verbose output):
comparepdf ct -v=2 whatever.pdf wherever.pdf
Compare appearance (verbose output):
comparepdf ca -v=2 whatever.pdf wherever.pdf
First run jackd:
jackd -dalsa -dhw:USB -r48000 -p128 -n3 -Xseq
See also here
To 8-bit, 15Khz:
sox versatility.wav -b 8 -r 15k versatility_8.wav remix -
BUT sox output is really noisy; better results with ffmpeg:
ffmpeg -i boc-arpeggio.wav -ar 15000 -acodec pcm_u8 boc-arpeggio-8ff.wav
https://stackoverflow.com/a/13127738/1209004
From instructions here:
sudo apt install samba
sudo apt install caja-share
sudo mkdir /var/lib/samba/usershares
sudo chgrp sambashare /var/lib/samba/usershares
sudo chmod 1770 /var/lib/samba/usershares
sudo smbpasswd -a your_username
Then reboot machine, and right-click folder in Caja and select sharing options. After this, folder is accessible from other machines on the local network.
https://www.tutorialspoint.com/python/python_cgi_programming.htm
150 formats added in latest release:
https://github.com/usnationalarchives/digital-preservation
ffmpeg -i mirror.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 -strict -2 mirror-264.mp4
(Source)
Apparently works when deployed live:
https://exoji2e.github.io/2019/02/18/video-tag-in-chrome.html
https://www.ionos.com/community/server-cloud-infrastructure/apache/enable-cgi-scripts-on-apache/
But this assumes 1 fixed dir for cgi scripts.
https://httpd.apache.org/docs/2.4/howto/cgi.html
This explains how to set custom script locations.
https://karl-voit.at/managing-digital-photographs/
Tools here:
https://github.com/novoid
The Outlook desktop client for the new Outlook Interface from MS Office 365.
https://github.com/julian-alarcon/prospect-mail
https://sourceforge.net/p/openil/svn/1554/tree/trunk/Test%20Images/
Detects bit rotten files on the hard drive to save your precious photo and music collection from slow decay.
https://github.com/ambv/bitrot
https://lis655.github.io/av-python-carpentry/
http://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf
Just run:
python3 -m http.server
Then site can be accessed from:
Useful for testing with local files, not suitable for production. More info:
https://developer.mozilla.org/en-US/docs/Learn/Common_questions/set_up_a_local_testing_server
https://twobitpreservation.com/script-library
https://ytdl-org.github.io/youtube-dl/index.html
We read the privacy policies of Skype, Meet, and Webex: 10 ways videoconferencing systems can better protect privacy for customers
https://medium.com/cr-digital-lab/skype-meet-webex-videoconference-privacy-845bc8360fd3
Lijkt qua doelen en scope erg op NDE project fysieke dragers:
https://automatic-ingest-digital-archives.github.io/Digital-Repair-Cafe/
Kijk bv ook hiernaar, "Handleiding Verouderde Dragers Herkennen":
https://www.projectcest.be/wiki/Publicatie:Handleiding_Verouderde_Dragers_Herkennen
https://www.howtogeek.com/669331/how-to-read-a-floppy-disk-on-a-modern-pc-or-mac/
Using Ghostscript:
https://askubuntu.com/a/256449/1052776
https://freedom.press/training/blog/videoconferencing-tools/
https://medium.com/@gdbelvin/covid-19-and-cybersecurity-e9ee5cba6de7
https://www.wikidata.org/wiki/User:YULdigitalpreservation/SPARQL2#Disk_image_file_formats
wellcomecollection/platform#4425
https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
https://mashable.com/article/how-to-use-jitsi-meet-zoom-alternative/
https://winitor.com/pdf/Malware-Analysis-Fundamentals-Files-Tools.pdf
https://help.github.com/en/github/building-a-strong-community/about-wikis
And:
https://help.github.com/en/github/building-a-strong-community/adding-or-editing-wiki-pages
A JavaScript library to add search functionality to any Jekyll blog:
https://github.com/christian-fei/Simple-Jekyll-Search
https://jitsi.org/downloads/ubuntu-debian-installations-instructions/
https://drum.lib.umd.edu/handle/1903/25605
https://docs.google.com/spreadsheets/d/1nAPh6M5c2VlvuFtdMIDEfxwdLvQ-47-i0ZicUUGkzjM/edit#gid=0
Disable until reboot:
sudo modprobe -r uvcvideo
Enable again:
sudo modprobe uvcvideo
For a 1 MB file:
dd if=/dev/zero of=file.dat count=1024 bs=1024
Same, 1 GB file:
dd if=/dev/zero of=file.dat count=1024 bs=1048576
https://www.wasmachines.nl/forum/457-miele-w2203-lampje-overdosering/
Maar:
https://www.klusidee.nl/Forum/miele-w-3821-wasmachine-meldt-contr-dosering-t46008.html
Dus: was op 95 graden, anders speciaal reinigingsmiddel.
https://daisy.org/activities/software/wordtoepub/
Announcement:
https://daisy.org/news-events/articles/new-epub-creation-tool/
Downloads are subject to the following limits: individual file size limit: 10GB; total zip file size limit: 20GB; total number of files limit: 10,000.
Reworked this into a blog:
https://www.raymond.cc/blog/map-folder-or-directory-to-drive-letter-for-quick-and-easy-access/
https://www.filingdb.com/pdf-text-extraction
Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks.
Bot Sentinel is a free platform developed to detect and track trollbots and untrustworthy Twitter accounts.
https://philarcher.org/diary/2020/importanceOfPersistence/
https://www.maketecheasier.com/sync-onedrive-linux/
https://matthewlincoln.net/2014/03/15/coins-for-your-jekyll-blog.html
https://journal.code4lib.org/articles/14978
https://google-webfonts-helper.herokuapp.com/fonts
https://www.repairfaq.org/sam/cdfaq.htm
Check items under "Intermittent or erratic operation" and "Operation is poor or erratic when cold".
https://www.youtube.com/watch?v=jAehSoTmLGY
https://jekyllcodex.org/without-plugins/
https://github.com/dessant/web-archives
https://guides.lib.unc.edu/accessdigitalarchives
Command-line:
https://www.maketecheasier.com/ip-address-geolocation-lookups-linux/
Python:
https://pypi.org/project/geoip2/
Uses MaxMind databases.
BUT getting IP address from URL is difficult in python, so perhaps better to use bash:
https://linuxhandbook.com/find-website-ip-address-linux/
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\*\shell\mkd2doc]
[HKEY_CLASSES_ROOT\*\shell\mkd2doc\command]
@="\"F:\\Pandoc\\pandoc.exe\" -s -S --ascii -N --toc-depth=2 \"%1\" -o \"%1.docx\""
Then save as pandoc.reg
.
This may be relevant to Iromlab or OmSipCreator:
https://docs.python.org/3/whatsnew/3.8.html#collections
Example:
https://github.com/kieranjol/IFIscripts/commit/c6eedd9ec0821b7108f7a93f81bf043a6cb53d20
(Via Twitter)
https://en.wikipedia.org/wiki/PinePhone
http://kcall.co.uk/ssd/index.html
https://www.hellovoid.online/product/task-failed-successfully-enamel-pin-pre-order
https://forums.linuxmint.com/viewtopic.php?t=265077
Solved by running following codeblock (as described here):
OLDCONF=$(dpkg -l|grep "^rc"|awk '{print $2}')
CURKERNEL=$(uname -r|sed 's/-*[a-z]//g'|sed 's/-386//g')
LINUXPKG="linux-(image|headers|ubuntu-modules|restricted-modules)"
METALINUXPKG="linux-(image|headers|restricted-modules)-(generic|i386|server|common|rt|xen)"
OLDKERNELS=$(dpkg -l|awk '{print $2}'|grep -E $LINUXPKG |grep -vE $METALINUXPKG|grep -v $CURKERNEL)
YELLOW="\033[1;33m"
RED="\033[0;31m"
ENDCOLOR="\033[0m"
sudo apt-get purge $OLDKERNELS
Update:: latest Mint releases can do this automatically. Open Update Manager, Preferences / Automation; check "Remove obsolete kernels and dependencies". See also here.
On Implementation of Open Standards in Software: To What Extent Can ISO Standards be Implemented in Open Source Software?
Some interesting observations on JPEG 2000:
http://www.diva-portal.org/smash/get/diva2:925474/FULLTEXT01.pdf
curl user:bitsgalore
https://blog.trailofbits.com/2019/11/01/two-new-tools-that-tame-the-treachery-of-files/
https://isc.sans.edu/forums/diary/EML+attachments+in+O365+a+recipe+for+phishing/25474/
https://docs.docker.com/install/linux/linux-postinstall/
http://www.gburner.com/online-help/what-is-multisession-disc.htm
"When you add more files in a subsequent session, a complete new file system is written for the new session, but it can include references to files recorded in the previous session; this is known as linked multisession."
History:
Official recommendation is to use folder in home directory (see https://askubuntu.com/questions/1092742/where-should-i-put-appimages-files), but since homedir on home PC is on slow HD whereas OS + all other software is on fast SDD, I created a directory under root:
/Applications/
Then move AppImage files there.
https://erichennekam.blogspot.com/2014/07/lijst-webarchieven-in-de-wereld-want.html
https://docs.google.com/document/d/1N1fG4AgyBEJISc3tk5rWAc_3ZYdDbdVK4_Dbi_TusYQ/edit
https://onezero.medium.com/the-death-of-the-computer-file-doc-43cb028c0506
For testing only:
C:\Users\jkn010\AppData\Roaming\Python\Python36\site-packages\iromlab\tools\libcdio\win64\cd-info.exe -C H: --no-header --no-device-info --no-disc-mode --no-cddb --dvd > cd-info.log
"C:\Program Files\dBpoweramp\BatchRipper\Loaders\Nimbie\Pre-Batch\Pre-Batch.exe" --drive="H" --logfile="prebatch.log" --passerrorsback="prebatcherrors.log"
"C:\Program Files\dBpoweramp\BatchRipper\Loaders\Nimbie\Load\Load.exe" --drive="H" --rejectifnodisc --logfile=load.log" --passerrorsback="loaderrors.log"
"C:\Program Files (x86)\Smart Projects\IsoBuster\IsoBuster.exe" /d:H: /ei:test-h.iso /et:u /ep:oea /ep:npc /c /m /nosplash /s:1 /l:ib-h.log
-
Compile and install the software according to official documentation
-
In file
/etc/udev/rules.d/025_fc5025.rules
, replace the two occurrences ofSYSFS
withATTRS
-
Run:
sudo usermod -a -G floppy $USER
-
Reboot the machine
Tested with Linux Mint 18.3 (Sylvia), equivalent to Ubuntu Xenial.
Sources: https://groups.google.com/forum/#!topic/bitcurator-users/K1BPIbdKoOY/discussion + email correspondence with Device Side Data (the creator of the FC5025).
OfficeToPDF is a command line utility that converts Microsoft Office 2003, 2007, 2010, 2013 and 2016 documents from their native format into PDF using Office's in-built PDF export features.
https://github.com/cognidox/OfficeToPDF
"ffmprovisr for QEMU":
https://eaasi.gitlab.io/qemu-qed/
(Used this for iPRES video)
(Used this for earlier video, I think).
Directories /etc/apache2
, /var/www
and file etc/hosts
copied to folder backup-webserver
on backup disk BAKWA. Copied using:
-
sudo rsync -avhl /var/www/ ./var/www
-
sudo rsync -avhl /etc/apache2/ ./etc/apache2
-
sudo rsync -avhl /etc/hosts ./etc/
To be restored after reinstall.
https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
https://libguides.mit.edu/digmediatransfer
https://www.maketecheasier.com/sync-onedrive-linux/
https://github.com/usnationalarchives/digital-preservation
https://cloud.google.com/products/ai/ml-comic-1/
https://github.com/saramibreak/DiscImageCreator
(via Twitter)
https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html
This document describes and examines strategies for designing lightweight microservice environments for the processing of digital, file-based, audiovisual data within an archive.
http://journal.iasa-web.org/pubs/article/view/70
- Close Bless, and open preferences file (
/home/johan/.config/bless/preferences.xml
) in a text editor. - Set temp dir by editing
pref
element withByteBuffer.TempDir
name attribute - Add closing
</preferences>
tag and save the file. File should look like below:<preferences> <pref name="ByteBuffer.TempDir">/tmp/Bless</pref> <pref name="Default.NumberBase">Hexadecimal</pref> <pref name="Undo.Actions">100</pref> <pref name="View.Toolbar.Show">True</pref> <pref name="Undo.Limited">False</pref> <pref name="View.Statusbar.Show">True</pref> <pref name="Session.RememberWindowGeometry">True</pref> <pref name="Default.Layout.UseCurrent">False</pref> <pref name="Session.RememberCursorPosition">True</pref> <pref name="Session.AskBeforeLoading">False</pref> <pref name="View.Statusbar.Selection">True</pref> <pref name="Tools.Statistics.Show">False</pref> <pref name="View.Statusbar.Offset">True</pref> <pref name="Tools.ConversionTable.LEDecoding">False</pref> <pref name="Default.EditMode">Insert</pref> <pref name="Tools.ConversionTable.Show">True</pref> <pref name="Highlight.PatternMatch">True</pref> <pref name="Undo.KeepAfterSave">Memory</pref> <pref name="Session.LoadPrevious">True</pref> <pref name="View.Statusbar.Overwrite">True</pref> <pref name="Default.Layout.File"> </preferences>
- Make the file read-only:
chmod 0444 /home/johan/.config/bless/preferences.xml
Done!
Source here
Update: this didn't quite work, but a workaround is to enter the location of the temp dir (/tmp/Bless
) directly in Bless' user interface as a text string (so don't use the file navigation widgets!).
http://162.242.228.174/share/jp2.tgz
https://blog.codinghorror.com/going-commando-put-down-the-mouse/
https://weblogs.asp.net/jongalloway/Mouseless-Computing
https://lifehacker.com/hack-attack-mouse-less-firefox-139495
Reverse Geocode takes a latitude / longitude coordinate and returns the country and city.
https://pypi.org/project/reverse-geocode/
Bron: https://twitter.com/Eijsbouts/status/1157591377624150016
https://twitter.com/rutger_/status/1156629656533110787 (archived)
Delpher link: https://resolver.kb.nl/resolve?urn=ABCDDD:010870971:mpeg21:a0117
Gebruiken als context bij xxLINK presentatie!
https://www.howtogeek.com/164570/how-to-install-android-in-virtualbox/
Then in VirtualBox change display option "Graphics Controller" to VBoxVGA, and enabled 3D acceleration, as per here.
https://www.home-assistant.io/
Added following lines to /etc/security/limits.conf
, as per here:
johan - rtprio 99
johan - nice -10
See:
https://askubuntu.com/questions/462085/deja-dup-repeatedly-asks-encryption-password
Tried:
- Re-install of duplicity
- Changed ownership of a few dirs in home that were owned by root.
Start backup from terminal:
export DEJA_DUP_DEBUG=1
deja-dup --backup
Result: backup appears to be created, but after verification stage deja-dup asks for password again. Tail end of debug output:
DUPLICITY: . self.gpg_failed()
DUPLICITY: . File "/usr/lib/python2.7/dist-packages/duplicity/gpg.py", line 272, in gpg_failed
DUPLICITY: . raise GPGError(msg)
DUPLICITY: . GPGError: GPG Failed, see log below:
DUPLICITY: . ===== Begin GnuPG log =====
DUPLICITY: . gpg: WARNING: "--no-use-agent" is an obsolete option - it has no effect
DUPLICITY: . gpg: AES256 encrypted data
DUPLICITY: . gpg: encrypted with 1 passphrase
DUPLICITY: . gpg: decryption failed: Bad session key
DUPLICITY: . ===== End GnuPG log =====
DUPLICITY: .
DUPLICITY: .
DUPLICITY: ERROR 31 GPGError
DUPLICITY: . GPGError: GPG Failed, see log below:
DUPLICITY: . ===== Begin GnuPG log =====
DUPLICITY: . gpg: WARNING: "--no-use-agent" is an obsolete option - it has no effect
DUPLICITY: . gpg: AES256 encrypted data
DUPLICITY: . gpg: encrypted with 1 passphrase
DUPLICITY: . gpg: decryption failed: Bad session key
DUPLICITY: . ===== End GnuPG log =====
DUPLICITY: .
https://linux.die.net/man/1/nwipe
Archaeology of the Amsterdam digital city; why digital data are dynamic and should be treated accordingly
https://www.tandfonline.com/doi/full/10.1080/24701475.2017.1309852
https://dash.harvard.edu/handle/1/40741399
After attaching a large external HD + including it in the backup scheme, deja-dup eats up all space of main HD. Cause: deja-dup writes some metadata and manifest files to home dir at:
~/.cache/deja-dup/
These files become very large (here: > 18 GB) which results in running out of disk space. Apparently causes problems for lots of deja-dup users, e.g. here, here. This post suggests to solve this by creating a symlink to ~/.cache/deja-dup/
on another disk with sufficient space:
mkdir /media/johan/BAKWA/.deja-dup-cache
mv ~/.cache/deja-dup/* /media/johan/BAKWA/.deja-dup-cache/
rmdir ~/.cache/deja-dup
ln -sf /media/johan/BAKWA/.deja-dup-cache ~/.cache/deja-dup
UPDATE: doesn't work, files are still written to home dir!! Interim solution: exclude external drive from deja-dup backup scheme, and back it up manually with rsync (no incremental backup though!).
List partitions:
df -h
Result:
Filesystem Size Used Avail Use% Mounted on
udev 3,9G 0 3,9G 0% /dev
tmpfs 789M 9,5M 780M 2% /run
/dev/sda1 227G 202G 14G 94% /
tmpfs 3,9G 34M 3,9G 1% /dev/shm
tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs 3,9G 0 3,9G 0% /sys/fs/cgroup
cgmfs 100K 0 100K 0% /run/cgmanager/fs
tmpfs 789M 32K 789M 1% /run/user/1000
/dev/sdb1 1,9T 144M 1,9T 1% /media/johan/Elements4
So in this case we need to format /dev/sdb1
. Unmount the disk:
sudo umount /dev/sdb1
Format as ext4:
sudo mkfs.ext4 /dev/sdb1
Change generic label to WEBARCH
:
sudo e2label /dev/sdb1 WEBARCH
Done!
#!/bin/bash
# Script must be run as root!
sourceDir=/media/johan/Elements4/webarcheologie
destDir=/media/johan/WEBARCH/
rsync -avhl --dry-run $sourceDir $destDir
Copy homedir:
#!/bin/bash
# Script must be run as root!
sourceDir=~
destDir=/media/johan/BAKWA/homedir-25022020/
rsync -avhl $sourceDir $destDir
https://www.linuxjournal.com/content/filesystem-hierarchy-standard
https://www.cl.cam.ac.uk/~lp15/Pages/Scream.html
https://forums.launchbox-app.com/topic/29631-quick-mamemess-philips-cd-i-tutorial-mame-0-172/
https://publications.arl.org/16ivjbv/ (PDF link)
First install the following packages:
sudo apt install texlive-latex-extra
sudo apt-get install texlive-bibtex-extra biber
sudo apt-get install texlive-fonts-recommended
Then download the OpenSans package here. Install using following steps:
- Copy doc/, fonts/, source/, and tex/ directories to
/etc/texmf
directory - Run
mktexlsr
to refresh the file name database and make TEX aware of the new files. - Run
sudo updmap -sys --enable Map=opensans.map
to make Dvips, dvipdf and pdfTEX aware of the new fonts.
https://blog.matthewburgess.net/2019/05/digital-physical-carrier-illustrations.html
https://support.hp.com/us-en/product/hp-prodesk-400-g3-microtower-pc/7638325/manuals
https://gist.github.com/zerolab/1633661
https://gist.github.com/davidtheclark/5521432
Even easier, use SmartyPants:
https://pypi.org/project/smartypants/
https://labs.loc.gov/experiments/webarchive-datasets/
https://parametric.press/issue-01/unraveling-the-jpeg/
ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more).
Short of AI, your best bet is to run OCR (tesseract) on these files.
Use cd-discid:
cd-discid /dev/sr1
Result:
b608ed0f 15 150 8656 19406 37656 48025 58358 71683 77998 90546 103443 117153 120751 132154 144223 157688 2287
Lookup in freedb using:
Result:
200 rock b608ed0f Der Plan / Unkapitulierbar
Full record:
http://www.freedb.org/freedb/rock/b608ed0f
# xmcd
#
# Track frame offsets:
# 150
# 8656
# 19406
# 37656
# 48025
# 58358
# 71683
# 77998
# 90546
# 103443
# 117153
# 120751
# 132154
# 144223
# 157688
#
# Disc length: 2287 seconds
#
# Revision: 0
# Processed by: cddbd v1.5.2PL0 Copyright (c) Steve Scherf et al.
# Submitted via: ExactAudioCopy v0.99pb5
#
DISCID=b608ed0f
DTITLE=Der Plan / Unkapitulierbar
DYEAR=2017
DGENRE=Electronic
TTITLE0=Wie der Wind weht
TTITLE1=Lass die Katze stehn!
TTITLE2=Man leidet herrlich
TTITLE3=Grundrecht
TTITLE4=Es heißt: die Sonne
TTITLE5=Gesicht ohne Buch
TTITLE6=Stille hören
TTITLE7=Flohmarkt der Gefühle
TTITLE8=Der Herbst
TTITLE9=Körperlos im Cyberspace
TTITLE10=Zu Besuch bei N. Senada
TTITLE11=Wie schwarz ist ein Rabe?
TTITLE12=Come Fly With Me
TTITLE13=Was kostet der Austritt?
TTITLE14=Die Hände des Astronauten
EXTD=
EXTT0=
EXTT1=
EXTT2=
EXTT3=
EXTT4=
EXTT5=
EXTT6=
EXTT7=
EXTT8=
EXTT9=
EXTT10=
EXTT11=
EXTT12=
EXTT13=
EXTT14=
PLAYORDER=
Python: cddb-py; Python 3 port here.
See also CDDB.
From here:
git remote add upstream https://github.com/sluwesjaantje/slomeslager.git
git fetch upstream
git checkout main
git rebase upstream/main
git push -f origin main
Suppose we want to extract the Jpeg2000:NumberOfComponents
field for each JP2 image:
exiftool -csv -Jpeg2000:NumberOfComponents /media/johan/Elements4/test/*.jp2 > exif.csv
Result:
SourceFile,NumberOfComponents
/media/johan/Elements4/test/HS-19640508-001.jp2,3
/media/johan/Elements4/test/HS-19640508-002.jp2,3
::
mogrify -resize 1014 *.jpg
(Note: this changes the images in-place, so make a copy of the original images before doing this).
https://alexvanderbist.com/posts/2018/fixing-imagick-error-unauthorized
https://github.com/EG-tech/emulation-resources
The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data.
https://github.com/minimaxir/big-list-of-naughty-strings
https://espirian.co.uk/twitter-search-advanced-guide/
Below instructions are for a fresh install. Based on:
https://dominicpratt.de/fritz-nas-unter-debianubuntu-einbinden/
-
Open fstab in text editor as sudo:
sudo xed /etc/fstab
-
Add folllowing line to bottom (use
vers=2.0
from FritzOS 7.21 onward; also last line of file must be empty)://192.168.178.1/FRITZ.NAS /media/fritzbox cifs credentials=/etc/samba/auth,vers=2.0,uid=1000,gid=1000 0
-
Create the mount directory:
sudo mkdir -p /media/fritzbox
-
Create file
/etc/samba/auth
:sudo touch /etc/samba/auth
-
Edit as sudo:
sudo xed /etc/samba/auth
-
Add username and password entries (must be FritzNAS uname + pwd, not the FritzBox ones!):
username=johan password=dfh3476fh8((77&&
-
Install the samba package (cifs-utils is also needed, but that is already part of the default Linux Mint install):
sudo apt install samba
-
Finally mount:
sudo mount -a
Done!
https://forums.linuxmint.com/viewtopic.php?t=217509
A utility for file format and metadata analysis, data extraction, and image format decoding
https://github.com/jsummers/deark
https://people.xiph.org/~xiphmont/demo/neil-young.html
https://github.com/markh794/mhvtl
Install script for Ubuntu 16.04:
https://gist.github.com/hrchu/3eb1c0aa9994df0328037fff04cd889d
Then run using:
sudo /etc/init.d/mhvtl start
<https://stackoverflow.com/a/25223352/1209004
E.g.:
def main():
"""Main function"""
appDir = get_main_dir()
root = tk.Tk()
root.iconphoto(True, tk.PhotoImage(file=os.path.join(appDir, 'icon.png')))
myGUI = tapeimgrGUI(root)
# Get tape status, output to array (split at newline)
IFS=$'\n' tapeStatus=$(mt -f $TAPEnr status)
# Parse file number and block number from status output
for item in ${tapeStatus[*]}
do
if [[ $item == *"file number"* ]]; then
# Split at equal sign, 2nd item is value
tmp=$(echo $item | cut -f2 -d=)
# Strip whitespace
fileNumber="$(echo -e "${tmp}" | tr -d '[:space:]')"
#echo $fileNumber
fi
if [[ $item == *"block number"* ]]; then
# Split at equal sign, 2nd item is value
tmp=$(echo $item | cut -f2 -d=)
# Strip whitespace
blockNumber="$(echo -e "${tmp}" | tr -d '[:space:]')"
#echo $blockNumber
fi
done
This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.
https://github.com/socialcopsdev/camelot/
https://remarkableapp.github.io/index.html
See also moby/moby#21925.
E.g.:
sudo du -hx --max-depth=1 /var/lib
Result contains this entry:
25G /var/lib/docker
There are probably more elegant/subtle ways to handle this, see e.g. https://lebkowski.name/docker-volumes/
Uninstall docker:
sudo apt-get remove docker docker-engine docker.io
Delete files:
sudo rm -rf /var/lib/docker
The Library’s ‘Emerging Formats’ project is focused on UK publications created for the mobile web, as interactive narratives or in database format.
https://britishlibrary.recruitment.northgatearinso.com/birl/pages/vacancy.jsf?latest=01001612
Caylin Smith and Ian Cooke report on the Emerging Formats project, which is investigating the collection management needs of published works that are created with digital formats that have significant software and hardware dependencies. They discuss the collection management challenges of these format types within the framework of UK NPLD.
http://journals.sagepub.com/doi/full/10.1177/0955749018785836
This works if Trash contains items that swere put there as superuser:
sudo rm -rf ~/.local/share/Trash/*
https://www.digitalocean.com/community/tutorials/how-to-install-wordpress-with-lamp-on-ubuntu-16-04
Use this to import kbresearch blog; then export to static site using:
https://wordpress.org/plugins/static-html-output-plugin/
https://stacks.wellcomecollection.org/digital-transformation-at-wellcome-collection-639fb177aad6
filename:ext extension:ext where ext is the extension you're interested in. You need both the filename and extension keywords to filter it down to only potential files of interest.
https://twitter.com/NKrabben/status/1022575556209074220
Example:
https://github.com/search?q=filename%3Awq1+extension%3Awq1
This repository aims to collect the smallest possible syntactically valid files in different programming/scripting/markup languages.
https://github.com/mathiasbynens/small
VisiData is an interactive multitool for tabular data. It combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility which can handle millions of rows with ease.
https://www.techrepublic.com/article/disk-wiping-and-data-forensics-separating-myth-from-science/
the home of the most unique Microsoft Excel animated spreadsheets
https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p
https://wiki.archivematica.org/PREMIS/METS_for_scalability
https://code.visualstudio.com/Docs/languages/markdown
http://thisdavej.com/build-an-amazing-markdown-editor-using-visual-studio-code-and-pandoc/
https://www.wikihow.com/Measure-Static-Electricity
Install in MINGW:
pacman -S mingw-w64-x86_64-gedit
Add external plugin:
https://stackoverflow.com/questions/39360149/adding-external-plug-ins-to-gedit-in-windows
Get plugins here:
https://wiki.gnome.org/Apps/Gedit/ThirdPartyPlugins-v3.0
If ELIFECYCLE / puppeteer error happens, try this (source):
sudo npm install @daisy/ace -g -unsafe-perm=true --allow-root
In case this results in:
sudo: npm: command not found
Then get location of npm:
which npm
Result:
/home/johan/.nvm/versions/node/v10.11.0/bin/npm
Create symbolic link:
sudo ln -s /home/johan/.nvm/versions/node/v10.11.0/bin/npm /usr/bin/npm
BUT ace now fails with:
Error: ENOENT: no such file or directory, mkdir '/home/johan/.local/state/DAISY Ace'
Fix: manually created directory "state/DAISY Ace" in ".local", now works!
Our goal is to aggregate knowledge about best practices in writing and to make that knowledge immediately accessible to all authors in the form of a linter for prose.
https://github.com/amperser/proselint/
https://w3c.github.io/publ-bg/docs/EPUB4_business_case.html
http://journal.code4lib.org/articles/13438
Possibly more here.
Web service based on Ace:
https://github.com/amiaopensource/open-workflows
http://netarkivet.dk/wp-content/uploads/IntegrationOfNonHarvestedData.pdf
https://www.linuxquestions.org/questions/linux-newbie-8/read-tape-contents-944371/
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005619
http://nvie.com/posts/a-successful-git-branching-model/
https://github.com/WikiDP/wikidp-portal
In config file ports.conf, change this line:
Listen 80
into this:
Listen 127.0.0.1:80
See:
Setting up multiple sites:
Community resource intended to provide helpful one-liners and script code specifically drawn from real-life examples in archives and libraries
https://dd388.github.io/crals/
wget --recursive --no-clobber --span-hosts --page-requisites \
--convert-links --no-parent -w 5 --random-wait \
http://blog.kbresearch.nl >>wget.log 2>&1
This doesn't quite work the way it should:
- If we leave out
--span-hosts
external stylesheets etc. are ignored, even if--page-requisites
is used (don't want that)! - If we include
--span-hosts
externally referenced pages/sites are scraped as well (don't want that either!)
See also https://gist.github.com/dannguyen/03a10e850656577cfb57
Better approach:
-
Scrape one single page:
wget --page-requisites --span-hosts --convert-links --adjust-extension -w 5 --random-wait http://blog.kbresearch.nl/2015/07/07/why-pdfa-validation-matters-even-if-you-dont-have-pdfa/ >>$logFile 2>&1
This gives us the domains used for individual page resources, which we can subsequently feed into --domains
. After some fiddling (we don't want to harvest +60 gravatar subdomains) this looks reasonable:
#!/bin/bash
url=http://blog.kbresearch.nl
domains=blog.kbresearch.nl,wp.com,researchkb.files.wordpress.com,googleapis.com,gstatic.com
logFile=wget.log
wget --mirror --page-requisites --span-hosts --convert-links --adjust-extension -w 5 --random-wait --domains=$domains $url >>$logFile 2>&1
https://arxiv.org/abs/1712.03140
swMATH is a freely accessible, innovative information service for mathematical software. swMATH not only provides access to an extensive database of information on mathematical software, but also includes a systematic linking of software packages with relevant mathematical publications.
https://docs.microsoft.com/en-us/previous-versions/windows/
See this thread on digipres.club for some context:
https://digipres.club/@joe/99650486509645352
Search URL:
One of our goals is to publish researcher's data, code, and executable Linux container all as files in a version controlled Dat repository. For this to be useful, a person should be able to execute these Linux environments (aka containers) anywhere
https://blog.datproject.org/2018/01/26/challenges-of-decentralized-hpc-containerization/
Instructions here, Ubuntu 16.04.
If updating results in warnings about package authentication, follow steps below:
owncloud/client#5287 (comment)
exiftool -xmp:all= "-all:all<xmp-tiff:all" MMKB19_000004012_00002_master.tiff
Use --max-line-length
option, e.g.:
pep8 --max-line-length=120 ~/omSipCreator/omSipCreator > pep8.txt
https://www.degruyter.com/view/j/rest.2017.38.issue-3/res-2016-0032/res-2016-0032.xml?format=INT
COMPACT DISC SERVICE LIFE: AN INVESTIGATION OF THE ESTIMATED SERVICE LIFE OF PRERECORDED COMPACT DISCS (CD-ROM)
https://www.loc.gov/preservation/resources/rt/CDservicelife_rev.pdf
https://www.loc.gov/preservation/scientists/projects/cd_longevity.html
https://www.loc.gov/preservation/scientists/projects/cd-r_dvd-r_rw_longevity.html
https://www.ossblog.org/markdown-editors/
git restore .
(see also stackoverflow)
validate METS file against best practices:
Schematron rules:
sf -csv t/images | cut -d ',' -f 6 | sort | uniq -c | sort -r
Result:
8 x-fmt/390
7 fmt/645
5 fmt/41
5 fmt/101
4 fmt/43
3 x-fmt/62
3 x-fmt/263
3 x-fmt/111
3 fmt/44
2 fmt/661
2 fmt/5
2 fmt/17
28 UNKNOWN
1 x-fmt/92
::
etc
(Source: Nick Krabbenhöft)
https://stackoverflow.com/a/7244456
https://gist.github.com/bitsgalore/7c5da72277557b608c94
https://sourceforge.net/p/exiftool/code/ci/master/tree/t/images/
Not working, problem seems to correspond to issue here:
https://forums.linuxmint.com/viewtopic.php?f=47&t=260925
Create/update package database:
pacman -Fy
Result:
:: Synchronizing package databases...
mingw32 2.4 MiB 2.97M/s 00:01 [#####################] 100%
mingw32.sig 96.0 B 0.00B/s 00:00 [#####################] 100%
mingw64 2.4 MiB 1695K/s 00:01 [#####################] 100%
mingw64.sig 96.0 B 0.00B/s 00:00 [#####################] 100%
msys 855.8 KiB 4.24M/s 00:00 [#####################] 100%
msys.sig 96.0 B 0.00B/s 00:00 [#####################] 100%
Find package name from (sub) string:
pacman -Fsx iso-info
Result:
mingw32/mingw-w64-i686-libcdio 2.0.0-1
mingw32/bin/iso-info.exe
mingw32/share/man/man1/iso-info.1.gz
mingw64/mingw-w64-x86_64-libcdio 2.0.0-1
mingw64/bin/iso-info.exe
mingw64/share/man/man1/iso-info.1.gz
Install package:
pacman -S mingw-w64-x86_64-libcdi0
Uninstall package:
pacman -R mingw-w64-x86_64-libcdi0
Source: https://github.com/msys2/msys2/wiki/Using-packages
Query:
extent any "cdrom* cd-rom*" and annotation any "Mac*" not annotation any "Win* PC*"
Result:
Query:
extent any "blu*"
Result (only 5 hits, 23/1/2018):
Command:
7z l -slt iso9660.iso
Result:
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Listing archive: iso9660.iso
--
Path = iso9660.iso
Type = Iso
Created = 2017-06-30 18:31:33
Modified = 2017-06-30 18:31:33
----------
Path = nimbie.jpg
Folder = -
Size = 69424
Packed Size = 69424
Modified = 2017-06-30 13:23:38
Path = readme.txt
Folder = -
Size = 37
Packed Size = 37
Modified = 2017-06-30 13:25:20
UDF Bridge:
7z l -slt iso9660_udf.iso
Result:
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
Listing archive: iso9660_udf.iso
--
Path = iso9660_udf.iso
Type = Udf
Comment = UDF Bridge demo
Cluster Size = 2048
Created = 2017-06-30 18:31:33
----------
Path = nimbie.jpg
Folder = -
Size = 69424
Packed Size = 69632
Modified = 2017-06-30 13:23:38
Accessed = 2017-06-30 18:31:33
Path = readme.txt
Folder = -
Size = 37
Packed Size = 2048
Modified = 2017-06-30 13:25:20
Accessed = 2017-06-30 18:31:33
https://twitter.com/anjacks0n/status/941020183812100096
Esp.:
Without Tika, relying on on DROID, there would have been 25,887,108 unidentified resources - mostly plain text, JS, CSS etc. Without DROID, only 464 would go unidentified, but we'd have no format-version-level information. Combining tools is crucial for web archives.
Using iso-info:
iso-info -l -i dvd-erik.iso
Result:
d [LSN 22] 4096 Jan 01 1970 01:00:00 .
d [LSN 22] 2048 Jan 01 1970 01:00:00 ..
- [LSN 26] 158549392 Jul 30 2008 09:33:59 086_10B21_078v_079r.TIF
- [LSN 77443] 158633884 Jul 30 2008 09:34:08 087_10B21_079v_080r.TIF
- [LSN 154901] 157658880 Jul 30 2008 09:34:19 088_10B21_080v_081r.TIF
- [LSN 231883] 157877788 Jul 30 2008 09:34:29 089_10B21_081v_082r.TIF
::
::
- [LSN 2092850] 158203324 Jul 30 2008 09:38:31 113_10B21_105v_106r.TIF
- [LSN 2170098] 156139844 Jul 30 2008 09:38:41 114_10B21_106v_107r.TIF
Here LSN * 2048 = offset of start of file.
From the manual:
--try-again Mark all non-trimmed and non-scraped blocks inside the rescue domain as non-tried before beginning the rescue. Try this if the drive stops responding and ddrescue immediately starts scraping failed blocks when restarted. If '--retrim' is also specified, mark all failed blocks inside the rescue domain as non-tried.
Useful if ddrescue remains stuck endlessly in "scraping failed blocks".
msiexec /a putty-64bit-0.70-installer.msi
Disable PDF/A validation, only extract features:
verapdf --off --extract whatever.pdf > whatever.xml
Recursively process directory tree:
verapdf --recurse --off --extract myDir > whatever.xml
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22digital%20preservation%22
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22digital%20scholarship%22
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:enrichment
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:IPR
- https://zenodo.org/communities/kbnl/search?page=1&size=20&q=keywords:%22public%20libraries%22
https://docs.google.com/spreadsheets/d/1g2vbAFBHWhsPRkNljbQBsKasMI-GCFTsQLol0cFT6js/edit#gid=0
https://web.archive.org/web/20020201195007/http://www.geocities.com:80/SiliconValley/4031/
https://superuser.com/questions/670324/obtaining-a-list-of-all-hyperlinks
(Note: lots of DOIs in references don't resolve at all, or resolve to wrong location!)
http://onlinelibrary.wiley.com/doi/10.1111/1746-692X.12106/full
http://www.nature.com/news/a-clean-green-science-machine-1.17125?WT.mc_id=TWT_NatureNews
http://tyndall.ac.uk/sites/default/files/twp161.pdf
https://www.chemistryworld.com/opinion/cutting-the-science-travel-footprint/9567.article
Archivematica examples in:
https://www.loc.gov/standards/premis/examples.html
Paper by Andy Jackson (2012):
http://arxiv.org/pdf/1210.1714.pdf
https://twitter.com/andrewjbtw/status/920791293122396160
For one file:
convert whatever_compressed.tif +compress whatever_uncompressed.tif
Multiple files:
#!/bin/bash
# Input and output directories
dirIn=~/tiffsDDD
dirOut=~/tiffsDDUncompressed
while IFS= read -d $'\0' -r file ; do
# File basename
bName=$(basename -s .TIF "$file")
# Output name
outName=$bName.TIF
# Full output paths
fOut="$dirOut/$outName"
# Convert to uncompressed TIFF
convert $file +compress $fOut
done < <(find $dirIn -type f -name "*.TIF" -print
-
"Failed to start the X server" message in login screen
Solution:
https://linuxnorth.wordpress.com/2017/07/04/installing-and-uninstalling-lightdm-in-linux-mint-18-2/
-
Top/title bar of windows missing, cannot move windows.
Solution: go to Preferences/Desktop Settings/Windows and select a Window Manager from the dropdown menu (for some reason no WM is selected by default).
-
Window resize margin in default Metacity window manager is only 1 px wide
Solution: https://askubuntu.com/questions/4109/how-do-i-increase-the-resize-margin-on-windows
This library provides a fast, standalone way to read and write WARC Format commonly used in web archives.
https://github.com/webrecorder/warcio
Includes ARC/WARC validation:
https://sbforge.org/display/JWAT/Running+JWAT-Tools
https://technet.microsoft.com/en-us/library/ee309278(office.12).aspx
https://github.com/apache/tika/tree/master/tika-core/src/main/resources/org/apache/tika/mime
Kaitai Struct is a declarative language used for describe various binary data structures, laid out in files or in memory (...).
The main idea is that a particular format is described in Kaitai Struct language (.ksy file) and then can be compiled with ksc into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and give access to it in a nice, easy-to-comprehend API.
Use -d
option with invalid-name
:
python3 -m pylint -d invalid-name boxvalidator.py > pylintjpylyzer.txt
https://github.com/Dzonatas/solution/tree/master/Documentation
Following command will keep logibn credentials in cache for 1 hour:
git config --global credential.helper "cache --timeout=3600"
For some reason I always forget this (below for OpenJPEG):
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
- http://hirise-pds.lpl.arizona.edu/download/PDS/RDR/ESP/ORB_011200_011299/ESP_011265_1560/ESP_011265_1560_RED.JP2 (2GB)
- 6.7 GB http://apollo.sese.asu.edu/data/pancam/AS16/jp2/AS16-P-4102.jp2
https://gist.github.com/danielestevez/2044589
Prince tool:
waeasyprint (OS alternative):
The goal of this Action is to improve scientific understanding of the implications of digitization, hence helping individuals, disciplines, societies and sectors across Europe to cope optimally with the effects.
http://blog.online-convert.com/huge-list-of-example-files-creative-commons/
robocopy sourceDir destDir /COPYALL /E /R:0 /DCOPY:T
E.g.:
robocopy H:\iromlabTestKBDepotNew M:\DigitalPreservation\optischeDragers\iromlabTestKBDepot /COPYALL /E /R:0 /DCOPY:T >robocopy.stdout 2>robocopy.stderr
Some useful links:
Good description of the problem:
https://lists.debian.org/debian-user/2005/01/msg02339.html
the sector numbers in the file system refer to sectors of the original CD rather than sectors of session2.iso. I don't know of a utility for rewriting them so that the file can be loop-mounted or written to an ordinary CD, but you can at least get a directory listing by using isoinfo with an offset:
isoinfo -i session2.iso -N 204345 -l
https://lists.gnu.org/archive/html/libcdio-devel/2010-02/msg00048.html
Esp.:
Remember, the path table and directory structure of the iso reflect the fact that the ISO filesystem starts on sector 222145 (49:23:70) of the CD. If it is burned to another CD at a different position, it won't work. Likewise, any program that reads the iso will need to be able to compensate for the offset. Try, for example: isoinfo -N 222145 -d -i '8mm-songs_to_love_and_die_by.iso'
Also (from same thread):
https://lists.gnu.org/archive/html/libcdio-devel/2010-02/msg00053.html
Default encoding for read/write write depends on locale settings, which can result in unexpected behaviour. See e.g.:
Solution: always set the encoding explicitly when opening a file for read/write in text mode. Example:
# Byte sequence corresponds to multiplication sign in UTF-8
myBytes = b'\xc3\x97'
# Decode to string
myString = myBytes.decode('utf-8')
# Write myString to file
with open("myString.txt", "w", encoding="utf-8") as ms_file:
ms_file.write(myString)
In this case, create link to f:\Pandoc\pandoc.exe
in directory c:\bin
:
mklink pandoc.exe F:\Pandoc\pandoc.exe
Powershell method:
Get-ItemProperty HKLM:\Software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\* | Select-Object DisplayName, DisplayVersion, Publisher, InstallDate | Format-Table –AutoSize > installedPrograms.txt
https://www.loc.gov/standards/premis/guidelines2017-premismets.pdf
java -jar tika-app-1.14.jar -t whatever.epub > whatever.txt
BUT doesn't return chapters in reading order!!
https://github.com/deanmalmgren/textract
Installs with errors under Windows; seems to work OK on Linux.
https://github.com/nscaife/file-windows
Saves output file as 24 bits / channel:
ffmpeg -i frogs-01.wav -codec pcm_s24le frogs-01-24-bit.wav
For list of all codec
values:
ffmpeg -codecs
http://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time
https://wiki.gentoo.org/wiki/FFmpeg_-_Extract_Blu-Ray_Audio
From:
https://support.microsoft.com/nl-nl/help/100027/info-direct-drive-access-under-win32
To open a physical hard drive for direct disk access (raw I/O) in a Win32-based application, use a device name of the form
\\.\PhysicalDriveN
where N is 0, 1, 2, and so forth, representing each of the physical drives in the system.
To open a logical drive, direct access is of the form
\\.\X:
where X: is a hard-drive partition letter, floppy disk drive, or CD-ROM drive.
E.g. compute checksum on CD in d: drive:
md5sum \\.\D:
Access to logical drives:
http://stackoverflow.com/q/6522644/1209004
Write access:
http://stackoverflow.com/q/7135398/1209004
Reading raw disks with Python:
http://blog.lifeeth.in/2011/03/reading-raw-disks-with-python.html
https://github.com/barneygale/isoparser
BUT this will make accessing the site CAPTCHA hell for Tor users: https://support.cloudflare.com/hc/en-us/articles/203306930-Does-CloudFlare-block-Tor-
Alternatives:
- CERTBot / Letsencrypt: requires server access
- Github pages has built-in https support, but only for github.io domains.
https://www.codementor.io/arpitbhayani/host-your-python-package-using-github-on-pypi-du107t7ku
One everything is set up, for each new release the basic steps are:
- Update version number in main code
- Update link to
download_url
(in my case this is automated) - Commit changes & push
- Add tag:
git tag -a x.y.z -m "whatever"
git push --tags
python setup.py register -r pypi
python setup.py sdist upload -r pypi
The md5sum of a "burnt" CD can be different than the md5sum of the associated iso file and not indicate an error
http://twiki.org/cgi-bin/view/Wikilearn/CdromMd5sumsAfterBurning
See also:
http://superuser.com/questions/220082/how-to-validate-a-dvd-against-an-iso
https://warekennis.nl/wp-content/uploads/2013/03/BOOKS-AND-LITERATURE-STATUS-REVIEW-2017-.pdf
ffprobe track01.cdda.wav -show_format -show_streams > properties.txt
Result (file properties.txt):
[STREAM]
index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=2
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=8233176
duration=186.693333
bit_rate=1411200
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=track01.cdda.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=186.693333
size=32932748
bit_rate=1411201
probe_score=99
[/FORMAT]
XML output:
ffprobe track01.cdda.wav -show_format -show_streams -print_format xml > properties.xml
Script:
https://blog.heckel.xyz/wp-content/uploads/2012/12/fritzbox-dlna-refresh
https://github.com/amiaopensource/open-workflows
https://confluence.nypl.org/display/DIG/Specifications+for+Audio+and+Moving+Image+Digitization
Mediags is a console program that scans directories for media files and verifies the integrity of those files. Detailed content reports may optionally be produced.
(Binaries windows only)
https://medium.com/swlh/browsers-not-apps-are-the-future-of-mobile-c552752ff75#.ilc1zlj1a
Video conversations with up to 8 people for free. No login required — no installs
https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/A_Guide_To_WDQS
Extract references and metadata from PDF documents, and download all referenced PDFs:
https://www.metachris.com/pdfx/
http://stackoverflow.com/questions/13343096/explanation-of-need-for-multi-threading-gui-programming
https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats
http://stackoverflow.com/questions/1623039/python-debugging-tips
http://www.filfre.net/2016/09/a-slow-motion-revolution/
An Open-Source Strategy for Documenting Events: The Case Study of the 42nd Canadian Federal Election on Twitter
http://journal.code4lib.org/articles/11358
http://checkers.eiii.eu/en/pdfcheck/
https://docs.python.org/3.6/library/queue.html
https://docs.python.org/3.6/library/sched.html
And perhaps:
https://docs.python.org/3.6/library/threading.html#module-threading
Possibly usable in CD imaging workflow (esp. interaction with operator input).
This Windows Batchscript setups a MinGW/GCC compiler environment for building ffmpeg and other media tools under Windows. After building the environment it retrieves and compiles all tools. All tools get static compiled, no external .dlls needed (with some optional exceptions)
https://github.com/jb-alvarado/media-autobuild_suite
By default this doesn't build the ffmpeg optional libraries (incl. cddio). In order to build them, if the batch file prompts you to Choose ffmpeg and mpv optional libraries?, select option 4 (All available external libs). Alternatively (if you accidentally ran the build with the default option), open file media-autobuild_suite.ini and set the value of ffmpegChoice to 4:
ffmpegChoice=4
http://lrn.no-ip.info/packages/i686-w64-mingw/libcdio/0.93-1/
http://www.student.tugraz.at/thomas.plank/
http://discid.sourceforge.net/
Tried flactag fork, which gives following output for CD-ROM:
Query failed: no actual audio tracks on disc: CDROM or DVD?
So might be useful for distinguishing between audio CD's and CD-ROMs (tarball contains Windows binary).
http://disktype.sourceforge.net/
Output audio CD:
Block device, size 690.4 MiB (723972096 bytes)
CD-ROM, 14 tracks, CDDB disk ID D912690E
Track 1: Audio track, 37.35 MiB (39163152 bytes), 3 min 42 sec
Track 2: Audio track, 87.89 MiB (92163120 bytes), 8 min 42 sec
::
Track 13: Audio track, 37.22 MiB (39029088 bytes), 3 min 41 sec
Track 14: Audio track, 78.14 MiB (81931920 bytes), 7 min 44 sec
CD-ROM:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 1 track, CDDB disk ID 0205F301
Track 1: Data track, 223.2 MiB (233994240 bytes)
ISO9660 file system
Volume name "0305132335"
Preparer "CEQUADRAT 32BIT ISO-9660 FORMATTER COPYRIGHT (C) 1995-1998 BY CEQUDRAT GMBH"
Data size 222.9 MiB (233682944 bytes, 114103 blocks of 2 KiB)
Joliet extension, volume name "0305132335"
Enhanced audio CD:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 22 tracks, CDDB disk ID 4B113416
Track 1: Audio track, 9.627 MiB (10094784 bytes), 0 min 57 sec
Track 2: Audio track, 30.01 MiB (31462704 bytes), 2 min 58 sec
::
Track 20: Audio track, 41.33 MiB (43340304 bytes), 4 min 05 sec
Track 21: Audio track, 47.73 MiB (50048208 bytes), 4 min 43 sec
Track 22: Data track, 90.84 MiB (95252480 bytes)
DVD:
Block device, size 223.2 MiB (233990144 bytes)
CD-ROM, 1 track, CDDB disk ID 023BFD01
Track 1: Data track, 2.197 GiB (2358986752 bytes)
Apple partition map, 2 entries
Partition 1: 31.50 KiB (32256 bytes, 63 sectors from 1)
Type "Apple_partition_map"
Partition 2: 2.737 GiB (2938324992 bytes, 5738916 sectors from 1108)
Type "Apple_HFS"
HFS Plus file system
Volume size 2.737 GiB (2938324992 bytes, 1434729 blocks of 2 KiB)
Volume name "BelPop Marc Moulin"
UDF file system
Sector size 2048 bytes
Volume name "BelPop Marc Moulin"
UDF version 1.50
ISO9660 file system
Volume name "BELPOPMARCMOULIN"
Data size 2.737 GiB (2938894336 bytes, 1435007 blocks of 2 KiB)
Joliet extension, volume name "BelPop Marc Moul"
(note DVD is identified as CD-ROM; doesn't realy matter as extraction fronm DVD is identical to data CD-ROM).
Compiles without problems under Windows (using Cygwin), but doesn't seem to be able to access cd-devices. E.g.:
disktype /dev/sr0
Result:
--- /dev/sr0
Block device, size 332.6 MiB (348790784 bytes)
disktype: Data read failed at position 0: Invalid request code
Or:
disktype D:\
Result:
--- D:\
disktype: D:\: Is a directory
Or:
disktype D
Result:
--- D
disktype: Can't stat D: No such file or directory
Perhaps try cdrdao scanbus
?
http://www.robvanderwoude.com/wmic.php
Example - get information about optical drives:
wmic cdrom where mediatype!='unknown' get > test.txt
The GNU Compact Disc Input and Control library (libcdio) contains a library for CD-ROM and CD image access. Applications wishing to be oblivious of the OS- and device-dependent properties of a CD-ROM or of the specific details of various CD-image formats may benefit from using this library.
http://www.gnu.org/software/libcdio/
Python interface:
https://pypi.python.org/pypi/pycdio/
Brown, "Developing Virtual CD-ROM Collections" (2012):
http://www.ijdc.net/index.php/ijdc/article/view/216/285
Page 13:
-
Create BIN/TOC file with cdrdao using:
cdrdao read-cd --read-raw --device 1,0,0 --datafile allmy.bin allmy.toc
-
Author developed SheepShaver extension that allows these images to be read by emulator
Caveats:
- The given cdrdao command only extracts one session (I guess the Voyager CD-ROMs only contain one session with both the data and audio tracks, although the paper isn't entirely clear about this).
- In case of a CD with multiple sessions one would have to repeat the command for each of those (result: one separate image for each session)
- Hybrid CD-ROMs are not supported by any of the most widely-used emulators (also stressed by author)
Jackson (BL):
On multisession carriers:
While CD-ROM, DVD and HFS+ format disks are reasonably well covered by this approach, there are some important limitations. For example, the optical media formats all support the notion of ‘sessions’ – consecutive additions of tracks to a disk. This means that a given carrier may contain a ‘history’ of different versions of the data. By choosing to extract a single disk image, we only expose the final version of the data track, and any earlier versions, sessions or tracks are ignored. For our purposes, these sessions are not significant, but this may not be true elsewhere.
BUT sessions (at least on commercially manufactured carriers) typically don't contain different versions of the same data, but data that are completely different! Example: many 'enhanced' audio CDs that contain one session with all audio tracks, and another session with a data track. So sessions are significant!
BL workflow for REd Book (audio) and Yellow Book (mixed mode) carriers:
- Image to MDS/MDF format
- Then post-process MDS/MDF file with IsoBuster
But it's not entirely clear if the MDS/MDF can handle multisession carriers?
I found this in the Knowledge Base of the developer of the format:
Image making wizard will always allow the user to create mds/mdf ccd/img/sub.
But ISO format, only for those disc's that contain 1 data track(mode1 or mode2form1).
For cue/bin only for one session disc. if the original disc is a multi-session one, then the cue/bin would not be available and If the user chooses read sub-channel, the cue/bin and iso would be unavailable as well . because iso and cue/bin could not save sub channel data.
So apparently MDS/MDF does support multisession after all!
Good overview of disc image formats here:
http://www.theisozone.com/blogs/homebrew/burning-image-file-type-explained/
Includes links to ROM and startup images:
http://www.redundantrobot.com/#/sheepshaver
Report by Cornell University:
https://ecommons.cornell.edu/handle/1813/41368
Some useful info on Mac / PC images and hybrids:
http://www.macdisk.com/faqcden.php
Contains lots of info on optical carrier and disc image formats (e.g. BIN/CUE):
http://web.archive.org/web/20070221154246/http://www.goldenhawk.com/download/cdrwin.pdf
http://stackoverflow.com/questions/10123929/python-requests-fetch-a-file-from-a-local-url
https://blog.codinghorror.com/computer-display-calibration-101/
https://blog.codinghorror.com/bias-lighting/
Find all files with .pdf extension:
find . -type f -name '*.pdf'
Count all files with .pdf extension:
find . -type f -name '*.pdf'| wc -l
Esp. 'useful links' section:
https://github.com/garbear/pyrominfo
Representation of 1 pixel in many different formats:
http://cloudinary.com/blog/one_pixel_is_worth_three_thousand_words
Online tutorials on APIs, Data Management, Data Manipulation, Distant Reading, Linked Open Data, Mapping and GIS, Network Analysis, Omeka Exhibit Building, Web Scraping and Programming with Python:
http://programminghistorian.org/lessons/
Supports lots of (old) Office-related formats + includes many conversion tools:
https://launchpad.net/ubuntu/+source/writerperfect/0.9.5-1
https://github.com/osnr/horrifying-pdf-experiments
https://en.wikibooks.org/wiki/A_Beginner%27s_Python_Tutorial/Classes
https://forums.linuxmint.com/viewtopic.php?t=177915
(Source: Nick Krabbenhöft on Twitter)
http://www.loc.gov/standards/mets/profiles/00000007.html
http://dx.doi.org/10.2218/ijdc.v4i2.107
http://www.digpres.com/publications/woodsbrownarch09.pdf
Example METS file (note that apparently they combine multiple ISOs in one AIP):
http://webapp1.dlib.indiana.edu/virtual_disk_library/index.cgi/4252478/mets
http://www.bl.uk/profiles/sound/METS_profile.pdf
https://www.blackmoreops.com/2015/06/18/linux-file-system-hierarchy-v2-0/
- Delving SIP-Creator
- Fedora SIP Creator
- UGent Sip Creator
- SIP-Builder
- RODA-In
- Dvcapture
- DURAARK SIP generator
xmllint --noout -schema schema.xsd whatever.xml
find -type f -exec md5sum "{}" + > checksums.md5
Source: http://askubuntu.com/a/318534. Works also under Cygwin.
Issue: output also includes MD5 sum of output file (which become invalid once anything is written to the file).
-
Convert master JP2 to TIFF using Kakadu (this preserves any embedded ICC profiles):
kdu_expand -i master.jp2 -o master.tiff
-
Convert TIFF to lossy JP2 with Aware via jpwrappa:
jpwrappa -m -p C:\jpwrappa\profiles\optionsKBAccessLossy_2014.xml master.tiff access.jp2
(The -m
switch can be omitted, in which case there is no need for Exiftool.)
- Acronova Nimbie USB Plus range
- Nimbie NB21-DVD
- Nimbie USB range (NB11 - not available (19/5))
- Guidelines for Digital Newspaper Preservation
- Chronicles in Preservation: Preserving Digital News and Newspapers
- Digital Preservation of Newspapers: Findings of the Chronicles in Preservation Project
- E-paper Production Workflow – Adapting Production Workflow Processes for Digital Newsprint
- PRESERVING NEWS IN THE DIGITAL ENVIRONMENT: MAPPING THE NEWSPAPER INDUSTRY IN TRANSITION
Use the --reference-docx
switch:
pandoc -S --reference-docx=template.docx test.md -o test.docx
Rollback to previous state:
git reset --hard <tag/branch/commit id>
Commit changes:
git push ... -f
Example:
git reset --hard 2dbe067c1674dcf9a23104c4b64b772e1550ba29
git push origin master -f
http://162.242.228.174/mimes/mime_comparisons.html
An open repository of web crawl data that can be accessed and analyzed by anyone
A Python port of the Apache Tika library that makes Tika available using the Tika REST Server.
https://github.com/chrismattmann/tika-python
https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167
http://programminghistorian.org/lessons/intro-to-bash
https://github.com/titusz/epubcheck
http://www.tegelspreukmaker.nl/
Looks a bit similar to Prezi, but OS (presentation as SVG):
Press F9, F10, F11 or F12 twice. "Auto-rotate screen" option in Android Settings must be enabled.
Following codeblock is not rendered correctly in Wordpress:
<pre><code><div>test</div></code></pre>
Workaround is to replace forward slash in closing tag by entity reference:
<pre><code><div>test</div></code></pre>
https://github.com/ANSSI-FR/caradoc
Note: current Debian package of Opam not recent enough, so used the instructions under "Binary distribution" at https://opam.ocaml.org/doc/Install.html. Installs binary in /usr/local/bin
.
Make file initially didn't work because ocamlfind
could not be found. Fixed by typing:
eval $(opam config env)
After this it compiles without any errors.
Includes MiniDisc:
Python library that reads/writes EPUB, including EPUB 3:
https://github.com/aerkalov/ebooklib
Example, create EPUB from HTML:
https://gist.github.com/bitsgalore/4c830a301f33f584c041
http://www.cb.nl/nieuws/alle-relevante-data-over-e-books-in-nederland/
http://www.cb.nl/nieuws/e-bookbarometeblijft-groeien/
http://fileformats.archiveteam.org/wiki/Encyclopedia_of_Graphics_File_Formats
http://homepages.cwi.nl/~steven/Talks/2015/11-06-xml-amsterdam/
This works (but what's referred to as a "schema" isn't really a schema at all):
https://blog.udemy.com/excel-to-xml/
Similar to above, but uses XSD Schema directly, might be better:
https://bitwizards.com/blog/november-2010/how-to-export-an-excel-2010-worksheet-to-xml
Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot
Web archive player:
https://github.com/ikreymer/webarchiveplayer
E.g. replace every occurrence of /tmp/"$fileIn"
with /tmp/"$(cat /dev/urandom | tr -cd 'a-f0-9' | head -c 16)"
:
find /home/johan/cajascripts -type f -print0 | xargs -0 sed -i 's/\/tmp\/"$fileIn"/\/tmp\/"$(cat \/dev\/urandom | tr -cd 'a-f0-9' | head -c 16)"/g'
- Don't save offsite links
- Use 'blogs' ignore pattern
Command (I think?):
!archive http://www.flipvandyke.nl/ --no-offsite-links --ignore-sets=blogs
https://help.ubuntu.com/community/DataRecovery
If N = number of layers, then first extract layers i and below to a separate JP2 with Aware j2kdriver tool:
j2kdriver -i foo.jp2 -ql (N-i+1) -t JP2 -o foo_i.jp2
Then use jpylyzer to compute the compression ratio of resulting image.
Create derived image for each quality layer:
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 11 -t JP2 -o layer1.jp2
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 10 -t JP2 -o layer2.jp2
::
::
j2kdriver -i MMAD01_000001001_00011_master.jp2 -ql 1 -t JP2 -o layer11.jp2
touch -d "1 January 1768" myfile.txt
This happened to my HP ProBook 640 G1. Workaround: in BIOS, disable "wake on LAN". Source: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1470723/comments/13
http://wiki.hydrogenaud.io/index.php?title=Comparison_of_CD_rippers
http://publications.beeldengeluid.nl/pub/84
http://blog.differential.com/best-way-to-merge-a-github-pull-request/
Third option (Catch Feature Up with Master by Rebasing, then fast-forward Merge).
http://liv.science.uva.nl/index.html
Misschien delen (her)bruikbaar voor interne cursussen e.d.
Ubuntu with Nautilus file manager - Nautilus Actions:
http://www.pcsteps.com/4434-add-right-click-commands-linux-mint-ubuntu/
Linux Mint Cinnamon with Nemo file manager:
http://www.pcsteps.com/4434-add-right-click-commands-linux-mint-ubuntu/
Linux Mint Mate with Caja file manager:
http://www.ethanjoachimeldridge.info/tech-blog/caja-exifstrip-context-action
From http://stackoverflow.com/a/11202773:
Suppose I want to create a floppy image containing file oakcdrom.sys:
dd bs=512 count=2880 if=/dev/zero of=oakcd.img
mkfs.msdos oakcd.img
mcopy -i oakcd.img oakcdrom.sys ::/
Inspect contents:
mdir -i oakcd.img
General command:
ddrescue -d -n -b 512 /dev/fd0 myfloppy.img myfloppy.log
To get name of device:
lsblk
Result:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465,8G 0 disk
├─sda1 8:1 0 457,9G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 7,9G 0 part [SWAP]
sdb 8:16 0 29,8G 0 disk
sdc 8:32 1 1,4M 1 disk
So in this case it is /dev/sdc
. Create the image with:
sudo ddrescue -d -n -b 512 /dev/sdc myfloppy.img myfloppy.log
Optionaly use dosfsck tool to check the integrity of the file system (assuming it is a DOS file system). Use following command:
echo "n" |dosfsck -t -r myfloppy.img
The -t
option checks for bad clusters, but this only works in combination with -a
(automatically repair) or -r
(interactively repair). So to do the check without automatic repair or input from user we use -r
and then use a pipe to prevent any changes being made. Result:
fsck.fat 3.0.26 (2014-03-07)
Cluster 2845 is unreadable.
Cluster 2846 is unreadable.
Cluster 2847 is unreadable.
Cluster 2848 is unreadable.
Perform changes ? (y/n) myfloppy.img: 33 files, 2304/2847 clusters
Check integrity of git rpo:
http://stackoverflow.com/questions/5585388/which-git-commands-perform-integrity-checks
(Bottom line: use git fsck
.)
How to shrink the git folder:
http://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder
Exit GUI:
Ctrl-Alt-F1
Re-enter:
Ctrl-Alt-F8
From https://bugs.launchpad.net/ubuntu/+source/retext/+bug/1451125:
sudo apt-get install python3-docutils python3-markdown
From the manual:
- Turn on or restart the computer, and then press esc while the “Press the ESC key for Startup Menu” message is displayed at the bottom of the screen
- Press f10 to enter Computer Setup.
sudo badblocks -sv /dev/sda1
See also:http://askubuntu.com/questions/59064/how-to-run-a-checkdisk
/usr/share/virtualbox
Run this on host machine:
sudo ntpdate ntp.xs4all.nl
Then re-start VM; host and guest are now in sync and no more clock skew errors.
pandoc -S whatever.md -o whatever.html
http://broadcast.oreilly.com/2008/11/validating-code-lists-with-sch.html
Handige Unicode en UTF-8 achtergrondinfo:
http://codesnippets.wpakb.kb.nl/index.php?title=Character_sets
Sigil:
https://github.com/user-none/Sigil
Simple, use-friendly.
ddrescue:
http://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html
Command line (Cygwin):
ddrescue -b 2048 -v /dev/scd0 test.iso test.log
disktype tool:
http://disktype.sourceforge.net/
E.g. reveals file system tyype (ISO/UDF), other tech info.
General instructions here:
http://www.msfn.org/board/topic/170785-virtualbox-windows-98se-step-by-step/
But results in error:
HID failed to attach mouse driver (VERR_PDM_NO_ATTACHED_DRIVER
Tried this:
https://forums.virtualbox.org/viewtopic.php?f=2&t=58657#p272752
VBoxInternal/USB/HidMouse/1/Config/CoordShift 0
Still doesn't work; neither does:
VBoxInternal/USB/HidMouse/1/Config/CoordShift 1
But see:
https://www.virtualbox.org/manual/ch12.html#idp60139152
Windows 2000 installation failures:
https://www.virtualbox.org/manual/ch12.html#idp60119680
Works!
Then go install guest additions:
https://docs.oracle.com/cd/E36500_01/E36502/html/qs-guest-additions.html
"AsciiMath is an easy-to-write markup language for mathematics":
git add -A
git commit -m "Changed everything"
git push origin master
git push [email protected]:openplanets/jpylyzer-test-files.git master
Versioning: x.y.z
x: API breakage y: new feature z: bugfix
git tag -a 1.1.0 -m "tagging vesion 1.1.1 with refactored code"
git push --tags
1. Convert all master JP2s to TIFF with ImageMagick, using the command:
mogrify -format tiff *.jp2
2. Conversion loses resolution info (see below), so add new values using:
exiftool *.tiff -xresolution=300 -yresolution=300 -resolutionunit=inches
3. Convert TIFFs to master JP2s:
f:\johan\pythoncode\jpwrappa\jpwrappa\jpwrappa.py M:\Trans\johan\testJP2ContrApp2014\B5\tiff\*.tiff M:\Trans\johan\testJP2ContrApp2014\B5\jp2k\master\ -p F:\johan\pythonCode\jpwrappa\jpwrappa\profiles\optionsKBMasterLossless_2014.xml -m
4. Same for access JP2s:
f:\johan\pythoncode\jpwrappa\jpwrappa\jpwrappa.py M:\Trans\johan\testJP2ContrApp2014\B5\tiff\*.tiff M:\Trans\johan\testJP2ContrApp2014\B5\jp2k\access\ -p F:\johan\pythonCode\jpwrappa\jpwrappa\profiles\optionsKBAccessLossy_2014.xml -m
But ... looking at image header box:
<imageHeaderBox> <height>2818</height> <width>1913</width> <nC>1</nC> <bPCSign>unsigned</bPCSign> <bPCDepth>8</bPCDepth> <c>jpeg2000</c> <unkC>yes</unkC> <iPR>no</iPR> </imageHeaderBox>
So "unknown colourspace" is set to "yes", which should be no (and it is "No" in the source JP2). So what is causing this? Bug in Aware software? Does this only happen with Grayscale images?
To reproduce the problem:
- Convert any JP2 to TIFF with ImageMagick (will strip away any resolution info)
- Convert TIFF to JP2 with Aware.
Run jpylyzer on resulting JP2:
<isValidJP2>False</isValidJP2> <tests> <jp2HeaderBox> <resolutionBox> <captureResolutionBox> <hRcNIsValid>False</hRcNIsValid> </captureResolutionBox> </resolutionBox> </jp2HeaderBox> </tests>
Looking at properties of resolution box:
<resolutionBox> <captureResolutionBox> <vRcN>29491</vRcN> <vRcD>7491</vRcD> <hRcN>0</hRcN> <hRcD>1</hRcD> <vRcE>1</vRcE> <hRcE>4</hRcE> <vRescInPixelsPerMeter>39.37</vRescInPixelsPerMeter> <hRescInPixelsPerMeter>0.0</hRescInPixelsPerMeter> <vRescInPixelsPerInch>1.0</vRescInPixelsPerInch> <hRescInPixelsPerInch>0.0</hRescInPixelsPerInch> </captureResolutionBox> </resolutionBox>
Here for UTF-8:
http://stackoverflow.com/a/9822937
git clone https://github.com/openpreserve/jpylyzer.git --branch gh-pages --single-branch ./jpylyzerHomepage
File:
E:\\laPeyneCDROM\\xlsfiles\\series98.xls
Refs to MACROS.XLS'!ENash
, which is missing.
Solution: before opening, disable automatic workbook calculation from options:
Loading spreadsheet now results in most recent values that are stored in workbook.
thermo filetype:tdb
Only gives results with extension tdb
.
-
An Introduction to Optical Media Preservation: http://journal.code4lib.org/articles/9581
-
What are the best CD/DVD-ROM drives for disc imaging? http://qanda.digipres.org/10/what-are-the-best-cd-dvd-rom-drives-for-disc-imaging?show=10#q10
-
CD/DVD Drive Accuracy List 2014: http://forum.dbpoweramp.com/showthread.php?34019-CD-DVD-Drive-Accuracy-List-2014
-
Preserving Write-Once DVDs: Producing Disk Images, Extracting Content, and Addressing Flaws and Errors (LoC): http://preservationmatters.blogspot.nl/2015/01/preserving-write-once-dvds.html
-
Developing a Robust Migration Workflow for Preserving and Curating Hand-held Media (Andy Jackson): http://anjackson.net/keeping-codes/practice/developing-a-robust-migration-workflow-for-preserving-and-curating-handheld-media.html
https://spotdocs.scholarsportal.info/display/EJournals/Publisher+Data+Formats
Both errors and warnings reported to same _message_ element in XML. E.g. compare:
<status>Not well-formed</status>
<messages>
<message>ERROR: /OEBPS/cover.html(5): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.</message>
<message>ERROR: /OEBPS/copyright.html(5): non-standard stylesheet resource 'OEBPS/page-template.xpgt' of type 'application/vnd.adobe-page-template+xml'. A fallback must be specified.</message>
</messages>
with this:
<status>Well-formed</status>
<messages>
<message>WARN: /OEBPS/toc.ncx: meta@dtb:uid content 'null' should conform to unique-identifier in content.opf: '821'</message>
</messages>
So output needs some parsing. Tested w. epubcheck 3.0.1.
- E drive: Hitachi (grote drive)
- H drive: Buffalo (kleine drive)
H gebruikt als backupdisk van E.
17/18 november, poster gecanceld, wel 90 s praatje + 1 slide.
BnF:
http://www.bnf.fr/documents/ref_num_fichier_image.pdf
Readers absorb less on Kindles than on paper, study finds:
Reading and learning from screens versus print: a study in changing habits: Part 1 – reading long information rich texts:
http://www.emeraldinsight.com/doi/full/10.1108/NLW-01-2013-0012
http://www.scientificamerican.com/article/reading-paper-screens/
https://help.github.com/articles/syncing-a-fork
Requires:
https://help.github.com/articles/configuring-a-remote-for-a-fork
- PEP8
- pyflakes
- pdb: http://stackoverflow.com/a/1623085/1209004
GraphicsMagick command line:
gm convert -compress jpeg -quality 50 *.TIF test.pdf
Result: PDF with all images as JPEG, quality 50. According to Acrobat / Apache Preflight the PDF has some format conformance issues. One possible remedy is to re-process the PDF using Ghostscript. E.g. command below produces a PDF that conforms to PDF/A-1b::
gswin64 -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=test_a.pdf test.pdf
Source: http://stackoverflow.com/questions/1659147/how-to-use-ghostscript-to-convert-pdf-to-pdf-a-or-pdf-x
Link: http://journal.code4lib.org/articles/9158
Tutorial:
http://fotoforensics.com/tutorial-estq.php
But ... this is also possible with ImageMagick / GraphicsMagick (according to Approximate Quantization Table method that is mentioned in the tutorial):
http://superuser.com/questions/62730/how-to-find-the-jpg-quality