Skip to content

Instantly share code, notes, and snippets.

View douglasmiranda's full-sized avatar
👽

Douglas Miranda douglasmiranda

👽
  • Earth, Brazil
View GitHub Profile

Understanding PDF Format

We have been working with PDF files since 1999 and developed complex software to display PDF files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below.

There are also a large number of technical terms used with PDF so we have created a Glossary of Terms with all the keywords.

If you are interested in using our software to display your PDF documents (we can rasterize them, convert them to HTML5 or SVG, or provide a complete Java PDF Viewer) pdf why not setup a call with us and see if we can help?

Here is an overview of the topics covered in this article:

@douglasmiranda
douglasmiranda / tinymce.md
Created December 5, 2024 01:38
TinyMCE can break your searches

TinyMCE will convert accents to HTML entities like:

à becomes á

So you see the problem there, you search in your database for "á" or even "a" (in case you are using unaccent), and you get nothing.

In my case, I was using full text search in Postgres, using the dictionary portuguese_unaccent (custom dictionary check here).

So keep in mind you can always make TinyMCE not convert stuff:

𝐀 A
𝗔 A
𝐴 A
𝘈 A
𝑨 A
𝘼 A
𝒜 A
𝙰 A
𝐁 B
𝗕 B
@douglasmiranda
douglasmiranda / admin.py
Last active November 28, 2024 23:53
Django Admin - Pretty Print (formatted / idented / no color) JSON (read-only)
# IMPORTANT: Just keep in mind this is not ideal for untrusted JSON content.
import json
from django.contrib import admin
from .models import Publication
@admin.register(Publication)
class PublicationAdmin(admin.ModelAdmin):
@douglasmiranda
douglasmiranda / admin.py
Created November 27, 2024 01:36
Verify if the current Django Admin view is changelist.
# There are many ways to do it, like checking if "change" is in the current url path
# you can adapt to check for _changelist, _change, _delete...
def is_changelist_view(model_admin_instance, request):
"""
Verify if the current view is changelist.
Args:
model_admin_instance (admin.ModelAdmin): ModelAdmin instance.
request (HttpRequest): Current request.
@douglasmiranda
douglasmiranda / Dockerfile
Created November 18, 2024 00:04
Dockerfile - Microsoft Fonts - Installing .deb directly - Debian
FROM debian:bookworm
# Debian 12 (Bookworm)
# MS Fonts
ARG MS_FONTS_VERSION=3.8.1
RUN apt-get update && apt-get install --no-install-recommends -y \
# Custom dependencies
# ttf-mscorefonts-installer requires extra setup, adding contrib repo
# that might generate conflict with other dependencies
@douglasmiranda
douglasmiranda / policy.json
Created November 3, 2024 02:17
S3 policy - Allow public read for all; Deny resource access for a folder/path, but let my IAM user have access.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::MY_BUCKET/*"
},
@douglasmiranda
douglasmiranda / compose.yml
Created November 2, 2024 01:54
Minio as S3 storage for Django + Django Storages setup
services:
django:
# setup django container ...
# Storage
minio:
image: minio/minio:latest
ports:
- 9000:9000
- 9001:9001
@douglasmiranda
douglasmiranda / mutool.md
Last active October 3, 2024 18:05
Clean and output all pages but the first one.

Today (2024-10-02) the docs for mutool clean just doesn't include proper info for this.

mutool clean a.pdf b.pdf 2-N

In the Ubuntu page, there's a better description:

From Ubuntu Man pages:

@douglasmiranda
douglasmiranda / Dockerfile
Created October 3, 2024 00:57
ttf-mscorefonts-installer on Debian 12 - Dockerfile
# For errors like:
# Package 'ttf-mscorefonts-installer' has no installation candidate
# You could always do like the entire internet gonna tell you
# just add contrib to your /etc/apt/sources.list
# example for debian 12
# echo "deb http://deb.debian.org/debian bookworm main contrib" >> /etc/apt/sources.list
# But that can mess with your other dependencies
# In my case was PrinceXML and it's dependencies.