Skip to content

Instantly share code, notes, and snippets.

View atomjoy's full-sized avatar

Atomjoy atomjoy

View GitHub Profile
@lyquix-owner
lyquix-owner / cleanhtml.php
Last active March 20, 2025 18:39
PHP script to automatically clean dirty HTML. Removes unnecessary attributes (e.g. style, id, dir), replaces deprecated tags with valid ones (e.g. <b> to <strong>), and strips undesirable tags (e.g <font>). We have used this script to safely clean hundreds of blog posts that were littered with inline styling.
<?php
// List of tags to be replaced and their replacement
$replace_tags = [
'i' => 'em',
'b' => 'strong'
];
// List of tags to be stripped. Text and children tags will be preserved.
$remove_tags = [