Last active
March 9, 2025 14:48
-
-
Save hugowetterberg/81747 to your computer and use it in GitHub Desktop.
A useful function for splitting ical content into 75-octet lines, taking multibyte characters into account. See: http://www.ietf.org/rfc/rfc2445.txt, section 4.1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
mb_internal_encoding("UTF-8"); | |
$desc = <<<TEXT | |
<p>Lines of text SHOULD NOT be longer than 75 octets, (och hör på den) excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique.</p> | |
That is, a long line can be split between any two characters by inserting a CRLF | |
immediately followed by a single linear white space character (i.e., | |
SPACE, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence | |
of CRLF followed immediately by a single linear white space character | |
is ignored (i.e., removed) when processing the content type. | |
TEXT; | |
/** | |
* Apply folding compliant with RFC 5545 | |
* See https://www.rfc-editor.org/rfc/rfc5545#section-3.1 | |
* | |
* @param string $preamble The property name, e.g. DESCRIPTION | |
* @param string $value The value for the property, e.g. a very long string | |
* @param bool $strip_tags Strip HTML tags from the value | |
* | |
* @return string Returns the folded string without the property name | |
*/ | |
function ical_split($preamble, $value, $strip_tags=true) | |
{ | |
$value = trim($value); | |
$value = preg_replace('/[\r\n]+/', ' ', $value); | |
$value = preg_replace('/\s{2,}/', ' ', $value); | |
if ($strip_tags) { | |
$value = strip_tags($value); | |
} | |
$value = $preamble . ':' . $value; | |
$offset = 0; | |
$chunkSize = 75; | |
$lines = []; | |
while ($line = mb_strcut($value, $offset, $chunkSize - 1)) { | |
$lines[] = $line; | |
$offset += $chunkSize; | |
} | |
return substr(join("\r\n\t", $lines), strlen($preamble) + 1); | |
} | |
$split = ical_split('DESCRIPTION:', $desc); | |
print 'DESCRIPTION:' . $split; | |
// Test results | |
$lines = preg_split('/\r\n/', 'DESCRIPTION:' . $split); | |
print "\n\nTests\n"; | |
foreach ($lines as $i => $line) { | |
print "Line {$i}: " . strlen($line) . " octets\n"; | |
} | |
print "\nAlt desc output:\n"; | |
$split = ical_split('X-ALT-DESC:', $desc, false); | |
print 'X-ALT-DESC:' . $split; | |
print "\n\n"; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DESCRIPTION:Lines of text SHOULD NOT be longer than 75 octets, (och hör | |
å den) excluding the line break. Long content lines SHOULD be split into | |
multiple line representations using a line "folding" technique. That is, | |
long line can be split between any two characters by inserting a CRLF imm | |
diately followed by a single linear white space character (i.e., SPACE, US | |
ASCII decimal 32 or HTAB, US-ASCII decimal 9). Any sequence of CRLF follow | |
d immediately by a single linear white space character is ignored (i.e., r | |
moved) when processing the content type. | |
Tests | |
Line 0: 73 octets | |
Line 1: 75 octets | |
Line 2: 75 octets | |
Line 3: 75 octets | |
Line 4: 75 octets | |
Line 5: 75 octets | |
Line 6: 75 octets | |
Line 7: 41 octets | |
Alt desc output: | |
X-ALT-DESC:<p>Lines of text SHOULD NOT be longer than 75 octets, (och hö | |
på den) excluding the line break. Long content lines SHOULD be split int | |
a multiple line representations using a line "folding" technique.</p> Tha | |
is, a long line can be split between any two characters by inserting a CR | |
F immediately followed by a single linear white space character (i.e., SPA | |
E, <b>US-ASCII</b> decimal 32 or HTAB, US-ASCII decimal 9). Any sequence o | |
CRLF followed immediately by a single linear white space character is ign | |
red (i.e., removed) when processing the content type. | |
Huh, 14 years... time flies :)
Your implementation looks nice and elegant @viavario. Stripping out tags should probably have been separate from the folding, but I added an optional param to your implementation that can be used to disable tag stripping, preserving the old behaviour.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As far as I understand from the RFC is that lines should be folded at a length of 75 characters including the property.
Depending on the way the ICS data is generated, your function might end up in an endless loop, particularly when using the while loop on line 28 if you have a long preamble or property.
As @keizie and @djkgamc commented,
mb_strcut
does exactly what we need:Although I'm in favor of properly applying the folding technique on multibyte strings, it should be noted that in section 3.1 of RFC 5545 the responsibility of supporting multibyte strings is put on the implementation of the unfolding technique instead of the folding technique:
Anyway, I believe the function can be improved to handle longer properties, as well as be more compliant with the RFC as @sqren suggested, and handle HTML in the
X-ALT-DESC
property as @Giulo77 requested: