Skip to content

Instantly share code, notes, and snippets.

@alexdowad
Created November 24, 2021 12:06
Show Gist options
  • Save alexdowad/5dfa94202a0192b658e94ec86a2b2e7d to your computer and use it in GitHub Desktop.
Save alexdowad/5dfa94202a0192b658e94ec86a2b2e7d to your computer and use it in GitHub Desktop.
Little bare-bones fuzzer for text conversion via PHP's mbstring library
#!/usr/bin/env ruby
$oldphp = File.join(__dir__, '../../sapi/cli/php-9308974f8c')
$newphp = File.join(__dir__, '../../sapi/cli/php-a14a5ef07f')
require 'fileutils'
include FileUtils
# UTF-8
encodings = %w{UTF-7 UTF-16 UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE UCS-2 UCS-2BE UCS-2LE UCS-4 UCS-4BE UCS-4LE SJIS ISO-2022-JP EUC-JP}
def try_conversion(str, from, to)
File.open('/tmp/oldphp-conv', 'w') { |f| f.write("#!#{$oldphp}\n<?php $data = stream_get_contents(STDIN); $str = mb_convert_encoding($data, '#{to}', '#{from}'); echo $str;") }
File.open('/tmp/newphp-conv', 'w') { |f| f.write("#!#{$newphp}\n<?php $data = stream_get_contents(STDIN); $str = mb_convert_encoding($data, '#{to}', '#{from}'); echo $str;") }
chmod('+x', '/tmp/oldphp-conv')
chmod('+x', '/tmp/newphp-conv')
old_result = new_result = nil
IO.popen('/tmp/oldphp-conv', 'r+') do |pipe|
pipe.write(str)
pipe.close_write
old_result = pipe.read
end
IO.popen('/tmp/newphp-conv', 'r+') do |pipe|
pipe.write(str)
pipe.close_write
new_result = pipe.read
end
return [old_result, new_result]
end
def test_conversion(str, from, to)
result = try_conversion(str, from, to)
return result[0] == result[1]
end
def reduce_case(str, from, to)
return str if str.empty?
reduce_by = str.length / 2
while reduce_by > 0
shorter = str[0..-(reduce_by + 1)]
if !test_conversion(shorter, from, to)
return reduce_case(shorter, from, to)
end
shorter = str[reduce_by..-1]
if !test_conversion(shorter, from, to)
return reduce_case(shorter, from, to)
end
reduce_by /= 2
end
return str
end
def known_case(str, from, to, new_result, old_result)
return (to == 'UTF-7' && old_result == '+') ||
(from == 'ISO-2022-JP' && str == "\e" && old_result == '')
end
if __FILE__ == $0
1000.times do
len = rand(1000)
str = len.times.collect { rand(256).chr }.join
from = encodings.sample
to = encodings.sample
puts "#{len}-byte string, from #{from} to #{to}"
old_result, new_result = try_conversion(str, from, to)
if old_result != new_result
str = reduce_case(str, from, to)
old_result, new_result = try_conversion(str, from, to)
if !known_case(str, from, to, new_result, old_result)
puts "Result didn't match"
puts "Input string:"
p str
puts "From old PHP:"
p old_result
puts "From new PHP:"
p new_result
exit
end
end
end
end
@alexdowad
Copy link
Author

  1. reduce_by usually manages to reduce 'bad' inputs to the smallest string which exhibits a problem, but not always. It would be more robust if we tried more different ways of reducing the size of a 'bad' input... but anyways, this is just a little throwaway script, and it's not the end of the world if a bad input is not reduced as much as it could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment