-
-
Save kousu/bf5610187b608d79d415b1436091ab2d to your computer and use it in GitHub Desktop.
import os.path | |
def sanitize_path(path): | |
""" | |
Sanitize a path against directory traversals | |
Based on https://stackoverflow.com/questions/13939120/sanitizing-a-file-path-in-python. | |
>>> sanitize_path('../test') | |
'test' | |
>>> sanitize_path('../../test') | |
'test' | |
>>> sanitize_path('../../abc/../test') | |
'test' | |
>>> sanitize_path('../../abc/../test/fixtures') | |
'test/fixtures' | |
>>> sanitize_path('../../abc/../.test/fixtures') | |
'.test/fixtures' | |
>>> sanitize_path('/test/foo') | |
'test/foo' | |
>>> sanitize_path('./test/bar') | |
'test/bar' | |
>>> sanitize_path('.test/baz') | |
'.test/baz' | |
>>> sanitize_path('qux') | |
'qux' | |
""" | |
# - pretending to chroot to the current directory | |
# - cancelling all redundant paths (/.. = /) | |
# - making the path relative | |
return os.path.relpath(os.path.join("/", path), "/") | |
if __name__ == '__main__': | |
import doctest | |
doctest.testmod() |
I, thank you for the feedback here. My use case is also web apps. I did not face the issue you mention, but it is problematic indeed.
From the pathlib doc (emphasis mine) :
You want to make sure that your code only manipulates paths without actually accessing the OS. In this case, instantiating one of the pure classes may be useful since those simply don’t have any OS-accessing operations. https://docs.python.org/3/library/pathlib.html
I suppose then one would like to write :
import pathlib
return pathlib.PurePath("/", path).resolve().relative_to("/")
but the PurePath
class does not implement resolve()
. I am also not familiar enough with the library to suggest a solution.
I don't know why this is so hard. The top StackOverflow answer says to use secure_filename()
but that only works on file_names_ -- it forces all '/
to _
-- which isn't my use case. When I wrote this I had in mind accessing HTML snippets or files in an upload folder, and for that I want to preserve some directory structure just not access to the whole system.
Reading the docs, one thing that resolve()
does is follow all symlinks. So my version is not safe against symlink poisoning. I assumed my apps don't make symlinks and no one else should be writing into their folder, but it's a risk. There's no way to protect against symlink poisoning without actually reading the disk.
To be safe against that I guess you need to take a different approach and check if you're in a given restricted folder. But that's tricky to get right as well:
sudo mkdir -p /var/www/app/uploads
>>> def is_safe(base, target):
... base = pathlib.Path(base)
... target = pathlib.Path(target)
...
...
... try:
... return (base/target.parent).resolve(strict=True).is_relative_to(base)
... except FileNotFoundError:
... return False # not safe, someone tried
>>> is_safe("/var/www/app/uploads", "../../../etc/passwd")
False
>>> is_safe("/var/www/app/uploads", "/etc/passwd")
False
>>> is_safe("/var/www/app/uploads", "some_nonexistent_file.txt")
True
>>> is_safe("/var/www/app/uploads", "somesubfolder/some_nonexistent_file.txt") # but it fails incorrectly for not-yet-created subfolders
False
Huh, but
pathlib
may sometimes decide to talk to the OS. For example, if you addstrict=True
the tests fail:That makes me a little nervous. Usually I'd be using this for webapps where paths on the site don't correspond to paths on disk, and I wouldn't want my webapp to inadvertently become a way for attackers to probe for targets. I'm not familiar enough with pathlib to know when/when not it talks to the filesystem.