Skip to content

Instantly share code, notes, and snippets.

@kousu
Created April 5, 2021 09:31
Show Gist options
  • Save kousu/bf5610187b608d79d415b1436091ab2d to your computer and use it in GitHub Desktop.
Save kousu/bf5610187b608d79d415b1436091ab2d to your computer and use it in GitHub Desktop.
Sanitize a file path against directory traversal in python
import os.path
def sanitize_path(path):
"""
Sanitize a path against directory traversals
Based on https://stackoverflow.com/questions/13939120/sanitizing-a-file-path-in-python.
>>> sanitize_path('../test')
'test'
>>> sanitize_path('../../test')
'test'
>>> sanitize_path('../../abc/../test')
'test'
>>> sanitize_path('../../abc/../test/fixtures')
'test/fixtures'
>>> sanitize_path('../../abc/../.test/fixtures')
'.test/fixtures'
>>> sanitize_path('/test/foo')
'test/foo'
>>> sanitize_path('./test/bar')
'test/bar'
>>> sanitize_path('.test/baz')
'.test/baz'
>>> sanitize_path('qux')
'qux'
"""
# - pretending to chroot to the current directory
# - cancelling all redundant paths (/.. = /)
# - making the path relative
return os.path.relpath(os.path.join("/", path), "/")
if __name__ == '__main__':
import doctest
doctest.testmod()
@kousu
Copy link
Author

kousu commented Jun 18, 2025

Huh, but pathlib may sometimes decide to talk to the OS. For example, if you add strict=True the tests fail:

        return self.with_segments(os.path.realpath(self, strict=strict))
                                  ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
      File "<frozen posixpath>", line 445, in realpath
    FileNotFoundError: [Errno 2] No such file or directory: '/qux'

That makes me a little nervous. Usually I'd be using this for webapps where paths on the site don't correspond to paths on disk, and I wouldn't want my webapp to inadvertently become a way for attackers to probe for targets. I'm not familiar enough with pathlib to know when/when not it talks to the filesystem.

@remiberthoz
Copy link

I, thank you for the feedback here. My use case is also web apps. I did not face the issue you mention, but it is problematic indeed.

From the pathlib doc (emphasis mine) :

You want to make sure that your code only manipulates paths without actually accessing the OS. In this case, instantiating one of the pure classes may be useful since those simply don’t have any OS-accessing operations. https://docs.python.org/3/library/pathlib.html

I suppose then one would like to write :

import pathlib
return pathlib.PurePath("/", path).resolve().relative_to("/")

but the PurePath class does not implement resolve(). I am also not familiar enough with the library to suggest a solution.

@kousu
Copy link
Author

kousu commented Jun 18, 2025

I don't know why this is so hard. The top StackOverflow answer says to use secure_filename() but that only works on file_names_ -- it forces all '/ to _ -- which isn't my use case. When I wrote this I had in mind accessing HTML snippets or files in an upload folder, and for that I want to preserve some directory structure just not access to the whole system.

Reading the docs, one thing that resolve() does is follow all symlinks. So my version is not safe against symlink poisoning. I assumed my apps don't make symlinks and no one else should be writing into their folder, but it's a risk. There's no way to protect against symlink poisoning without actually reading the disk.

To be safe against that I guess you need to take a different approach and check if you're in a given restricted folder. But that's tricky to get right as well:

sudo mkdir -p /var/www/app/uploads
>>> def is_safe(base, target):
...     base = pathlib.Path(base)
...     target = pathlib.Path(target)
...     
... 
...     try:
...         return (base/target.parent).resolve(strict=True).is_relative_to(base)
...     except FileNotFoundError:
...         return False # not safe, someone tried
>>> is_safe("/var/www/app/uploads", "../../../etc/passwd")
False
>>> is_safe("/var/www/app/uploads", "/etc/passwd")
False
>>> is_safe("/var/www/app/uploads", "some_nonexistent_file.txt")
True
>>> is_safe("/var/www/app/uploads", "somesubfolder/some_nonexistent_file.txt") # but it fails incorrectly for not-yet-created subfolders
False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment