A few approaches, from simple to precise: http://www.huyng.com/posts/python-performance-analysis/
If you can import but you don't want to pip install:
http://pythonforbiologists.com/index.php/measuring-memory-usage-in-python/
Suggestions for reading large files: http://stackoverflow.com/questions/12039235/loading-large-file-in-python
Alternatives to reading large files: http://stackoverflow.com/questions/1700650/checking-for-duplicate-files-without-storing-their-checksums
Reading files line by line:
with open(...) as f:
for line in f:
<do something with line>
Searching big text files: http://stackoverflow.com/questions/15034504/fast-method-to-search-for-a-string-in-a-big-text-file-with-python
Finding duplicate files: http://stackoverflow.com/questions/18724376/finding-duplicate-files-via-hashlib/18725256#18725256
A Python dict with an SQLite backend: http://sebsauvage.net/python/snyppets/index.html#dbdict