Skip to content

Instantly share code, notes, and snippets.

@GeoffWilliams
Last active January 24, 2017 22:47
Show Gist options
  • Select an option

  • Save GeoffWilliams/2235bd805e6c5c2edf8e to your computer and use it in GitHub Desktop.

Select an option

Save GeoffWilliams/2235bd805e6c5c2edf8e to your computer and use it in GitHub Desktop.
find byte ofsetts (decimal) in a catalogue that have non-utf8 bytes, for later viewing in a hex editor
filename = "catalog.bad"
i = 0
with open(filename, 'rb') as f:
while 1:
byte_s = f.read(1)
if not byte_s:
break
try:
u = unicode(byte_s, "utf-8")
except:
print i
i = i+1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment