Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save StoneCypher/7788d5c94cce8cd715438d39d006749a to your computer and use it in GitHub Desktop.
Save StoneCypher/7788d5c94cce8cd715438d39d006749a to your computer and use it in GitHub Desktop.

Okay, so, I just had that moment where some spammer tried to pretend to be my friend, and FB executed a surprisingly efficient shutdown process (I doubt the account survived 15 minutes total.)

Still, nerd rage, and an interesting problem. It occurred to me that one way to fight back would be for the system to automatically notify the original account when a second account appeared, using the same name and image, so that the original could validate that it was in fact a secondary account, rather than an impostor.

Hau 2

Or: "But where are we ever going to find rubber pants our size?"

My described mecahnism is currently unsatisfactory, because one of the steps (step 1) requires moderate annoying one-time effort by the userbase, which means it's kind of a no-go. However, the rest of these are adequate, and the displeasurable approach still shows that this is plausible; hopefully someone can figure out a better method for that step.

It is possible that satisfactory AI/ML may be able to estimate which photos contain person X, then a confirmation batch ("please check if any of these images are you" with a single false flag) could be occasionally presented, which would make this not-a-huge-problem. However I'm not sure how hard that would be, and am far too lazy to check.

The method

As described:

  1. Users will mark which account images are photos of them
  2. All images will be hashed on entry
  3. When a user account is created, has a name changed, or a main photo changed:
  4. Build a list of all matching names 1. Possibly stemmed, altered (eg remove/add middle inital) or fuzzed
  5. For each matching name 1. Go through the matched name's public account photos' hashes 1. For each hash, compare to this main photo's hash 1. If the name and hashed photos both pass a levenstein threshhold for likeliness,
    1. Notify the original account
    2. "It looks like you opened another account. Please confirm that this isn't fraud."
    3. ???
    4. Profit

The reason the marking step #1 is so important is the number of false positives it will prevent for people with very common names who are using meme images, stock photos, political photos, celebrity photos, television images, &c for their profile image.

As a quick rule of thumb, I facebook searched for "Tom Davis" - seems likely to be a common name, but I don't know anyone with that name - and looked down the list. In the first hundred results, 12 had a photo that wasn't them (one political, two product, five joke, two landscape, one abstract art, and one sex joke meme.) Step 1 would probably prevent frequent false alarms for them.

Photo hashing

To me this is the fun part.

The first thing off of the top of my head:

  1. If Alpha PNG:
  2. Normalize the four color channels independently to the integer quantized [0 - 3] space, creating 256 colors
  3. Repeat the below process for the alpha png granted each a background in white, black, 50% gray, image avg color, img mode color
  4. Normalize the three color channels independently to the integer quantized [0 - 5] space, creating 216 colors
  5. Normalize the size of the image to a thumbprint size, probably 7x7 (do not maintain aspect ratio)
  6. Voila: 49 byte image hash, extremely tolerant of several forms of editing

This approach does not detect images cut out of the middles of other images. Hashing by row and column might defeat that, but if you're fighting at that level, it's probably smarter to bulk train an AI for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment