Skip to content

Instantly share code, notes, and snippets.

@torrez
Forked from Samuirai/captcha.md
Created June 14, 2012 22:15
Show Gist options
  • Save torrez/2933299 to your computer and use it in GitHub Desktop.
Save torrez/2933299 to your computer and use it in GitHub Desktop.
G-WAN Captcha Decode

G-WAN is a new free web server. They seem to be very proud of it, or at least just want to make a lot of money. Well anyway, in almost every sentence they write, they claim that they are 20% cooler than anything else. It feels a bit arrogant. I have to admit, I don't know a lot about web servers, so I can't speak to how good they are.

However, then I saw their Captcha example. I also don't know much about machine learning algorithms, OCR, and stuff like that, but I do know how to read pixels. I also know how to compare values with python :P

demo

They say the following about their Captcha:

[...] difficult or even completely impossible for robots.

Wait wat? If this is true, this is something really outstanding and maybe an alternative to reCaptcha...

But then I was like:

are you kidding me?

So I wrote this basic stupid pixel by pixel reading and comparing code, to decode the captcha.

smrrd$ python crack_captcha.py
GIF Image
---------
R0lGODlhGAAZAJEAAP///9//v4SkZAAAACH5BAEAAAAALAAAAAAYABkAAAJfhI+pGB0rmHuGAmtEPJj7E23VYlmbeDnMB2guu44J2lWqQi/6Drl0k7hlSKwSiHeBgV5BTK2FNOKIsmQVJekIkdzgTEOVIERY4ApDPoczTOvzCbVtq/G6kt4CK+BdRQEAOw==

Captcha Data Matrix
-------------------
                                                
  1 1 1 1               2         1 1 1 1 1     
  1       1           2 2         1             
  1       1         2   2         1             
  1       1       2     2         1 1 1 1       
  1       1       2 2 2 2 2       1             
  1       1             2         1             
  1 1 1 1               2         1             
                                                
  2 2 2 2 2         1 1 1           1 1 1       
  2               1       1       1       1     
  2                       1       1             
  2 2 2 2             1 1         1             
  2                       1       1             
  2               1       1       1       1     
  2                 1 1 1           1 1 1       
                                                
        1           2 2 2               1       
      1 1         2       2           1 1       
    1   1         2       2         1   1       
        1           2 2 2 2       1     1       
        1                 2       1 1 1 1 1     
        1         2       2             1       
        1           2 2 2               1       
                                            
color | pixel count
-------------------
    0 |        472
    1 |         81
    2 |         44

color   1 | color   2
---------------------
        3 |         4
        1 |         9
        4 |          
---------------------
        8 |        13

I also don't understand, what they think this means and why they are so excited about it:

The two sums are: 13 and 8... for the same Captcha image!
By just changing the HTML background color [...]

In the end, this was the first time I tried to solve a Captcha. I think this is the best example of how not to implement it.

YouTube Video Demo

kind regards,
samuirai

Also checkout my Website http://www.smrrd.de

edit: to see really cool stuff with reCaptcha, check out what they did: http://www.dc949.org/projects/stiltwalker/

import base64, sys, io, Image, urllib2, re
bg = 0 # background color
cw = 8 # character width
# get the new captcha
url = urllib2.urlopen("http://62.75.175.163:8080/?captcha.c")
html = url.read().replace('\n','').replace('\r','')
url.close()
# get the base64 gif image code
# <img src="data:image/gif;base64,R0lGODlhG...AAADs=" alt="A tree" width="48" height="50" />
# => R0lGODlhG...AAADs=
regex = re.compile('.*base64,(.*)" alt')
gif_b64 = regex.match(html).group(1)
print "GIF Image"
print "---------"
print gif_b64
# load the string as image
f = io.BytesIO(base64.b64decode(gif_b64))
img = Image.open(f)
pix = img.load()
# print and analyse the pixels
print "Captcha Data Matrix"
print "-------------------"
pixels = {}
for y in xrange(0,25):
for x in xrange(0,24):
# collect data
if pix[x,y] not in pixels: pixels[pix[x,y]]=0
else: pixels[pix[x,y]]+=1
# print pixels
if pix[x,y]!=bg: print pix[x,y],
else: print ' ',
print ''
# print the analyse - total useless, but looks cool
print "color | pixel count"
print "-------------------"
for color in pixels:
print "%5d | %10d" % (color,pixels[color])
# define all characters
charset = {
0:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
1:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
2:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0]],
3:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
4:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
5:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
6:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
7:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0]],
8:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
9:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
}
def find_character(tmp_char):
highest_char = ['X',0,0,]
for char in xrange(0,10):
match = 0
fails = 0
for x in range(len(tmp_char)):
for y in range(len(tmp_char[x])):
#print str(tmp_char[x][y])+str(charset[char][x][y]),
if (charset[char][x][y] != 0 and tmp_char[x][y] != 0) or (charset[char][x][y] == tmp_char[x][y]):
match+=1
else: fails+=1
#print
#print [char,match,fails,]
if match>highest_char[1]:
highest_char=[char,match,fails,]
return highest_char
def get_color(tmp_char):
color = {}
for x in range(len(tmp_char)):
for y in range(len(tmp_char[x])):
if tmp_char[x][y]!=bg:
if tmp_char[x][y] not in color:
color[tmp_char[x][y]] = 0
else:
color[tmp_char[x][y]] += 1
return color.keys()
# analyse each character
color = {}
for yl in xrange(0,3):
for xl in xrange(0,3):
tmp_char = []
for y in xrange(yl*cw,yl*cw+cw):
tmp_line = []
for x in xrange(xl*cw,xl*cw+cw):
tmp_line.append(pix[x,y])
#print pix[x,y],
#print
tmp_char.append(tmp_line)
match = find_character(tmp_char)
col = get_color(tmp_char)
if not match[2]:
if col[0] not in color:
color[col[0]] = [match[0]]
else:
color[col[0]].append(match[0])
# print the sums
print ""
if len(color.keys())>=2:
print "color %3d | color %3d" % (color.keys()[0],color.keys()[1])
else:
print "color %3d | " % (color.keys()[0])
print "---------------------"
left_list = color[color.keys()[0]]
if len(color.keys())>=2:
right_list = color[color.keys()[1]]
else:
right_list = []
longest_length = 0
if len(left_list)>len(right_list): longest_length = len(left_list)
else: longest_length = len(right_list)
for row in xrange(0,longest_length):
left_val = ''
right_val = ''
if len(left_list)>row: left_val = str(left_list[row])
if len(right_list)>row: right_val = str(right_list[row])
print "%9s | %9s" % (left_val,right_val)
print "---------------------"
print "%9s | %9s" % (sum(left_list),sum(right_list))
@G-WAN
Copy link

G-WAN commented Jun 16, 2012

Torrez,

Thanks for discussing your views about G-WAN, a Web server which inspired you, visibly.

The source code of our captcha.c example states:

"By just changing the HTML background color (mouse cursor hovering, previous state or input or shared secret) used for the transparent GIF Captcha image we can make something simple for humans become difficult or even completely impossible for robots."

This text above is also displayed when the example is run by G-WAN (as shown on your screenshot by the way).

Since you did not follow any of the suggested guidelines (nor you tried to implement anything else having similar goals), you did not even try to make it "difficult or even completely impossible for robots" as we suggested this should be done.

This example was merely showing how to exercise the (ultra-fast) G-WAN native support for in-memory GIF images when dealing with transparency. It was called captcha.c because we illustrated how such a mechanism could be done.

But you did not even attempt to do that.

Maybe trying to do it could be a good exercise, worth being published on GitHub.

@torrez
Copy link
Author

torrez commented Jun 16, 2012 via email

@G-WAN
Copy link

G-WAN commented Jun 16, 2012

Torrez,

It is a fork

You will not comment on the relevance of 34 "forks" that take place:

  • at the same time used to publish the original,
  • without modification of the original contents.

In a well-established technology called "e-mail", that activity is classified as "S.P.A.M."

On a public media like GitHub, the expression "Fear Campaign" (or FUD) might be more appropriate:

http://thunderfeeds.com/reader/news/gwan-does-not-understand-forking-and-the-captcha-ignorance

http://news.ycombinator.com/item?id=4120690

You are posting in real-time more FUD as this talk is going-on. At least you removed any doubt about your motivations.

@torrez
Copy link
Author

torrez commented Jun 16, 2012 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment