Skip to content

Instantly share code, notes, and snippets.

@Samuirai
Created June 14, 2012 20:59
Show Gist options
  • Save Samuirai/2932918 to your computer and use it in GitHub Desktop.
Save Samuirai/2932918 to your computer and use it in GitHub Desktop.
G-WAN Captcha Decode

G-WAN is a new free web server. They seem to be very proud of it, or at least just want to make a lot of money. Well anyway, in almost every sentence they write, they claim that they are 20% cooler than anything else. It feels a bit arrogant. I have to admit, I don't know a lot about web servers, so I can't speak to how good they are.

However, then I saw their Captcha example. I also don't know much about machine learning algorithms, OCR, and stuff like that, but I do know how to read pixels. I also know how to compare values with python :P

demo

They say the following about their Captcha:

[...] difficult or even completely impossible for robots.

Wait wat? If this is true, this is something really outstanding and maybe an alternative to reCaptcha...

But then I was like:

are you kidding me?

So I wrote this basic stupid pixel by pixel reading and comparing code, to decode the captcha.

smrrd$ python crack_captcha.py
GIF Image
---------
R0lGODlhGAAZAJEAAP///9//v4SkZAAAACH5BAEAAAAALAAAAAAYABkAAAJfhI+pGB0rmHuGAmtEPJj7E23VYlmbeDnMB2guu44J2lWqQi/6Drl0k7hlSKwSiHeBgV5BTK2FNOKIsmQVJekIkdzgTEOVIERY4ApDPoczTOvzCbVtq/G6kt4CK+BdRQEAOw==

Captcha Data Matrix
-------------------
                                                
  1 1 1 1               2         1 1 1 1 1     
  1       1           2 2         1             
  1       1         2   2         1             
  1       1       2     2         1 1 1 1       
  1       1       2 2 2 2 2       1             
  1       1             2         1             
  1 1 1 1               2         1             
                                                
  2 2 2 2 2         1 1 1           1 1 1       
  2               1       1       1       1     
  2                       1       1             
  2 2 2 2             1 1         1             
  2                       1       1             
  2               1       1       1       1     
  2                 1 1 1           1 1 1       
                                                
        1           2 2 2               1       
      1 1         2       2           1 1       
    1   1         2       2         1   1       
        1           2 2 2 2       1     1       
        1                 2       1 1 1 1 1     
        1         2       2             1       
        1           2 2 2               1       
                                            
color | pixel count
-------------------
    0 |        472
    1 |         81
    2 |         44

color   1 | color   2
---------------------
        3 |         4
        1 |         9
        4 |          
---------------------
        8 |        13

I also don't understand, what they think this means and why they are so excited about it:

The two sums are: 13 and 8... for the same Captcha image!
By just changing the HTML background color [...]

In the end, this was the first time I tried to solve a Captcha. I think this is the best example of how not to implement it.

YouTube Video Demo

kind regards,
samuirai

personal Website http://www.smrrd.de
I'm a member of the Stuttgart Hackerspace - shackspace

edit: to see really cool stuff with reCaptcha, check out what they did: http://www.dc949.org/projects/stiltwalker/

import base64, sys, io, Image, urllib2, re
bg = 0 # background color
cw = 8 # character width
# get the new captcha
url = urllib2.urlopen("http://62.75.175.163:8080/?captcha.c")
html = url.read().replace('\n','').replace('\r','')
url.close()
# get the base64 gif image code
# <img src="data:image/gif;base64,R0lGODlhG...AAADs=" alt="A tree" width="48" height="50" />
# => R0lGODlhG...AAADs=
regex = re.compile('.*base64,(.*)" alt')
gif_b64 = regex.match(html).group(1)
print "GIF Image"
print "---------"
print gif_b64
# load the string as image
f = io.BytesIO(base64.b64decode(gif_b64))
img = Image.open(f)
pix = img.load()
# print and analyse the pixels
print "Captcha Data Matrix"
print "-------------------"
pixels = {}
for y in xrange(0,25):
for x in xrange(0,24):
# collect data
if pix[x,y] not in pixels: pixels[pix[x,y]]=0
else: pixels[pix[x,y]]+=1
# print pixels
if pix[x,y]!=bg: print pix[x,y],
else: print ' ',
print ''
# print the analyse - total useless, but looks cool
print "color | pixel count"
print "-------------------"
for color in pixels:
print "%5d | %10d" % (color,pixels[color])
# define all characters
charset = {
0:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
1:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
2:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0]],
3:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
4:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0]],
5:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
6:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
7:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0]],
8:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
9:
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]],
}
def find_character(tmp_char):
highest_char = ['X',0,0,]
for char in xrange(0,10):
match = 0
fails = 0
for x in range(len(tmp_char)):
for y in range(len(tmp_char[x])):
#print str(tmp_char[x][y])+str(charset[char][x][y]),
if (charset[char][x][y] != 0 and tmp_char[x][y] != 0) or (charset[char][x][y] == tmp_char[x][y]):
match+=1
else: fails+=1
#print
#print [char,match,fails,]
if match>highest_char[1]:
highest_char=[char,match,fails,]
return highest_char
def get_color(tmp_char):
color = {}
for x in range(len(tmp_char)):
for y in range(len(tmp_char[x])):
if tmp_char[x][y]!=bg:
if tmp_char[x][y] not in color:
color[tmp_char[x][y]] = 0
else:
color[tmp_char[x][y]] += 1
return color.keys()
# analyse each character
color = {}
for yl in xrange(0,3):
for xl in xrange(0,3):
tmp_char = []
for y in xrange(yl*cw,yl*cw+cw):
tmp_line = []
for x in xrange(xl*cw,xl*cw+cw):
tmp_line.append(pix[x,y])
#print pix[x,y],
#print
tmp_char.append(tmp_line)
match = find_character(tmp_char)
col = get_color(tmp_char)
if not match[2]:
if col[0] not in color:
color[col[0]] = [match[0]]
else:
color[col[0]].append(match[0])
# print the sums
print ""
if len(color.keys())>=2:
print "color %3d | color %3d" % (color.keys()[0],color.keys()[1])
else:
print "color %3d | " % (color.keys()[0])
print "---------------------"
left_list = color[color.keys()[0]]
if len(color.keys())>=2:
right_list = color[color.keys()[1]]
else:
right_list = []
longest_length = 0
if len(left_list)>len(right_list): longest_length = len(left_list)
else: longest_length = len(right_list)
for row in xrange(0,longest_length):
left_val = ''
right_val = ''
if len(left_list)>row: left_val = str(left_list[row])
if len(right_list)>row: right_val = str(right_list[row])
print "%9s | %9s" % (left_val,right_val)
print "---------------------"
print "%9s | %9s" % (sum(left_list),sum(right_list))
@G-WAN
Copy link

G-WAN commented Jun 16, 2012

Dear Samuirai,

I'm just to stupid to understand

"It's difficult to get a man to understand something when his salary depends on him not understanding it." (Upton Sinclair)

Let's have a look at the facts

Fine with me:

You have posted this same clueless rant a few dozens of times on GitHub (34 times says the GitHub search tool):

https://gist.github.com/gists/search?q=G-WAN+Captcha+Decode&x=0&y=0

And then you have spread the word on ycombinator.org, happily adding insults to injury, going as far as to accusing Ers35 (a long-term U.S.-based G-WAN user) of being G-WAN's author acting under a nickname (just because he demonstrated how far away from the point your critics felt).

Your friends might not have factored it, but each time they leave the ycombinator.org site for our site to seek a good word to quote-and-moke they leave their IP address in our log files. And this is quite embarrassing to see who they are working for.

In response to so much "good will" and "neutrality", I am sure that you can appreciate that no word was harmed in our reply to your so distinguished posts.

@Samuirai
Copy link
Author

What are you talking about?

You have posted this same clueless rant a few dozens of times on GitHub (34 times says the GitHub search tool):
https://gist.github.com/gists/search?q=G-WAN+Captcha+Decode&x=0&y=0

I've only written this Gist and these people just forked it. You should probably read about how github works and what a fork is.

[...] happily adding insults to injury and accusing Ers35 [...]

I haven't said anything about him. This was another user.

"It's difficult to get a man to understand something when his salary depends on him not understanding it." (Upton Sinclair)

Check my Website please. And read what I wrote on the main page. I really want to learn new stuff. So please describe it to me. I really want to understand it :(

Your friends might not have factored it, but each time they leave the ycombinator.org site for our site to seek a good word to quote-and-moke they leave their IP address in our log files. And this is quite embarrassing to see who they are working for.

Oh... I just leave this here without a comment.

@G-WAN
Copy link

G-WAN commented Jun 16, 2012

Samuirai,

these people forked my gist

You will not comment on the relevance of 34 "forks" that take place:

  • at the same time used to publish the original,
  • without modification of the original contents.

In a well-established technology called "e-mail", that activity is classified as "S.P.A.M."

On a public media like GitHub, the expression "Fear Campaign" (or FUD) might be more appropriate:

http://thunderfeeds.com/reader/news/gwan-does-not-understand-forking-and-the-captcha-ignorance

http://news.ycombinator.com/item?id=4120690

You are posting in real-time more FUD as this talk is going-on. At least you removed any doubt about your motivations.

@theckman
Copy link

So I'll comment on the 34 forks. I just forked this gist to ensure I have a copy of the Python script if it were ever to be removed from this gist. I wonder, what is the realistic possibility that others have forked it for the same reason...?

@Samuirai
Copy link
Author

What you guys do is like a self DDoDs. I don't want to make a FUD. It was never my intention. The Captcha Decoding stuff was just funny. It also says nothing about your webserver. I asked several times, if somebody can describe the Captcha you thought about - but I got no answer.
A fork is a private copy. And they just copied the gist, because they want to have their copy. It was also the first time submitting soemthing to Hacker News. I didn't expect, that somebody would actually read it. But it turned out, that many people have a sinking feeling about G-WAN. It was just salt into the wound. Instead of avoiding the technical discussion about the Captcha you could be more transparent and openminded - this is how online marketing works.

The main problem is, that my english is not that rethorical strong. But I thought it's clear, that my uncouth sentences are stylistic devices to make the topic more funny.

@pyalot
Copy link

pyalot commented Jun 16, 2012

@Samuirai trolling an implementor of a captcha algorithm is pointless (trolling is pointless, but I digress), write an exploit, publish, bask in the knowledge of furthering spam research.

@G-WAN don't feed the trolls, also, don't troll.

@me, I should not have participated in this unproductive discussion.

@AJenbo
Copy link

AJenbo commented Oct 30, 2012

The captcha also requires that the human is able of a bit of math.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment