Skip to content

Instantly share code, notes, and snippets.

View msjyoo's full-sized avatar

Michael Yoo msjyoo

View GitHub Profile

Hello all. We must choose an extraction library for our new Goliath system to provide default values when no rules have been set. The choice has been narrowed down to Boilerpipe and Goose. They both have sub-par documenation (Boilerpipe, Goose), so I've dug around in the code to find the exact process by which they pull out data. Here I will compare them so we can choose one.

#Boilerpipe

This bad mamba jamba was developed by a Ph.D-having guy who, along with some other folks, wrote a big fat academic paper around the algorithm it uses, which you can find in our Dropbox if you really want to read it. Basically, they use link density, text density, and number of words on a block-by-block basis to distinguish boilerplate blocks from content blocks. A block is simply a contiguous piece of text terminated by the start

@mouseroot
mouseroot / ip_routing_win7.txt
Created April 30, 2013 16:41
Enable IP Routing in Windows 7
1. Open regedit
2. Navigate to "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\"
3. Change IPEnableRouter to 1
4. Run services->Routing and Remote->start
(reboot may be nessesary)
@rduplain
rduplain / MainActivity.java
Created May 8, 2012 20:08
A very simple full-screen WebView activity for Android native wrappers, as a starting point.
package com.willowtreeapps.demo;
import android.app.Activity;
import android.os.Bundle;
import android.view.KeyEvent;
import android.view.Window;
import android.webkit.WebView;
import android.webkit.WebViewClient;
public class MainActivity extends Activity {
@schacon
schacon / FUGPL.txt
Created September 25, 2008 18:49
the anti-gpl license
The FUGPL License
===================
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software with only one restriction. No part of
it may be included in software projects that are solely distributed under
strong copyleft restricted licenses. This license is *NOT* GPL compatible,
and that is it's only restriction.