This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def text_cleaner(text): | |
rules = [ | |
{r'>\s+': u'>'}, # remove spaces after a tag opens or closes | |
{r'\s+': u' '}, # replace consecutive spaces | |
{r'\s*<br\s*/?>\s*': u'\n'}, # newline after a <br> | |
{r'</(div)\s*>\s*': u'\n'}, # newline after </p> and </div> and <h1/>... | |
{r'</(p|h\d)\s*>\s*': u'\n\n'}, # newline after </p> and </div> and <h1/>... | |
{r'<head>.*<\s*(/head|body)[^>]*>': u''}, # remove <head> to </head> | |
{r'<a\s+href="([^"]+)"[^>]*>.*</a>': r'\1'}, # show links instead of texts | |
{r'[ \t]*<[^<]*?/?>': u''}, # remove remaining tags |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'aws-sdk-s3' | |
require 'rack' | |
class BucketListObjectsWrapper | |
attr_reader :bucket | |
def initialize(bucket) | |
@bucket = bucket |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Basic | |
sudo apt-get -y update | |
sudo apt-get -qq install -y build-essential | |
# OpenCV | |
sudo apt-get -qq install -y libopencv-dev | |
sudo apt-get -qq install -y libtesseract-dev | |
# General dependencies | |
sudo apt-get -qq install -y libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler | |
sudo apt-get -qq install -y --no-install-recommends libboost-all-dev | |
# Remaining dependencies, 14.04 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
get_info() function reads the image using openCV and performs thresholding, dilation, noise removal, and | |
contouring to finally retrieve bounding boxes from the contour. | |
Below are some of the additional available functions from openCV for preprocessing: | |
Median filter: median filter blurs out noises by taking the medium from a set of pixels | |
cv2.medianBlur() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from skimage import io, color, img_as_float | |
from skimage.feature import corner_peaks, plot_matches | |
import matplotlib.pyplot as plt | |
import numpy as np | |
from skimage import io, img_as_float, color, exposure | |
img = img_as_float(io.imread('./ml/old-front.jpg')) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Copyright 2016 The Android Open Source Project | |
Licensed under the Apache License, Version 2.0 (the "License"); | |
you may not use this file except in compliance with the License. | |
You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DynamicGuy Contributor License Agreement | |
In order to clarify the intellectual property license granted with Contributions from any person or entity, the open source project DynamicGuy ("DynamicGuy") must have a Contributor License Agreement (CLA) on file that has been signed by each Contributor, indicating agreement to the license terms below. This license is for your protection as a Contributor as well as the protection of DynamicGuy and its users; it does not change your rights to use your own Contributions for any other purpose. | |
You accept and agree to the following terms and conditions for Your present and future Contributions submitted to DynamicGuy. Except for the license granted herein to DynamicGuy and recipients of software distributed by DynamicGuy, You reserve all right, title, and interest in and to Your Contributions. | |
Definitions. "You" (or "Your") shall mean the copyright owner or legal entity authorized by the copyright owner that is making this Agreement with DynamicGuy. For legal entities, |
NewerOlder