Skip to content

Instantly share code, notes, and snippets.

View leoluyi's full-sized avatar
🎯
Focusing

Leo Lu leoluyi

🎯
Focusing
View GitHub Profile
@leoluyi
leoluyi / food.R
Created December 12, 2016 17:44
food
library(httr)
library(rvest)
h <- handle("http://amis.afa.gov.tw/veg/VegProdDayTransInfo.aspx")
res <- GET(handle = h)
cookies(h)
vs <- content(res) %>% html_nodes("input#__VIEWSTATE") %>% html_attr("value")
vs_gen <- content(res) %>% html_nodes("input#__VIEWSTATEGENERATOR") %>% html_attr("value")
@leoluyi
leoluyi / README_miniCRAN.md
Last active July 27, 2019 19:31
Use miniCRAN for local R package install
@leoluyi
leoluyi / README.md
Created January 3, 2017 15:15 — forked from dannguyen/README.md
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

@leoluyi
leoluyi / convert.py
Created January 16, 2017 05:38 — forked from jwass/convert.py
Simple Shapefile to GeoJSON converter. Using the shapefile from here: http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/senate2012.html it will result in an error "ValueError: Record's geometry type does not match collection schema's geometry type: 'Polygon' != 'Unk…
import fiona
import fiona.crs
def convert(f_in, f_out):
with fiona.open(f_in) as source:
with fiona.open(
f_out,
'w',
driver='GeoJSON',
crs = fiona.crs.from_epsg(4326),
@leoluyi
leoluyi / shp2gj.py
Created January 18, 2017 05:37 — forked from frankrowe/shp2gj.py
PyShp, shp to geojson in python
import shapefile
# read the shapefile
reader = shapefile.Reader("my.shp")
fields = reader.fields[1:]
field_names = [field[0] for field in fields]
buffer = []
for sr in reader.shapeRecords():
atr = dict(zip(field_names, sr.record))
geom = sr.shape.__geo_interface__
buffer.append(dict(type="Feature", \
library(rvest)
library(httr)
library(stringr)
library(magrittr)
library(data.table)
# library(quantmod)
set_config(config(ssl_verifypeer = 0L))
# 取得上市股票類別
library(magrittr)
library(httr)
library(rvest)
library(data.table)
library(stringr)
library(parallel)
set_config(config(ssl_verifypeer = 0L))
list(
@leoluyi
leoluyi / README.md
Created March 14, 2017 05:48 — forked from jcheng5/README.md
Using arbitrary Leaflet plugins with Leaflet for R

Using arbitrary Leaflet JS plugins with Leaflet for R

The Leaflet JS mapping library has lots of plugins available. The Leaflet package for R provides direct support for some, but far from all, of these plugins, by providing R functions for invoking the plugins.

If you as an R user find yourself wanting to use a Leaflet plugin that isn't directly supported in the R package, you can use the technique shown here to load the plugin yourself and invoke it using JS code.

@leoluyi
leoluyi / tor.R
Created March 22, 2017 03:34
Using TOR in R
# Installing TOR on mac: brew install tor
# Run TOR on custom port: tor --SOCKSPort 9050
# Check the 'origin' field in the response to verify TOR is working.
library(httr)
GET("https://httpbin.org/get", use_proxy("socks5://localhost:9050"))
# Set proxy in curl
library(curl)
h <- new_handle(proxy = "socks5://localhost:9050")
@leoluyi
leoluyi / text_preprocess.py
Created May 23, 2017 15:18
Python text mining
# https://datawarrior.wordpress.com/2015/08/12/codienerd-1-r-or-python-on-text-mining/
# import all necessary libraries
from nltk.stem import PorterStemmer
from nltk.tokenize import SpaceTokenizer
from nltk.corpus import stopwords
from functools import partial
from gensim import corpora
from gensim.models import TfidfModel
import re