Leo Lu leoluyi

R miniCRAN Tutorial

References

https://cran.r-project.org/web/packages/miniCRAN/vignettes/miniCRAN-introduction.html

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

Using arbitrary Leaflet JS plugins with Leaflet for R

The Leaflet JS mapping library has lots of plugins available. The Leaflet package for R provides direct support for some, but far from all, of these plugins, by providing R functions for invoking the plugins.

If you as an R user find yourself wanting to use a Leaflet plugin that isn't directly supported in the R package, you can use the technique shown here to load the plugin yourself and invoke it using JS code.

	library(httr)
	library(rvest)


	h <- handle("http://amis.afa.gov.tw/veg/VegProdDayTransInfo.aspx")
	res <- GET(handle = h)
	cookies(h)

	vs <- content(res) %>% html_nodes("input#__VIEWSTATE") %>% html_attr("value")
	vs_gen <- content(res) %>% html_nodes("input#__VIEWSTATEGENERATOR") %>% html_attr("value")

	import fiona
	import fiona.crs

	def convert(f_in, f_out):
	with fiona.open(f_in) as source:
	with fiona.open(
	f_out,
	'w',
	driver='GeoJSON',
	crs = fiona.crs.from_epsg(4326),

	import shapefile
	# read the shapefile
	reader = shapefile.Reader("my.shp")
	fields = reader.fields[1:]
	field_names = [field[0] for field in fields]
	buffer = []
	for sr in reader.shapeRecords():
	atr = dict(zip(field_names, sr.record))
	geom = sr.shape.__geo_interface__
	buffer.append(dict(type="Feature", \

	library(rvest)
	library(httr)
	library(stringr)
	library(magrittr)
	library(data.table)
	# library(quantmod)

	set_config(config(ssl_verifypeer = 0L))

	# 取得上市股票類別

	library(magrittr)
	library(httr)
	library(rvest)
	library(data.table)
	library(stringr)
	library(parallel)

	set_config(config(ssl_verifypeer = 0L))

	list(

	# Installing TOR on mac: brew install tor
	# Run TOR on custom port: tor --SOCKSPort 9050

	# Check the 'origin' field in the response to verify TOR is working.
	library(httr)
	GET("https://httpbin.org/get", use_proxy("socks5://localhost:9050"))

	# Set proxy in curl
	library(curl)
	h <- new_handle(proxy = "socks5://localhost:9050")

	# https://datawarrior.wordpress.com/2015/08/12/codienerd-1-r-or-python-on-text-mining/

	# import all necessary libraries
	from nltk.stem import PorterStemmer
	from nltk.tokenize import SpaceTokenizer
	from nltk.corpus import stopwords
	from functools import partial
	from gensim import corpora
	from gensim.models import TfidfModel
	import re