Skip to content

Instantly share code, notes, and snippets.

View wenhuizhang's full-sized avatar
🎯
Focusing

Wenhui Zhang wenhuizhang

🎯
Focusing
View GitHub Profile
@wenhuizhang
wenhuizhang / AssemblyInfo.cpp
Created February 20, 2014 22:33
Lab2_Graphics_Scan-convert & Z-Buffer
#include "stdafx.h"
using namespace System;
using namespace System::Reflection;
using namespace System::Runtime::CompilerServices;
using namespace System::Runtime::InteropServices;
using namespace System::Security::Permissions;
//
// General Information about an assembly is controlled through the following
@wenhuizhang
wenhuizhang / Edgetable.h
Created March 12, 2014 16:13
Lab3_Graphics_Constant Shading
#include "stdafx.h"
#include <vector>
#include <iostream>
#ifndef Edge_H_INCLUDED
#define Edge_H_INCLUDED
using namespace std;
class Edge{
@wenhuizhang
wenhuizhang / Edgetable.h
Created March 12, 2014 16:20
Lab3_Graphics_Grouraud Shading
#include "stdafx.h"
#include <vector>
#include <iostream>
#ifndef Edge_H_INCLUDED
#define Edge_H_INCLUDED
using namespace std;
class Edge{
@wenhuizhang
wenhuizhang / Edgetable.h
Created March 12, 2014 16:24
Lab3_Graphics_Phong Shading
#include "stdafx.h"
#include <vector>
#include <iostream>
#ifndef Edge_H_INCLUDED
#define Edge_H_INCLUDED
using namespace std;
class Edge{
@wenhuizhang
wenhuizhang / EdgeTable.h
Created April 2, 2014 17:47
Lab4_Texture Mapping (Base as Phong)
#include "stdafx.h"
#include <vector>
#include <iostream>
#ifndef Edge_H_INCLUDED
#define Edge_H_INCLUDED
using namespace std;
@wenhuizhang
wenhuizhang / R_Venture Capital Analysis (US since 1998)
Last active August 29, 2015 13:59
Venture Capital Analysis (US since 1998)
Title
We first obtained our data from a web site using
data <- read.csv("http://www.capitalhacks.org/wp-content/uploads/2014/03/TechCrunchcontinentalUSA.csv",
stringsAsFactors = F)
Preprocessing: removed 1328 jad-tech-consulting from the data set since 23-Sep-93 was substanially earlier than any other data sets, and thus presumed to be an outlier.
We then added a column for the number of quarters since jan 1 1999 using
//
// Prefix header
//
// The contents of this file are implicitly included at the beginning of every source file.
//
#import <Availability.h>
#ifndef __IPHONE_5_0
#warning "This project uses features only available in iOS SDK 5.0 and later."
@wenhuizhang
wenhuizhang / Find_Seat_DP
Last active August 29, 2015 14:14
DataIncubator
/*
Q1: There is a subway car with N adjacent seats in a row.
People walk into the subway and choose a random available seat (drawn uniformly).
The only constraint on seat availability is that they do not like to sit next to one another
so there is always (at least) one empty seat between any two individuals.
This process continues until all available seats are taken.
What is the mean and standard deviation of the fraction of occupied seats
(when the process is complete) for different values of N?
Give the answer with 10 digits of significance.
*/
@wenhuizhang
wenhuizhang / web_crawler.md
Last active April 13, 2020 18:43
web crawler
Name Language Platform
Heritrix Java Linux
Nutch Java Cross-platform
Scrapy Python Cross-platform
DataparkSearch C++ Cross-platform
GNU Wget C Linux
GRUB C#, C, Python, Perl Cross-platform
ht://Dig C++ Unix
HTTrack C/C++ Cross-platform
@wenhuizhang
wenhuizhang / OpenSource_ML.md
Last active September 25, 2015 02:45
Open Source for Machine Learning

#Open Source machine learning

Understanding language is not easy, even for us humans, but computers are slowly getting better at it. 50 years ago, the psychiatrist chat bot Elyza could successfully initiate a therapy session but very soon you understood that she was responding using simple pattern analysis. Now, the IBM’s supercomputer Watson defeats human champions in a quiz show live on TV. The software pieces required to understand language, like the ones used by Watson, are complex. But believe it or not, many of these pieces are actually available for free as open-source. This post summarizes how open-source software can help you analyze language data using this flow chart as a guideline. http://entopix.com/so-you-need-to-understand-language-data-open-source-nlp-software-can-help.html

If your language data is already available as text, it is most likely to be stored in files. Apache libraries like POI and PDFBox extract text from the most common formats. Apache Tika is a toolkit that uses such lib