This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import time | |
| def timeit(f): | |
| def timed(*args, **kwargs): | |
| start = time.clock() | |
| for _ in range(100): | |
| f(*args, **kwargs) | |
| end = time.clock() | |
| return end - start |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| {'INTEGER': 12, 'HTMLTAG': 10, 'int': 4, 'formattedsize': 4, 'sizeindex': 4, 'string': 3, | |
| 'sizes': 3, 'decimals': 3, 'size': 2, 'code': 2, 'blockquote': 2, 'permitted': 2, 'specifiers': 2, | |
| 'default': 2, 'parameter': 2, 'FUNCTIONCALL': 2, 'CODE': 1, 'private': 1, 'eb': 1, 'gt': 1, | |
| 'gb': 1, 'error': 1, 'application': 1, 'format': 1, 'desktop': 1, 'pb': 1, 'formatsizebinary': 1, | |
| 'lt': 1, 'tb': 1, 'math': 1, 'return': 1, 'kb': 1, 'yb': 1, 'tostring': 1, 'zb': 1, 'amp': 1, | |
| 'mb': 1, 'bytes': 1, 'length': 1, 'double': 1} | |
| ['default', | |
| 'parameter', |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "metadata": { | |
| "name": "" | |
| }, | |
| "nbformat": 3, | |
| "nbformat_minor": 0, | |
| "worksheets": [ | |
| { | |
| "cells": [ | |
| { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import csv | |
| with open('companies.csv', 'wb') as csvfile: | |
| csv.writer(csvfile, delimiter=',').writerows(row_gen) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| row_gen = ( [td.text(), td.next().text()] # left, right element | |
| for table in d('.borderless').items() | |
| for td in table('td:nth-child(1)').items() # left column | |
| if table('th:first').text() == 'NUANS Reports & Preliminary Searches' and | |
| td.next().text() in ('Active', 'Inactive') ) | |
| 10 loops, best of 3: 172 ms per loop |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| l = [] | |
| for th in d.items('.borderless td:nth-child(1)'): | |
| left = th.text() | |
| right = th.next().text() | |
| tr = th.parent() | |
| tbody = tr.parent() | |
| title = tbody('th:first').text() # first element | |
| if title == 'NUANS Reports & Preliminary Searches' and right in ['Active', 'Inactive']: | |
| l.append([left, right]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyquery import PyQuery as pq | |
| url = 'https://www.nuans.com/RTS2/en/jur_codes-codes_jur_en.cgi#Example_of_report_layouts' | |
| d = pq(url) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn.feature_extraction.text import CountVectorizer | |
| import numpy as np | |
| def tokenizer(s): | |
| width = 7 | |
| return [s[i:i+width] for i in range(len(s)-width+1)] | |
| def count_chunks(sequence_list): | |
| vectorizer = CountVectorizer(tokenizer=tokenizer) | |
| X = vectorizer.fit_transform(sequence_list) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Split the dataset in two equal parts | |
| X_train, X_test, y_train, y_test = train_test_split( | |
| X, y, test_size=0.5, random_state=0) | |
| # Set the parameters by cross-validation | |
| tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4], | |
| 'C': [1, 10, 100, 1000]}, | |
| {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}] | |
| model = GridSearchCV(SVC(C=1), tuned_parameters, cv=5, scoring=score) | |
| model.fit(X_train, y_train) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| function svmStruct = best_svm_classifer_rbf(cdata,labels) | |
| %Write a function called crossfun to calculate the predicted classification yfit from a test vector | |
| %xtest, when the SVM is trained on a sample xtrain that has classification ytrain. | |
| function yfit = crossfun(xtrain,ytrain,xtest, rbf_sigma, boxconstraint) | |
| % Train the model on xtrain, ytrain, | |
| % and get predictions of class of xtest and output it as yfit |