in2csv file1.xls > file1.csv
in2csv -f fixed -s schema.csv data.fixed > data.csv
csvgrep -c phone_number -r "\d{3}-123-\d{4}" data.csv > matching.csv
def search_item(dataframe, name, query, na=False, case=False, regex=True): | |
idx = pd.Series([False]*len(dataframe)) | |
# For each item in the query look for the item and collect the documents ids it pertains to | |
for q in query: | |
matches = dataframe[text_column].str.contains(q, na=False, case=False, regex=True) |
--- | |
title: 'Going deeper with dplyr: New features in 0.3 and 0.4' | |
output: html_document | |
--- | |
## Introduction | |
In August 2014, I created a [40-minute video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features. | |
This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn't cover last time (though it is not necessarily new). My [new video tutorial](https://www.youtube.com/watch?v=2mh1PqfsXVI) walks through the code below in detail. |
--- | |
title: "Introduction to dplyr for Faster Data Manipulation in R" | |
output: html_document | |
--- | |
Note: There is a 40-minute [video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) on YouTube that walks through this document in detail. | |
## Why do I use dplyr? | |
* Great for data exploration and transformation |
Ok this was a little confusing for me but I finally realized what was happening. So I decided to give my 2 cents in hopes that it will be more clear for others and if I forget sometime in the future : ).
I was not using the name of the share I created in the VM, instead I used share or vb_share when the name of my share was wd so this had me confused for a minute.
First add your share directory in the VM Box: enter image description here
Whatever you name your share here will be the name you will need to use when mounting in the vm guest OS. i.e. I named mine "wd" for my western digital passport drive.
Next on the the guset OS make a directory to use for your mount preferably in your home directory.
// (Project -> Edit Project) | |
{ | |
"build_systems": | |
[ | |
{ | |
"name": "Anaconda Python Builder", | |
"selector": "source.python", | |
"shell_cmd": "python -u \"$file\"" | |
} | |
], |
library(ggplot2) | |
library(gtable) | |
# create example data | |
set.seed(42) | |
dataset_names <- c("Human", "Mouse", "Fly", "Worm") | |
datasets <- data.frame(name = factor(dataset_names, levels=dataset_names), parity = factor(c(0, 0, 1, 0)), v50 = runif(4, max=0.5), y=1:4) | |
data <- data.frame( dataset1 = rep(datasets$name, 4), dataset2 = rep(datasets$name, each = 4), z = runif(16,min = 0, max = 0.5) ) | |
pal <- c("#dddddd", "#aaaaaa") |
from multiprocessing import Pool | |
from PIL import Image | |
SIZE = (75,75) | |
SAVE_DIRECTORY = 'thumbs' | |
def get_image_paths(folder): | |
return (os.path.join(folder, f) | |
for f in os.listdir(folder) | |
if 'jpeg' in f) |
Follow these steps to install the Guest Additions on your Ubuntu virtual machine: | |
1. Login as ubuntu; | |
2. Click on Applications/System/Terminal (or on Applications/Terminal, if you are using the 606.1 Dapper Drake release); | |
3. Update your APT database with sudo apt-get update, and typing your password, if requested; Install the latest security updates with sudo apt-get upgrade; | |
4. Install required packages with sudo apt-get install build-essential module-assistant; | |
5. Configure your system for building kernel modules by running sudo m-a prepare; | |
6. Click on Install Guest Additions… from the Devices menu, then choose to browse the content of the CD when requested. | |
7. Run sudo sh /media/cdrom/VBoxLinuxAdditions.run, and follow the instructions on screen. |