Columbia University x Software Carpentries
Python Group B

Bootcamp Site: https://columbiaswc.github.io/2018-08-27-Columbia-B/

Instructors:

Cesar Arias – [email protected]
Marii Nyröp – [email protected] | @mnyrop | marii.info

Helpers:

Jochen Weber –
Parixit Davé –
Yanchen Liu – [email protected]

Schedule
Setup
1: The Unix Shell
2: Version Control with Git
3: Programming with Python

Schedule

Day 1		Day 2
09:00	Automating tasks with the Unix shell	Building programs with Python
10:30	Break	Break
12:00	Lunch break	Lunch break
01:00	Version control with Git	Building programs with Python, Continued
02:30	Break	Break
04:00	Wrap-up	Wrap-up & post-workshop survey
04:30	END	END

Setup

Setup check in:

Do you have the necessary software (Bash, Git, and Python) installed?
Do you have Jupyter notebooks installed? (You can check by opening a Terminal and entering the command jupyter notebook. This should open a browser.)
Do you have the demo data for bash and python downloaded?
Do you have a GitHub account?
Do you have 2 color post-its?

Setup resources:

Section 1 :: The Unix Shell

Introducing the Shell

Questions:

What is a command shell and why would I use one?

Notes:

Graphical User Interface (GUI) versus Command Line Interface (CLI)
Read-evaluate-print-loop (REPL)
Command, flag, argument
Flexibility and automation

Try

pwd
ls -F /

Navigating Files and Directories

Questions:

How can I move around on my computer?
How can I see what files and directories I have?
How can I specify the location of a file or directory on my computer?

Notes:

File system hierarchy
Current working directory
Root directory
Home directory
man and --help
Abosulute vs Relative path

Try:

Activity:

move into data-shell directory
enter the command ls north-pacific-gyre/2012-07-03/ using tab completion

Working With Files and Directories

Questions:

How can I create, copy, and delete files and directories?
How can I edit files?

Notes

Naming conventions for files and directories
The Nano text editor
Deleting with rm is forever

Try

Pipes and Filters

Questions:

How can I combine existing commands to do new things?

Loops

Questions:

How can I perform the same actions on many different files?

Full Tutorial: https://swcarpentry.github.io/shell-novice/

Section 2 :: Version Control with Git

Automated Version Control

Questions:

What is version control and why should I use it?

Notes:

Version control – keeps track of changes and allows for greater control over them
Manages collaboration and change conflicts

Setting Up Git

Questions:

How do I get set up to use Git?

Notes:

Git is the software, GitHub is a popular service for hosting content that is version controlled by Git
Your local Git needs to be configured to work with your GitHub account

Activity:

Configure Git to use your user name with git config --global user.name "your-username"
Configure Git to use your email with git config --global user.email "[email protected]"
Configure your Git to use Nano as its text editor with git config --global core.editor "nano -w"
Check your Git configuration with git config --list
Check possible config commands with git config -h

Creating a Repository

Questions:

Where does Git store information?

Notes:

A repository is where your Git project files and the history of all your project’s commits live
git init initializes a repository (and everything in it!)
git status shows the repository's current status
The Git repository history and info lives in a directory called .git

Activity:

cd ~/Desktop
mkdir planets
cd planets
git init initializes the planets repository

Tracking Changes

Questions:

How do I record changes in Git?
How do I check the status of my version control repository?
How do I record notes about what changes I made and why?

Notes:

Adding and committing
Commit messages
Reading the commit history

Activity:

Ignoring Things

Questions:

How can I tell Git to ignore files I don’t want to track?

Notes:

There will often be files you don't want Git to track, for security or efficiency reasons
Ignored files, directory, and file patterns are listed in a .gitignore file

Activity:

Remotes in GitHub

Questions:

How do I share my changes with others on the web?

Notes:

Repository remotes (origin)
Git Push
Git Pull

Activity:

Log into GitHub
Add a new public repository called 'planets'
Copy the 'quick setup' link (it should be 'https://github.com/YOUR_USERNAME/planets.git')
And paste it into your terminal for the command git remote add origin 'https://github.com/YOUR_USERNAME/planets.git'
Run git remote -V
Run git push -u origin master and enter your GitHub password when prompted
Try git pull origin master
Refresh your repository page on GitHub. What do you see?

Collaborating

Notes:

Collaborator permissions
Git clone

Questions:

How can I use version control to collaborate with other people?

Activity:

Conflicts

Questions:

What do I do when my changes conflict with someone else’s?

Notes:

When working on the same files, collaborators can create content conflicts.
Version control with Git provides means for managing and reconciling conflicts
Use git fetch and git pull often to avoid/preempt conflicts

Activity:

Open Science

Questions:

How can version control help me make my work more open?

Licensing

Questions:

What licensing information should I include with my work?

Citation

Questions:

How can I make my work easier to cite?

Hosting

Questions:

Where should I host my version control repositories?

Exploring History (at the end, if there's time)

Questions:

How can I identify old versions of files?
How do I review my changes?
How can I recover old versions of files?

Full Tutorial: https://swcarpentry.github.io/git-novice/

Section 3 :: Programming with Python

Analyzing Patient Data

Questions:

How can I process tabular data files in Python?

Notes:

Variable assignment variable = value
Int, Float, and String types 1, 1.0, '1'
Arrays and N-Arrays [0, 1, 2] and [[1, 0],[0, 1],[1, 2]]
Print print(variable)
Python libraries (e.g., numpy) import numpy
CSV file (Comma-separated values)
Indexing and slicing data[0, 0] and data[:3, 10:]
IPython mystery functions (.function, and .function?)
Add comments with '#' # this is what this line does
Get stats with numpy.mean(array), numpy.max(array), numpy.min(array)
Get stats for a given axis with numpy.mean(axis=0) or numpy.mean(axis=1)
Plot and visualize with matplotlib.pyplot

Activity 1: Variables

Activity 2: Loading and Printing Data

Activity 3: N-Array Arithmetic/Stats

Activity 4: Visualizing Data

Get a plot started for the data

import matplotlib.pyplot
%matplotlib inline
image = matplotlib.pyplot.imshow(data)
matplotlib.pyplot.show

Try a plot with the average inflammation over time

ave_inflammation = numpy.mean(data, axis=0)
ave_plot = matplotlib.pyplot.plot(ave_inflammation)
matplotlib.pyplot.show()

Is this expected?

What about the maximum value along the first axis (0)?

max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
matplotlib.pyplot.show()

What about the minimum value along the first axis (0)?

min__plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
matplotlib.pyplot.show()

Group the plots together in one figure to compare, and start from scratch at the top of your notebook

import numpy
import matplotlib.pyplot

%matplotlib inline

data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')

fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))

axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))

axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))

axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))

fig.tight_layout()

matplotlib.pyplot.show()

Repeating Actions with Loops

Questions:

How can I do the same operations on many different values?

Notes:

Loop with

for variable in collection:
  do things with variable

Body of the loop must be indented
Use len(variable) to get the length of an array or string

Try:

Print out the characters in the word 'lead':

word = 'lead'
print(word[0])
print(word[1])
print(word[2])
print(word[3])

Try with 'tin'. What happens?

word = 'tin'
print(word[0])
print(word[1])
print(word[2])
print(word[3])

Try printing the characters with a loop instead:

word = 'lead'
for char in word:
  print(char)

Switch the word to 'oxygen'. Does it work?

word = 'oxygen'
for char in word:
  print(char)

Swap out the variable name. What happens?

word = 'oxygen'
for banana in word:
  print(banana)

Make a loop that updates a variable:

length = 0
for vowel in 'aeiou':
  length = length + 1
print('There are', length, 'vowels')

Try another loop. Does it do what you'd expect?

letter = 'z'
for letter in 'abc':
  print(letter)
print('after the loop, letter is', letter)

Use a shortcut to count the vowels:
```
print(len('aeiou'))
```

Storing Multiple Values in Lists

Questions:

How can I store many values together?

Notes:

Mutable vs. immutable data (lists and arrays vs strings and numbers)
In-place modifications can be tricky
Lists use bracket notation list = [item, item2, item3], and can be indexed list[0] and sliced list[:2]

Try:

odds = [1,3, 5, 7]
primes = odds
primes.append(2)
print('primes:', primes)
print('odds:', odds)

Try again by making a copy with list()

odds = [1,3, 5, 7]
primes = list(odds)
primes.append(2)
print('primes:', primes)
print('odds:', odds)

Analyzing Data from Multiple Files

Questions:

How can I do the same operations on many different files?

Notes:

Use the glob library for working with files, directories, and file paths.
Use glob/glob(pattern) to create a list of files that match the pattern
Use * in a pattern to match 0 or more characters (of any kind) and ? to match a single character

Activity:

Import the glob library
```
import glob
```
Use glob.glob to list the inflammation data files in the current directory
```
print(glob.glob('inflammation*.csv'))
```

Make inline graphs for the first 3 inflammation data files

import numpy
import matplotlib.pyplot

%matplotlib inline

filenames = sorted(glob.glob('inflammation*.csv'))
filenames = filenames[0:3]

for f in filenames:
  print(f)

  data = numpy.loadtxt(fname=f, delimiter=',')
  fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))

  axes1 = fig.add_subplot(1, 3, 1)
  axes2 = fig.add_subplot(1, 3, 2)
  axes3 = fig.add_subplot(1, 3, 3)

  axes1.set_ylabel('average')
  axes1.plot(numpy.mean(data, axis=0))

  axes2.set_ylabel('max')
  axes2.plot(numpy.max(data, axis=0))

  axes3.set_ylabel('min')
  axes3.plot(numpy.min(data, axis=0))

  fig.tight_layout()
  matplotlib.pyplot.show()

Making Choices

Questions:

How can my programs do different things based on data values?

Notes:

if, elif, and else are conditionals for control flow

You can combine conditionals with and and or

if (a > 1) and (b == 5):
  # do something

True and False are booleans

Try:

Try out an if / else

num = 37
if num > 100:
  print('greater')
else:
  print('not greater')
print('done')

Try without an else

num = 53
print('before conditional...')
if num > 100:
  print(num, 'is greater than 100')
print('... after conditional')

Chain several tests with elif

num = -3
if num > 0:
  print(num, 'is positive')
elif num == 0:
  print(num, 'is zero')
else:
  print(num, 'is negative')

Combine tests with and

if (1 > 0) and (-1 < 0):
  print('both parts are true')
else:
  print('at least one part is false')

Try with or

if (1 < 0) or (-1 < 0):
  print('at least one test is true')

mnyrop/columbiaswc-b.md

Columbia University x Software Carpentries Python Group B

Contents

Schedule

Setup

Section 1 :: The Unix Shell

Introducing the Shell

Questions:

Notes:

Try

Navigating Files and Directories

Questions:

Notes:

Try:

Activity:

Working With Files and Directories

Questions:

Notes

Try

Pipes and Filters

Questions:

Loops

Questions:

Section 2 :: Version Control with Git

Automated Version Control

Questions:

Notes:

Setting Up Git

Questions:

Notes:

Activity:

Creating a Repository

Questions:

Notes:

Activity:

Tracking Changes

Questions:

Notes:

Activity:

Ignoring Things

Questions:

Notes:

Activity:

Remotes in GitHub

Questions:

Notes:

Activity:

Collaborating

Notes:

Questions:

Activity:

Conflicts

Questions:

Notes:

Activity:

Open Science

Questions:

Licensing

Questions:

Citation

Questions:

Hosting

Questions:

Exploring History (at the end, if there's time)

Questions:

Section 3 :: Programming with Python

Analyzing Patient Data

Questions:

Notes:

Activity 1: Variables

Activity 2: Loading and Printing Data

Activity 3: N-Array Arithmetic/Stats

Activity 4: Visualizing Data

Repeating Actions with Loops

Questions:

Notes:

Try:

Storing Multiple Values in Lists

Questions:

Notes:

Columbia University x Software Carpentries
Python Group B