Skip to content

Instantly share code, notes, and snippets.

@mnyrop
Last active August 24, 2018 21:04
Show Gist options
  • Save mnyrop/cbff39148a926f1e21d73c0dfc23af2c to your computer and use it in GitHub Desktop.
Save mnyrop/cbff39148a926f1e21d73c0dfc23af2c to your computer and use it in GitHub Desktop.

Columbia University x Software Carpentries
Python Group B

Bootcamp Site: https://columbiaswc.github.io/2018-08-27-Columbia-B/

Instructors:

Helpers:

Contents

Schedule

Day 1 Day 2
09:00 Automating tasks with the Unix shell Building programs with Python
10:30 Break Break
12:00 Lunch break Lunch break
01:00 Version control with Git Building programs with Python, Continued
02:30 Break Break
04:00 Wrap-up Wrap-up & post-workshop survey
04:30 END END

Setup

Setup check in:

  • Do you have the necessary software (Bash, Git, and Python) installed?
  • Do you have Jupyter notebooks installed? (You can check by opening a Terminal and entering the command jupyter notebook. This should open a browser.)
  • Do you have the demo data for bash and python downloaded?
  • Do you have a GitHub account?
  • Do you have 2 color post-its?

Setup resources:

Section 1 :: The Unix Shell

Introducing the Shell

Questions:

  • What is a command shell and why would I use one?

Notes:

  • Graphical User Interface (GUI) versus Command Line Interface (CLI)
  • Read-evaluate-print-loop (REPL)
  • Command, flag, argument
  • Flexibility and automation

Try

  • pwd
  • ls -F /

Navigating Files and Directories

Questions:

  • How can I move around on my computer?
  • How can I see what files and directories I have?
  • How can I specify the location of a file or directory on my computer?

Notes:

  • File system hierarchy
  • Current working directory
  • Root directory
  • Home directory
  • man and --help
  • Abosulute vs Relative path

Try:

  • ls
  • ls -F
  • ls --help
  • man ls
  • ls -j
  • ls -l
  • ls -R
  • ls -F Desktop
  • pwd
  • cd desktop or cd Desktop
  • cd data-shell
  • cd data
  • cd ..
  • cd ~/desktop/data-shell/data or cd ~/Desktop/data-shell/data
  • cd /
  • cd ~
  • cd -

Activity:

  • move into data-shell directory
  • enter the command ls north-pacific-gyre/2012-07-03/ using tab completion

Working With Files and Directories

Questions:

  • How can I create, copy, and delete files and directories?
  • How can I edit files?

Notes

  • Naming conventions for files and directories
  • The Nano text editor
  • Deleting with rm is forever

Try

  • Go back to data-shell by checking where you are pwdand using cd
  • Check what's in data shell with ls -F
  • Make a new directory called 'thesis' mkdir thesis
  • Check what's in the directory again ls -F
  • Nothing is in thesis because it's brand new. Check with ls -F thesis
  • Move into thesis with cd thesis
  • Use Nano to add and edit a file nano draft.txt. Add some lines of text, then use Ctrl-X to exit.
  • Go to your home directory and make a file using touch. cd ~ followed by touch my_file.txt
  • Use ls -l to inspect the files. How large is my_file.txt? Why?
  • Move back into thesis in data-shell with cd and remove the draft file with rm rm draft.txt using tab completion. But be careful!
  • Run ls to see if the file is still there.
  • Re-add the file and move back into data-shell with nano draft.txt, ls, and cd ..
  • Try removing thesis with rm thesis. What happens?
  • Type out rm -r thesis for removing the directory recursively. But don't hit enter! We're not ready to delete it yet.
  • Instead try removing the directory safely with rm -r -i thesis. Type y for each file to delete.
  • Make the thesis directory in data-shellagain by checking where you are pwdand using mkdir thesis
  • Remake the file draft.txt with nano nano thesis/draft.txt
  • Change the filename of draft.txt to quotes.txt using mv thesis/draft.txt thesis/quotes.txt
  • Check what happened with ls thesis
  • Move quotes.txt into the current working directory with mv thesis/quotes.txt .
  • See what's in thesis with ls thesis
  • Find quotes.txt in the current working directory with ls

Pipes and Filters

Questions:

  • How can I combine existing commands to do new things?

Loops

Questions:

  • How can I perform the same actions on many different files?

Full Tutorial: https://swcarpentry.github.io/shell-novice/

Section 2 :: Version Control with Git

Automated Version Control

Questions:

  • What is version control and why should I use it?

Notes:

  • Version control – keeps track of changes and allows for greater control over them
  • Manages collaboration and change conflicts

Setting Up Git

Questions:

  • How do I get set up to use Git?

Notes:

  • Git is the software, GitHub is a popular service for hosting content that is version controlled by Git
  • Your local Git needs to be configured to work with your GitHub account

Activity:

  • Configure Git to use your user name with git config --global user.name "your-username"
  • Configure Git to use your email with git config --global user.email "[email protected]"
  • Configure your Git to use Nano as its text editor with git config --global core.editor "nano -w"
  • Check your Git configuration with git config --list
  • Check possible config commands with git config -h

Creating a Repository

Questions:

  • Where does Git store information?

Notes:

  • A repository is where your Git project files and the history of all your project’s commits live
  • git init initializes a repository (and everything in it!)
  • git status shows the repository's current status
  • The Git repository history and info lives in a directory called .git

Activity:

  • cd ~/Desktop
  • mkdir planets
  • cd planets
  • git init initializes the planets repository

Tracking Changes

Questions:

  • How do I record changes in Git?
  • How do I check the status of my version control repository?
  • How do I record notes about what changes I made and why?

Notes:

  • Adding and committing
  • Commit messages
  • Reading the commit history

Activity:

  • Check current directory with pwd
  • Make sure you are in ~/Desktop/planets using cd
  • Use Nano to make a file nano mars.txt and add the sentence 'Cold and dry, but everything is my favorite color' to it before saving
  • Use ls to list the directory contents
  • Use cat mars.txt to print out the file's content
  • Try git status again
  • Tell Git to track our new file with git add mars.txt
  • Try git status again
  • Commit the changes and give a message about what the changes are with git commit -m 'start notes on mars as a base'
  • Try git status again
  • Look at the commit history with git log
  • Add some additonal info to mars.txt with nano mars.txt, for example 'The two moons may be a problem for Wolfman'
  • Try git status again
  • Try git diff
  • Try committing the new changes with git commit -m 'add concerns about the moons and wolfman'
  • Try git status again
  • Add the file with git add mars.txt then retry the commit git commit -m 'add concerns about the moons and wolfman'
  • Add a third piece of info to the file with nano mars.txt, e.g., 'The mummy will appreciate the lack of humidity'
  • Print it with cat mars.txt
  • Try git diff
  • Add the file with git add mars.txt
  • Commit the changes with git commit -m 'add mummy climate concerns'
  • Try git status again
  • Look at the commit history with git log
  • Try git log -1. What happens?
  • Try git log --oneline
  • Try git log --oneline --all --decorate

Ignoring Things

Questions:

  • How can I tell Git to ignore files I don’t want to track?

Notes:

  • There will often be files you don't want Git to track, for security or efficiency reasons
  • Ignored files, directory, and file patterns are listed in a .gitignore file

Activity:

  • Make sure we're still in ~/Desktop/planets with pwd
  • Make a new directory with mkdir results
  • Add a few files with touch a.dat b.dat results/a.out results/b.out
  • Run git status
  • Make a .gitignore file with nano .gitignore with *.dat on the first line and results/ on the second line.
  • Print out the file's content with cat .gitignore
  • Run git status again
  • Add and commit the .gitignore file with git add .gitignore and git commit -m 'ignore data files and results folder'
  • Run git status again
  • Try running git add a.dat. What happens?
  • Run git status --ignored

Remotes in GitHub

Questions:

  • How do I share my changes with others on the web?

Notes:

  • Repository remotes (origin)
  • Git Push
  • Git Pull

Activity:

  • Log into GitHub
  • Add a new public repository called 'planets'
  • Copy the 'quick setup' link (it should be 'https://github.com/YOUR_USERNAME/planets.git')
  • And paste it into your terminal for the command git remote add origin 'https://github.com/YOUR_USERNAME/planets.git'
  • Run git remote -V
  • Run git push -u origin master and enter your GitHub password when prompted
  • Try git pull origin master
  • Refresh your repository page on GitHub. What do you see?

Collaborating

Notes:

  • Collaborator permissions
  • Git clone

Questions:

  • How can I use version control to collaborate with other people?

Activity:

  • Pair up with a neighbor and decide who will start as the 'owner' and who will start as the 'collaborator'
  • Owner: click on the 'Settings' tab on your planets repository in GitHub, navigate to 'Collaborators' and add your partner's GitHub username.
  • Collaborator: Go to https://git.521000.best.notifications and accept the owner's request to collaborate on their repository
  • Collaborator: clone the owner's planets repository onto your computer as OWNER_USERNAME-planets with git clone 'https://github.com/OWNER_USERNAME/planets.git' ~/Desktop/OWNER_USERNAME-planets
  • Collaborator: Change directory into the newly cloned repository with cd
  • Collaborator: Add a new file nano pluto.txt with the content 'It is so a planet!'
  • Collaborator: Run cat pluto.txt
  • Collaborator: Add the file with git add pluto.txt
  • Collaborator: Commit the changes and add a commit message git commit -m 'add notes about Pluto'
  • Collaborator: Push the changes to the owner's remote repository on GitHub with git push origin master
  • Owner: Refresh the repository page on GitHub
  • Owner: Back in the bash terminal, make sure you are in planets with pwd and cd if necessary
  • Owner: Pull in the new changes from the remote repository on GitHub with git pull origin master
  • Owner and Collaborator: switch roles and add another planet

Conflicts

Questions:

  • What do I do when my changes conflict with someone else’s?

Notes:

  • When working on the same files, collaborators can create content conflicts.
  • Version control with Git provides means for managing and reconciling conflicts
  • Use git fetch and git pull often to avoid/preempt conflicts

Activity:

  • Collaborator: Create a conflict by adding another line to mars.txt with nano: "This is a new line in OWNER_USERNAME's copy"
  • Collaborator: Push the change to GitHub with git add mars.txt, git commit -m 'add a line to OWNER_USERNAME copy', and git push origin master
  • Owner: Make a change in your own copy of mars.txt with nano: 'this is a different line added'
  • Owner: Push the change to GitHub with git add mars.txt, git commit -m 'add a line in my copy', and git push origin master. What happens?
  • Owner: Pull in collaborator's changes with git pull origin master
  • Owner: Look at the conflict with cat mars.txt
  • Owner: Use nano mars.txt to reconcile the conflict.
  • Owner: Merge the changes by committing them with git add mars.txt, git status, git commit -m 'merge in changes from GitHub', and finally git push origin master
  • Collaborator: Pull in the newly reconciled change with git fetch and git pull origin master
  • Collaborator: Check the results with cat mars.txt

Open Science

Questions:

  • How can version control help me make my work more open?

Licensing

Questions:

  • What licensing information should I include with my work?

Citation

Questions:

  • How can I make my work easier to cite?

Hosting

Questions:

  • Where should I host my version control repositories?

Exploring History (at the end, if there's time)

Questions:

  • How can I identify old versions of files?
  • How do I review my changes?
  • How can I recover old versions of files?

Full Tutorial: https://swcarpentry.github.io/git-novice/

Section 3 :: Programming with Python

Analyzing Patient Data

Questions:

  • How can I process tabular data files in Python?

Notes:

  • Variable assignment variable = value
  • Int, Float, and String types 1, 1.0, '1'
  • Arrays and N-Arrays [0, 1, 2] and [[1, 0],[0, 1],[1, 2]]
  • Print print(variable)
  • Python libraries (e.g., numpy) import numpy
  • CSV file (Comma-separated values)
  • Indexing and slicing data[0, 0] and data[:3, 10:]
  • IPython mystery functions (.function, and .function?)
  • Add comments with '#' # this is what this line does
  • Get stats with numpy.mean(array), numpy.max(array), numpy.min(array)
  • Get stats for a given axis with numpy.mean(axis=0) or numpy.mean(axis=1)
  • Plot and visualize with matplotlib.pyplot

Activity 1: Variables

  • Make sure you have the demo data downloaded
  • Move into the data folder with cd ~/Desktop/swc-python/data
  • Start a Jupyter notebook with jupyter notebook and New > Python 3 (Notebook)
  • Enter 3 + 5 * 4 into a cell and press Shift+Enter to run it.
  • Set a variable weight_kg = 60
  • Change it to a float with weight_kg = 60.0
  • Set a variable weight_kg_text to the string 'weight in kilograms: ' with weight_kg_text = 'weight in kilograms:'
  • Print out the weight in kilograms with print(weight_kg)
  • Print out both the text and the weight with print(weight_kg_text, weight_kg)
  • Print out the weight in pounds as a sentence with print('weight in pounds: ', 2.2 * weight_kg)
  • Try print(weight_kg). Did the weight_kg variable change?
  • Try weight_kg = 65.0 and print('weight in kilograms is now: ', weight_kg). Did the variable weight_kg change?
  • Set weight_lb = 2.2 * weight_kg the run print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
  • Reassign weight_kg = 100.0 and run print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb). Why isn't weight_lb updated?

Activity 2: Loading and Printing Data

  • Clear the notebook and at the top, run import numpy
  • Next, load the first sample data file with numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
  • Save the loaded file to a variable called data with data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
  • Print out the data print(data)
  • Print out the data type for the variable data with print(type(data))
  • Print out the data's dtype, or the data type of the items within data using print(data.dtype)
  • Get the shape of the data (rows, columns) with print(data.shape)
  • Get the first value in the data using the index of [0, 0]: print('the first value in data:', data[0, 0])
  • Print out the middle value of data with print('middle value in data:', data[30, 20])
  • Select the first 10 days (columns) from the first 4 patients (rows) with print(data[0:4,0:10])
  • Shift the slice to the 5th day with print(data[5:10, 0:10])
  • You can drop the first number for a short-hand way to slice from the beginning, and drop the second number to slice through the end. Try small = data[:3, 36:], then print('small is:') and print(small). (This selects rows 0-2 and columns 36-end)

Activity 3: N-Array Arithmetic/Stats

  • Try

    doubledata = data * 2.0
    print('original:')
    print(data[:3, 36:])
    print('doubledata:')
    print(doubledata[:3, 36:])
  • Try

    tripledata = doubledata + data
    print('tripledata')
    print(tripledata[:3, 36:])
  • Get the mean of the original data with print(numpy.mean(data))

  • Try a function without an input: import time then print(time.ctime()) to get the current time

  • Get the maximum value, minimum value, and standard deviation with

    maxval, minval, stdval = numpy.max(data), numpy.min(data), numpy.std(data)
    
    print('maximum inflammation:', maxval)
    print('minimum inflammation:', minval)
    print('standard deviation', stdval)
  • Set patient zero to its own variable and print it out

    patient_0 = data[0, :]
    print('maximum inflammation for patient 0:', patient_0.max())
  • Skip storing patient 0 as its own variable and do it in one line

    print('maximum inflammation for patient 0:', numpy.max(data[2, :]))
  • Get the average across the 0 axis (rows) with print(numpy.mean(data, axis=0))

  • Get the shape to double check it's doing what we'd expect print(numpy.mean(data, axis=0).shape)

  • Print out the average inflammation per patient across all days with the 1 axis (columns) print(numpy.mean(data, axis=1))

Activity 4: Visualizing Data

  • Get a plot started for the data

    import matplotlib.pyplot
    %matplotlib inline
    image = matplotlib.pyplot.imshow(data)
    matplotlib.pyplot.show
  • Try a plot with the average inflammation over time

    ave_inflammation = numpy.mean(data, axis=0)
    ave_plot = matplotlib.pyplot.plot(ave_inflammation)
    matplotlib.pyplot.show()

    Is this expected?

  • What about the maximum value along the first axis (0)?

    max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
    matplotlib.pyplot.show()
  • What about the minimum value along the first axis (0)?

    min__plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
    matplotlib.pyplot.show()
  • Group the plots together in one figure to compare, and start from scratch at the top of your notebook

    import numpy
    import matplotlib.pyplot
    
    %matplotlib inline
    
    data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
    
    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
    
    axes1 = fig.add_subplot(1, 3, 1)
    axes2 = fig.add_subplot(1, 3, 2)
    axes3 = fig.add_subplot(1, 3, 3)
    
    axes1.set_ylabel('average')
    axes1.plot(numpy.mean(data, axis=0))
    
    axes2.set_ylabel('max')
    axes2.plot(numpy.max(data, axis=0))
    
    axes3.set_ylabel('min')
    axes3.plot(numpy.min(data, axis=0))
    
    fig.tight_layout()
    
    matplotlib.pyplot.show()

Repeating Actions with Loops

Questions:

  • How can I do the same operations on many different values?

Notes:

  • Loop with
    for variable in collection:
      do things with variable
  • Body of the loop must be indented
  • Use len(variable) to get the length of an array or string

Try:

  • Print out the characters in the word 'lead':

    word = 'lead'
    print(word[0])
    print(word[1])
    print(word[2])
    print(word[3])
  • Try with 'tin'. What happens?

    word = 'tin'
    print(word[0])
    print(word[1])
    print(word[2])
    print(word[3])
  • Try printing the characters with a loop instead:

    word = 'lead'
    for char in word:
      print(char)
  • Switch the word to 'oxygen'. Does it work?

    word = 'oxygen'
    for char in word:
      print(char)
  • Swap out the variable name. What happens?

    word = 'oxygen'
    for banana in word:
      print(banana)
  • Make a loop that updates a variable:

    length = 0
    for vowel in 'aeiou':
      length = length + 1
    print('There are', length, 'vowels')   
  • Try another loop. Does it do what you'd expect?

    letter = 'z'
    for letter in 'abc':
      print(letter)
    print('after the loop, letter is', letter)
  • Use a shortcut to count the vowels:

    print(len('aeiou'))

Storing Multiple Values in Lists

Questions:

  • How can I store many values together?

Notes:

  • Mutable vs. immutable data (lists and arrays vs strings and numbers)
  • In-place modifications can be tricky
  • Lists use bracket notation list = [item, item2, item3], and can be indexed list[0] and sliced list[:2]

Try:

  • Make a list

    odds = [1, 3, 5, 7]
    print('odds are', odds)
  • Select items from the list by index

    print('first and last', odds[0], odds[-1])
  • Loop over items in the list

    for number in odds:
      print(number)
  • Change an item in a list (fix a typo)

    names = ['Curie', 'Darwing', 'Turing']
    print('names is originally', names)
    names[1] = 'Darwin'
    print('final value of names:', names)
  • Now try changing a character in a string. What happens?

    name = 'Darwin'
    name[0] = 'd'
  • Modify a list based on a list. Does it do what you'd expect?

    salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
    my_salsa = salsa
    salsa[0] = 'hot peppers'
    print('Ingredients in my salsa:', my_salsa)
  • Try making an independent copy of a list instead

    salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
    my_salsa = list(salsa) # list() makes a copy
    salsa[0] = 'hot peppers'
    print('Ingredients in my salsa:', my_salsa)
  • Try making a list of lists

    x = [['pepper', 'zucchini', 'onion'],
        ['cabbage', 'lettuce', 'garlic'],
        ['apple', 'pear', 'banana']]
  • Print the first line as a list within a list

    print([x[0]])
  • Print the first line as a list

    print(x[0])
  • Make a heterogeneous list

    sample_ages = [10, 12.5, 'Unknown']
  • Add an item to a the list odds

    odds.append(11)
    print('odds after adding a value:', odds)
  • Remove an item from the list odds by index

    del odds[0]
    print('odds after removing the first element', odds)
  • Reverse the list

    odds.reverse()
    print('odds after reversing', odds)
  • Try modifying a list in place. (Remember, it's immutable!) What happens?

odds = [1,3, 5, 7]
primes = odds
primes.append(2)
print('primes:', primes)
print('odds:', odds)
  • Try again by making a copy with list()
    odds = [1,3, 5, 7]
    primes = list(odds)
    primes.append(2)
    print('primes:', primes)
    print('odds:', odds)

Analyzing Data from Multiple Files

Questions:

  • How can I do the same operations on many different files?

Notes:

  • Use the glob library for working with files, directories, and file paths.
  • Use glob/glob(pattern) to create a list of files that match the pattern
  • Use * in a pattern to match 0 or more characters (of any kind) and ? to match a single character

Activity:

  • Import the glob library

    import glob
  • Use glob.glob to list the inflammation data files in the current directory

    print(glob.glob('inflammation*.csv'))
  • Make inline graphs for the first 3 inflammation data files

    import numpy
    import matplotlib.pyplot
    
    %matplotlib inline
    
    filenames = sorted(glob.glob('inflammation*.csv'))
    filenames = filenames[0:3]
    
    for f in filenames:
      print(f)
    
      data = numpy.loadtxt(fname=f, delimiter=',')
      fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
    
      axes1 = fig.add_subplot(1, 3, 1)
      axes2 = fig.add_subplot(1, 3, 2)
      axes3 = fig.add_subplot(1, 3, 3)
    
      axes1.set_ylabel('average')
      axes1.plot(numpy.mean(data, axis=0))
    
      axes2.set_ylabel('max')
      axes2.plot(numpy.max(data, axis=0))
    
      axes3.set_ylabel('min')
      axes3.plot(numpy.min(data, axis=0))
    
      fig.tight_layout()
      matplotlib.pyplot.show()

Making Choices

Questions:

  • How can my programs do different things based on data values?

Notes:

  • if, elif, and else are conditionals for control flow
  • You can combine conditionals with and and or
    if (a > 1) and (b == 5):
      # do something
  • True and False are booleans

Try:

  • Try out an if / else

    num = 37
    if num > 100:
      print('greater')
    else:
      print('not greater')
    print('done')
  • Try without an else

    num = 53
    print('before conditional...')
    if num > 100:
      print(num, 'is greater than 100')
    print('... after conditional')
  • Chain several tests with elif

    num = -3
    if num > 0:
      print(num, 'is positive')
    elif num == 0:
      print(num, 'is zero')
    else:
      print(num, 'is negative')
  • Combine tests with and

    if (1 > 0) and (-1 < 0):
      print('both parts are true')
    else:
      print('at least one part is false')
  • Try with or

    if (1 < 0) or (-1 < 0):
      print('at least one test is true')

Activity

Creating Functions

Questions:

  • How can I define new functions?
  • What’s the difference between defining and calling a function?
  • What happens when I call a function?

Errors and Exceptions

Questions:

  • How does Python report errors?
  • How can I handle errors in Python programs?

Defensive Programming

Questions:

  • How can I make my programs more reliable?

Debugging

Questions:

  • How can I debug my program?

Command-Line Programs

Questions:

  • How can I write Python programs that will work like Unix command-line tools?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment