Created
August 31, 2018 02:59
-
-
Save disulfidebond/9e0c09710a83d8df05f9ecdf3f348570 to your computer and use it in GitHub Desktop.
Etherpad from Software Carpentry 2018 Workshop
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
†Welcome to Software Carpentry Etherpad! | |
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents. | |
Use of this service is restricted to members of the Software Carpentry and Data Carpentry community; this is not for general purpose use (for that, try etherpad.wikimedia.org). | |
Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html | |
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ | |
Link to workshop website: https://uw-madison-aci.github.io/2018-08-29-uwmadison-swc/ | |
Unix Shell: | |
Commands we've learned: | |
pwd - 'print working directory', prints the location your terminal is looking at in your filesystem | |
ls - 'list', prints the contents of the directory you are in | |
options: | |
-l: long format (see info about files) | |
-h: human readable (makes the size more readable) | |
-a: shows hidden directories and files | |
-F: show indicators for files vs. folders | |
-..: show the directory contents above my current directory | |
<DIRECTORYNAME>: lists a different directory than you are in | |
<file name>: searches for some specified file in current directory | |
man - gives you the manual for a particular command | |
if "man" does not work, you can try --help after the command ("man" doesnt work for windows) | |
cd - 'change directory', changes which your terminal is looking at in your filesystem | |
cd .. - navigate up one directory | |
cd ~ -navigate to my home directory | |
cd - - navigate to the last place I was ('previous channel button') | |
mkdir - make a new directory | |
nano 'some file' - open 'some file' within the nano command line text editor | |
'CTRL-x' to exit, then 'y' to save, then enter to keep the name | |
'CTRL-o' to save without exiting (stands for 'write out') | |
rm 'some file' - removes 'some file'. This only works on files. | |
options: | |
-r 'some directory' - removes 'some directory' and anything it contains. | |
-i : interactive, prompts you for each item being removed (use 'y' or 'n' for yes and no) | |
touch <filename> | |
create an empty file | |
mv <file> <new-location> | |
moved a folder or file to a new location or new name | |
cp <file> <new name or location> | |
copies a file to a new name or new location | |
Absolute paths start with a /. | |
Tab Completion <3<3<3<3<3 | |
Hit "tab" when you are typing a command or location to autocomplete something | |
CTRL-a : jump to the beginning of a line | |
CTRL-e : jump to the end of a line | |
Wildcards | |
* - asterisks can represent any characters | |
example: *.pdb will mean any files ending in ".pdb" | |
cat <file> - shows contents in terminal | |
(word count) wc <path> - counts number of words in a text file | |
-l count lines | |
-c count characters | |
sort <file> - sorts file contents | |
-n sort numerically, rather than alphabetically | |
-k <num> sort on column <num>, instead of the first column | |
-r reverse the sort order | |
head <file> - shows the top of a text file (default is the first 10 lines) | |
-n <num> show first num lines | |
tail <file> - show the end of a text file (default is the first 10 lines) | |
-n <num> show the last num lines | |
redirect ( <any_cmd> > <output_filename>) : direct the output that would've gone to the screen to a text file instead | |
PIPES ( <cmd1> | <cmd2> | <cmd3> ) : strings multiple commands together, passing the output from the previous command to the next | |
Loops | |
one-line syntax: for <loop_variable> in <list of items>; do <cmd1>; <cmd2>; ... ; done | |
- use the value of the loop variable using $ ($loop_variable) | |
ex: for filename in basilisk.dat unicorn.dat; do head -n 3 $filename; done | |
ex: for filename in *.dat; do cp $filename original-$filename; done | |
Regular Expressions | |
these are really handy ways to filter filenames or text | |
*[AB].txt - "all files which end in either A.txt or B.txt" | |
^ - the "not" operator, this negates or filps the effect of the expression | |
*[^AB].txt - "all files which end in anything but A.txt or B.txt" | |
A "quick" guide to regular expression usage with a few nice examples can be found here: http://marvin.cs.uidaho.edu/Handouts/regex.html | |
echo <text> - returns the text after the command as output to the string, nice for examining the values of variables in bash | |
to write the value of a bash (or for loop) variable to screen: echo $variable_name | |
Bash Scripts | |
- bash scrips typically end with ".sh" | |
- to run a bash script use either "bash <script_name>.sh" or "./<srcipt_name>.sh" | |
script arguments: | |
- arguments can be provided to a bash script like so: bash <script_name>.sh arg1 arg2 arg3... | |
- in the script, the value of an argument can be accessed using $1, $2, $3 where those values would be the values of arg1, arg2, arg3 respectively | |
- if the script tries to access the value of an argument that doesn't exist, it can result in unexpected behavior!!! | |
comments: | |
- any line starting with a "#" symbol will be ignored in the script's execution | |
- comments are a nice way of leaving notes for collaborators (or your future self) about what the script does and how it is used | |
Unix Questions(write yours below) and answers from instructors: | |
when i use "LS" I get coloring, when I use "ls" I get the same info without coloring, is that significant? how? (windows) | |
interesting, this is probably due to a setup file on your system that tells it that "LS" should be colored. Up to you which you'd like to use. | |
(colors in 'ls' hightlight different type of files, and/or permissions) | |
Is there a shortcut to jump to previous space or '/' similar to ctrl-a or ctrl-e? | |
`cd -` allows you to move to the previous folder, | |
is that what you wanted to know ? | |
I meant more if I'm typing in the command line something like cd 'folder/folder1/folder2/ and I made a typo. Is there shortcut to jump to the previous / to correct it instead of hitting left arrow a bunch? | |
From google: | |
Some useful line editing key bindings provided by the Readline library: | |
Ctrl-A : go to the beginning of line. | |
Ctrl-E : go to the end of line. | |
Alt-B : skip one word backward. | |
Alt-F : skip one word forward. | |
Ctrl-U : delete to the beginning of line. | |
Ctrl-K : delete to the end of line. | |
Alt-D : delete to the end of word. | |
(The command with Alt based shortcuts are not working for me, the Ctrl based shortcuts are working on my terminal) | |
(It looks like these also work in jupyter notebook) | |
Are 'bash goostats' and './goostats' synonymous?sorta..they are the same if the goostats has execute permissions. | |
Question from sticky notes: a way to list # of files in a direcotry would be helpful | |
`ls | wc -l` | |
or if you have specific files (example those ending in .txt) you could do `ls *.txt | wc -l` | |
I use it all the time to make sure I have the same number of output files as input files after my analyses run - Sarah | |
Yes, for bash scripts. The './goostats' could be used to run other types of scripts/programs that aren't bash. | |
Python: | |
Open jupyter notebook (in terminal command line): | |
jupyter notebook | |
Correction: "is" is used for more than just comparing type and value. It would be best to use `==` in most cases. | |
If you are working with multi-dimensional data (not tabular), you may be interested in the "xarray" library (http://xarray.pydata.org/en/stable/). | |
Python Questions(write yours below) and answers from instructors: | |
If you have a really long list is there a way to search within that list to figure out where (what position) some item is located? | |
Yes. You can use the `.index` method. Example: | |
a = [1, 2, 3] | |
a.index(2) | |
# returns 1 (for the item in the second position) | |
# this will raise an exception if the item doesn't exist in the list | |
How to remove an item from a list | |
There are a couple of options, here is a blog post that shows a few examples and explains pros and cons http://gloriadwomoh.me/blog/deleting-an-item-from-a-list-in-python-pop-remove-delete/ | |
Regarding number precision (previously asked question): | |
Python built in float numbers are 64-bit (double) floats. If you are concerned with precision (8, 16, 32 bit integers and 32-bit versus 64-bit floats) you may want to look in to the "numpy" python library. Steve is about to talk about the "pandas" library which uses numpy underneath. | |
Still a bit uncertain about when to use single and double brackets and ( ) and [ ]. | |
( ) indicates that the comma-separated values are a tuple, while the [ ] indicates that the comma-separated values are a list. Does that kinda cover it? | |
Also when to use no space vs. space after a character. | |
Do you mean when defining a string? Like "a" vs "a " Or am I missing the crux of the question? :) | |
Or do you mean `print( "a" )` versus `print("a")`? In this case it doesn't matter except for code style preferences. A good starting point for python style guidlines is PEP8: https://www.python.org/dev/peps/pep-0008/ | |
For two plots, use subplot() | |
see https://stackoverflow.com/questions/42818361/how-to-make-two-plots-side-by-side-using-python | |
Git: | |
Sarah will let you know when the links below come into play | |
Socrative: | |
https://b.socrative.com/login/student/ | |
ROOM: SSTEVENS | |
Info on different config options (eg, text editors): https://uw-madison-aci.github.io/20170830-git-novice/02-setup/ | |
https://github.com/UW-Madison-ACI/countries | |
countries: | |
france | |
brazil | |
Senegal | |
Chile | |
Australia | |
Spain | |
Canada | |
Sweden | |
Swaziland | |
China | |
Egypt | |
Madagascar | |
Germany | |
Iceland | |
New Zealand | |
norway | |
South Korea | |
Canada | |
Colombia | |
Japan | |
srilanka | |
United Kingdom | |
Notes for git: | |
$ git config --global user.name "name-in-quotes" | |
$ git config --global user.email "email-address" | |
$ git config --global color.ui "auto" | |
$ git config --global core.editor "nano -w" | |
git init = initialize a repository in current folder | |
git status = what is the status of the current repo? what branch? what is ready to commit? what is staged? ... | |
Untracked files: files that you have not yet told your git repo about but which are in its location | |
git status --ignored - display which files are being ignored | |
git add <path/to/file/or/directory> = adds one or more files to staging - can be new files or changes already being tracked | |
git commit -m "description-of-changes" = commits the file to the repository | |
When message flag is is missing, it will launch the configured text editor to add the message. | |
Note: the two separate steps of adding and staging provide flexibility in managing the repository. | |
git log = show the history of commits with their details (author, date, message...) | |
git log -1 = show one commit | |
git log --oneline = show commits as a single line | |
git log --graph --all --oneline : gives you a pretty graphical representations of your branches and commits | |
git diff = display the differences between the staged state of files and their most recent edits | |
Can also see the differences between specific commits | |
git diff --staged = what is the difference between the staged version and the last committed version | |
git diff HEAD <file> - what is the difference between current and most recent commit? | |
git diff HEAD~1 <file> - what is the difference between current version and one version before (minus) the most recent commit? | |
git diff <commit-id> <file> - what is the difference between current version and the commit with ID <commit-id>? | |
git checkout <commit-id> <file> = make the current version of <file> the same as it was when it was committed as <commit-id> | |
git checkout HEAD <file> = check the version out of the repository that is in HEAD | |
git checkout <branch-name> - switch to <branch-name> | |
git checkout -b <branch-name> - create a new branch and check it out in one step | |
.gitignore = a file that tells a git repository which files should not be tracked | |
Can list individual files or use wildcards | |
Can create an entry for an entire directory | |
The .gitignore file itself should be committed to the repository | |
git branch - display the branches | |
git branch <name> - create a new branch called <name> | |
See git checkout <branch> | |
git branch -d <name> - delete an unused or merged branch. | |
git branch -D <name> - using capital D if there are changes not yet merged that should not be merged into the current branch. | |
git merge <other-branch> - merge the changes from a different branch into the currently checked out branch | |
git remote - show information about origins | |
git remote add origin <url> - add a remote connection named "origin" that points to the URL online | |
"origin" is the convention for the main remote repository | |
git push <remote-name> <branch-name> | |
git push origin master - push all the committed changes not yet in origin up to the origin remote repository | |
(in other words, push from your computer to github) | |
git clone <url> - create a local copy of a repository from the remote copy at the URL | |
Will automatically create an "origin" remote entry | |
to clone a repo into a directory that has a different name than the repo | |
git clone <url> <local folder name> | |
example: | |
On github, we have repo called "planets", but we already have a "planets" folder on our computer where we want to clone the repo. | |
git clone <planet-repo-url> planets-2 | |
git pull <remote-name> <branch-name> | |
git pull origin master - update the local copy with the latest commits from the remote branch. | |
On Github | |
create new repositories (new) | |
"Fork" a repository that someone else already has | |
will create a copy of another person's repo that you can edit that has all the history of the original repo, but any changes you make on your fork won't automatically go into the original repo. | |
Make a pull request (PR) | |
when you make changes on your branch or fork, make a pull request into the original repo where you are basically asking "please pull my changes in" | |
Helpful blog on writing pretty/consistent commit messages: https://chris.beams.io/posts/git-commit/ | |
Git Questions(write yours below) and answers from instructors: | |
Can you edit files within a git repository outside of the command line (like in some language environment) and then stage and commit those changes? | |
Yes- As long as the file you change is located in the folder that is the git repo. The file system you see and edit at the command line is the same file system you see in the view fnder. So if you make a change using a GUI program, you will need to commit those changes in your git repo as well. Thanks. | |
Question was asked if you can add and commit all in one line. The answer is yes. | |
git commit -a -m "my message" | |
with this one, the -a addes ALL FILES that are being tracked (not files that are in the gitignore), even if you don't want to, so this can be a dangerous command | |
git commit --only <file> -m "message" | |
*I think* this is how this adds and committs only one file | |
or rather maybe: commit <file> -m "my message" | |
If you would like to join Github Education (gives free private reposes among other stuff), you can sign up here: https://education.github.com/ ('join' at the top of page) - I highly recommend signing up! | |
Why did I need a username and password and the instructor did not get a prompt? | |
Sarah has her git/computer set up to automatically accept her username and password | |
Here is how you can set it up too: https://stackoverflow.com/questions/8588768/how-do-i-avoid-the-specification-of-the-username-and-password-at-every-git-push | |
- Recommend trying this maybe at lunch if you want to set it up | |
- I personally do not use it because, yes, it is tedious, but it can be a nice catch if I accidentally start to push to a remote that I don't want to. | |
Auto tab complete git commands: | |
https://git-scm.com/book/en/v1/Git-Basics-Tips-and-Tricks | |
At this website, the top sectionhas a file you can download into your home directory (~). Then you'll have to edit your .bashrc file which is located in your home directory to add the line "source ~/git-completion.bash". I recommend just doing the very top part of this site (the other stuff is more complicated and less fixable if something gets messed up). | |
If I have 3 different versions in the history of a repo, does git save 3 copies of that repo, or only the differences between the versions? Similarly, if I have a branch and master, do I have two copies of everything on my disk, or just the one copy, and the differences from master? | |
Are you asking if you made 3 different branches in a repo and make different changes on all of them? | |
If you make a new branch, no there is not a new copy of the file. Git technically stores the "diffs" or changes between different states. Diffs are much smaller in memory and is not necessarily a second copy of the file. | |
This is the same with the commits - in each commit, we aren't actually storing the "state" or file at that time, but rather the difference between commits. This is how git tracks our changes. | |
Also, when you switch branches, you only have access to the state of the files that are on that branch. | |
Did that answer your question? | |
Yup, that got all of my questions. Thanks! | |
When you want to make changes in a git repro is it best to create a new branch, work on changes there, and then merge that branch with the master? | |
yes! that is a great and an encouraged workflow | |
If I want to use code without making modifcations, but want to access and new code from developer, would I clone, or should I fork anyway? | |
If you do not plan to change the code, just cloning the original repo without forking is a good idea. But forking and cloning shouldn't hurt it either. In both cases, if the orignal repo changes, you will need to update your local copy with git pull. | |
Some questions from the morning that are good to answer but I don't plan on covering them this afternoon: | |
Does Git work with files stored in google drive? | |
Yes, there is a "hack-ish" way to do this. You can sync your google drive folder on your computer and then use "git init" and set it up as a git repository. When people make changes on a drive file, your git repo should show that there are changes that you need to commit and push. | |
Keep in mind, that (1) these google files are likely binaries so it is harder to determine the differences, and (2) this process is not seemless. Sometimes the google drive doesn't sync properly and you can end up with two copies of the same file. (So I've been told). | |
Is there a way to fork and create a PR (pull request) from the command line? | |
Yes, to fork you must first clone, and then your can use "git fork" on the command line: https://www.quora.com/Git-How-can-I-fork-a-repository-using-only-the-command-line | |
And yes, you can make a PR from the command line, you have to know what commit you want to start the PR from: https://git-scm.com/docs/git-request-pull | |
Clarifying the difference between the "origin" and "upstream" terms: | |
These are both names for remote instances of a repository and the names are conventions or best practices | |
In general, the "origin" remote is one that your user will have permission to push changes directly into while the "upstream" remote is one that you do not have permission to push changes into. Therefore, the typical workflow would involve | |
1. forking someone else's repo (the upstream one), which create a personal copy but which is remote (on a server) | |
2. cloning the personal, forked copy on to the computer where you do your work (e.g., your laptop). this process will automatically set the remote personal copy as the "origin" | |
3. committing changes locally and then pushing them back up to your personal remote to copy, which is "origin" | |
4. submitting a pull request to merge your changes in your origin remote to the upstream | |
Step 4 may be approved or rejected by the owner of the upstream repo. | |
What is the difference between cloning to a remote server with git clone and pulling information from github to your computer? | |
Git repositories can be hosted on any computer. You could have them on a remote server at your workplace or on github (or other git hosting site). They both act as git remotes and can usually be treated in the same way. Github just happens to have a web interface. You can `git clone` to create a local copy of the remote repository, `git pull` to update your local copy with changes on the remote, and `git push` to push your local changes to the remote if you have permission. | |
What is the difference between clone, pull, and fetch? | |
Clone will create the repository locally, add the remote (origin), and pull the current version from the remote (default: master branch) | |
Pull will get the changes from the remote and merge them in to your current branch | |
Fetch is typically not needed in a "normal" workflow but acts as the synchronization step to let your local git repository know about the changes available on the remote, but it doesn't affect your working directory. In fact, as described in the `git pull --help` information, git pull is actually a combination of `git fetch` and `git merge`. | |
Be careful when copying and pasting from nano--if there is a "$" then your copy command will only copy up to that point. | |
import sys | |
# import the sys library | |
print(sys.argv) | |
# sys.argv is a list of values. This command instructs python to print the entire list | |
print(sys.argv[1]) | |
# sys.argv is a list of values. This command instructs python to only print the value at position 1 | |
# remember that lists always start at 0, not 1 | |
# the 0th item is always the name of the program, you usually will not use this value | |
If you use print(), python will automatically convert whatever is inside the parenthesis to a string. Be careful that some types of values cannot be automagically converted, and this will cause an error. | |
for filename in *.csv # scan the current directory, select all files that end in ".csv", the * is a wildcard that tells Bash: "match anything" + "that ends in csv" | |
do | |
python gdp_plots.py $filename | |
# run the command "python gdp_plots.py" using the file that was assigned to the $filename variable | |
done | |
# python equivalent: | |
for filename in file_list: | |
# for each item in the list "file_list", assign it to the variable filename, then do the following: | |
data = pandas.read_csv(filename, index_col = 'counts') | |
# read the values in the csv file and store them in the 'data' variable | |
ax = data.plot(title=filename) | |
# creae a plot of the data, and store it in the 'ax' variable | |
ax.set_xlabel('Year') | |
# assign the x label inside ax to 'Year' | |
ax.set_ylabe('GDP Per Capita') | |
# assign the y label inside ax to 'GDP Per Capita' | |
ax.set_xticks(range(len(data.index))) | |
# this is a bit complicated, but with any python command, start from the inside parenthesis and work your way out, similar to solving a math problem | |
# get the length of the values for the data rows above | |
# you're setting up tick marks, and this requires the argument "range". assign the value from the length of the data rows as the range | |
# assign this to the ax variable | |
# the time command automagically starts when the command runs, and stops when it ends, basically a computer stopwatch. | |
# then it outputs what would be on the stopwatch screen | |
What real, sys, and user time means when using the time command: https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1 | |
if len(sys.argv) == 1: | |
# IMPORTANT: sys.argv will always have one value, which is the name of the python script. If the length is 1, then nothing else was added | |
# if the length of the list is 1, do the following: | |
print('Not enough arguments have been provided') | |
print('Usage: python gdp_plots.py <filename>') | |
print('Options: -a plot all gdp data in the current directory') | |
sys.exit() | |
# or use exit() | |
# exit the script, do not continue further | |
if file_list == []: | |
# if file_list is empty, do the following | |
# other ways to do the same: | |
# # if len(file_list) == 0: if the length of the file_list is 0, i.e. empty | |
# do something | |
# # if file_list == False: # if the file_list is False, i.e. there are no values inside it | |
# # do something | |
# # if not file_list: # the same as if file_list == False | |
# # do something | |
if sys.argv[1] == '-a': | |
# look at the sys.argv. if the value at position 1 is equal to '-a', then do the following: | |
file_list = glob.glob("*gdp*.csv") | |
# use the glob library to match any files that match using wildcards, this means | |
# "match anything" + "match the string gdp" + "match anything" + "match .csv" | |
# example: this will match the file helloworld_gdp_file.csv | |
# this will not match the file hello_world.csv | |
else: | |
file_list = sys.argv[1:] | |
# take all arguments that you've provided, then add them to file_list | |
# this will match anything that you've typed after the filename gdp_plots.py | |
# this will cause an error if what you've typed is not a filename in the directory! | |
Standard python docstring conventions: | |
https://www.python.org/dev/peps/pep-0257/ | |
"Google style": https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html | |
"Numpy style": https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy | |
More ACI/SWC helpful resources: https://github.com/UW-Madison-ACI/swcarpentry-workflows-in-practice/blob/master/resources.md | |
Wrap-up slides: https://docs.google.com/presentation/d/1CjkZkwr5E9VEheujYF9BfLCtBwxAmdOOCQGrvJES4fo/edit?usp=sharing | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment