Note that any code following a $ (e.g. $ pipenv install) should be run in the Terminal (Mac)/Command Prompt (Windows)/Command Line (Linux) in the appropriate directory.
Motivation: We want our research to be reproducible. Reproducibilty requires both that the person reproducing your research (1) has access to the same version of the code and (2) uses the exact same version of the packages needed in this code. To satisfy these requirements, GitHub is used to allow the reproducer to use the exact versions of the code and pipenv to easily install and use the exact same version of the packages used.
Pipenv can be installed with a simple $ pip3 install pipenv
We install packages in pipenv just like in pip, but when we do $ pipenv install example_package, in addition to installing the package it adds the package to the Pipfile and the Pipfile.lock file
(if these files do not exist for this project yet, pipenv first creates them).
The Pipfile file is a less technical file which only purpose is to be human-readable, while the Pipfile.lock file is a technical file that is what pipenv actually use.
The Pipfile file is very similar to a requirements.txt file, but it is nicer because anytime we install a package using pipenv, it automatically adds that package
to the Pipfile. Like a requirements.txt, the Pipfile does not require the user to define a version of the package.
pipenv also adds the package to the Pipfile.lock file and it always record which version that was installed even if we as a user did not define an exact version to install.
Therefore, even when we at the time of installing a package just wanted the most recent stable version of the package, anyone reproducing your work in the future will know exactly what version that was. Examples of a Pipfile and a Pipfile.lock are below.
As stated above, the Pipfile.lock is the "machine-readable" version of the Pipfile and it contains the exact versions that
the original authors used when they wrote the code. When you add more packages you again use $ pipenv install example_package and pipenv will install that package, making sure that the new package has no dependency conflict with any package already installed, and finally update the Pipfile and the Pipfile.lock file.
- Starting with a new project (aka no
Pipfile), we can simply do$ pipenv install package1 package2 package3which will cause pipenv to create the virtual environment, create aPipfile, install those packages, and then "lock" the versions inPipfile.lock - To enter the virtual environment, you then navigate to the folder with the
Pipfile.lockfile and run$ pipenv shell. This means that you step into the virtual environment and all the Python packages that you installed will now be available - If you need to add a new python package, simply execute
$ pipenv install new_packageand it will be added to the virtual environment and to thePipfileas expected. - To exit the
$ pipenv shell, just do$ exit
- Clone repo and navigate in the console to where the
Pipfileand thePipfile.lockfile are located. - Run
$ pipenv syncand it will create a virtual environment where the exact version of all packages from thePipfile.lockwill be installed - Then just run
$ pipenv shellto enter the virtual environment
-
Instead of going into the shell with
$ pipenv shell, you can also always run$ pipenv run <command>(e.g.$ pipenv run python3 example_module.py) which will run the<command>in the Pipenv shell but your console will not enter into the shell. So, while you can run$ pipenv shelland all subsequent commands will be in the virtual env without re-running, you would need to add$ pipenv run ...before every command. I never do it this way (except in Docker containers...). In other words,foo@bar:~$ pipenv shell foo@bar:~$ python3 myscript1.py foo@bar:~$ python3 myscript2.py foo@bar:~$ python3 myscript3.py foo@bar:~$ exit foo@bar:~$ python3 myscript1.py # will throw ModuleNotFound exception since not in virtual environment
is equivalent to
foo@bar:~$ pipenv run python3 myscript1.py foo@bar:~$ pipenv run python3 myscript2.py foo@bar:~$ pipenv run python3 myscript3.py foo@bar:~$ python3 myscript1.py # will throw ModuleNotFound exception since not in virtual environment
-
Sometimes
$ pipenv install(which basically re-installs/updates all your packages listed in the Pipfile) will take a long time to lock. This is not necessary, for example, when you are first starting a project and need to keep adding packages one by one. In this case, you can run$ pipenv install example-package --skip-lock, which will still correctly install and add it to your Pipfile but without the long wait. Once you are ready to lock the dependencies, you can run$ pipenv lock -
Pipenv can be annoying sometimes. If having a nonsensical problem, this almost always works:
- Run
$ pipenv --rmwhich deletes the virtual environment - Delete Pipfile.lock:
$ rm Pipfile.lock - Run
$ pipenv install
- Run
Anytime that we have passwords or sensitive information, we never ever ever want to (1) have it in our code and (2) commit it to Github. .env files allow us to keep sensitive information in the "environment" and out of our code.
Let's say you are trying to use a PostgreSQL database from a Python script. The usual way to do this is with the psycopg2 package.
import psycopg2
conn = psycopg2.connect(
host='localhost',
database='mydb',
user='postgres',
password='StopLookingAtMyPassword!!!!!'
)But this is not okay because now anyone who sees your code can get your login and steal all of your data. Let's use a .env file to fix this.
-
Create a
.gitignorefile if one does not already exist and add.env. This should ALWAYS be in the.gitignorefrom the very beginning of a project and will prevent the.envfile from ever being committed to GitHub. -
Create a
.envfile in your root directory (along side thePipfileandPipfile.lockpreferably) and paste the following text inside:USERNAME="postgres" PASSWORD="StopLookingAtMyPassword!!!!!" -
Now, we are going to load our
USERNAMEandPASSWORDvariables into the environment with pipenv:foo@bar:~$ pipenv shell Loading .env environment variables... Launching subshell in virtual environment... . ~/repo-WcdiAtXE/bin/activate foo@bar:~$ echo $PASSWORD StopLookingAtMyPassword!!!!!
Note the "Loading .env environment variables...." below the
$ pipenv shellcommand. Now all Python code that we run from this directory using$ python3 example_script.py(and even within jupyter notebooks!) will have access to these environmental variables (detailed below). If we were to exit the pipenv environment (with$ exit) and run$ echo $PASSWORDagain, it would be blank. -
Time to rewrite our code from above!
import os import psycopg2 conn = psycopg2.connect( host='localhost', database='mydb', user=os.environ['USERNAME'], password=os.environ['PASSWORD'] )
Note that we imported the
osmodule and useos.environ(which is a Python dictionary keyed on your environment variables) to grab our variables from the pipenv environment. Now, our code does not have any passwords!!!!
This is the human readable/editable file which is extremely similar in function to requirements.txt
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
jupyter = "*"
pandas = "*"
sklearn = "0.23"
matplotlib = "*"
missingno = "*"
[dev-packages]
[requires]
python_version = "3.7"
This is the machine readable/editable version which is automatically created with the command pipenv install or pipenv lock. Below is only part of a file because the files are very long:
{
"_meta": {
"hash": {
"sha256": "8e1e467da58950511f3b0d2ff6103a5d3f07dc1e7c7a5064a1245d73ed9cb646"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.7"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}
]
},
"default": {
"appnope": {
"hashes": [
"sha256:93aa393e9d6c54c5cd570ccadd8edad61ea0c4b9ea7a01409020c9aa019eb442",
"sha256:dd83cd4b5b460958838f6eb3000c660b1f9caf2a5b1de4264e941512f603258a"
],
"markers": "sys_platform == 'darwin' and platform_system == 'Darwin'",
"version": "==0.1.2"
},
"argon2-cffi": {
"hashes": [
"sha256:05a8ac07c7026542377e38389638a8a1e9b78f1cd8439cd7493b39f08dd75fbf",
"sha256:0bf066bc049332489bb2d75f69216416329d9dc65deee127152caeb16e5ce7d5",
"sha256:18dee20e25e4be86680b178b35ccfc5d495ebd5792cd00781548d50880fee5c5",
"sha256:392c3c2ef91d12da510cfb6f9bae52512a4552573a9e27600bdb800e05905d2b",
"sha256:57358570592c46c420300ec94f2ff3b32cbccd10d38bdc12dc6979c4a8484fbc",
"sha256:6678bb047373f52bcff02db8afab0d2a77d83bde61cfecea7c5c62e2335cb203",
"sha256:6ea92c980586931a816d61e4faf6c192b4abce89aa767ff6581e6ddc985ed003",
"sha256:77e909cc756ef81d6abb60524d259d959bab384832f0c651ed7dcb6e5ccdbb78",
"sha256:7d455c802727710e9dfa69b74ccaab04568386ca17b0ad36350b622cd34606fe",
"sha256:8a84934bd818e14a17943de8099d41160da4a336bcc699bb4c394bbb9b94bd32",
"sha256:9bee3212ba4f560af397b6d7146848c32a800652301843df06b9e8f68f0f7361",
"sha256:9dfd5197852530294ecb5795c97a823839258dfd5eb9420233c7cfedec2058f2",
"sha256:b160416adc0f012fb1f12588a5e6954889510f82f698e23ed4f4fa57f12a0647",
"sha256:ba7209b608945b889457f949cc04c8e762bed4fe3fec88ae9a6b7765ae82e496",
"sha256:cc0e028b209a5483b6846053d5fd7165f460a1f14774d79e632e75e7ae64b82b",
"sha256:d8029b2d3e4b4cea770e9e5a0104dd8fa185c1724a0f01528ae4826a6d25f97d",
"sha256:da7f0445b71db6d3a72462e04f36544b0de871289b0bc8a7cc87c0f5ec7079fa",
"sha256:e2db6e85c057c16d0bd3b4d2b04f270a7467c147381e8fd73cbbe5bc719832be"
],
"version": "==20.1.0"
},
"async-generator": {
"hashes": [
"sha256:01c7bf666359b4967d2cda0000cc2e4af16a0ae098cbffcb8472fb9e8ad6585b",
"sha256:6ebb3d106c12920aaae42ccb6f787ef5eefdcdd166ea3d628fa8476abe712144"
],
"markers": "python_version >= '3.5'",
"version": "==1.10"