Created
June 28, 2019 10:56
-
-
Save Sparker0i/561fce8e7be0f7c11c15e05590e11785 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h1>Reading Files Python</h1>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<p><strong>Welcome!</strong> This notebook will teach you about reading the text file in the Python Programming Language. By the end of this lab, you'll know how to read text files.</p>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h2>Table of Contents</h2>\n", | |
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n", | |
" <ul>\n", | |
" <li><a href=\"download\">Download Data</a></li>\n", | |
" <li><a href=\"read\">Reading Text Files</a></li>\n", | |
" <li><a href=\"better\">A Better Way to Open a File</a></li>\n", | |
" </ul>\n", | |
" <p>\n", | |
" Estimated time needed: <strong>40 min</strong>\n", | |
" </p>\n", | |
"</div>\n", | |
"\n", | |
"<hr>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h2 id=\"download\">Download Data</h2>" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"--2019-06-28 10:56:13-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt\n", | |
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n", | |
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n", | |
"HTTP request sent, awaiting response... 200 OK\n", | |
"Length: 45 [text/plain]\n", | |
"Saving to: ‘/resources/data/Example1.txt’\n", | |
"\n", | |
"/resources/data/Exa 100%[===================>] 45 --.-KB/s in 0s \n", | |
"\n", | |
"2019-06-28 10:56:14 (20.7 MB/s) - ‘/resources/data/Example1.txt’ saved [45/45]\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"# Download Example file\n", | |
"\n", | |
"!wget -O /resources/data/Example1.txt https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/example1.txt" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<hr>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h2 id=\"read\">Reading Text Files</h2>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"One way to read or write a file in Python is to use the built-in <code>open</code> function. The <code>open</code> function provides a <b>File object</b> that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover <b>.txt</b> files. The first parameter you need is the file path and the file name. An example is shown as follow:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadOpen.png\" width=\"500\" />" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" The mode argument is optional and the default value is <b>r</b>. In this notebook we only cover two modes: \n", | |
"<ul>\n", | |
" <li><b>r</b> Read mode for reading files </li>\n", | |
" <li><b>w</b> Write mode for writing files</li>\n", | |
"</ul>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"For the next example, we will use the text file <b>Example1.txt</b>. The file is shown as follow:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadFile.png\" width=\"200\" />" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" We read the file: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Read the Example1.txt\n", | |
"\n", | |
"example1 = \"/resources/data/Example1.txt\"\n", | |
"file1 = open(example1, \"r\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" We can view the attributes of the file." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The name of the file:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'/resources/data/Example1.txt'" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Print the path of file\n", | |
"\n", | |
"file1.name" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" The mode the file object is in:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'r'" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Print the mode of file, either 'r' or 'w'\n", | |
"\n", | |
"file1.mode" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can read the file and assign it to a variable :" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'This is line 1 \\nThis is line 2\\nThis is line 3'" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Read the file\n", | |
"\n", | |
"FileContent = file1.read()\n", | |
"FileContent" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The <b>/n</b> means that there is a new line. " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can print the file: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"This is line 1 \n", | |
"This is line 2\n", | |
"This is line 3\n" | |
] | |
} | |
], | |
"source": [ | |
"# Print the file with '\\n' as a new line\n", | |
"\n", | |
"print(FileContent)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The file is of type string:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"str" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Type of file content\n", | |
"\n", | |
"type(FileContent)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" We must close the file object:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# Close file after finish\n", | |
"\n", | |
"file1.close()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<hr>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<h2 id=\"better\">A Better Way to Open a File</h2>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using the <code>with</code> statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"This is line 1 \n", | |
"This is line 2\n", | |
"This is line 3\n" | |
] | |
} | |
], | |
"source": [ | |
"# Open file using with\n", | |
"\n", | |
"with open(example1, \"r\") as file1:\n", | |
" FileContent = file1.read()\n", | |
" print(FileContent)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The file object is closed, you can verify it by running the following cell: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"True" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Verify if the file is closed\n", | |
"\n", | |
"file1.closed" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" We can see the info in the file:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"This is line 1 \n", | |
"This is line 2\n", | |
"This is line 3\n" | |
] | |
} | |
], | |
"source": [ | |
"# See the content of file\n", | |
"\n", | |
"print(FileContent)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The syntax is a little confusing as the file object is after the <code>as</code> statement. We also don’t explicitly close the file. Therefore we summarize the steps in a figure:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadWith.png\" width=\"500\" />" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"This\n" | |
] | |
} | |
], | |
"source": [ | |
"# Read first four characters\n", | |
"\n", | |
"with open(example1, \"r\") as file1:\n", | |
" print(file1.read(4))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Once the method <code>.read(4)</code> is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method <code>read()</code>:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"This\n", | |
" is \n", | |
"line 1 \n", | |
"\n", | |
"This is line 2\n" | |
] | |
} | |
], | |
"source": [ | |
"# Read certain amount of characters\n", | |
"\n", | |
"with open(example1, \"r\") as file1:\n", | |
" print(file1.read(4))\n", | |
" print(file1.read(4))\n", | |
" print(file1.read(7))\n", | |
" print(file1.read(15))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The process is illustrated in the below figure, and each color represents the part of the file read after the method <code>read()</code> is called:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Images/ReadChar.png\" width=\"500\" />" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"This is line 1 \n", | |
"\n", | |
"This \n", | |
"is line 2\n" | |
] | |
} | |
], | |
"source": [ | |
"# Read certain amount of characters\n", | |
"\n", | |
"with open(example1, \"r\") as file1:\n", | |
" print(file1.read(16))\n", | |
" print(file1.read(5))\n", | |
" print(file1.read(9))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can also read one line of the file at a time using the method <code>readline()</code>: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"first line: This is line 1 \n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"# Read one line\n", | |
"\n", | |
"with open(example1, \"r\") as file1:\n", | |
" print(\"first line: \" + file1.readline())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" We can use a loop to iterate through each line: \n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Iteration 0 : This is line 1 \n", | |
"\n", | |
"Iteration 1 : This is line 2\n", | |
"\n", | |
"Iteration 2 : This is line 3\n" | |
] | |
} | |
], | |
"source": [ | |
"# Iterate through the lines\n", | |
"\n", | |
"with open(example1,\"r\") as file1:\n", | |
" i = 0;\n", | |
" for line in file1:\n", | |
" print(\"Iteration\", str(i), \": \", line)\n", | |
" i = i + 1;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can use the method <code>readlines()</code> to save the text file to a list: " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# Read all lines and save as a list\n", | |
"\n", | |
"with open(example1, \"r\") as file1:\n", | |
" FileasList = file1.readlines()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" Each element of the list corresponds to a line of text:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'This is line 1 \\n'" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Print the first line\n", | |
"\n", | |
"FileasList[0]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'This is line 2\\n'" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Print the second line\n", | |
"\n", | |
"FileasList[1]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'This is line 3'" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Print the third line\n", | |
"\n", | |
"FileasList[2]" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment