Created
April 13, 2019 05:41
-
-
Save ayulockin/7fd2c39cb15baf2d2815b35fce114063 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Utility to prepare get MNIST dataset as csv\n", | |
"\n", | |
"### Instructions:\n", | |
"\n", | |
"- Download dataset from [here](http://yann.lecun.com/exdb/mnist/).\n", | |
"- Extract them into `MNIST_dataset/data` dir.\n", | |
"- Set the path to the files if required.\n", | |
"\n", | |
"Hurray you have the csv dataset. Now enjoy!" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Imports" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import gzip\n", | |
"import os" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### About MNIST Hand Written Dataset\n", | |
"\n", | |
"- train-images-idx3-ubyte: training set images<br>\n", | |
"- train-labels-idx1-ubyte: training set labels<br>\n", | |
"- t10k-images-idx3-ubyte: test set images<br>\n", | |
"- t10k-labels-idx1-ubyte: test set labels<br>\n", | |
"\n", | |
"The training set contains 60000 examples, and the test set 10000 examples" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Setting path" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"path_to_dataset = os.getcwd()+'/MNIST_dataset/data'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"files = os.listdir(path_to_dataset)\n", | |
"files" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Helper function" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def MNISTtocsv(path, img_file, label_file, out_csv, n):\n", | |
" img = open(path+'/'+img_file, \"rb\")\n", | |
" label = open(path+'/'+label_file, \"rb\")\n", | |
" out = open(out_csv, \"w\")\n", | |
"\n", | |
" img.read(16)\n", | |
" label.read(8)\n", | |
" images = []\n", | |
"\n", | |
" for i in range(n):\n", | |
" image = [ord(label.read(1))]\n", | |
" for j in range(28*28):\n", | |
" image.append(ord(img.read(1)))\n", | |
" images.append(image)\n", | |
"\n", | |
" for image in images:\n", | |
" out.write(\",\".join(str(pix) for pix in image)+\"\\n\")\n", | |
" \n", | |
" img.close()\n", | |
" label.close()\n", | |
" out.close()\n", | |
"\n", | |
"MNISTtocsv(path_to_dataset, files[0], files[1], 'mnist_test.csv',10000)\n", | |
"MNISTtocsv(path_to_dataset, files[2], files[3], 'mnist_train.csv', 60000)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Thank you..surely will plot some..point taken 100%
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Beautifully structured @ayulockin. Clean code. Consider plotting some of the images. It will give a good sense of the data to the viewers.