Skip to content

Instantly share code, notes, and snippets.

@kokes
Created March 6, 2016 16:34
Show Gist options
  • Save kokes/052cc2a63e6366fc92b1 to your computer and use it in GitHub Desktop.
Save kokes/052cc2a63e6366fc92b1 to your computer and use it in GitHub Desktop.
Dnes už starý hrátky s datama ke strukturálním fondům
Display the source blob
Display the rendered blob
Raw
{"nbformat_minor": 0, "cells": [{"source": "# Struktur\u00e1ln\u00ed fondy", "cell_type": "markdown", "metadata": {}}, {"execution_count": 94, "cell_type": "code", "source": "from pyquery import PyQuery as pq\nimport pandas as pd\npd.set_option('precision', 2)\nimport numpy as np\nimport urllib\n\nimport PyPDF2\nfrom StringIO import StringIO\nimport re\n\nimport sys\nreload(sys) # Reload does the trick!\nsys.setdefaultencoding('UTF8')", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"source": "##\u00a0Seznam vl\u00e1dn\u00edch instituc\u00ed", "cell_type": "markdown", "metadata": {}}, {"source": "MF\u010cR je m\u00e1 [na webu](http://www.mfcr.cz/cs/verejny-sektor/hospodareni/rozpoctove-ramce-statisticke-informace/verejny-sektor/sektor-vladnich-instituci).", "cell_type": "markdown", "metadata": {}}, {"execution_count": 7, "cell_type": "code", "source": "mfcr_url = 'http://www.mfcr.cz/assets/cs/media/Rozp-ramce-EU-85-2011_2015_Seznam-vladnich-instituci-v-CR_v01.xlsx'\nmfcr_fn, headers = urllib.urlretrieve(mfcr_url)", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"execution_count": 8, "cell_type": "code", "source": "mfcr_xls = pd.ExcelFile(mfcr_fn)\nmfcr_xls.sheet_names", "outputs": [{"execution_count": 8, "output_type": "execute_result", "data": {"text/plain": "[u'Sektor vl\\xe1dn\\xedch instituc\\xed-S.13',\n u'\\u010c\\xedseln\\xedk - pr\\xe1vn\\xedch forem']"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 81, "cell_type": "code", "source": "stfir = mfcr_xls.parse(mfcr_xls.sheet_names[0], skiprows=range(3))\nstfir = stfir.iloc[:,(1,2,5)]\nstfir.columns = 'ico, nazev, nace'.split(', ')\n#stfir.ico = stfir.ico.astype('string')\nstfir.head(5)", "outputs": [{"execution_count": 81, "output_type": "execute_result", "data": {"text/plain": " ico nazev nace\n0 25217968 Franti\u0161kol\u00e1ze\u0148sk\u00e1 v\u00fdtopna, s.r.o. 6820\n1 25399039 OBEC-INVEST, s.r.o. 68310\n2 25700898 Domov D\u0159ev\u010dice, spol. s r.o. 6820\n3 26076357 Strakonick\u00e1 televize, s.r.o. 60200\n4 27577708 Divadla Kladno s.r.o. 90020", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>ico</th>\n <th>nazev</th>\n <th>nace</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>25217968</td>\n <td>Franti\u0161kol\u00e1ze\u0148sk\u00e1 v\u00fdtopna, s.r.o.</td>\n <td>6820</td>\n </tr>\n <tr>\n <th>1</th>\n <td>25399039</td>\n <td>OBEC-INVEST, s.r.o.</td>\n <td>68310</td>\n </tr>\n <tr>\n <th>2</th>\n <td>25700898</td>\n <td>Domov D\u0159ev\u010dice, spol. s r.o.</td>\n <td>6820</td>\n </tr>\n <tr>\n <th>3</th>\n <td>26076357</td>\n <td>Strakonick\u00e1 televize, s.r.o.</td>\n <td>60200</td>\n </tr>\n <tr>\n <th>4</th>\n <td>27577708</td>\n <td>Divadla Kladno s.r.o.</td>\n <td>90020</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "Neni tam budvar.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 156, "cell_type": "code", "source": "stfir[stfir.nazev.str.contains('podnik')]", "outputs": [{"execution_count": 156, "output_type": "execute_result", "data": {"text/plain": " ico nazev nace\n40 25125877 BALMED Praha, st\u00e1tn\u00ed podnik 68202\n459 71377999 Agentura pro podporu podnik\u00e1n\u00ed a investic Czec... 70220\n695 100340 St\u0159edn\u00ed odborn\u00e1 \u0161kola a St\u0159edn\u00ed odborn\u00e9 u\u010dili\u0161... 85322\n1010 575933 St\u0159edn\u00ed \u0161kola slu\u017eeb a podnik\u00e1n\u00ed, Ostrava-Poru... 85321\n1981 47008539 M\u011bstsk\u00fd bytov\u00fd podnik Kralupy nad Vltavou 6820\n2775 49629077 St\u0159edn\u00ed odborn\u00e9 u\u010dili\u0161t\u011b gastronomie a podnik\u00e1n\u00ed 85321\n2918 60075953 St\u0159edn\u00ed \u0161kola obchodu, slu\u017eeb a podnik\u00e1n\u00ed a Vy... 85322\n2988 60153687 Bytov\u00fd podnik Vrchlab\u00ed, p\u0159\u00edsp\u011bvkov\u00e1 organizace 6820\n4115 63731371 St\u0159edn\u00ed \u0161kola automobiln\u00ed, mechanizace a podni... 85322\n4289 64669033 Bytov\u00fd podnik m\u011bsta \u017delezn\u00e9ho Brodu 6820\n5934 70947066 M\u011bstsk\u00fd bytov\u00fd podnik Karolinka, p\u0159\u00edsp\u011bvkov\u00e1 o... 6820\n8210 72053666 Karlovarsk\u00e1 agentura rozvoje podnik\u00e1n\u00ed, p\u0159\u00edsp\u011b... 69200\n8225 72068272 M\u011bstsk\u00fd kulturn\u00ed podnik - FIDIKO \u017damberk 90040", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>ico</th>\n <th>nazev</th>\n <th>nace</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>40</th>\n <td>25125877</td>\n <td>BALMED Praha, st\u00e1tn\u00ed podnik</td>\n <td>68202</td>\n </tr>\n <tr>\n <th>459</th>\n <td>71377999</td>\n <td>Agentura pro podporu podnik\u00e1n\u00ed a investic Czec...</td>\n <td>70220</td>\n </tr>\n <tr>\n <th>695</th>\n <td>100340</td>\n <td>St\u0159edn\u00ed odborn\u00e1 \u0161kola a St\u0159edn\u00ed odborn\u00e9 u\u010dili\u0161...</td>\n <td>85322</td>\n </tr>\n <tr>\n <th>1010</th>\n <td>575933</td>\n <td>St\u0159edn\u00ed \u0161kola slu\u017eeb a podnik\u00e1n\u00ed, Ostrava-Poru...</td>\n <td>85321</td>\n </tr>\n <tr>\n <th>1981</th>\n <td>47008539</td>\n <td>M\u011bstsk\u00fd bytov\u00fd podnik Kralupy nad Vltavou</td>\n <td>6820</td>\n </tr>\n <tr>\n <th>2775</th>\n <td>49629077</td>\n <td>St\u0159edn\u00ed odborn\u00e9 u\u010dili\u0161t\u011b gastronomie a podnik\u00e1n\u00ed</td>\n <td>85321</td>\n </tr>\n <tr>\n <th>2918</th>\n <td>60075953</td>\n <td>St\u0159edn\u00ed \u0161kola obchodu, slu\u017eeb a podnik\u00e1n\u00ed a Vy...</td>\n <td>85322</td>\n </tr>\n <tr>\n <th>2988</th>\n <td>60153687</td>\n <td>Bytov\u00fd podnik Vrchlab\u00ed, p\u0159\u00edsp\u011bvkov\u00e1 organizace</td>\n <td>6820</td>\n </tr>\n <tr>\n <th>4115</th>\n <td>63731371</td>\n <td>St\u0159edn\u00ed \u0161kola automobiln\u00ed, mechanizace a podni...</td>\n <td>85322</td>\n </tr>\n <tr>\n <th>4289</th>\n <td>64669033</td>\n <td>Bytov\u00fd podnik m\u011bsta \u017delezn\u00e9ho Brodu</td>\n <td>6820</td>\n </tr>\n <tr>\n <th>5934</th>\n <td>70947066</td>\n <td>M\u011bstsk\u00fd bytov\u00fd podnik Karolinka, p\u0159\u00edsp\u011bvkov\u00e1 o...</td>\n <td>6820</td>\n </tr>\n <tr>\n <th>8210</th>\n <td>72053666</td>\n <td>Karlovarsk\u00e1 agentura rozvoje podnik\u00e1n\u00ed, p\u0159\u00edsp\u011b...</td>\n <td>69200</td>\n </tr>\n <tr>\n <th>8225</th>\n <td>72068272</td>\n <td>M\u011bstsk\u00fd kulturn\u00ed podnik - FIDIKO \u017damberk</td>\n <td>90040</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"execution_count": 76, "cell_type": "code", "source": "# doplnime nuly do I\u010c, kter\u00fd kv\u016fli konverzi na int p\u0159i\u0161ly o nuly na za\u010d\u00e1tku\n# for ind, ico in zip(stfir.index, stfir.ico):\n# if len(ico) == 8: continue\n# stfir.loc[ind,'ico'] = '0'*(8-len(ico)) + ico", "outputs": [], "metadata": {"collapsed": false, "trusted": false}}, {"source": "## P\u0159ehled dotac\u00ed\nAktu\u00e1ln\u00ed spreadsheet je na [adrese MMR](http://www.strukturalni-fondy.cz/cs/Informace-o-cerpani/Seznamy-prijemcu). Jeliko\u017e se ka\u017ed\u00fd m\u011bs\u00edc aktualizuje, budeme muset naj\u00edt aktu\u00e1ln\u00ed odkaz.", "cell_type": "markdown", "metadata": {"collapsed": true}}, {"execution_count": 31, "cell_type": "code", "source": "mmr_url = 'http://www.strukturalni-fondy.cz/cs/Informace-o-cerpani/Seznamy-prijemcu'\nmmr = pq(mmr_url)", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"execution_count": 32, "cell_type": "code", "source": "seznam = mmr.find(\"ul#p_lt_zoneContent_VypisSouboru1_rTree2_ctl00_item_tree a\").attr('href')\nszn_fn, headers = urllib.urlretrieve('http://www.strukturalni-fondy.cz' + seznam)", "outputs": [], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 74, "cell_type": "code", "source": "dot = pd.read_excel(szn_fn, skiprows=range(8))\ndot.columns = 'firma, ico, popis, program, fond, datum_alokace, castka_alok, datum_platby, castka_propl, stav'.split(', ')\n\ndot.head(5)", "outputs": [{"execution_count": 74, "output_type": "execute_result", "data": {"text/plain": " firma ico \\\n0 \" A L L E G R O \" s.r.o. 48951862 \n1 \" A L L E G R O \" s.r.o. 48951862 \n2 \" A L L E G R O \" s.r.o. 48951862 \n3 \" A L L E G R O \" s.r.o. 48951862 \n4 \" IZOS s.r.o. \" 47285338 \n\n popis program \\\n0 Inovace v\u00fdroby interi\u00e9rov\u00fdch kovov\u00fdch\\nstavebn... OP Podnik\u00e1n\u00ed a inovace \n1 Podpora exportu firmy Allegro s.r.o. na\\nSlove... OP Podnik\u00e1n\u00ed a inovace \n2 Realizace v\u00fdroby kovov\u00fdch stropn\u00edch podhled\u016f OP Podnik\u00e1n\u00ed a inovace \n3 Rekonstrukce objektu v Bratkovic\u00edch OP Podnik\u00e1n\u00ed a inovace \n4 CONECO Bratislava OP Podnik\u00e1n\u00ed a inovace \n\n fond datum_alokace castka_alok datum_platby castka_propl stav \n0 ERDF 19.10.2009 14875000 02.11.2012 14304447 Finalized \n1 ERDF 08.10.2008 1615000 30.09.2009 1164770 Finalized \n2 ERDF 21.03.2008 14869900 23.09.2008 14869900 Finalized \n3 ERDF 07.09.2010 11059350 NaN 0 Cancelled \n4 ERDF 02.09.2010 381650 01.09.2011 285450 Finalized ", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>firma</th>\n <th>ico</th>\n <th>popis</th>\n <th>program</th>\n <th>fond</th>\n <th>datum_alokace</th>\n <th>castka_alok</th>\n <th>datum_platby</th>\n <th>castka_propl</th>\n <th>stav</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>\" A L L E G R O \" s.r.o.</td>\n <td>48951862</td>\n <td>Inovace v\u00fdroby interi\u00e9rov\u00fdch kovov\u00fdch\\nstavebn...</td>\n <td>OP Podnik\u00e1n\u00ed a inovace</td>\n <td>ERDF</td>\n <td>19.10.2009</td>\n <td>14875000</td>\n <td>02.11.2012</td>\n <td>14304447</td>\n <td>Finalized</td>\n </tr>\n <tr>\n <th>1</th>\n <td>\" A L L E G R O \" s.r.o.</td>\n <td>48951862</td>\n <td>Podpora exportu firmy Allegro s.r.o. na\\nSlove...</td>\n <td>OP Podnik\u00e1n\u00ed a inovace</td>\n <td>ERDF</td>\n <td>08.10.2008</td>\n <td>1615000</td>\n <td>30.09.2009</td>\n <td>1164770</td>\n <td>Finalized</td>\n </tr>\n <tr>\n <th>2</th>\n <td>\" A L L E G R O \" s.r.o.</td>\n <td>48951862</td>\n <td>Realizace v\u00fdroby kovov\u00fdch stropn\u00edch podhled\u016f</td>\n <td>OP Podnik\u00e1n\u00ed a inovace</td>\n <td>ERDF</td>\n <td>21.03.2008</td>\n <td>14869900</td>\n <td>23.09.2008</td>\n <td>14869900</td>\n <td>Finalized</td>\n </tr>\n <tr>\n <th>3</th>\n <td>\" A L L E G R O \" s.r.o.</td>\n <td>48951862</td>\n <td>Rekonstrukce objektu v Bratkovic\u00edch</td>\n <td>OP Podnik\u00e1n\u00ed a inovace</td>\n <td>ERDF</td>\n <td>07.09.2010</td>\n <td>11059350</td>\n <td>NaN</td>\n <td>0</td>\n <td>Cancelled</td>\n </tr>\n <tr>\n <th>4</th>\n <td>\" IZOS s.r.o. \"</td>\n <td>47285338</td>\n <td>CONECO Bratislava</td>\n <td>OP Podnik\u00e1n\u00ed a inovace</td>\n <td>ERDF</td>\n <td>02.09.2010</td>\n <td>381650</td>\n <td>01.09.2011</td>\n <td>285450</td>\n <td>Finalized</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 53, "cell_type": "code", "source": "np.shape(dot)", "outputs": [{"execution_count": 53, "output_type": "execute_result", "data": {"text/plain": "(65780, 10)"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 65, "cell_type": "code", "source": "dot.stav.value_counts()", "outputs": [{"execution_count": 65, "output_type": "execute_result", "data": {"text/plain": "Finalized 44718\nOngoing 18332\nCancelled 2729\ndtype: int64"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 104, "cell_type": "code", "source": "dot=dot[dot.stav.isin(['Finalized', 'Ongoing'])]\ndot.stav.value_counts()", "outputs": [{"execution_count": 104, "output_type": "execute_result", "data": {"text/plain": "Finalized 44718\nOngoing 18332\ndtype: int64"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 105, "cell_type": "code", "source": "dot.program.value_counts()", "outputs": [{"execution_count": 105, "output_type": "execute_result", "data": {"text/plain": "OP \u017divotn\u00ed prost\u0159ed\u00ed 16818\nOP Podnik\u00e1n\u00ed a inovace 11976\nOP Vzd\u011bl\u00e1v\u00e1n\u00ed pro\\nkonkurenceschopnost 10592\nIntegrovan\u00fd opera\u010dn\u00ed program 8491\nOP Lidsk\u00e9 zdroje a zam\u011bstnanost 5404\nOP Ryb\u00e1\u0159stv\u00ed 1182\nROP NUTS II St\u0159edn\u00ed Morava 1172\nROP NUTS II Jihoz\u00e1pad 1117\nROP NUTS II St\u0159edn\u00ed \u010cechy 1030\nOP Praha Adaptabilita 1021\nROP NUTS II Moravskoslezsko 996\nROP NUTS II Jihov\u00fdchod 925\nROP NUTS II Severov\u00fdchod 720\nROP NUTS II Severoz\u00e1pad 490\nOP Praha Konkurenceschopnost 352\nOP Technick\u00e1 pomoc 328\nOP Doprava 244\nOP V\u00fdzkum a v\u00fdvoj pro inovace 192\ndtype: int64"}, "metadata": {}}], "metadata": {"scrolled": false, "collapsed": false, "trusted": false}}, {"source": "Doprava m\u00e1 m\u00e1lo projekt\u016f, ale hodn\u011b pen\u011bz. TODO: pen\u011bz na pr\u016fm\u011brnej projekt", "cell_type": "markdown", "metadata": {}}, {"execution_count": 111, "cell_type": "code", "source": "dot.groupby('program').sum().loc[:,'castka_alok']/10**9", "outputs": [{"execution_count": 111, "output_type": "execute_result", "data": {"text/plain": "program\nIntegrovan\u00fd opera\u010dn\u00ed program 43.0\nOP Doprava 152.5\nOP Lidsk\u00e9 zdroje a zam\u011bstnanost 58.8\nOP Podnik\u00e1n\u00ed a inovace 93.6\nOP Praha Adaptabilita 3.0\nOP Praha Konkurenceschopnost 7.0\nOP Ryb\u00e1\u0159stv\u00ed 0.7\nOP Technick\u00e1 pomoc 4.2\nOP Vzd\u011bl\u00e1v\u00e1n\u00ed pro\\nkonkurenceschopnost 46.3\nOP V\u00fdzkum a v\u00fdvoj pro inovace 50.8\nOP \u017divotn\u00ed prost\u0159ed\u00ed 118.3\nROP NUTS II Jihov\u00fdchod 22.5\nROP NUTS II Jihoz\u00e1pad 18.9\nROP NUTS II Moravskoslezsko 19.4\nROP NUTS II Severov\u00fdchod 17.7\nROP NUTS II Severoz\u00e1pad 18.2\nROP NUTS II St\u0159edn\u00ed Morava 18.0\nROP NUTS II St\u0159edn\u00ed \u010cechy 17.5\nName: castka_alok, dtype: float64"}, "metadata": {}}], "metadata": {"scrolled": true, "collapsed": false, "trusted": false}}, {"source": "##\u00a0St\u00e1tn\u00ed podniky", "cell_type": "markdown", "metadata": {}}, {"source": "Ve\u0159ejn\u00fd firmy berou tolik:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 112, "cell_type": "code", "source": "dot.loc[dot.ico.isin(stfir.ico),'castka_alok'].sum()/10**9", "outputs": [{"execution_count": 112, "output_type": "execute_result", "data": {"text/plain": "500.91038528135999"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "Z celkov\u00fdho poolu asi 700 mld.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 115, "cell_type": "code", "source": "dot.castka_alok.sum()/10**9", "outputs": [{"execution_count": 115, "output_type": "execute_result", "data": {"text/plain": "710.27491644792008"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "Of michael kors, chybej krajema a mestama zrizeny podniky - dopravni podniky, nemocnice.", "cell_type": "markdown", "metadata": {}}, {"source": "## Velk\u00fd ryby", "cell_type": "markdown", "metadata": {}}, {"source": "Chce to se\u0159adit podle sumy \u010d\u00e1stek.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 129, "cell_type": "code", "source": "dot.groupby('ico').sum()", "outputs": [{"execution_count": 129, "output_type": "execute_result", "data": {"text/plain": " castka_alok castka_propl\nico \n0 4.3e+08 4.3e+08\n205 3.1e+07 1.8e+07\n1171 8.5e+08 6.9e+08\n1350 2.9e+06 2.6e+06\n1481 2.5e+07 6.5e+06\n1490 6.3e+06 1.4e+05\n2739 1.7e+09 1.2e+09\n5886 9.7e+09 1.8e+09\n6033 4.5e+06 2.0e+06\n6599 1.5e+08 1.1e+08\n6947 9.3e+08 7.2e+08\n6963 4.1e+08 2.8e+08\n7064 8.1e+09 3.1e+09\n7536 1.5e+09 8.2e+08\n8648 4.8e+07 2.9e+07\n8702 1.9e+08 1.5e+08\n8753 1.1e+07 3.6e+06\n9971 3.8e+07 2.5e+07\n10235 4.1e+06 4.1e+06\n10367 1.1e+07 4.7e+06\n10545 1.0e+07 9.9e+06\n10944 3.3e+07 1.9e+07\n11835 1.5e+08 1.1e+08\n12114 3.0e+06 2.1e+06\n12122 4.0e+07 1.4e+07\n12131 2.5e+07 1.8e+07\n12190 5.7e+07 1.3e+07\n12556 1.6e+06 5.5e+05\n12645 1.2e+07 3.8e+06\n13251 4.8e+06 4.0e+06\n... ... ...\n76168051 1.3e+06 1.0e+06\n76222071 1.6e+06 1.6e+06\n76614174 3.8e+06 3.8e+06\n76626270 1.0e+05 1.0e+05\n86594265 3.9e+05 3.9e+05\n86652036 8.5e+06 8.4e+06\n86726889 4.3e+05 4.3e+05\n86770713 1.0e+07 8.7e+06\n86796968 1.2e+06 1.2e+06\n86797042 7.1e+05 7.1e+05\n86870971 8.5e+04 8.5e+04\n86891332 8.5e+04 8.5e+04\n86952480 3.5e+05 3.4e+05\n86956931 4.7e+06 4.7e+06\n87074079 2.8e+06 2.8e+06\n87149257 3.0e+07 3.0e+07\n87157624 1.7e+05 1.6e+05\n87192993 2.1e+05 2.1e+05\n87592827 8.6e+05 6.8e+05\n87847795 2.1e+06 2.1e+06\n87964911 1.9e+06 1.9e+06\n88067807 1.3e+06 1.3e+06\n88068331 5.4e+06 5.4e+06\n88124304 1.3e+06 1.3e+06\n88387402 1.9e+06 1.3e+06\n88701221 8.5e+05 5.1e+05\n88751287 3.3e+06 3.3e+06\n88799387 4.0e+06 3.8e+06\n88845907 6.2e+06 5.7e+06\n88893472 8.3e+04 0.0e+00\n\n[24348 rows x 2 columns]", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>castka_alok</th>\n <th>castka_propl</th>\n </tr>\n <tr>\n <th>ico</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>4.3e+08</td>\n <td>4.3e+08</td>\n </tr>\n <tr>\n <th>205</th>\n <td>3.1e+07</td>\n <td>1.8e+07</td>\n </tr>\n <tr>\n <th>1171</th>\n <td>8.5e+08</td>\n <td>6.9e+08</td>\n </tr>\n <tr>\n <th>1350</th>\n <td>2.9e+06</td>\n <td>2.6e+06</td>\n </tr>\n <tr>\n <th>1481</th>\n <td>2.5e+07</td>\n <td>6.5e+06</td>\n </tr>\n <tr>\n <th>1490</th>\n <td>6.3e+06</td>\n <td>1.4e+05</td>\n </tr>\n <tr>\n <th>2739</th>\n <td>1.7e+09</td>\n <td>1.2e+09</td>\n </tr>\n <tr>\n <th>5886</th>\n <td>9.7e+09</td>\n <td>1.8e+09</td>\n </tr>\n <tr>\n <th>6033</th>\n <td>4.5e+06</td>\n <td>2.0e+06</td>\n </tr>\n <tr>\n <th>6599</th>\n <td>1.5e+08</td>\n <td>1.1e+08</td>\n </tr>\n <tr>\n <th>6947</th>\n <td>9.3e+08</td>\n <td>7.2e+08</td>\n </tr>\n <tr>\n <th>6963</th>\n <td>4.1e+08</td>\n <td>2.8e+08</td>\n </tr>\n <tr>\n <th>7064</th>\n <td>8.1e+09</td>\n <td>3.1e+09</td>\n </tr>\n <tr>\n <th>7536</th>\n <td>1.5e+09</td>\n <td>8.2e+08</td>\n </tr>\n <tr>\n <th>8648</th>\n <td>4.8e+07</td>\n <td>2.9e+07</td>\n </tr>\n <tr>\n <th>8702</th>\n <td>1.9e+08</td>\n <td>1.5e+08</td>\n </tr>\n <tr>\n <th>8753</th>\n <td>1.1e+07</td>\n <td>3.6e+06</td>\n </tr>\n <tr>\n <th>9971</th>\n <td>3.8e+07</td>\n <td>2.5e+07</td>\n </tr>\n <tr>\n <th>10235</th>\n <td>4.1e+06</td>\n <td>4.1e+06</td>\n </tr>\n <tr>\n <th>10367</th>\n <td>1.1e+07</td>\n <td>4.7e+06</td>\n </tr>\n <tr>\n <th>10545</th>\n <td>1.0e+07</td>\n <td>9.9e+06</td>\n </tr>\n <tr>\n <th>10944</th>\n <td>3.3e+07</td>\n <td>1.9e+07</td>\n </tr>\n <tr>\n <th>11835</th>\n <td>1.5e+08</td>\n <td>1.1e+08</td>\n </tr>\n <tr>\n <th>12114</th>\n <td>3.0e+06</td>\n <td>2.1e+06</td>\n </tr>\n <tr>\n <th>12122</th>\n <td>4.0e+07</td>\n <td>1.4e+07</td>\n </tr>\n <tr>\n <th>12131</th>\n <td>2.5e+07</td>\n <td>1.8e+07</td>\n </tr>\n <tr>\n <th>12190</th>\n <td>5.7e+07</td>\n <td>1.3e+07</td>\n </tr>\n <tr>\n <th>12556</th>\n <td>1.6e+06</td>\n <td>5.5e+05</td>\n </tr>\n <tr>\n <th>12645</th>\n <td>1.2e+07</td>\n <td>3.8e+06</td>\n </tr>\n <tr>\n <th>13251</th>\n <td>4.8e+06</td>\n <td>4.0e+06</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>76168051</th>\n <td>1.3e+06</td>\n <td>1.0e+06</td>\n </tr>\n <tr>\n <th>76222071</th>\n <td>1.6e+06</td>\n <td>1.6e+06</td>\n </tr>\n <tr>\n <th>76614174</th>\n <td>3.8e+06</td>\n <td>3.8e+06</td>\n </tr>\n <tr>\n <th>76626270</th>\n <td>1.0e+05</td>\n <td>1.0e+05</td>\n </tr>\n <tr>\n <th>86594265</th>\n <td>3.9e+05</td>\n <td>3.9e+05</td>\n </tr>\n <tr>\n <th>86652036</th>\n <td>8.5e+06</td>\n <td>8.4e+06</td>\n </tr>\n <tr>\n <th>86726889</th>\n <td>4.3e+05</td>\n <td>4.3e+05</td>\n </tr>\n <tr>\n <th>86770713</th>\n <td>1.0e+07</td>\n <td>8.7e+06</td>\n </tr>\n <tr>\n <th>86796968</th>\n <td>1.2e+06</td>\n <td>1.2e+06</td>\n </tr>\n <tr>\n <th>86797042</th>\n <td>7.1e+05</td>\n <td>7.1e+05</td>\n </tr>\n <tr>\n <th>86870971</th>\n <td>8.5e+04</td>\n <td>8.5e+04</td>\n </tr>\n <tr>\n <th>86891332</th>\n <td>8.5e+04</td>\n <td>8.5e+04</td>\n </tr>\n <tr>\n <th>86952480</th>\n <td>3.5e+05</td>\n <td>3.4e+05</td>\n </tr>\n <tr>\n <th>86956931</th>\n <td>4.7e+06</td>\n <td>4.7e+06</td>\n </tr>\n <tr>\n <th>87074079</th>\n <td>2.8e+06</td>\n <td>2.8e+06</td>\n </tr>\n <tr>\n <th>87149257</th>\n <td>3.0e+07</td>\n <td>3.0e+07</td>\n </tr>\n <tr>\n <th>87157624</th>\n <td>1.7e+05</td>\n <td>1.6e+05</td>\n </tr>\n <tr>\n <th>87192993</th>\n <td>2.1e+05</td>\n <td>2.1e+05</td>\n </tr>\n <tr>\n <th>87592827</th>\n <td>8.6e+05</td>\n <td>6.8e+05</td>\n </tr>\n <tr>\n <th>87847795</th>\n <td>2.1e+06</td>\n <td>2.1e+06</td>\n </tr>\n <tr>\n <th>87964911</th>\n <td>1.9e+06</td>\n <td>1.9e+06</td>\n </tr>\n <tr>\n <th>88067807</th>\n <td>1.3e+06</td>\n <td>1.3e+06</td>\n </tr>\n <tr>\n <th>88068331</th>\n <td>5.4e+06</td>\n <td>5.4e+06</td>\n </tr>\n <tr>\n <th>88124304</th>\n <td>1.3e+06</td>\n <td>1.3e+06</td>\n </tr>\n <tr>\n <th>88387402</th>\n <td>1.9e+06</td>\n <td>1.3e+06</td>\n </tr>\n <tr>\n <th>88701221</th>\n <td>8.5e+05</td>\n <td>5.1e+05</td>\n </tr>\n <tr>\n <th>88751287</th>\n <td>3.3e+06</td>\n <td>3.3e+06</td>\n </tr>\n <tr>\n <th>88799387</th>\n <td>4.0e+06</td>\n <td>3.8e+06</td>\n </tr>\n <tr>\n <th>88845907</th>\n <td>6.2e+06</td>\n <td>5.7e+06</td>\n </tr>\n <tr>\n <th>88893472</th>\n <td>8.3e+04</td>\n <td>0.0e+00</td>\n </tr>\n </tbody>\n</table>\n<p>24348 rows \u00d7 2 columns</p>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"source": "## Agrohr\u00e1tky\nZe str\u00e1nky Agrofertu jsme vzali [v\u00fdro\u010dn\u00ed zpr\u00e1vu](http://www.agrofert.cz/?288/vyrocni-zpravy) pro rok 2012. Z toho dostanem I\u010cO k\u00f3dy.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 117, "cell_type": "code", "source": "def getPDFContent(path, pages=[]):\n content = \"\"\n p = file(path, \"rb\")\n pdf = PyPDF2.PdfFileReader(p)\n if pages:\n for i in pages:\n content += pdf.getPage(i).extractText() + \"\\n\"\n else:\n numPages = pdf.getNumPages()\n for i in range(numPages):\n content += pdf.getPage(i).extractText() + \"\\n\"\n content = \" \".join(content.replace(u\"\\xa0\", \" \").strip().split())\n return content", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"execution_count": 118, "cell_type": "code", "source": "ag_url = 'http://www.agrofert.cz/f/?4235/vyrocni-zprava-2012'\n\nag_fn, header = urllib.urlretrieve(ag_url)", "outputs": [], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 119, "cell_type": "code", "source": "vz = getPDFContent(ag_fn)", "outputs": [], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": 120, "cell_type": "code", "source": "#vz = vz.encode('ascii', 'ignore')\nm = re.findall(r\"([0-9]{8})\", vz)\n\nagroico = [int(j) for j in np.unique(m)]", "outputs": [], "metadata": {"collapsed": false, "trusted": false}}, {"source": "Ani ne 2 mld pro Andrejka", "cell_type": "markdown", "metadata": {}}, {"execution_count": 134, "cell_type": "code", "source": "round(dot.loc[dot.ico.isin(agroico),'castka_alok'].sum()/10**9, 3)", "outputs": [{"execution_count": 134, "output_type": "execute_result", "data": {"text/plain": "1.649"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "Na 120 projektech", "cell_type": "markdown", "metadata": {}}, {"execution_count": 158, "cell_type": "code", "source": "np.shape(dot.loc[dot.ico.isin(agroico),'castka_alok'])", "outputs": [{"execution_count": 158, "output_type": "execute_result", "data": {"text/plain": "(120,)"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.3", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment