Skip to content

Instantly share code, notes, and snippets.

@txomon
Last active August 29, 2015 14:25
Show Gist options
  • Save txomon/22318b97c19221afa637 to your computer and use it in GitHub Desktop.
Save txomon/22318b97c19221afa637 to your computer and use it in GitHub Desktop.
Europython 2015 minutes

Monday 11

Keynote

Talk by the two Ola-s that created Django Girls group and an intro to how they felt through the process in getting to be Django devs.

Bokeh part 1

github.com/birdsarah/gtimei or something like that

github pandas tutorial recommended.

Glyphs are shapes in the plots. shapes of the parts

bokeh.models is the lowest level.

Chart class takes data and makes the graph.

Workflows with Flowy

Usually models for parallelization use graphs. This one inspect python code or sth like that.

The idea is to write the dependencies between tasks and then let flowy manage the paralellism.

Packaging problems

Presenter @ionelmc

use check-manifest to check the manifest

Use graft and excludes. Dirty better than not working

setuptools > distutils

For bdist, use MANIFEST.in + include_package_data=True. Discourage use of package_data

data_files is horrible thing to use they will get spread all over the place

Python packaging tools are for libraries not for applications (configs, pre/post install actions)

pex can be used to bundle/vendor embedd stuff

setuptools_scm handles version specification for you

if you import from main.py they will get executed twice because of python -m mypackage

use console_scripts always avoid setup(scripts='') thing

extra_requires are to use like pip install 'mypackage[pdf]', not recommended for dev use extra_requires={'pdf':['reportlab']}. Use tox for dev dependencies

Use tox. For managing virtualenv, you can use also vex and pew. pyenv manages complete interpreters.

there are extra_require features, declarative conditional dependencies.

extras_require={ ':python_version=="2.6"': ['argparse'], ':sys_platform=="win32"': ['colorama'], }

can build wheels with dependencies

Coverage for C extensions: export CFLAGS=-coverage

To upload, use twine. Twine upload metadata and sort of stuff. Pypi doens't allow reuploading distributions anymore.

Versioning has come normalized. PEP-440 is not compatible with semver.org

Cookiecutter tetmplate with these ideas cookiecutter-pylibrary

Knowing the garbage collector

Explanation about how not to have memory leaks with circular references and that stuff. Pypy implementation for garbing collecting (two levesl of gc) with probability of getting halted.

Bokeh 2nd part

Presented several examples. working on maps. Can hack it to use websockets

Storage in python

dusk is for defining stuff as algorithms to make agnostic on data. Need to put library name here for new computing resources optimization

Lighting talks

Cosmic ray * Mutation testing using ast, multiprocessing etc. Instructions * Utilities for iterating over sets of data Fernando * How to meet internals in python * get source, compile with gdb support and symbols, execute and put breakpoints to trace calls Smartfeedz * Social media agreggation tons of current gen techs (nltk, etc.) * going to do it opensource Massage * Massage for donations tuesday 16:45 exxtreme * Story telling RPG creator * Use esperanto for your API, that way you know you don't mix different languages pygame zero * Py game with syntax parser and so

Tuesday

Python in the future

Guido Van Rossum talk. Talking about how the creation of Django Girls and the creation of Python are similar. The talk is assembled as a Q&A Session.

Why to switch to Python 3?

Because it's actually better. Easier to teach. New features. Python2 won't have any updates but security updates.

What is the best of 3.5?

Directory listings faster Multiply overriding for numpy etc. pep 492 * async for async if, etc

Why so many open bugs in Python?

Shit happens and you don't have time for everything. There are also non usefull functionalities. Backwards compatibility breaking. Obscure edge cases.

What will the future bring?

New stuff, no idea

Q&A from audience

Pypy may have hidden incompatibilities.

How graphs can help us understand our code. Code is not text.

neo4j orientdb arangodb titandb for graph data

We usually think of code as text. Graphs are used to optimize etc.

https://quantifiedcode.github.io/code-is-beautiful

The circunvalar complexity measures how complex is your code. Can be used to know how many unit tests you should write.

There are city graphs showing how complex your code is. You can make complex checks on code not by how it looks in text but figuratively.

Deploy to cloud even if you don't want to

Juju talk. It seems interesting because you can define how services upgrade between versions, appart from what you do in puppet/salt, etc.

jujucharms.com

Logging and metrics

Talk about tools. Already on the internet. https://ox.cx/B Talks about statsd graphite etc.

OpenCV

Explanation on how to recognise objetcs. And they explain all the internal algorithms.

opencv has functions to :

  • calibrate camera (make lines more linear)

Logging in python

Specify config by dict.

Non blocking io

Super basic intro. Example to blocking i/o.

Lymph microservices

The idea is to create connected services that see each other and create a complete set of services by having them communicate automatically.

Wednesday

Pymongo etc.

Presenting the mongo searching framework queries, finally giving tips on how to limit queries on size

Type hints in python

Optional, using a pep 484 to introduce functionality. You can make the annotations in a stub file, instead of in place

Release workflows with devpi

The make a workflow where each team has a user, upload there their packages, and then create specific indices mixing them.

Devpi can be used as a pypy for several namespaces, mirror, etc. Support some auth, mainly using the http protoco, mirror, etc. Support some auth, mainly using the http protocoll

Pylint

How pylint internas and astroid work

Distributed app

Sample application. Needs for a distributed click counter:

  • Multidatacenter
  • Unique counter
  • Automatic resize
  • Distributed configuration

Tips:

  • Rely on the stack
  • Use easy to understand tools
  • Simple and small tools
  • Isolated components

Ideas:

  • Microservices * Isolated components really tiny. But if gone really crazy, you will need to comunicate everything

Stack:

  • nginx + uWSGI
  • collector, gathers the http get and retuns total
  • consumer, increments the counter
  • code with flask and gevent loop
  • queues with beanstalkd
  • zookeper * consul * etcd <-> consul because k/v store and multidatacenter
    • Consul has a uWSGI consul plugin

PySpark

Sparck Resilient distributed datasets api, and on top of the spark core, you can find sparksql for example

Can be in top of mesos

Use it for all bigdata

Lighting talks

  • Use scrapy, google maps and Iptyhon to create a map of the flats they wanted
  • RAMLifications * API creator for raml. rogue.ly/ramlfications
  • attrs * Tuples without tuples to overcome the problem that they are actually not taking namedtuple name as a value
  • FUD * Do not judge projects without trying them
  • asyncio fast tutorial * zefciu/lolcats-asyncio
  • py3c.readthedocs.org * Porting c bindings to python 3
  • python unconference september hamburg 4-6
  • Python tips & tricks judy2k/stupid-python-tricks
    • Marge sympson is valid python code
    • Do not modify your callers env although you can
    • -ish funciton to check against 'False' and stuff like that
  • Python subprocesses do not give back memory (it does but fragments memory)
  • RinohType * Latex similar text processor
  • qutebrowser * Navegador webkit-qt con vim para clickar enlaces con accesos directos
    • vimium, vimperaTOR * Working alternatives

Thursday

Keynote

Education and python, how adapted to show to students. @MissPhilbin talks basically about the strengths and flaws of python.

Google SRE Classroom

Paxos BigTable GFS

Friday

Elasticsearch

  • boost factors on filter facilities
  • gauss for example for location
  • field value factor for taking a real score into account
  • random score to boost new stuff
  • really boost searches

Hadoop with python

Map reduce streaming are done through binaries.

Don't use Dumbo nor hadoopy. Use Pydoop < Luigi < MRJob

Mrjob has:

  • Super docs
  • Integration with Amazon EMR
  • Local testing without Hadoop
  • Automatic upload to cluster
  • Multistep jobs

Luigi:

  • Framework with real workflow
    • central scheduler
    • task history
  • automatic upload to cluster
  • integration with snakebite

Pydoop:

  • fast but slower than snakebite
  • hdfs api based on libhdfs
  • implement record reader/writer in python
  • implement partitioner in python
  • difficult to install and small comunity and doesn't upload itself to cluster

Pig MapReduce:

  • Faster than python and can be extended with python
  • uses DSL for itself
  • Pig UDF can be in pig and jython

For complex workflow organization, job chaining and hdfs manipulation Use luigi + Snakebite

For lightning speed mr jobs and beggining difficulties Use pydoop + pig

Amazon emr and testing in local: Use MRJob

https://github.com/maxtepkeev/talks

Documentation 101

Documentation is important, and it's important to be portable, adoptable in workflow and scalable and aadaptable by project.

First, who is going to be my reader?

Second, what do my readers want to know?

Third, when do my readers need this content? Looking at the lifecycle of the software (at the beggining, installation or orientation on how to use it, later configuration problem solving

Fourth, where do my readers consume this content? From terminal? Mobile, on the go?

Fifth, why do my readers even need this content? BAD docs are worse than no docs.

Example about gnome, missing categories. But ordered by category.

Example Archlinux wiki. Search functionality, really not wiki needed

Example RHEL Openstack. Really messy.

DevOps for docs:

  • Unified toolchain. Use dev's docs to generate docs
  • Continuous integration for docs
  • Iterative authoring
  • Curate the content, for example splitting the docs from one big file to many little ones
  • Automation, continuous deployment (not really important), Automated testing for not only code examples, but with tools hemingway, supply pluggin.

Contribution guide lines, provide templates.

How to build a local python community

pip install pyladies , organizer's book, etc. They are full of resouces and tips.

You need to have an objective.

How to know if the event was successful?

  • First conference: There IS someone else
  • Do not overcommit to expectations.

People and resources:

  • You are the most important resource because you can make it happen. So you need to have commitment
  • Other people to help, organizers etc.

Speeding up search with locality sensitive hashing

they create a multidimensional space to point users and products, and they print users closer if they click and farther if they don't.

LSHForest with scikit-learn but it's slow

FLANN is plain to deploy

annoy is great but you can't add point to an existing index

So they create rpforest, overcoming the limitations

Lighting talks

  • multiplayer game dragondemo.net
  • deep neural networks on emotions judy2k/stupid-python-tricks, 3GPU processing hours, with -ish
  • masterkey => override base classes stuff jespino/python-master-key

Ideas for lighting talk

  • Codecombat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment