Charles Boutaud, bureau of investigative journalism https://www.thebureauinvestigates.com/profile/charlesboutaud https://pydata.org/london2018/schedule/presentation/49/ Data driven journalism, overview of things done, they use a slack channel to kick-off the conversation, discuss and ask stuff then translated into potential projects
https://catboost.yandex/ https://github.com/catboost/catboost Anna Veronika Dorogush https://pydata.org/london2018/schedule/presentation/34/ Catboost, basically gradient boost for categories instead of numerical features Supports GPU with several speed multiplier wrt xgboost Parameters are very important, particularly learning rate and iterations to find right balance for error convergence
Dat Nguyen https://pydata.org/london2018/schedule/presentation/21/ Performance at zopa, lending platform for micro loans, np-hard problems to allocate money pots that meet investors requirements. Mostly unreadable presentation due to tiny font unfortunately. Anyways, they use https://numba.pydata.org/ an anaconda package that seems to give good results
Ian Ozsvald (pydata organiser) https://pydata.org/london2018/schedule/presentation/32/ Have a look at pandas_profiling https://github.com/pandas-profiling/pandas-profiling testing the Sklearn dummy classifier on Titanic dataset from kaggle, run random forest and notice that performs way better than dummy classifier. Plot yellow brick confusion matrix, check the lib, quite cool viz lib for sklearn https://pythonhosted.org/yellowbrick/introduction.html Eli5 library, explain like I'm 5 http://eli5.readthedocs.io/en/latest/overview.html
Liam P. Kirwin https://pydata.org/london2018/schedule/presentation/37/ big commercial insurance company, they’re old school because works well, so data science helps where traditional models can’t reach. Interestingly enough, they introduced data science libraries to improve communication across levels. Eli5 / lime (explain results from classifiers): https://github.com/marcotcr/lime / sharply (mh, might have misspelled this) libraries for storytelling
This is the guy of the live coding presentation I saw at pydata 17... I'm sorry, I find incomprehensible live coding sessions a useless show off, I will avoid in the future.
Starting from monolithic repo across the world, no Dev/prod separation in 2006 Scaled enormously to 35M lines of code, using hydra, a proprietary object oriented database with distributed optimistic writing. Source code is stored in hydra, so when running the command, it'd fetch the latest version directly from the db Viz tool called perspective been recently opensourced on GitHub https://jpmorganchase.github.io/perspective/
Emmanuelle Gouillart https://pydata.org/london2018/schedule/presentation/51/ core dev for Scikit-image really interesting talk about Empowering learning in her team. Encouraging to write documentation, add api gallery and take all actions to onboard new comers as quickly as possible Check sphinx-gallery https://github.com/sphinx-gallery/sphinx-gallery Check binder, making run notebooks on k8s, it's also featured on sphinx-gallery https://mybinder.org/
Guillermo Christen https://pydata.org/london2018/schedule/presentation/38/ Blah blah on how to use neural networks to create encoders/decoders to compress feature spaces. Meh, not convinced at all.
Adam Hill https://pydata.org/london2018/schedule/presentation/17/ this guy... interesting, seems a good guy, albeit being on the pompous side. showing the dataset from companies house with neo4j, a lot of things looking really odd (i.e. 4,000 children under 2 of age are registered as company owners...). Checkout datakind.org and his presentation http://bit.ly/pyDataLDN2018-Corporate-Ownership
Matti Lyra https://pydata.org/london2018/schedule/presentation/30/ minhash library to compare document similarity https://github.com/mattilyra/LSH kind of meh, focused on similar text only e.g. multiple versions of same article where few things are changed, how to quickly find if they're the same? solution proposed only applies to text