Skip to content

Instantly share code, notes, and snippets.

@brianspiering
Last active June 9, 2019 22:28
Show Gist options
  • Save brianspiering/1df070d85ab405b607b2222c6eca845c to your computer and use it in GitHub Desktop.
Save brianspiering/1df070d85ab405b607b2222c6eca845c to your computer and use it in GitHub Desktop.
Brian Spiering's Data Science Principles

Brian Spiering's Data Science Principles

  1. All the data, all the time.

  2. Data is its own best model.

  3. End-to-end solution, first. Then iterate.

  4. Simple working system then add complexity (if needed).

  5. Research best practices before inventing.

  6. Choose for features; Stay for community.

  7. Point estimates are always wrong.

  8. Base rates matter. Absolute values matter.

  9. Data Science is an applied field. Research and instructional solutions have only secondary value.

  10. You have assumptions. It is better if they are explicit.

  11. Random is a good baseline. Sometimes random is shockingly hard to beat.

  12. Assume data is streaming. There will be always more data tomorrow.

  13. Make a only single pass over the data.

  14. Real-time means different time-scales to different people. The business proabably doesn't need, thus data science shouldn't support, "real-time" analytics.

  15. An approximate answer right now is often better than extact answer in the distant future (or never).

  16. Keep the data as raw as possible. Create views for specific use cases.

  17. Avoid lossy data compression, for example frequentist linear regression.

  18. Meta-data is as important as data.

  19. Pareto efficiency for data science techniques:

  20. Hash maps and friends (sets, Counters, bloom filters, …)

  21. Bayes Theorem

  22. A relational database

  23. Epsilon-greedy algorithm

  24. Use Python's built-in data types (and their methods) and functions as much as possible.

  25. Every function should have tests.

  26. Benchmarking and profiling trumps complexity analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment