Skip to content

Instantly share code, notes, and snippets.

@mikeblas
Last active September 26, 2024 07:40
Show Gist options
  • Save mikeblas/8e3a7f7123fd73fd44d463e4d0ae57b4 to your computer and use it in GitHub Desktop.
Save mikeblas/8e3a7f7123fd73fd44d463e4d0ae57b4 to your computer and use it in GitHub Desktop.
FAQ: Where can I get sample data?

It's not so hard to find sample data and data sources to use for interesting side-projects, or just for practicing writing SQL.

In-product sample data

Most DBMSes come with sample databases. You can write lots of interesting queries against them, and usually a tutorial accompanies the database in the documentation.

Some websites are full of sample data sets. Why not download an interesting one, learn to load it up, and write your own interesting queries?

Dataset Websites

There are many websites which host data sets.

Third-party sample data

Of course, some sample data is built for generic tutorials, by third parties:

Practice Sites

There are some sites that let you write queries interactively with canned data, rather than having you download data to play with on your own.

Regular dumps

Some sites publish data by making their backups available, or dumping the data they use to make their own reports.

Live data sources

Some data sources produce data live, as it happens. These are itneresting sources becaue they usually represent slowly changing dimensions, and will need to be accumulated or logged before being stored or processed.

Finding more

There's data everywhere! If you don't like these sources, you can try finding other data sets.

  • Once you know the protocol or format, search for it! The OneBusAway API and GTFS protocols are about public transportation data, so earch for "GTFS Data {YourCity}".
  • Search for APIs for your favortie game or game server.
  • GitHub uses tags for search, so try #sample-databases, #opendata, or #datasets. What other tags can you find?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment