Skip to content

Instantly share code, notes, and snippets.

@daefresh
Last active December 13, 2024 20:17
Show Gist options
  • Save daefresh/d386685a0c9f614daa6edb3aa2e47751 to your computer and use it in GitHub Desktop.
Save daefresh/d386685a0c9f614daa6edb3aa2e47751 to your computer and use it in GitHub Desktop.
[Advanti's Top 150+ Open Data Tools on GitHub in 2021.] This is a hand-curated list of tools 🔨 that I refer to when designing data platforms ❤️. Connect with me on LinkedIn if you'd like this! https://www.linkedin.com/in/douglaseisenstein/
Repo Name Stars Last Commit Timestamp GitHub URL Project URL Project Description
airbyte 3829 Tue 31 Aug 2021 12:27:10 GMT https://github.com/airbytehq/airbyte https://airbyte.io Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses lakes and databases.
airflow 22946 Tue 31 Aug 2021 13:25:22 GMT https://github.com/apache/airflow https://airflow.apache.org/ Apache Airflow - A platform to programmatically author schedule and monitor workflows
amazoncaptcha 140 Sat 17 Jul 2021 02:06:48 GMT https://github.com/a-maliarov/amazoncaptcha Pure Python lightweight Pillow-based solver for Amazon's text captcha.
amundsen 2572 Fri 27 Aug 2021 04:50:38 GMT https://github.com/amundsen-io/amundsen https://www.amundsen.io/amundsen/ Amundsen is a metadata driven application for improving the productivity of data analysts data scientists and engineers when interacting with data.
arangodb 11554 Tue 31 Aug 2021 12:03:58 GMT https://github.com/arangodb/arangodb https://www.arangodb.com 🥑 ArangoDB is a native multi-model database with flexible data models for documents graphs and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
ArchiveBox 11665 Wed 11 Aug 2021 15:12:58 GMT https://github.com/ArchiveBox/ArchiveBox https://archivebox.io 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc. saves HTML JS PDFs media and more...
arctic 2389 Fri 23 Jul 2021 15:29:07 GMT https://github.com/man-group/arctic https://arctic.readthedocs.io/en/latest/ High performance datastore for time series and tick data
arrow 8311 Tue 31 Aug 2021 12:51:28 GMT https://github.com/apache/arrow https://arrow.apache.org/ Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
aws-data-wrangler 2089 Mon 23 Aug 2021 16:21:37 GMT https://github.com/awslabs/aws-data-wrangler https://aws-data-wrangler.readthedocs.io Pandas on AWS - Easy integration with Athena Glue Redshift Timestream QuickSight Chime CloudWatchLogs DynamoDB EMR SecretManager PostgreSQL MySQL SQLServer and S3 (Parquet CSV JSON and EXCEL).
backstage 12852 Tue 31 Aug 2021 12:01:20 GMT https://github.com/backstage/backstage https://backstage.io Backstage is an open platform for building developer portals
beam 4967 Mon 30 Aug 2021 22:17:18 GMT https://github.com/apache/beam https://beam.apache.org/ Apache Beam is a unified programming model for Batch and Streaming
benthos 3372 Sat 28 Aug 2021 21:01:10 GMT https://github.com/Jeffail/benthos https://www.benthos.dev Declarative stream processing for mundane tasks and data engineering
blazingsql 1567 Mon 30 Aug 2021 18:54:09 GMT https://github.com/BlazingDB/blazingsql https://blazingsql.com BlazingSQL is a lightweight GPU accelerated SQL engine for Python. Built on RAPIDS cuDF.
bonobo 1457 Wed 10 Mar 2021 15:44:00 GMT https://github.com/python-bonobo/bonobo https://www.bonobo-project.org/ Extract Transform Load for Python 3.5+
bytebase 951 Tue 31 Aug 2021 06:40:56 GMT https://github.com/bytebase/bytebase https://bytebase.com Web-based zero-config dependency-free database schema change and version control tool for teams. Public demo: https://demo.bytebase.com
calcite 2633 Sat 28 Aug 2021 20:19:10 GMT https://github.com/apache/calcite https://calcite.apache.org/ Apache Calcite
cayley 13917 Fri 18 Jun 2021 13:25:36 GMT https://github.com/cayleygraph/cayley https://cayley.io An open-source graph database
celery 17829 Tue 31 Aug 2021 12:21:48 GMT https://github.com/celery/celery https://docs.celeryproject.org/en/stable/index.html Distributed Task Queue (development branch)
cerberus 2569 Wed 05 May 2021 20:47:21 GMT https://github.com/pyeve/cerberus http://python-cerberus.org Lightweight extensible data validation library for Python
chartify 2968 Fri 05 Feb 2021 18:49:02 GMT https://github.com/spotify/chartify Python library that makes it easy for data scientists to create charts.
Chronicle-Bytes 275 Tue 31 Aug 2021 08:38:09 GMT https://github.com/OpenHFT/Chronicle-Bytes http://chronicle.software Chronicle Bytes has a similar purpose to Java NIO's ByteBuffer with many extensions
ClickHouse 18864 Tue 31 Aug 2021 14:09:24 GMT https://github.com/ClickHouse/ClickHouse https://clickhouse.tech ClickHouse® is a free analytics DBMS for big data
cockroach 21505 Tue 31 Aug 2021 09:26:43 GMT https://github.com/cockroachdb/cockroach https://www.cockroachlabs.com CockroachDB - the open source cloud-native distributed SQL database.
crate 3170 Mon 30 Aug 2021 15:27:22 GMT https://github.com/crate/crate https://crate.io/products/cratedb/ CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.
crux 1450 Thu 26 Aug 2021 08:58:48 GMT https://github.com/juxt/crux https://opencrux.com General purpose bitemporal database for SQL Datalog & graph queries
cudf 4132 Tue 31 Aug 2021 11:54:03 GMT https://github.com/rapidsai/cudf http://rapids.ai cuDF - GPU DataFrame Library
dagster 3713 Tue 31 Aug 2021 12:54:32 GMT https://github.com/dagster-io/dagster https://dagster.io A data orchestrator for machine learning analytics and ETL.
dapr 14405 Tue 31 Aug 2021 14:16:23 GMT https://github.com/dapr/dapr https://dapr.io Dapr is a portable event-driven runtime for building distributed applications across cloud and edge.
dash 15059 Thu 26 Aug 2021 14:20:20 GMT https://github.com/plotly/dash https://plotly.com/dash Analytical Web Apps for Python R Julia and Jupyter. No JavaScript Required.
dask 8721 Tue 31 Aug 2021 13:59:40 GMT https://github.com/dask/dask https://dask.org Parallel computing with task scheduling
datacatalog 42 Thu 29 Jul 2021 04:16:56 GMT https://github.com/flyteorg/datacatalog https://flyte.org Data Catalog is a service for indexing parameterized strongly-typed data artifacts across revisions. It also powers Flytes memoization system
datafuse 1961 Tue 31 Aug 2021 03:54:15 GMT https://github.com/datafuselabs/datafuse https://datafuse.rs An elastic and scalable Cloud Warehouse offers Blazing Fast Query and combines Elasticity Simplicity Low cost of the Cloud built to make the Data Cloud easy
datahub 3468 Tue 31 Aug 2021 03:24:22 GMT https://github.com/linkedin/datahub https://datahubproject.io A Metadata Platform for the Modern Data Stack
datapane 375 Wed 25 Aug 2021 21:15:17 GMT https://github.com/datapane/datapane https://datapane.com Datapane makes it simple to build shareable reports from Python.
DataProfiler 636 Wed 25 Aug 2021 19:21:05 GMT https://github.com/capitalone/DataProfiler https://capitalone.github.io/DataProfiler What's in your data? Extract schema statistics and entities from datasets
dbeaver 21922 Tue 31 Aug 2021 10:43:23 GMT https://github.com/dbeaver/dbeaver https://dbeaver.io Free universal database tool and SQL client
dbt 3420 Tue 31 Aug 2021 14:08:20 GMT https://github.com/dbt-labs/dbt https://www.getdbt.com/ dbt (data build tool) enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
debezium 5281 Tue 31 Aug 2021 12:09:36 GMT https://github.com/debezium/debezium https://debezium.io Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
dedupe 3131 Mon 23 Aug 2021 22:35:07 GMT https://github.com/dedupeio/dedupe https://docs.dedupe.io :id: A python library for accurate and scalable fuzzy matching record deduplication and entity-resolution.
delta 3596 Fri 27 Aug 2021 22:29:52 GMT https://github.com/delta-io/delta https://delta.io An open-source storage layer that brings scalable ACID transactions to Apache Spark™ and big data workloads.
delta-sharing 233 Wed 25 Aug 2021 22:06:12 GMT https://github.com/delta-io/delta-sharing https://delta.io/sharing An open protocol for secure data sharing
dgraph 16555 Tue 31 Aug 2021 07:14:11 GMT https://github.com/dgraph-io/dgraph https://dgraph.io Native GraphQL Database with graph backend
diesel 7228 Wed 25 Aug 2021 11:34:10 GMT https://github.com/diesel-rs/diesel https://diesel.rs A safe extensible ORM and Query Builder for Rust
differential-dataflow 1683 Wed 25 Aug 2021 23:07:06 GMT https://github.com/TimelyDataflow/differential-dataflow None An implementation of differential dataflow using timely dataflow on Rust.
dolt 9359 Mon 30 Aug 2021 22:51:54 GMT https://github.com/dolthub/dolt Dolt – It's Git for Data
dremio-oss 943 Tue 06 Jul 2021 16:57:27 GMT https://github.com/dremio/dremio-oss https://www.dremio.com Dremio - the missing link in modern data
druid 11097 Tue 31 Aug 2021 07:04:00 GMT https://github.com/apache/druid https://druid.apache.org/ Apache Druid: a high performance real-time analytics database.
duckdb 3415 Tue 31 Aug 2021 14:10:37 GMT https://github.com/duckdb/duckdb http://www.duckdb.org DuckDB is an in-process SQL OLAP Database Management System
dvc 8485 Tue 31 Aug 2021 03:21:28 GMT https://github.com/iterative/dvc https://dvc.org 🦉Data Version Control Git for Data & Models ML Experiments Management
egeria 424 Tue 31 Aug 2021 10:17:41 GMT https://github.com/odpi/egeria https://egeria-project.org Open Metadata and Governance
etcd 37061 Mon 30 Aug 2021 12:31:00 GMT https://github.com/etcd-io/etcd https://etcd.io Distributed reliable key-value store for the most critical data of a distributed system
fastapi 35361 Fri 27 Aug 2021 08:49:40 GMT https://github.com/tiangolo/fastapi https://fastapi.tiangolo.com/ FastAPI framework high performance easy to learn fast to code ready for production
feast 2172 Tue 31 Aug 2021 08:07:57 GMT https://github.com/feast-dev/feast https://feast.dev Feature Store for Machine Learning
findatapy 949 Thu 29 Jul 2021 09:50:01 GMT https://github.com/cuemacro/findatapy Python library to download market data via Bloomberg Eikon Quandl Yahoo etc.
flink 16990 Tue 31 Aug 2021 14:09:06 GMT https://github.com/apache/flink Apache Flink
flyte 1608 Mon 30 Aug 2021 19:22:59 GMT https://github.com/flyteorg/flyte https://flyte.org Kubernetes-native workflow automation platform for complex mission-critical data and ML processes at scale. It has been battle-tested at Lyft Spotify Freenome and others and is truly open-source.
flyway 6084 Wed 18 Aug 2021 10:00:07 GMT https://github.com/flyway/flyway https://flywaydb.org Flyway by Redgate • Database Migrations Made Easy.
frictionless-py 453 Mon 30 Aug 2021 12:56:20 GMT https://github.com/frictionlessdata/frictionless-py https://framework.frictionlessdata.io Frictionless is a framework to describe extract validate and transform tabular data.
fuzzywuzzy 8457 Sat 20 Feb 2021 06:36:20 GMT https://github.com/seatgeek/fuzzywuzzy http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/ Fuzzy String Matching in Python
Gerapy 2503 Sat 28 Aug 2021 06:24:24 GMT https://github.com/Gerapy/Gerapy https://docs.gerapy.com/ Distributed Crawler Management Framework Based on Scrapy Scrapyd Django and Vue.js
getting-started 842 Thu 29 Apr 2021 14:20:17 GMT https://github.com/singer-io/getting-started https://singer.io This repository is a getting started guide to Singer.
gobblin 1955 Tue 31 Aug 2021 00:25:27 GMT https://github.com/apache/gobblin https://gobblin.apache.org/ A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion replication organization and lifecycle management for both streaming and batch data ecosystems.
grafana 43646 Tue 31 Aug 2021 13:01:23 GMT https://github.com/grafana/grafana https://grafana.com The open and composable observability and data visualization platform. Visualize metrics logs and traces from multiple sources like Prometheus Loki Elasticsearch InfluxDB Postgres and many more.
gremlin 1894 Tue 05 Sep 2017 10:58:37 GMT https://github.com/tinkerpop/gremlin http://tinkerpop.apache.org/ A Graph Traversal Language (no longer active - see Apache TinkerPop)
hazelcast 4502 Tue 31 Aug 2021 14:08:35 GMT https://github.com/hazelcast/hazelcast https://www.hazelcast.com Open-source distributed computation and storage platform
holoviz 519 Mon 16 Aug 2021 13:25:26 GMT https://github.com/holoviz/holoviz https://holoviz.org/ High-level tools to simplify visualization in Python.
hora 2027 Thu 12 Aug 2021 12:32:50 GMT https://github.com/hora-search/hora http://horasearch.com/ 🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .
hudi 2183 Tue 31 Aug 2021 03:28:35 GMT https://github.com/apache/hudi https://hudi.apache.org/ Upserts Deletes And Incremental Processing on Big Data.
hugegraph 1709 Wed 25 Aug 2021 10:00:27 GMT https://github.com/hugegraph/hugegraph None HugeGraph Database core component including graph engine API and built-in backends
iceberg 1907 Mon 30 Aug 2021 17:51:47 GMT https://github.com/apache/iceberg https://iceberg.apache.org/ Apache Iceberg
ignite 3952 Tue 31 Aug 2021 07:39:05 GMT https://github.com/apache/ignite https://ignite.apache.org/ Apache Ignite
influxdb 22008 Tue 31 Aug 2021 13:52:02 GMT https://github.com/influxdata/influxdb https://influxdata.com Scalable datastore for metrics events and real-time analytics
intake 611 Thu 26 Aug 2021 19:15:23 GMT https://github.com/intake/intake https://intake.readthedocs.io/ Intake is a lightweight package for finding investigating loading and disseminating data.
janusgraph 4145 Sat 28 Aug 2021 19:15:49 GMT https://github.com/JanusGraph/janusgraph https://janusgraph.org JanusGraph: an open-source distributed graph database
kafka 19742 Mon 30 Aug 2021 22:39:25 GMT https://github.com/apache/kafka None Mirror of Apache Kafka
Kats 2895 Mon 30 Aug 2021 19:43:37 GMT https://github.com/facebookresearch/Kats Kats a kit to analyze time series data a lightweight easy-to-use generalizable and extendable framework to perform time series analysis from understanding the key statistics and characteristics detecting change points and anomalies to forecasting future trends.
kedro 4279 Mon 23 Aug 2021 10:43:01 GMT https://github.com/quantumblacklabs/kedro https://kedro.readthedocs.io/ A Python framework for creating reproducible maintainable and modular data science code.
klio 633 Mon 30 Aug 2021 20:45:04 GMT https://github.com/spotify/klio https://docs.klio.io Smarter data pipelines for audio.
lakeFS 1558 Tue 31 Aug 2021 09:17:13 GMT https://github.com/treeverse/lakeFS https://lakefs.io Git-like capabilities for your object storage
marquez 755 Mon 30 Aug 2021 19:22:56 GMT https://github.com/MarquezProject/marquez https://marquezproject.ai Collect aggregate and visualize a data ecosystem's metadata
mars 2204 Tue 31 Aug 2021 10:40:45 GMT https://github.com/mars-project/mars https://docs.pymars.org Mars is a tensor-based unified framework for large-scale data computation which scales numpy pandas scikit-learn and Python functions.
materialize 3008 Tue 31 Aug 2021 03:56:51 GMT https://github.com/MaterializeInc/materialize https://materialize.com Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow.
MechanicalSoup 3810 Wed 09 Jun 2021 20:42:41 GMT https://github.com/MechanicalSoup/MechanicalSoup http://mechanicalsoup.readthedocs.io/en/stable/ A Python library for automating interaction with websites.
MeiliSearch 17931 Tue 31 Aug 2021 14:04:16 GMT https://github.com/meilisearch/MeiliSearch https://docs.meilisearch.com Powerful fast and an easy to use search engine
meltano 37 Tue 31 Aug 2021 00:57:50 GMT https://github.com/meltano/meltano https://meltano.com ELT for the DataOps era- open source data integration tool. This is a read-only mirror of https://gitlab.com/meltano/meltano
metabase 25864 Tue 31 Aug 2021 14:19:48 GMT https://github.com/metabase/metabase https://metabase.com The simplest fastest way to get business intelligence and analytics to everyone in your company :yum:
milvus 7581 Tue 31 Aug 2021 12:45:58 GMT https://github.com/milvus-io/milvus https://milvus.io An open-source vector database for embedding similarity search and AI applications.
modin 6372 Tue 31 Aug 2021 07:28:48 GMT https://github.com/modin-project/modin http://modin.readthedocs.io Modin: Speed up your Pandas workflows by changing a single line of code
mongo 20313 Tue 31 Aug 2021 14:17:29 GMT https://github.com/mongodb/mongo https://www.mongodb.com/ The MongoDB Database
n8n 17289 Tue 31 Aug 2021 09:55:06 GMT https://github.com/n8n-io/n8n https://n8n.io Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services.
nebula-graph 698 Tue 03 Aug 2021 03:18:13 GMT https://github.com/vesoft-inc/nebula-graph https://nebula-graph.io A distributed fast open-source graph database featuring horizontal scalability and high availability
neo4j 9264 Thu 26 Aug 2021 17:14:42 GMT https://github.com/neo4j/neo4j http://neo4j.com Graphs for Everyone
networkx 9589 Tue 31 Aug 2021 13:47:02 GMT https://github.com/networkx/networkx https://networkx.org Network Analysis in Python
nocodb 17005 Tue 31 Aug 2021 09:36:19 GMT https://github.com/nocodb/nocodb https://docs.nocodb.com 🔥 🔥 The Open Source Airtable alternative - Powered by Vue.js 🚀 🚀
nteract 5619 Mon 30 Aug 2021 18:40:42 GMT https://github.com/nteract/nteract https://nteract.io 📘 The interactive computing suite for you! ✨
OpenLineage 443 Fri 27 Aug 2021 23:19:08 GMT https://github.com/OpenLineage/OpenLineage http://openlineage.io An Open Standard for lineage metadata collection
OpenMetadata 324 Tue 31 Aug 2021 10:47:31 GMT https://github.com/open-metadata/OpenMetadata https://open-metadata.org Open Standard for Metadata. A Single place to Discover Collaborate and Get your data right.
orientdb 4338 Tue 31 Aug 2021 10:31:41 GMT https://github.com/orientechnologies/orientdb http://orientdb.com OrientDB is the most versatile DBMS supporting Graph Document Reactive Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master) supports SQL ACID Transactions Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license.
pandas-profiling 7853 Sun 27 Jun 2021 20:16:39 GMT https://github.com/pandas-profiling/pandas-profiling https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/ Create HTML profiling reports from pandas DataFrame objects
pandera 682 Fri 06 Aug 2021 01:57:51 GMT https://github.com/pandera-dev/pandera https://pandera.readthedocs.io A light-weight flexible and expressive pandas data validation library
papermill 4290 Tue 31 Aug 2021 04:39:29 GMT https://github.com/nteract/papermill http://papermill.readthedocs.io/en/latest/ 📚 Parameterize execute and analyze notebooks
pilosa 2197 Wed 27 Jan 2021 14:13:04 GMT https://github.com/pilosa/pilosa https://www.pilosa.com Pilosa is an open source distributed bitmap index that dramatically accelerates queries across multiple massive data sets.
pinot 3534 Mon 30 Aug 2021 20:55:02 GMT https://github.com/apache/pinot https://pinot.apache.org Apache Pinot (Incubating) - A realtime distributed OLAP datastore
polyaxon 2893 Tue 31 Aug 2021 14:17:01 GMT https://github.com/polyaxon/polyaxon https://polyaxon.com Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
prefect 6764 Mon 30 Aug 2021 21:05:36 GMT https://github.com/PrefectHQ/prefect https://prefect.io The easiest way to automate your data
presto 12352 Tue 31 Aug 2021 01:46:07 GMT https://github.com/prestodb/presto http://prestodb.github.io The official home of the Presto distributed SQL query engine for big data
prettier 40422 Tue 31 Aug 2021 04:40:52 GMT https://github.com/prettier/prettier https://prettier.io Prettier is an opinionated code formatter.
prisma 15789 Tue 31 Aug 2021 08:48:55 GMT https://github.com/prisma/prisma https://www.prisma.io Next-generation ORM for Node.js & TypeScript PostgreSQL MySQL MariaDB SQL Server SQLite & MongoDB (Preview)
protobuf 50320 Mon 30 Aug 2021 20:55:43 GMT https://github.com/protocolbuffers/protobuf https://developers.google.com/protocol-buffers/ Protocol Buffers - Google's data interchange format
pulsar 9495 Tue 31 Aug 2021 12:41:43 GMT https://github.com/apache/pulsar https://pulsar.apache.org/ Apache Pulsar - distributed pub-sub messaging system
pycaret 3939 Mon 23 Aug 2021 19:58:30 GMT https://github.com/pycaret/pycaret https://www.pycaret.org An open-source low-code machine learning library in Python
pytorch-lightning 15154 Tue 31 Aug 2021 09:30:43 GMT https://github.com/PyTorchLightning/pytorch-lightning https://pytorchlightning.ai The lightweight PyTorch wrapper for high-performance AI research. Scale your models not the boilerplate.
questdb 4513 Tue 31 Aug 2021 13:50:25 GMT https://github.com/questdb/questdb https://questdb.io An open source SQL database designed to process time series data faster
quilt 1069 Tue 31 Aug 2021 08:37:51 GMT https://github.com/quiltdata/quilt https://quiltdata.com Quilt is a self-organizing data hub for S3
ray 17122 Tue 31 Aug 2021 13:26:25 GMT https://github.com/ray-project/ray https://ray.io An open source framework that provides a simple universal API for building distributed applications. Ray is packaged with RLlib a scalable reinforcement learning library and Tune a scalable hyperparameter tuning library.
re-data 395 Tue 31 Aug 2021 11:48:42 GMT https://github.com/re-data/re-data http://getre.io re_data - data quality framework. Build on top of dbt re_data helps you find debug and resolve problems in your data.
redis 50784 Tue 31 Aug 2021 06:25:36 GMT https://github.com/redis/redis http://redis.io Redis is an in-memory database that persists on disk. The data model is key-value but many different kind of values are supported: Strings Lists Sets Sorted Sets Hashes Streams HyperLogLogs Bitmaps.
RedisGraph 1431 Wed 25 Aug 2021 16:48:34 GMT https://github.com/RedisGraph/RedisGraph https://redisgraph.io A graph database as a Redis module
reflow 871 Tue 10 Aug 2021 22:50:01 GMT https://github.com/grailbio/reflow A language and runtime for distributed incremental data processing in the cloud
rethinkdb 24924 Sat 15 May 2021 22:30:17 GMT https://github.com/rethinkdb/rethinkdb https://rethinkdb.com The open-source database for the realtime web.
rocksdb 20630 Tue 31 Aug 2021 02:10:55 GMT https://github.com/facebook/rocksdb http://rocksdb.org A library that provides an embeddable persistent key-value store for fast storage.
RustPython 8977 Mon 30 Aug 2021 15:03:20 GMT https://github.com/RustPython/RustPython https://rustpython.github.io A Python Interpreter written in Rust
rxdb 15919 Mon 30 Aug 2021 20:52:25 GMT https://github.com/pubkey/rxdb https://rxdb.info/ 🔄 A realtime Database for JavaScript Applications
scikit-learn 46997 Tue 31 Aug 2021 14:19:48 GMT https://github.com/scikit-learn/scikit-learn https://scikit-learn.org scikit-learn: machine learning in Python
scio 2185 Tue 31 Aug 2021 09:56:30 GMT https://github.com/spotify/scio https://spotify.github.io/scio A Scala API for Apache Beam and Google Cloud Dataflow.
scrapy 41419 Tue 24 Aug 2021 10:15:29 GMT https://github.com/scrapy/scrapy https://scrapy.org Scrapy a fast high-level web crawling & scraping framework for Python.
seata 20657 Tue 31 Aug 2021 06:47:39 GMT https://github.com/seata/seata https://seata.io :fire: Seata is an easy-to-use high-performance open source distributed transaction solution.
selenium 21519 Tue 31 Aug 2021 07:30:49 GMT https://github.com/SeleniumHQ/selenium https://selenium.dev A browser automation framework and ecosystem.
selenium 1529 Thu 12 Aug 2021 18:03:10 GMT https://github.com/tebeka/selenium Selenium/Webdriver client for Go
sheetjs 27134 Sun 29 Aug 2021 21:20:15 GMT https://github.com/SheetJS/sheetjs https://sheetjs.com/ :green_book: SheetJS Community Edition -- Spreadsheet Data Toolkit
snowplow 5810 Mon 30 Aug 2021 15:03:11 GMT https://github.com/snowplow/snowplow http://snowplowanalytics.com The enterprise-grade behavioral data engine (web mobile server-side webhooks) running cloud-natively on AWS and GCP
solr 171 Tue 31 Aug 2021 14:03:11 GMT https://github.com/apache/solr https://solr.apache.org/ Apache Solr open-source search software
sonic 11860 Thu 22 Jul 2021 09:39:53 GMT https://github.com/valeriansaliou/sonic https://crates.io/crates/sonic-server 🦔 Fast lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
spaCy 21195 Tue 31 Aug 2021 10:53:51 GMT https://github.com/explosion/spaCy https://spacy.io 💫 Industrial-strength Natural Language Processing (NLP) in Python
spark 30686 Tue 31 Aug 2021 03:40:34 GMT https://github.com/apache/spark https://spark.apache.org/ Apache Spark - A unified analytics engine for large-scale data processing
sparklyr 806 Mon 30 Aug 2021 19:47:38 GMT https://github.com/sparklyr/sparklyr https://sparklyr.ai R interface for Apache Spark
sqlmodel 4267 Wed 25 Aug 2021 13:46:57 GMT https://github.com/tiangolo/sqlmodel https://sqlmodel.tiangolo.com/ SQL databases in Python designed for simplicity compatibility and robustness.
strapi 39196 Tue 24 Aug 2021 09:00:41 GMT https://github.com/strapi/strapi https://strapi.io 🚀 Open source Node.js Headless CMS to easily build customisable APIs
stumpy 1900 Mon 30 Aug 2021 17:49:47 GMT https://github.com/TDAmeritrade/stumpy https://stumpy.readthedocs.io/en/latest/ STUMPY is a powerful and scalable Python library for modern time series analysis
supabase 17740 Mon 30 Aug 2021 23:03:12 GMT https://github.com/supabase/supabase https://supabase.io The open source Firebase alternative. Follow to stay updated about our public Beta.
superset 40216 Tue 31 Aug 2021 14:27:09 GMT https://github.com/apache/superset https://superset.apache.org/ Apache Superset is a Data Visualization and Data Exploration Platform
Systemizer 1032 Fri 27 Aug 2021 07:57:04 GMT https://github.com/honzaap/Systemizer https://honzaap.github.io/Systemizer/ A system design tool that allows you to simulate data flow of distributed systems.
terminusdb 1464 Mon 30 Aug 2021 13:13:47 GMT https://github.com/terminusdb/terminusdb https://terminusdb.com Open source graph database and document store. Designed for collaboratively building data-intensive applications and knowledge graphs.
tidb 28856 Tue 31 Aug 2021 09:48:13 GMT https://github.com/pingcap/tidb https://pingcap.com TiDB is an open source distributed HTAP database compatible with the MySQL protocol
TileDB 1167 Fri 27 Aug 2021 16:27:10 GMT https://github.com/TileDB-Inc/TileDB https://tiledb.com The Universal Storage Engine
timely-dataflow 2167 Wed 25 Aug 2021 23:06:34 GMT https://github.com/TimelyDataflow/timely-dataflow A modular implementation of timely dataflow in Rust
timescaledb 11615 Tue 31 Aug 2021 11:11:49 GMT https://github.com/timescale/timescaledb https://www.timescale.com/ An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
tinydb 4507 Sat 14 Aug 2021 14:18:19 GMT https://github.com/msiemens/tinydb https://tinydb.readthedocs.org TinyDB is a lightweight document oriented database optimized for your happiness :)
TorQ 234 Thu 15 Jul 2021 11:13:24 GMT https://github.com/AquaQAnalytics/TorQ http://goo.gl/8YupnC kdb+ production framework. Read the doc: http://aquaqanalytics.github.io/TorQ/. Join the group!
trino 3931 Tue 31 Aug 2021 11:50:54 GMT https://github.com/trinodb/trino https://trino.io Official repository of Trino the distributed SQL query engine for big data formerly known as PrestoSQL (https://trino.io)
tuplex 677 Mon 23 Aug 2021 00:46:32 GMT https://github.com/tuplex/tuplex https://tuplex.cs.brown.edu Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask but rather than invoking the Python interpreter Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
typedb 2846 Thu 26 Aug 2021 15:47:08 GMT https://github.com/vaticle/typedb https://vaticle.com TypeDB: a strongly-typed database
vector 7877 Tue 31 Aug 2021 13:46:17 GMT https://github.com/timberio/vector https://vector.dev A high-performance highly reliable observability data pipeline.
whale 632 Sat 12 Jun 2021 03:17:43 GMT https://github.com/hyperqueryhq/whale https://docs.whale.cx 🐳 The stupidly simple CLI workspace for your data warehouse.
yugabyte-db 5495 Tue 31 Aug 2021 06:46:09 GMT https://github.com/yugabyte/yugabyte-db https://www.yugabyte.com The high-performance distributed SQL database for global internet-scale apps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment