Last active
December 13, 2024 20:17
-
-
Save daefresh/d386685a0c9f614daa6edb3aa2e47751 to your computer and use it in GitHub Desktop.
[Advanti's Top 150+ Open Data Tools on GitHub in 2021.] This is a hand-curated list of tools 🔨 that I refer to when designing data platforms ❤️. Connect with me on LinkedIn if you'd like this! https://www.linkedin.com/in/douglaseisenstein/
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Repo Name | Stars | Last Commit Timestamp | GitHub URL | Project URL | Project Description | |
---|---|---|---|---|---|---|
airbyte | 3829 | Tue 31 Aug 2021 12:27:10 GMT | https://github.com/airbytehq/airbyte | https://airbyte.io | Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses lakes and databases. | |
airflow | 22946 | Tue 31 Aug 2021 13:25:22 GMT | https://github.com/apache/airflow | https://airflow.apache.org/ | Apache Airflow - A platform to programmatically author schedule and monitor workflows | |
amazoncaptcha | 140 | Sat 17 Jul 2021 02:06:48 GMT | https://github.com/a-maliarov/amazoncaptcha | Pure Python lightweight Pillow-based solver for Amazon's text captcha. | ||
amundsen | 2572 | Fri 27 Aug 2021 04:50:38 GMT | https://github.com/amundsen-io/amundsen | https://www.amundsen.io/amundsen/ | Amundsen is a metadata driven application for improving the productivity of data analysts data scientists and engineers when interacting with data. | |
arangodb | 11554 | Tue 31 Aug 2021 12:03:58 GMT | https://github.com/arangodb/arangodb | https://www.arangodb.com | 🥑 ArangoDB is a native multi-model database with flexible data models for documents graphs and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions. | |
ArchiveBox | 11665 | Wed 11 Aug 2021 15:12:58 GMT | https://github.com/ArchiveBox/ArchiveBox | https://archivebox.io | 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc. saves HTML JS PDFs media and more... | |
arctic | 2389 | Fri 23 Jul 2021 15:29:07 GMT | https://github.com/man-group/arctic | https://arctic.readthedocs.io/en/latest/ | High performance datastore for time series and tick data | |
arrow | 8311 | Tue 31 Aug 2021 12:51:28 GMT | https://github.com/apache/arrow | https://arrow.apache.org/ | Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing | |
aws-data-wrangler | 2089 | Mon 23 Aug 2021 16:21:37 GMT | https://github.com/awslabs/aws-data-wrangler | https://aws-data-wrangler.readthedocs.io | Pandas on AWS - Easy integration with Athena Glue Redshift Timestream QuickSight Chime CloudWatchLogs DynamoDB EMR SecretManager PostgreSQL MySQL SQLServer and S3 (Parquet CSV JSON and EXCEL). | |
backstage | 12852 | Tue 31 Aug 2021 12:01:20 GMT | https://github.com/backstage/backstage | https://backstage.io | Backstage is an open platform for building developer portals | |
beam | 4967 | Mon 30 Aug 2021 22:17:18 GMT | https://github.com/apache/beam | https://beam.apache.org/ | Apache Beam is a unified programming model for Batch and Streaming | |
benthos | 3372 | Sat 28 Aug 2021 21:01:10 GMT | https://github.com/Jeffail/benthos | https://www.benthos.dev | Declarative stream processing for mundane tasks and data engineering | |
blazingsql | 1567 | Mon 30 Aug 2021 18:54:09 GMT | https://github.com/BlazingDB/blazingsql | https://blazingsql.com | BlazingSQL is a lightweight GPU accelerated SQL engine for Python. Built on RAPIDS cuDF. | |
bonobo | 1457 | Wed 10 Mar 2021 15:44:00 GMT | https://github.com/python-bonobo/bonobo | https://www.bonobo-project.org/ | Extract Transform Load for Python 3.5+ | |
bytebase | 951 | Tue 31 Aug 2021 06:40:56 GMT | https://github.com/bytebase/bytebase | https://bytebase.com | Web-based zero-config dependency-free database schema change and version control tool for teams. Public demo: https://demo.bytebase.com | |
calcite | 2633 | Sat 28 Aug 2021 20:19:10 GMT | https://github.com/apache/calcite | https://calcite.apache.org/ | Apache Calcite | |
cayley | 13917 | Fri 18 Jun 2021 13:25:36 GMT | https://github.com/cayleygraph/cayley | https://cayley.io | An open-source graph database | |
celery | 17829 | Tue 31 Aug 2021 12:21:48 GMT | https://github.com/celery/celery | https://docs.celeryproject.org/en/stable/index.html | Distributed Task Queue (development branch) | |
cerberus | 2569 | Wed 05 May 2021 20:47:21 GMT | https://github.com/pyeve/cerberus | http://python-cerberus.org | Lightweight extensible data validation library for Python | |
chartify | 2968 | Fri 05 Feb 2021 18:49:02 GMT | https://github.com/spotify/chartify | Python library that makes it easy for data scientists to create charts. | ||
Chronicle-Bytes | 275 | Tue 31 Aug 2021 08:38:09 GMT | https://github.com/OpenHFT/Chronicle-Bytes | http://chronicle.software | Chronicle Bytes has a similar purpose to Java NIO's ByteBuffer with many extensions | |
ClickHouse | 18864 | Tue 31 Aug 2021 14:09:24 GMT | https://github.com/ClickHouse/ClickHouse | https://clickhouse.tech | ClickHouse® is a free analytics DBMS for big data | |
cockroach | 21505 | Tue 31 Aug 2021 09:26:43 GMT | https://github.com/cockroachdb/cockroach | https://www.cockroachlabs.com | CockroachDB - the open source cloud-native distributed SQL database. | |
crate | 3170 | Mon 30 Aug 2021 15:27:22 GMT | https://github.com/crate/crate | https://crate.io/products/cratedb/ | CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. | |
crux | 1450 | Thu 26 Aug 2021 08:58:48 GMT | https://github.com/juxt/crux | https://opencrux.com | General purpose bitemporal database for SQL Datalog & graph queries | |
cudf | 4132 | Tue 31 Aug 2021 11:54:03 GMT | https://github.com/rapidsai/cudf | http://rapids.ai | cuDF - GPU DataFrame Library | |
dagster | 3713 | Tue 31 Aug 2021 12:54:32 GMT | https://github.com/dagster-io/dagster | https://dagster.io | A data orchestrator for machine learning analytics and ETL. | |
dapr | 14405 | Tue 31 Aug 2021 14:16:23 GMT | https://github.com/dapr/dapr | https://dapr.io | Dapr is a portable event-driven runtime for building distributed applications across cloud and edge. | |
dash | 15059 | Thu 26 Aug 2021 14:20:20 GMT | https://github.com/plotly/dash | https://plotly.com/dash | Analytical Web Apps for Python R Julia and Jupyter. No JavaScript Required. | |
dask | 8721 | Tue 31 Aug 2021 13:59:40 GMT | https://github.com/dask/dask | https://dask.org | Parallel computing with task scheduling | |
datacatalog | 42 | Thu 29 Jul 2021 04:16:56 GMT | https://github.com/flyteorg/datacatalog | https://flyte.org | Data Catalog is a service for indexing parameterized strongly-typed data artifacts across revisions. It also powers Flytes memoization system | |
datafuse | 1961 | Tue 31 Aug 2021 03:54:15 GMT | https://github.com/datafuselabs/datafuse | https://datafuse.rs | An elastic and scalable Cloud Warehouse offers Blazing Fast Query and combines Elasticity Simplicity Low cost of the Cloud built to make the Data Cloud easy | |
datahub | 3468 | Tue 31 Aug 2021 03:24:22 GMT | https://github.com/linkedin/datahub | https://datahubproject.io | A Metadata Platform for the Modern Data Stack | |
datapane | 375 | Wed 25 Aug 2021 21:15:17 GMT | https://github.com/datapane/datapane | https://datapane.com | Datapane makes it simple to build shareable reports from Python. | |
DataProfiler | 636 | Wed 25 Aug 2021 19:21:05 GMT | https://github.com/capitalone/DataProfiler | https://capitalone.github.io/DataProfiler | What's in your data? Extract schema statistics and entities from datasets | |
dbeaver | 21922 | Tue 31 Aug 2021 10:43:23 GMT | https://github.com/dbeaver/dbeaver | https://dbeaver.io | Free universal database tool and SQL client | |
dbt | 3420 | Tue 31 Aug 2021 14:08:20 GMT | https://github.com/dbt-labs/dbt | https://www.getdbt.com/ | dbt (data build tool) enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. | |
debezium | 5281 | Tue 31 Aug 2021 12:09:36 GMT | https://github.com/debezium/debezium | https://debezium.io | Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ. | |
dedupe | 3131 | Mon 23 Aug 2021 22:35:07 GMT | https://github.com/dedupeio/dedupe | https://docs.dedupe.io | :id: A python library for accurate and scalable fuzzy matching record deduplication and entity-resolution. | |
delta | 3596 | Fri 27 Aug 2021 22:29:52 GMT | https://github.com/delta-io/delta | https://delta.io | An open-source storage layer that brings scalable ACID transactions to Apache Spark™ and big data workloads. | |
delta-sharing | 233 | Wed 25 Aug 2021 22:06:12 GMT | https://github.com/delta-io/delta-sharing | https://delta.io/sharing | An open protocol for secure data sharing | |
dgraph | 16555 | Tue 31 Aug 2021 07:14:11 GMT | https://github.com/dgraph-io/dgraph | https://dgraph.io | Native GraphQL Database with graph backend | |
diesel | 7228 | Wed 25 Aug 2021 11:34:10 GMT | https://github.com/diesel-rs/diesel | https://diesel.rs | A safe extensible ORM and Query Builder for Rust | |
differential-dataflow | 1683 | Wed 25 Aug 2021 23:07:06 GMT | https://github.com/TimelyDataflow/differential-dataflow | None | An implementation of differential dataflow using timely dataflow on Rust. | |
dolt | 9359 | Mon 30 Aug 2021 22:51:54 GMT | https://github.com/dolthub/dolt | Dolt – It's Git for Data | ||
dremio-oss | 943 | Tue 06 Jul 2021 16:57:27 GMT | https://github.com/dremio/dremio-oss | https://www.dremio.com | Dremio - the missing link in modern data | |
druid | 11097 | Tue 31 Aug 2021 07:04:00 GMT | https://github.com/apache/druid | https://druid.apache.org/ | Apache Druid: a high performance real-time analytics database. | |
duckdb | 3415 | Tue 31 Aug 2021 14:10:37 GMT | https://github.com/duckdb/duckdb | http://www.duckdb.org | DuckDB is an in-process SQL OLAP Database Management System | |
dvc | 8485 | Tue 31 Aug 2021 03:21:28 GMT | https://github.com/iterative/dvc | https://dvc.org | 🦉Data Version Control Git for Data & Models ML Experiments Management | |
egeria | 424 | Tue 31 Aug 2021 10:17:41 GMT | https://github.com/odpi/egeria | https://egeria-project.org | Open Metadata and Governance | |
etcd | 37061 | Mon 30 Aug 2021 12:31:00 GMT | https://github.com/etcd-io/etcd | https://etcd.io | Distributed reliable key-value store for the most critical data of a distributed system | |
fastapi | 35361 | Fri 27 Aug 2021 08:49:40 GMT | https://github.com/tiangolo/fastapi | https://fastapi.tiangolo.com/ | FastAPI framework high performance easy to learn fast to code ready for production | |
feast | 2172 | Tue 31 Aug 2021 08:07:57 GMT | https://github.com/feast-dev/feast | https://feast.dev | Feature Store for Machine Learning | |
findatapy | 949 | Thu 29 Jul 2021 09:50:01 GMT | https://github.com/cuemacro/findatapy | Python library to download market data via Bloomberg Eikon Quandl Yahoo etc. | ||
flink | 16990 | Tue 31 Aug 2021 14:09:06 GMT | https://github.com/apache/flink | Apache Flink | ||
flyte | 1608 | Mon 30 Aug 2021 19:22:59 GMT | https://github.com/flyteorg/flyte | https://flyte.org | Kubernetes-native workflow automation platform for complex mission-critical data and ML processes at scale. It has been battle-tested at Lyft Spotify Freenome and others and is truly open-source. | |
flyway | 6084 | Wed 18 Aug 2021 10:00:07 GMT | https://github.com/flyway/flyway | https://flywaydb.org | Flyway by Redgate • Database Migrations Made Easy. | |
frictionless-py | 453 | Mon 30 Aug 2021 12:56:20 GMT | https://github.com/frictionlessdata/frictionless-py | https://framework.frictionlessdata.io | Frictionless is a framework to describe extract validate and transform tabular data. | |
fuzzywuzzy | 8457 | Sat 20 Feb 2021 06:36:20 GMT | https://github.com/seatgeek/fuzzywuzzy | http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/ | Fuzzy String Matching in Python | |
Gerapy | 2503 | Sat 28 Aug 2021 06:24:24 GMT | https://github.com/Gerapy/Gerapy | https://docs.gerapy.com/ | Distributed Crawler Management Framework Based on Scrapy Scrapyd Django and Vue.js | |
getting-started | 842 | Thu 29 Apr 2021 14:20:17 GMT | https://github.com/singer-io/getting-started | https://singer.io | This repository is a getting started guide to Singer. | |
gobblin | 1955 | Tue 31 Aug 2021 00:25:27 GMT | https://github.com/apache/gobblin | https://gobblin.apache.org/ | A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion replication organization and lifecycle management for both streaming and batch data ecosystems. | |
grafana | 43646 | Tue 31 Aug 2021 13:01:23 GMT | https://github.com/grafana/grafana | https://grafana.com | The open and composable observability and data visualization platform. Visualize metrics logs and traces from multiple sources like Prometheus Loki Elasticsearch InfluxDB Postgres and many more. | |
gremlin | 1894 | Tue 05 Sep 2017 10:58:37 GMT | https://github.com/tinkerpop/gremlin | http://tinkerpop.apache.org/ | A Graph Traversal Language (no longer active - see Apache TinkerPop) | |
hazelcast | 4502 | Tue 31 Aug 2021 14:08:35 GMT | https://github.com/hazelcast/hazelcast | https://www.hazelcast.com | Open-source distributed computation and storage platform | |
holoviz | 519 | Mon 16 Aug 2021 13:25:26 GMT | https://github.com/holoviz/holoviz | https://holoviz.org/ | High-level tools to simplify visualization in Python. | |
hora | 2027 | Thu 12 Aug 2021 12:32:50 GMT | https://github.com/hora-search/hora | http://horasearch.com/ | 🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 . | |
hudi | 2183 | Tue 31 Aug 2021 03:28:35 GMT | https://github.com/apache/hudi | https://hudi.apache.org/ | Upserts Deletes And Incremental Processing on Big Data. | |
hugegraph | 1709 | Wed 25 Aug 2021 10:00:27 GMT | https://github.com/hugegraph/hugegraph | None | HugeGraph Database core component including graph engine API and built-in backends | |
iceberg | 1907 | Mon 30 Aug 2021 17:51:47 GMT | https://github.com/apache/iceberg | https://iceberg.apache.org/ | Apache Iceberg | |
ignite | 3952 | Tue 31 Aug 2021 07:39:05 GMT | https://github.com/apache/ignite | https://ignite.apache.org/ | Apache Ignite | |
influxdb | 22008 | Tue 31 Aug 2021 13:52:02 GMT | https://github.com/influxdata/influxdb | https://influxdata.com | Scalable datastore for metrics events and real-time analytics | |
intake | 611 | Thu 26 Aug 2021 19:15:23 GMT | https://github.com/intake/intake | https://intake.readthedocs.io/ | Intake is a lightweight package for finding investigating loading and disseminating data. | |
janusgraph | 4145 | Sat 28 Aug 2021 19:15:49 GMT | https://github.com/JanusGraph/janusgraph | https://janusgraph.org | JanusGraph: an open-source distributed graph database | |
kafka | 19742 | Mon 30 Aug 2021 22:39:25 GMT | https://github.com/apache/kafka | None | Mirror of Apache Kafka | |
Kats | 2895 | Mon 30 Aug 2021 19:43:37 GMT | https://github.com/facebookresearch/Kats | Kats a kit to analyze time series data a lightweight easy-to-use generalizable and extendable framework to perform time series analysis from understanding the key statistics and characteristics detecting change points and anomalies to forecasting future trends. | ||
kedro | 4279 | Mon 23 Aug 2021 10:43:01 GMT | https://github.com/quantumblacklabs/kedro | https://kedro.readthedocs.io/ | A Python framework for creating reproducible maintainable and modular data science code. | |
klio | 633 | Mon 30 Aug 2021 20:45:04 GMT | https://github.com/spotify/klio | https://docs.klio.io | Smarter data pipelines for audio. | |
lakeFS | 1558 | Tue 31 Aug 2021 09:17:13 GMT | https://github.com/treeverse/lakeFS | https://lakefs.io | Git-like capabilities for your object storage | |
marquez | 755 | Mon 30 Aug 2021 19:22:56 GMT | https://github.com/MarquezProject/marquez | https://marquezproject.ai | Collect aggregate and visualize a data ecosystem's metadata | |
mars | 2204 | Tue 31 Aug 2021 10:40:45 GMT | https://github.com/mars-project/mars | https://docs.pymars.org | Mars is a tensor-based unified framework for large-scale data computation which scales numpy pandas scikit-learn and Python functions. | |
materialize | 3008 | Tue 31 Aug 2021 03:56:51 GMT | https://github.com/MaterializeInc/materialize | https://materialize.com | Materialize simplifies application development with streaming data. Incrementally-updated materialized views - in PostgreSQL and in real time. Materialize is powered by Timely Dataflow. | |
MechanicalSoup | 3810 | Wed 09 Jun 2021 20:42:41 GMT | https://github.com/MechanicalSoup/MechanicalSoup | http://mechanicalsoup.readthedocs.io/en/stable/ | A Python library for automating interaction with websites. | |
MeiliSearch | 17931 | Tue 31 Aug 2021 14:04:16 GMT | https://github.com/meilisearch/MeiliSearch | https://docs.meilisearch.com | Powerful fast and an easy to use search engine | |
meltano | 37 | Tue 31 Aug 2021 00:57:50 GMT | https://github.com/meltano/meltano | https://meltano.com | ELT for the DataOps era- open source data integration tool. This is a read-only mirror of https://gitlab.com/meltano/meltano | |
metabase | 25864 | Tue 31 Aug 2021 14:19:48 GMT | https://github.com/metabase/metabase | https://metabase.com | The simplest fastest way to get business intelligence and analytics to everyone in your company :yum: | |
milvus | 7581 | Tue 31 Aug 2021 12:45:58 GMT | https://github.com/milvus-io/milvus | https://milvus.io | An open-source vector database for embedding similarity search and AI applications. | |
modin | 6372 | Tue 31 Aug 2021 07:28:48 GMT | https://github.com/modin-project/modin | http://modin.readthedocs.io | Modin: Speed up your Pandas workflows by changing a single line of code | |
mongo | 20313 | Tue 31 Aug 2021 14:17:29 GMT | https://github.com/mongodb/mongo | https://www.mongodb.com/ | The MongoDB Database | |
n8n | 17289 | Tue 31 Aug 2021 09:55:06 GMT | https://github.com/n8n-io/n8n | https://n8n.io | Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services. | |
nebula-graph | 698 | Tue 03 Aug 2021 03:18:13 GMT | https://github.com/vesoft-inc/nebula-graph | https://nebula-graph.io | A distributed fast open-source graph database featuring horizontal scalability and high availability | |
neo4j | 9264 | Thu 26 Aug 2021 17:14:42 GMT | https://github.com/neo4j/neo4j | http://neo4j.com | Graphs for Everyone | |
networkx | 9589 | Tue 31 Aug 2021 13:47:02 GMT | https://github.com/networkx/networkx | https://networkx.org | Network Analysis in Python | |
nocodb | 17005 | Tue 31 Aug 2021 09:36:19 GMT | https://github.com/nocodb/nocodb | https://docs.nocodb.com | 🔥 🔥 The Open Source Airtable alternative - Powered by Vue.js 🚀 🚀 | |
nteract | 5619 | Mon 30 Aug 2021 18:40:42 GMT | https://github.com/nteract/nteract | https://nteract.io | 📘 The interactive computing suite for you! ✨ | |
OpenLineage | 443 | Fri 27 Aug 2021 23:19:08 GMT | https://github.com/OpenLineage/OpenLineage | http://openlineage.io | An Open Standard for lineage metadata collection | |
OpenMetadata | 324 | Tue 31 Aug 2021 10:47:31 GMT | https://github.com/open-metadata/OpenMetadata | https://open-metadata.org | Open Standard for Metadata. A Single place to Discover Collaborate and Get your data right. | |
orientdb | 4338 | Tue 31 Aug 2021 10:31:41 GMT | https://github.com/orientechnologies/orientdb | http://orientdb.com | OrientDB is the most versatile DBMS supporting Graph Document Reactive Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master) supports SQL ACID Transactions Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license. | |
pandas-profiling | 7853 | Sun 27 Jun 2021 20:16:39 GMT | https://github.com/pandas-profiling/pandas-profiling | https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/ | Create HTML profiling reports from pandas DataFrame objects | |
pandera | 682 | Fri 06 Aug 2021 01:57:51 GMT | https://github.com/pandera-dev/pandera | https://pandera.readthedocs.io | A light-weight flexible and expressive pandas data validation library | |
papermill | 4290 | Tue 31 Aug 2021 04:39:29 GMT | https://github.com/nteract/papermill | http://papermill.readthedocs.io/en/latest/ | 📚 Parameterize execute and analyze notebooks | |
pilosa | 2197 | Wed 27 Jan 2021 14:13:04 GMT | https://github.com/pilosa/pilosa | https://www.pilosa.com | Pilosa is an open source distributed bitmap index that dramatically accelerates queries across multiple massive data sets. | |
pinot | 3534 | Mon 30 Aug 2021 20:55:02 GMT | https://github.com/apache/pinot | https://pinot.apache.org | Apache Pinot (Incubating) - A realtime distributed OLAP datastore | |
polyaxon | 2893 | Tue 31 Aug 2021 14:17:01 GMT | https://github.com/polyaxon/polyaxon | https://polyaxon.com | Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation) | |
prefect | 6764 | Mon 30 Aug 2021 21:05:36 GMT | https://github.com/PrefectHQ/prefect | https://prefect.io | The easiest way to automate your data | |
presto | 12352 | Tue 31 Aug 2021 01:46:07 GMT | https://github.com/prestodb/presto | http://prestodb.github.io | The official home of the Presto distributed SQL query engine for big data | |
prettier | 40422 | Tue 31 Aug 2021 04:40:52 GMT | https://github.com/prettier/prettier | https://prettier.io | Prettier is an opinionated code formatter. | |
prisma | 15789 | Tue 31 Aug 2021 08:48:55 GMT | https://github.com/prisma/prisma | https://www.prisma.io | Next-generation ORM for Node.js & TypeScript PostgreSQL MySQL MariaDB SQL Server SQLite & MongoDB (Preview) | |
protobuf | 50320 | Mon 30 Aug 2021 20:55:43 GMT | https://github.com/protocolbuffers/protobuf | https://developers.google.com/protocol-buffers/ | Protocol Buffers - Google's data interchange format | |
pulsar | 9495 | Tue 31 Aug 2021 12:41:43 GMT | https://github.com/apache/pulsar | https://pulsar.apache.org/ | Apache Pulsar - distributed pub-sub messaging system | |
pycaret | 3939 | Mon 23 Aug 2021 19:58:30 GMT | https://github.com/pycaret/pycaret | https://www.pycaret.org | An open-source low-code machine learning library in Python | |
pytorch-lightning | 15154 | Tue 31 Aug 2021 09:30:43 GMT | https://github.com/PyTorchLightning/pytorch-lightning | https://pytorchlightning.ai | The lightweight PyTorch wrapper for high-performance AI research. Scale your models not the boilerplate. | |
questdb | 4513 | Tue 31 Aug 2021 13:50:25 GMT | https://github.com/questdb/questdb | https://questdb.io | An open source SQL database designed to process time series data faster | |
quilt | 1069 | Tue 31 Aug 2021 08:37:51 GMT | https://github.com/quiltdata/quilt | https://quiltdata.com | Quilt is a self-organizing data hub for S3 | |
ray | 17122 | Tue 31 Aug 2021 13:26:25 GMT | https://github.com/ray-project/ray | https://ray.io | An open source framework that provides a simple universal API for building distributed applications. Ray is packaged with RLlib a scalable reinforcement learning library and Tune a scalable hyperparameter tuning library. | |
re-data | 395 | Tue 31 Aug 2021 11:48:42 GMT | https://github.com/re-data/re-data | http://getre.io | re_data - data quality framework. Build on top of dbt re_data helps you find debug and resolve problems in your data. | |
redis | 50784 | Tue 31 Aug 2021 06:25:36 GMT | https://github.com/redis/redis | http://redis.io | Redis is an in-memory database that persists on disk. The data model is key-value but many different kind of values are supported: Strings Lists Sets Sorted Sets Hashes Streams HyperLogLogs Bitmaps. | |
RedisGraph | 1431 | Wed 25 Aug 2021 16:48:34 GMT | https://github.com/RedisGraph/RedisGraph | https://redisgraph.io | A graph database as a Redis module | |
reflow | 871 | Tue 10 Aug 2021 22:50:01 GMT | https://github.com/grailbio/reflow | A language and runtime for distributed incremental data processing in the cloud | ||
rethinkdb | 24924 | Sat 15 May 2021 22:30:17 GMT | https://github.com/rethinkdb/rethinkdb | https://rethinkdb.com | The open-source database for the realtime web. | |
rocksdb | 20630 | Tue 31 Aug 2021 02:10:55 GMT | https://github.com/facebook/rocksdb | http://rocksdb.org | A library that provides an embeddable persistent key-value store for fast storage. | |
RustPython | 8977 | Mon 30 Aug 2021 15:03:20 GMT | https://github.com/RustPython/RustPython | https://rustpython.github.io | A Python Interpreter written in Rust | |
rxdb | 15919 | Mon 30 Aug 2021 20:52:25 GMT | https://github.com/pubkey/rxdb | https://rxdb.info/ | 🔄 A realtime Database for JavaScript Applications | |
scikit-learn | 46997 | Tue 31 Aug 2021 14:19:48 GMT | https://github.com/scikit-learn/scikit-learn | https://scikit-learn.org | scikit-learn: machine learning in Python | |
scio | 2185 | Tue 31 Aug 2021 09:56:30 GMT | https://github.com/spotify/scio | https://spotify.github.io/scio | A Scala API for Apache Beam and Google Cloud Dataflow. | |
scrapy | 41419 | Tue 24 Aug 2021 10:15:29 GMT | https://github.com/scrapy/scrapy | https://scrapy.org | Scrapy a fast high-level web crawling & scraping framework for Python. | |
seata | 20657 | Tue 31 Aug 2021 06:47:39 GMT | https://github.com/seata/seata | https://seata.io | :fire: Seata is an easy-to-use high-performance open source distributed transaction solution. | |
selenium | 21519 | Tue 31 Aug 2021 07:30:49 GMT | https://github.com/SeleniumHQ/selenium | https://selenium.dev | A browser automation framework and ecosystem. | |
selenium | 1529 | Thu 12 Aug 2021 18:03:10 GMT | https://github.com/tebeka/selenium | Selenium/Webdriver client for Go | ||
sheetjs | 27134 | Sun 29 Aug 2021 21:20:15 GMT | https://github.com/SheetJS/sheetjs | https://sheetjs.com/ | :green_book: SheetJS Community Edition -- Spreadsheet Data Toolkit | |
snowplow | 5810 | Mon 30 Aug 2021 15:03:11 GMT | https://github.com/snowplow/snowplow | http://snowplowanalytics.com | The enterprise-grade behavioral data engine (web mobile server-side webhooks) running cloud-natively on AWS and GCP | |
solr | 171 | Tue 31 Aug 2021 14:03:11 GMT | https://github.com/apache/solr | https://solr.apache.org/ | Apache Solr open-source search software | |
sonic | 11860 | Thu 22 Jul 2021 09:39:53 GMT | https://github.com/valeriansaliou/sonic | https://crates.io/crates/sonic-server | 🦔 Fast lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM. | |
spaCy | 21195 | Tue 31 Aug 2021 10:53:51 GMT | https://github.com/explosion/spaCy | https://spacy.io | 💫 Industrial-strength Natural Language Processing (NLP) in Python | |
spark | 30686 | Tue 31 Aug 2021 03:40:34 GMT | https://github.com/apache/spark | https://spark.apache.org/ | Apache Spark - A unified analytics engine for large-scale data processing | |
sparklyr | 806 | Mon 30 Aug 2021 19:47:38 GMT | https://github.com/sparklyr/sparklyr | https://sparklyr.ai | R interface for Apache Spark | |
sqlmodel | 4267 | Wed 25 Aug 2021 13:46:57 GMT | https://github.com/tiangolo/sqlmodel | https://sqlmodel.tiangolo.com/ | SQL databases in Python designed for simplicity compatibility and robustness. | |
strapi | 39196 | Tue 24 Aug 2021 09:00:41 GMT | https://github.com/strapi/strapi | https://strapi.io | 🚀 Open source Node.js Headless CMS to easily build customisable APIs | |
stumpy | 1900 | Mon 30 Aug 2021 17:49:47 GMT | https://github.com/TDAmeritrade/stumpy | https://stumpy.readthedocs.io/en/latest/ | STUMPY is a powerful and scalable Python library for modern time series analysis | |
supabase | 17740 | Mon 30 Aug 2021 23:03:12 GMT | https://github.com/supabase/supabase | https://supabase.io | The open source Firebase alternative. Follow to stay updated about our public Beta. | |
superset | 40216 | Tue 31 Aug 2021 14:27:09 GMT | https://github.com/apache/superset | https://superset.apache.org/ | Apache Superset is a Data Visualization and Data Exploration Platform | |
Systemizer | 1032 | Fri 27 Aug 2021 07:57:04 GMT | https://github.com/honzaap/Systemizer | https://honzaap.github.io/Systemizer/ | A system design tool that allows you to simulate data flow of distributed systems. | |
terminusdb | 1464 | Mon 30 Aug 2021 13:13:47 GMT | https://github.com/terminusdb/terminusdb | https://terminusdb.com | Open source graph database and document store. Designed for collaboratively building data-intensive applications and knowledge graphs. | |
tidb | 28856 | Tue 31 Aug 2021 09:48:13 GMT | https://github.com/pingcap/tidb | https://pingcap.com | TiDB is an open source distributed HTAP database compatible with the MySQL protocol | |
TileDB | 1167 | Fri 27 Aug 2021 16:27:10 GMT | https://github.com/TileDB-Inc/TileDB | https://tiledb.com | The Universal Storage Engine | |
timely-dataflow | 2167 | Wed 25 Aug 2021 23:06:34 GMT | https://github.com/TimelyDataflow/timely-dataflow | A modular implementation of timely dataflow in Rust | ||
timescaledb | 11615 | Tue 31 Aug 2021 11:11:49 GMT | https://github.com/timescale/timescaledb | https://www.timescale.com/ | An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension. | |
tinydb | 4507 | Sat 14 Aug 2021 14:18:19 GMT | https://github.com/msiemens/tinydb | https://tinydb.readthedocs.org | TinyDB is a lightweight document oriented database optimized for your happiness :) | |
TorQ | 234 | Thu 15 Jul 2021 11:13:24 GMT | https://github.com/AquaQAnalytics/TorQ | http://goo.gl/8YupnC | kdb+ production framework. Read the doc: http://aquaqanalytics.github.io/TorQ/. Join the group! | |
trino | 3931 | Tue 31 Aug 2021 11:50:54 GMT | https://github.com/trinodb/trino | https://trino.io | Official repository of Trino the distributed SQL query engine for big data formerly known as PrestoSQL (https://trino.io) | |
tuplex | 677 | Mon 23 Aug 2021 00:46:32 GMT | https://github.com/tuplex/tuplex | https://tuplex.cs.brown.edu | Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask but rather than invoking the Python interpreter Tuplex generates optimized LLVM bytecode for the given pipeline and input data set. | |
typedb | 2846 | Thu 26 Aug 2021 15:47:08 GMT | https://github.com/vaticle/typedb | https://vaticle.com | TypeDB: a strongly-typed database | |
vector | 7877 | Tue 31 Aug 2021 13:46:17 GMT | https://github.com/timberio/vector | https://vector.dev | A high-performance highly reliable observability data pipeline. | |
whale | 632 | Sat 12 Jun 2021 03:17:43 GMT | https://github.com/hyperqueryhq/whale | https://docs.whale.cx | 🐳 The stupidly simple CLI workspace for your data warehouse. | |
yugabyte-db | 5495 | Tue 31 Aug 2021 06:46:09 GMT | https://github.com/yugabyte/yugabyte-db | https://www.yugabyte.com | The high-performance distributed SQL database for global internet-scale apps. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment