Skip to content

Instantly share code, notes, and snippets.

@abhioncbr
Last active September 2, 2024 09:11
Show Gist options
  • Save abhioncbr/1100eb6b93fa1a29fb229f118f0feb83 to your computer and use it in GitHub Desktop.
Save abhioncbr/1100eb6b93fa1a29fb229f118f0feb83 to your computer and use it in GitHub Desktop.
Apache Superset in the production environment

Apache Superset in the production environment

Visualising data helps in building a much deeper understanding of the data and fastens analytics around the data. There are several mature paid products available in the market. Recently, I explored an open-source product name Apache-Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

  • A rich set of data visualisations
  • An easy-to-use interface for exploring and visualising data
  • Create and share dashboards

After reading about Superset, I wanted to try it, and as Superset is a python programming language based project, we can easily install it using pip, but I decided to set it up as a container based on Docker. Apache-Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and less modification is possible in the code(my opinion), I decided to modify the code so that it could run in multiple different modes. Below is a list of specific changes/enhancements done in the code

  • Different version of Superset image can be built using the same code.
  • Superset configuration can be easily edited and mounted into the container, no need of rebuilding the image.
  • Asynchronous query execution through Celery based executor and managing it through Flower UI

Exploration made easy

While for exploring a project, development mode is an excellent choice, however, it would be great if initial exploration happens with all the features for instance, in-case of Superset, running queries in async mode, and storing the result in cache. You can explore Superset smoothly by the below commands.

  • First pull a docker-superset image from docker-hub
docker pull abhioncbr/docker-superset:<tag>
cd docker-files/ && SUPERSET_ENV=<local | prod> SUPERSET_VERSION=<tag> docker-compose up -d

Running Superset in a complete distributed mode

As per my understanding, running a Superset in the production environment for serving thousands of end-users setup should be distributed in nature and can be easily scalable as per the requirements. The below image depicts such setup

distributed-superset-setup

Published docker-image of Superset can be leveraged to achieve the above depicted image

  • Load-balancer in front for routing the request from clients to one server container.
  • Multiple containers in server mode for serving the UI of the Superset. Starting a server container using docker run can be done as
docker run -p 8088:8088 -v config:/home/superset/config/ abhioncbr/docker-superset:<tag> cluster server <db_url> <redis_url>
  • Multiple containers in worker mode for executing the SQL queries in an async mode using Celery executor. Starting a worker container using docker run can be done as
docker run -p 5555:5555 -v config:/home/superset/config/ abhioncbr/docker-superset:<tag> cluster worker <db_url> 
<redis_url>
  • Centralised Redis container or Redis-cluster for serving as cache layer and Celery task queues for workers.
  • Centralised Superset metadata database.

I found setting up a Superset as Docker container is quite easy and the same can be used for different environments. You can similarly explore Superset.

@abhioncbr
Copy link
Author

distributed-superset-setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment