Created
October 22, 2020 12:14
-
-
Save johnnyaug/5076d5398f5c7f2f33044eeddc58f6e3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# lakeFS with MinIO | |
lakeFS gives Git-like capabilities over your MinIO storage, allowing you to coordinate with colleagues when working on your data. | |
In the following example, we will use lakeFS to create a branch on your storage, commit changes to it, and then merge it to the master branch. | |
## Prerequisites | |
* Install MinIO Server from [here](https://docs.min.io/docs/minio-quickstart-guide). | |
* Install `mc` from [here](https://docs.min.io/docs/minio-client-quickstart-guide). | |
* Install docker-compose from [here](https://docs.docker.com/compose/install/). | |
## Installation | |
For this example we will use a Postgres instance within a docker container. A production-suitable installation will require a persistent Postgres installation. | |
We will install lakeFS locally on your development machine. For more installation options, see lakeFS [docs](https://docs.lakefs.io/deploying/install.html). | |
Create a docker-compose enviornment file for lakeFS, replacing `<minio_access_key_id>`, `<minio_secret_access_key>` and `<minio_endpoint>` with their values in your MinIO installation. | |
```bash | |
LAKEFS_CONFIG_FILE=./.lakefs-env | |
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_KEY_ID=<minio_access_key_id>" > $LAKEFS_CONFIG_FILE | |
echo "LAKEFS_BLOCKSTORE_S3_CREDENTIALS_ACCESS_SECRET_KEY=<minio_secret_access_key>" >> $LAKEFS_CONFIG_FILE | |
echo "LAKEFS_BLOCKSTORE_S3_ENDPOINT=<minio_endpoint>" >> $LAKEFS_CONFIG_FILE | |
``` | |
Then start lakeFS: | |
```bash | |
curl https://compose.lakefs.io | docker-compose --env-file $LAKEFS_CONFIG_FILE -f - up | |
``` | |
## Configuration | |
Browse to lakeFS to create an admin user: `127.0.0.1:8000/setup` | |
Take note of the generated access key and secret. | |
We will use the `lakectl` binary to perform lakeFS operations. Find the distribution suitable to your operating system [here](https://github.com/treeverse/lakeFS/releases), and extract the `lakectl` binary from the tar.gz archive. Put it somewhere in your $PATH and run `lakectl --version` to verify. | |
Then run the following command to configure lakectl (use the credentials given to you in the setup before): | |
```bash | |
lakectl config | |
# output: | |
# Config file /home/janedoe/.lakectl.yaml will be used | |
# Access key ID: <LAKEFS_ACCESS_KEY_ID> | |
# Secret access key: <LAKEFS_SECRET_KEY> | |
# Server endpoint URL: http://lakefs.example.com:8000/api/v1 | |
``` | |
Verify that `lakectl` can access lakeFS with the command: | |
```bash | |
lakectl repo list | |
``` | |
If no error is displayed, you are good to go. Now let's set a MinIO alias for lakeFS: | |
mc alias set lakefs http://s3.local.lakefs.io <LAKEFS_ACCESS_KEY_ID> <LAKEFS_SECRET_KEY> | |
## Example | |
Create a bucket in MinIO. Note that this bucket is created directly in your installation of MinIO. Later we will use lakeFS to enable versioning on this bucket. | |
```bash | |
mc mb myminio/example-bucket | |
``` | |
Create a repoistory in lakeFS: | |
```bash | |
lakectl repo create lakefs://example-repo s3://example-bucket | |
``` | |
Create two example files: | |
echo "my first file" > myfile.txt | |
echo "my second file" > myfile2.txt | |
Copy the file to your master branch, and commit: | |
mc cp ./myfile.txt lakefs/example-repo/master/ | |
lakectl commit lakefs://example-repo@master -m "my first commit" | |
Now let's create a branch named `branch1`, and copy a file to it: | |
lakectl branch create lakefs://example-repo@branch1 --source lakefs://example-repo@master | |
mc cp ./myfile2.txt lakefs/example-repo/branch1/ | |
List master and the branch and see that the new file is only visibile in the branch, while the older file is visible in both the branch and the master. | |
```bash | |
mc ls lakefs/example-repo/master | |
# only myfile.txt should be listed | |
``` | |
```bash | |
mc ls lakefs/example-repo/branch1 | |
# both files should be listed | |
``` | |
Now let's commit the branch, and merge it back to master: | |
```bash | |
lakectl commit lakefs://example-repo@branch1 -m "my second commit" | |
``` | |
```bash | |
lakectl merge lakefs://example-repo@branch1 lakefs://example-repo@master | |
``` | |
Now both files are accessible through master: | |
```bash | |
mc ls lakefs/example-repo/master | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment