Skip to content

Instantly share code, notes, and snippets.

@trojblue
Created October 10, 2024 12:37
Show Gist options
  • Save trojblue/f130d1861825bebe114d7f6aedc3e3f1 to your computer and use it in GitHub Desktop.
Save trojblue/f130d1861825bebe114d7f6aedc3e3f1 to your computer and use it in GitHub Desktop.
Label S3 images with Argilla [WIP]
  1. edit dockerfile to allow host gateway access:
version: '3'
services:
  argilla:
    image: argilla/argilla-server:latest
    ports:
      - "6900:6900"  # Existing Argilla port
    extra_hosts:
      - "host.docker.internal:host-gateway"  # Allows the container to refer to the host machine

  # Other services like PostgreSQL, Elasticsearch, etc.
  1. running presigning service from host:
# presign_server.py
from fastapi import FastAPI, Request
import boto3
from pydantic import BaseModel

app = FastAPI()

class S3UriRequest(BaseModel):
    s3_uri: str

# Replace this with actual parsing of S3 URI to bucket and key
def parse_s3_uri(s3_uri: str):
    bucket = s3_uri.split('/')[2]
    key = '/'.join(s3_uri.split('/')[3:])
    return bucket, key

@app.post("/generate-presigned-url")
async def generate_presigned_url(request: S3UriRequest):
    s3_uri = request.s3_uri
    bucket, key = parse_s3_uri(s3_uri)

    # AWS credentials are already on the host machine
    s3_client = boto3.client('s3')
    
    # Generate presigned URL
    presigned_url = s3_client.generate_presigned_url(
        'get_object',
        Params={'Bucket': bucket, 'Key': key},
        ExpiresIn=21600  # 6 hours
    )
    
    return {"presigned_url": presigned_url}

# Run this FastAPI server with: uvicorn presign_server:app --port 7000

running the server:

uvicorn presign_server:app --host 0.0.0.0 --port 7000

Restart the Argilla containers:

docker-compose down
docker-compose up -d

Testing the Route:

Once your Docker container is up and running, you can test the new route by sending a POST request to /generate-presigned-url:

  1. First, list the running containers to get the container ID or name of the container you want to access.
docker ps
  1. You should see a list of running containers with their IDs and names. For example:

    CONTAINER ID   IMAGE                                                  COMMAND                  CREATED       STATUS       PORTS                                                                                  NAMES
    07197d8d1173   argilla/argilla-server:latest                          "sh -c 'python -m ar…"   3 hours ago   Up 3 hours   6900/tcp                                                                               argilla-worker-1
  2. Now, use docker exec to access the container in bash.

docker exec -it <container_name_or_id> /bin/bash

Replace <container_name_or_id> with the actual name or ID of the container. For example:

docker exec -it argilla-argilla-1 /bin/bash	
  1. in the container, run the following Python one-liner (replace <host_ip> with the correct IP if host.docker.internal is not available):
python -c "import requests; response = requests.post('http://host.docker.internal:7000/generate-presigned-url', json={'s3_uri': 's3://your-bucket/path/to/image.jpg'}, headers={'Content-Type': 'application/json'}); print(response.text)"
if successful, the console prints out something like this:
{"presigned_url":"https://your-bucket.s3.amazonaws.com/path/to/image.jpg?AWSAccessKeyId=AKIAXXXXXXZ2PIUQF5EA&Signature=v%2BWoXC5U1YnM7JxkXXXXXXXXF9M%3D&Expires=1728526769"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment