Skip to content

Instantly share code, notes, and snippets.

@jedi4ever
Last active February 3, 2025 18:31
Show Gist options
  • Save jedi4ever/35f497c4c8e222b66fc9de5d90a75961 to your computer and use it in GitHub Desktop.
Save jedi4ever/35f497c4c8e222b66fc9de5d90a75961 to your computer and use it in GitHub Desktop.
Feedback on Rungalileo - 27 Jan 2025

Feedback Galileo Product

Context

Feedback system on the docs is not working (mintly link is broken? ). So here's a list of issues and enhancements I came across. Happy to have a chat about them.

Datasets

Related docs

Bugs

The functions : list_datasets() , get_dataset_content() and create_dataset() are not working as documented.

import os
import promptquality as pq

pq.login(os.environ["GALILEO_CONSOLE_URL"])

dataset = pq.create_dataset(
    {
        "virtue": ["benevolence", "trustworthiness"],
        "voice": ["Oprah Winfrey", "Barack Obama"],
    }
)

They keep asking for credentials, judging from the code it is a pass through from the API but doesn't work well with the promptquality module.

Workaround

  • Function upload_dataset() is available in the SDK, but not documented.
  • The template version id doesn't really seem to matter ?
data = {"topic": ["Quantum Physics", "Politics", "Large Language Models"]}

from promptquality.helpers import upload_dataset

# The template version id doesn't really seem to matter ?
dataset = upload_dataset(data, project_id, template_version_id=template.selected_version.id)
print(dataset)

Useful enhancements

  • Set the name of a dataset (only in UI now)
  • Update a dataset (not possible but useful to keep the same dataset id)
  • Set the location (EU-east etc..) of where the dataset is stored
  • Point the dataset to an S3 bucket and the likes , instead of local upload

Templates

Related docs

Bugs

Trouble reusing a template for a run:

Here's the code to create a template, that's working fine.

template = create_template(
    template_name="my-template",
    project_id=project_id,
    template="""Answer the question based only on the following context:

    {context}

    Question: {question}
    """
)

Retrieving the template works too:

    template = pq.get_template(project_name="my-project", template_name="my-template")
    print(template)

The problem is in the reuse that template for a run:

run = pq.run(template=template.selected_version,
           template_name=template.name,
           dataset=data,
           settings=pq.Settings(model_alias='ChatGPT (16K context)',
                                temperature=0.8,
                                max_tokens=400),
)

Not sure what the syntax is for this, this sample is not working and complains.

Useful enhancements

  • command to list all templates
  • make templates visible in the UI
  • command to delete a template

Projects

Enhancements

  • command to list all projects via SDK: to do automated cleanup
  • command to delete a project via SDK : to do automated cleanup
  • allow for widening the description of a project in the UI
  • allow for batch deletion of projects in the UI

Annotations

Enhancements

  • command to list all annotations via SDK

  • command to create annotations type via SDK

  • command to submit an annotation via SDK (instead of only via UI)

  • command to delete an annotation via SDK

  • describe how to get the values of an annotation type via SDK:

    • I found that the annotations with their names become part of metrics attributes
  • Reuse the same name for an annotation across projects , hard to know as they are not visible in the UI

Metrics

Enhancements

  • Abilility to run non-local metrics with your own small models
  • Make the metrics visible in the UI

Bugs

  • Function to get the value of a metric with a "space in the name" via SDK

Work around now is :

getattr(row.metrics,"Response Length")

Security

  • Not really a bug , but wonder about the attribute polution of the metrics object (using the name of the metric as the attribute name)
  • Uploading python code as a custom scorer, would love to know that it can't execute shell or polute function call
  • Would want to know more details on the sandbox in which the code runs

Integrations

Enhancements

  • Now only a single key can be added , potentially leading to possible quota issues, would be good to be able to override this per project / user
  • Monitoring quotas

Observe

Enhancements

  • Metrics using LLM as a judge can't be created via SDK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment