Feedback Galileo Product

Context

Feedback system on the docs is not working (mintly link is broken? ). So here's a list of issues and enhancements I came across. Happy to have a chat about them.

Documentation at https://docs.galileo.ai/galileo
SDK - https://promptquality.docs.galileo.ai/

Datasets

Related docs

https://docs.galileo.ai/galileo/gen-ai-studio-products/galileo-evaluate/how-to/using-datasets

Bugs

The functions : list_datasets() , get_dataset_content() and create_dataset() are not working as documented.

import os
import promptquality as pq

pq.login(os.environ["GALILEO_CONSOLE_URL"])

dataset = pq.create_dataset(
    {
        "virtue": ["benevolence", "trustworthiness"],
        "voice": ["Oprah Winfrey", "Barack Obama"],
    }
)

They keep asking for credentials, judging from the code it is a pass through from the API but doesn't work well with the promptquality module.

Workaround

Function upload_dataset() is available in the SDK, but not documented.
The template version id doesn't really seem to matter ?

data = {"topic": ["Quantum Physics", "Politics", "Large Language Models"]}

from promptquality.helpers import upload_dataset

# The template version id doesn't really seem to matter ?
dataset = upload_dataset(data, project_id, template_version_id=template.selected_version.id)
print(dataset)

Useful enhancements

Set the name of a dataset (only in UI now)
Update a dataset (not possible but useful to keep the same dataset id)
Set the location (EU-east etc..) of where the dataset is stored
Point the dataset to an S3 bucket and the likes , instead of local upload

Templates

Related docs

https://docs.galileo.ai/galileo/gen-ai-studio-products/galileo-evaluate/how-to/prompt-management-storage

Bugs

Trouble reusing a template for a run:

Here's the code to create a template, that's working fine.

template = create_template(
    template_name="my-template",
    project_id=project_id,
    template="""Answer the question based only on the following context:

    {context}

    Question: {question}
    """
)

Retrieving the template works too:

    template = pq.get_template(project_name="my-project", template_name="my-template")
    print(template)

The problem is in the reuse that template for a run:

run = pq.run(template=template.selected_version,
           template_name=template.name,
           dataset=data,
           settings=pq.Settings(model_alias='ChatGPT (16K context)',
                                temperature=0.8,
                                max_tokens=400),
)

Not sure what the syntax is for this, this sample is not working and complains.

Useful enhancements

command to list all templates
make templates visible in the UI
command to delete a template

Projects

Enhancements

command to list all projects via SDK: to do automated cleanup
command to delete a project via SDK : to do automated cleanup
allow for widening the description of a project in the UI
allow for batch deletion of projects in the UI

Annotations

Enhancements

command to list all annotations via SDK
command to create annotations type via SDK
command to submit an annotation via SDK (instead of only via UI)
command to delete an annotation via SDK
describe how to get the values of an annotation type via SDK:
- I found that the annotations with their names become part of metrics attributes
Reuse the same name for an annotation across projects , hard to know as they are not visible in the UI

Metrics

Enhancements

Abilility to run non-local metrics with your own small models
Make the metrics visible in the UI

Bugs

Function to get the value of a metric with a "space in the name" via SDK

Work around now is :

getattr(row.metrics,"Response Length")

Security

Not really a bug , but wonder about the attribute polution of the metrics object (using the name of the metric as the attribute name)
Uploading python code as a custom scorer, would love to know that it can't execute shell or polute function call
Would want to know more details on the sandbox in which the code runs

Integrations

Enhancements

Now only a single key can be added , potentially leading to possible quota issues, would be good to be able to override this per project / user
Monitoring quotas

Observe

Enhancements

Metrics using LLM as a judge can't be created via SDK

jedi4ever/galileo-feedbacknotes.md

Feedback Galileo Product

Context

Datasets

Related docs

Bugs

Workaround

Useful enhancements

Templates

Related docs

Bugs

Useful enhancements

Projects

Enhancements

Annotations

Enhancements

Metrics

Enhancements

Bugs

Security

Integrations

Enhancements

Observe

Enhancements