Using DBRX with PySpark AI

This markdown shows a quick example of how to use Databricks DBRX to generate and run a transform query against a sammple dataset.

Requirements

Install the following

Configure and install databricks-cli
pip install langchain langchain-community mlflow setuptools

Code Example

from langchain_community.chat_models import ChatDatabricks
from pyspark_ai import SparkAI

llm = ChatDatabricks(
    endpoint="databricks-dbrx-instruct",
)

spark_ai = SparkAI(llm=llm)
spark_ai.activate()  # active partial functions for Spark DataFrame

df = spark_ai._spark.createDataFrame(
    [
        ("Normal", "Cellphone", 6000),
        ("Normal", "Tablet", 1500),
        ("Mini", "Tablet", 5500),
        ("Mini", "Cellphone", 5000),
        ("Foldable", "Cellphone", 6500),
        ("Foldable", "Tablet", 2500),
        ("Pro", "Cellphone", 3000),
        ("Pro", "Tablet", 4000),
        ("Pro Max", "Cellphone", 4500)
    ],
    ["product", "category", "revenue"]
)

## DataFrame Transformation
df.ai.transform("What are the best-selling and the second best-selling products in every category?").show()

Query Generated

The following query was generated by DBRX per the .ai.transform

SELECT category, product, revenue
FROM (
    SELECT category, product, revenue,
           ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
    FROM spark_ai_temp_view_152383686
) tmp
WHERE rank <= 2

Results

And here are the results - enjoy!

+---------+--------+-------+
| category| product|revenue|
+---------+--------+-------+
|Cellphone|Foldable|   6500|
|Cellphone|  Normal|   6000|
|   Tablet|    Mini|   5500|
|   Tablet|     Pro|   4000|
+---------+--------+-------+

dennyglee/using-dbrx-with-pyspark-ai.md

Using DBRX with PySpark AI

Requirements

Code Example

Query Generated

Results