This markdown shows a quick example of how to use Databricks DBRX to generate and run a transform query against a sammple dataset.
Install the following
- Configure and install
databricks-cli
pip install langchain langchain-community mlflow setuptools
from langchain_community.chat_models import ChatDatabricks
from pyspark_ai import SparkAI
llm = ChatDatabricks(
endpoint="databricks-dbrx-instruct",
)
spark_ai = SparkAI(llm=llm)
spark_ai.activate() # active partial functions for Spark DataFrame
df = spark_ai._spark.createDataFrame(
[
("Normal", "Cellphone", 6000),
("Normal", "Tablet", 1500),
("Mini", "Tablet", 5500),
("Mini", "Cellphone", 5000),
("Foldable", "Cellphone", 6500),
("Foldable", "Tablet", 2500),
("Pro", "Cellphone", 3000),
("Pro", "Tablet", 4000),
("Pro Max", "Cellphone", 4500)
],
["product", "category", "revenue"]
)
## DataFrame Transformation
df.ai.transform("What are the best-selling and the second best-selling products in every category?").show()
The following query was generated by DBRX per the .ai.transform
SELECT category, product, revenue
FROM (
SELECT category, product, revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM spark_ai_temp_view_152383686
) tmp
WHERE rank <= 2
And here are the results - enjoy!
+---------+--------+-------+
| category| product|revenue|
+---------+--------+-------+
|Cellphone|Foldable| 6500|
|Cellphone| Normal| 6000|
| Tablet| Mini| 5500|
| Tablet| Pro| 4000|
+---------+--------+-------+