With LLMs becoming available in Snowflake as part of their Cortex suite of products in this piece we will explore what the experience is like when classifying text. First of all, Snowflake has native CLASSIFY_TEXT
function that does exactly what it says when given a piece of text and an array of possible categories. Second, one could classify text using emebeddings (EMBED_TEXT_768
) and similarity to possible categories calculated by one of the distance function like cosine similarity (VECTOR_COSINE_SIMILARITY
). Finally, when going the embeddings + similarity route we could use a cross join with a categories table or create a column for each category's similarity score and then assign the greatest one. So we have thre
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import plotly.express as px | |
import streamlit as st | |
# Sample data | |
df = px.data.iris() | |
# Create the Plotly figure | |
fig = px.scatter(df, | |
x="sepal_width", | |
y="sepal_length", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql.functions import monotonically_increasing_id, row_number | |
from pyspark.sql import Window | |
from functools import reduce | |
def partitionIt(size, num): | |
''' | |
Create a list of partition indices each of size num where number of groups is ceiling(len(seq)/num) | |
Args: | |
size (int): number of rows/elemets |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import tableauserverclient as TSC | |
import pandas as pd | |
from io import StringIO | |
class Tableau_Server(object): | |
"""docstring for ClassName""" | |
def __init__(self,username, password,site_id,url, https = False): | |
super().__init__() # http://stackoverflow.com/questions/576169/understanding-python-super-with-init-methods |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<apex:page > | |
<html> | |
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script> | |
<!-- User Id in a span --> | |
<span id = 'user' style = 'display: none;'> | |
<apex:outputText label="Account Owner" value="{!$User.Id}"></apex:outputText> | |
</span> | |
<!-- Embed placeholder --> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Reddit(): | |
def __init__(self,client_id, client_secret,user_agent='My agent'): | |
self.reddit = praw.Reddit(client_id=client_id, | |
client_secret=client_secret, | |
user_agent=user_agent) | |
def get_comments(self, submission): | |
# get comments information using the Post as a starting comment | |
comments = [RedditComment(author=submission.author, | |
commentid = submission.postid, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tidyr) | |
setwd("~/Desktop/unnest") | |
fname = "file-name.csv" | |
df = read.csv(paste0(fname,'.csv'), stringsAsFactors = F) | |
df$seats = | |
sapply(1:nrow(df), function(x) { | |
seats = c(df[x,]$first_seat,df[x,]$last_seat) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
import base64 | |
import pprint | |
import pandas as pd | |
import json | |
from tqdm import tqdm | |
# https://stubhubapi.zendesk.com/hc/en-us/articles/220922687-Inventory-Search |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# http://srome.github.io/Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas/ | |
class HTMLTableParser: | |
@staticmethod | |
def get_element(node): | |
# for XPATH we have to count only for nodes with same type! | |
length = len(list(node.previous_siblings)) + 1 | |
if (length) > 1: | |
return '%s:nth-child(%s)' % (node.name, length) | |
else: | |
return node.name |
NewerOlder