Created
January 10, 2024 17:16
-
-
Save jjesusfilho/71912052b734938d46fe0353d241503f to your computer and use it in GitHub Desktop.
Chama o tiktoken do python para contar os tokens.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Conta tokens de textos com base em modelos da OPENAI | |
#' | |
#' @param x Vetor de textos | |
#' @param modelo Modelo a ser utilizado. | |
#' | |
#' @details Para usar esta função você tem de ter instalado o pacote | |
#' tiktoken do Python, o qual será chamado via reticulate. | |
#' | |
#' @return Vetor com quantidade de tokens em cada texto. | |
#' @export | |
#' | |
openai_contar_tokens <- function(x, modelo = "gpt-3.5-turbo"){ | |
tk <- reticulate::import("tiktoken") | |
encoding <- tk$encoding_for_model(modelo) | |
x |> | |
purrr::map_int(purrr::possibly(~{ | |
encoding$encode_batch(.x) |> | |
length() | |
}, NA_integer_)) | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment