Sascha Heyer SaschaHeyer

gcloud config set project sascha-playground-doit export PROJECT_ID=$(gcloud config get project) export REGION=us-central1 export CLUSTER_NAME=vllm export HF_TOKEN=XXX

gcloud container clusters create-auto ${CLUSTER_NAME}
--project=${PROJECT_ID}
--region=${REGION}
--release-channel=rapid \

Question: how can I deploy a LLM to Google cloud? The model size is around 80GB.

Answer: Deploying an 80GB LLM to Google Cloud is a great goal! It's a bit like moving a large piece of furniture - you need the right tools and plan to make it smooth. Here's a breakdown of how to do it:

Choose Your Deployment Platform:

Vertex AI: This is Google Cloud's managed machine learning platform. It's designed for deploying and managing models, including large ones. Vertex AI offers:

Model Serving: You can deploy your LLM as a model endpoint, allowing you to send requests for predictions.

Question: How can I create text embeddings with google cloud?

Answer: You can create text embeddings with Google Cloud using the Text Embeddings API on Vertex AI. This API allows you to generate numerical representations of text, which can be used for various tasks like:

Semantic Search: Finding documents similar in meaning to a given query. Recommendation Systems: Suggesting relevant content based on user preferences. Clustering: Grouping similar documents together. Question Answering: Finding answers to questions based on a knowledge base.

	{
	"workbench.colorTheme": "Google Next 2025",

	"editor.copyWithSyntaxHighlighting": false,
	"diffEditor.ignoreTrimWhitespace": false,
	"editor.emptySelectionClipboard": false,
	"workbench.editor.enablePreview": false,
	"window.newWindowDimensions": "inherit",
	"editor.multiCursorModifier": "ctrlCmd",
	"files.trimTrailingWhitespace": true,

	import pickle

	def unpickle_file(file_path):
	"""Load a pickled file."""
	try:
	with open(file_path, 'rb') as file:
	data = pickle.load(file)
	return data
	except (pickle.UnpicklingError, FileNotFoundError, EOFError) as e:
	print(f"Error unpickling file: {e}")

	import vertexai
	from vertexai.generative_models import (
	GenerativeModel,
	Tool,
	FunctionDeclaration,
	)
	from vertexai.preview import reasoning_engines
	from google.cloud import bigquery, firestore
	from langchain_google_firestore import FirestoreChatMessageHistory
	from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

	from google.cloud import storage
	import cv2
	import tempfile

	# Initialize Cloud Storage client
	storage_client = storage.Client()
	bucket_name = "your-bucket-name"
	blob_name = "path/to/image.jpg"

	# Download to a temporary file

	import vertexai
	from vertexai.preview import reasoning_engines
	from vertexai.preview import rag
	from vertexai.preview.generative_models import Tool

	from langchain.agents.format_scratchpad.tools import format_to_tool_messages
	from langchain.memory import ChatMessageHistory
	from langchain_core import prompts
	from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

	import re
	import time
	import streamlit as st
	import pandas as pd
	import requests
	from bs4 import BeautifulSoup
	from urllib.parse import urljoin, urlparse
	import vertexai
	from vertexai.generative_models import GenerativeModel
	import vertexai.preview.generative_models as generative_models

	import vertexai
	from vertexai.language_models import TextGenerationModel
	from google.cloud import translate_v2 as translate

	vertexai.init(project="sascha-playground-doit", location="us-central1")
	parameters = {
	"candidate_count": 1,
	"max_output_tokens": 1024,
	"temperature": 0.9,
	"top_p": 1