Skip to content

Instantly share code, notes, and snippets.

@ben-vargas
Last active June 25, 2025 19:50
Show Gist options
  • Select an option

  • Save ben-vargas/ce3f2cd586f45bed1a8413940398836f to your computer and use it in GitHub Desktop.

Select an option

Save ben-vargas/ce3f2cd586f45bed1a8413940398836f to your computer and use it in GitHub Desktop.
Product Requirements Document: Vercel AI SDK Provider for `@google/gemini-cli-core`

Product Requirements Document: Vercel AI SDK Provider for @google/gemini-cli-core

Field Value
Title Vercel AI SDK Community Provider for Gemini CLI Core
Author Gemini
Status Proposed
Version 1.0
Date June 25, 2025

1. Introduction

This document outlines the product requirements for creating a new community provider for the Vercel AI SDK. This provider will act as a bridge between the Vercel AI SDK (@vercel/ai) and the @google/gemini-cli-core library, enabling developers to leverage the power of Google's Gemini models within the standardized, framework-agnostic Vercel AI SDK ecosystem.

The provider will implement the LanguageModelV1 interface specified by the Vercel AI team, ensuring seamless integration with existing AI SDK helper functions like streamText, generateText, and streamObject.

2. Problem Statement

Developers using the Vercel AI SDK to build applications currently lack a direct, officially supported, or community-provided integration for the powerful features exposed by the @google/gemini-cli-core package. This includes its robust tool-use implementation, chat history management, and direct access to Gemini models.

To use these features, a developer would need to write a significant amount of custom boilerplate code to adapt the @google/gemini-cli-core interface to the Vercel AI SDK's LanguageModelV1 specification. This creates friction, duplicates effort across the community, and defeats the purpose of having a standardized SDK for building AI applications.

3. Goals and Objectives

  • Primary Goal: To enable developers to easily use Google's Gemini models via the @google/gemini-cli-core library as a language model provider within the Vercel AI SDK.
  • Objective 1: Create and publish an npm package (e.g., @ai-sdk/gemini-cli-provider) that exports a fully compliant LanguageModelV1 provider.
  • Objective 2: Implement full support for the core AI SDK features, including single-shot generation (generateText), response streaming (streamText), and server-side tool/function calling.
  • Objective 3: Provide clear and comprehensive documentation with installation instructions and usage examples for all supported features.
  • Objective 4: Achieve sufficient quality and stability to be listed as an official community provider by the Vercel AI SDK team.

4. Target Audience

The primary audience for this provider is any developer using the Vercel AI SDK in their projects (e.g., Next.js, SvelteKit, or any Node.js environment) who wishes to use Google's Gemini models as their backend LLM.

5. Scope

In Scope (Version 1.0)

  • A new class/factory function that implements the LanguageModelV1 interface.
  • Implementation of the doGenerate method for non-streaming text and tool-use generation.
  • Implementation of the doStream method for streaming text, tool-use, and other stream parts.
  • Translation layer to map data structures between the Vercel AI SDK format and the @google/gemini-cli-core format.
  • Flexible authentication support, allowing instantiation with either a simple API key or a pre-configured GoogleAuth client object for more complex scenarios like OAuth2.
  • Support for multimodal inputs, specifically mapping image data (e.g., base64 encoded strings with MIME types) from the Vercel AI SDK prompt format to the format expected by @google/gemini-cli-core.
  • Mapping of @google/gemini-cli-core errors to standard LanguageModelV1Error types.
  • Comprehensive unit and integration test suite.
  • A README.md file detailing installation, authentication, and usage.

Out of Scope (Version 1.0)

  • Support for any language model interface other than LanguageModelV1.
  • Implementation of the interactive, user-facing OAuth2 flow. The provider will accept a configured auth client, but the end-user application is responsible for the interactive process of obtaining tokens.
  • A user interface or front-end components. This is a server-side provider only.
  • Direct interaction with the gemini CLI executable; this provider will use the @google/gemini-cli-core library directly.

6. Functional Requirements

FR1: Provider Implementation

  • FR1.1: The provider shall export a factory function (e.g., createGeminiCliCoreProvider()) that returns an object conforming to the LanguageModelV1 interface.
  • FR1.2: The provider must be stateless, with all necessary configuration passed during instantiation.

FR2: Authentication

  • FR2.1: The factory function must accept an options object that allows for flexible authentication.
  • FR2.2: The options object must support providing a simple apiKey string for basic authentication.
  • FR2.3: The options object must also support providing a pre-configured authClient object (e.g., an instance of GoogleAuth from google-auth-library) for advanced authentication scenarios like OAuth2 or service accounts.
  • FR2.4: The provider must throw an error during initialization if neither an apiKey nor an authClient is provided.
  • FR2.5: The provider must securely use the provided credentials to initialize and make requests with the underlying @google/gemini-cli-core client.

FR3: doGenerate (Non-Streaming Generation)

  • FR3.1: The provider must implement the doGenerate(prompt: LanguageModelV1Prompt) method.
  • FR3.2: It must correctly map the incoming LanguageModelV1Prompt (including system messages, user/assistant history, and tool calls/results) to the format expected by @google/gemini-cli-core.
  • FR3.3: It must correctly map the response from @google/gemini-cli-core to the LanguageModelV1Response format, including text, toolCalls, finishReason, and usage tokens.

FR4: doStream (Streaming Generation)

  • FR4.1: The provider must implement the doStream(prompt: LanguageModelV1Prompt) method.
  • FR4.2: The method must return an AsyncIterable<LanguageModelV1StreamPart>.
  • FR4.3: The provider must handle the streaming output from @google/gemini-cli-core and transform it into a stream of LanguageModelV1StreamPart objects.
  • FR4.4: The stream must support the following parts:
    • text-delta: For streaming text content.
    • tool-call: For streaming tool use requests from the model.
    • finish: To signal the end of the generation, including the finishReason and usage.
    • error: To communicate errors that occur during the stream.

FR5: Tool & Function Calling

  • FR5.1: The provider must accept tool definitions in the Vercel AI SDK format (Zod schemas) passed in the prompt.
  • FR5.2: It must translate these Zod-based tool definitions into the format required by @google/gemini-cli-core to be sent to the model.
  • FR5.3: It must correctly parse toolCalls from the model's response (both streaming and non-streaming) and format them into the Vercel AI SDK's toolCalls structure.

FR6: Multimodal Input

  • FR6.1: The provider must correctly identify image data within the LanguageModelV1Prompt.
  • FR6.2: It must map the Vercel AI SDK's image format (e.g., { type: 'image', image: Buffer | string, mimeType?: string }) to the Part object format used by @google/gemini-cli-core ({ inlineData: { data: 'base64...', mimeType: '...' } }).

7. Non-Functional Requirements

  • NFR1 (Performance): The provider should introduce minimal latency over the direct @google/gemini-cli-core library calls. Streaming should be efficient, yielding chunks as soon as they are received.
  • NFR2 (Reliability): The provider must gracefully handle API errors from the downstream service and network interruptions, reporting them through the standard error-handling mechanisms of the Vercel AI SDK.
  • NFR3 (Security): API keys and other sensitive data must be handled securely and not be exposed in logs or error messages.
  • NFR4 (Documentation): The provider must have a README.md file that includes:
    • NPM package name and installation instructions.
    • Clear examples of how to instantiate the provider using both a simple apiKey and a pre-configured authClient for OAuth.
    • Usage examples for streamText, generateText, streamObject with tool calling, and prompts with image inputs.
    • A note clarifying that the end-user application is responsible for the interactive part of the OAuth flow.
  • NFR5 (Testability): The provider must have a high level of unit test coverage, with tests for all core functionalities, including data mapping, streaming logic, and error handling.

8. Dependencies and Assumptions

  • Dependency 1: This project will depend on the @vercel/ai, @google/gemini-cli-core, and zod npm packages.
  • Assumption 1: The @google/gemini-cli-core library provides a stable, programmatically accessible API for sending prompts, receiving responses (including streams), and defining tools.
  • Assumption 2: The Vercel LanguageModelV1 specification will remain stable throughout the development of version 1.0 of this provider.

9. Success Metrics

  • Adoption: The number of weekly downloads of the package on npm.
  • Quality: A low number of bug reports and issues opened on the project's GitHub repository.
  • Functionality: 100% of the functional requirements are implemented and verified by the test suite.
  • Community Recognition: The provider is successfully submitted to and accepted by the Vercel AI SDK team for inclusion in their list of community providers.

10. Future Work (Post-V1.0)

  • Provide optional helper utilities or detailed documentation/tutorials to guide end-users in implementing the interactive OAuth2 flow in common frameworks (e.g., Next.js).
  • Advanced caching strategies to reduce redundant API calls.
  • Adding support for new features as they are introduced in the Gemini API and @google/gemini-cli-core.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment