Skip to content

Instantly share code, notes, and snippets.

@glaforge
Created February 16, 2025 13:28
Show Gist options
  • Save glaforge/6c340c317276bb4a6be02deb1ecb7ee1 to your computer and use it in GitHub Desktop.
Save glaforge/6c340c317276bb4a6be02deb1ecb7ee1 to your computer and use it in GitHub Desktop.
How to contribute a new LLM module

Okay, let's outline the steps to create a new language model module for LangChain4j and propose its inclusion. Based on the provided file structure, you'll be focusing on creating a new module similar to the existing ones (e.g., langchain4j-open-ai, langchain4j-ollama, etc.). Here's a breakdown of the process, referencing the structure you've provided:

Key Steps and Considerations

  1. Understand the Abstractions and SPI: LangChain4j, like its Python counterpart, is built around core abstractions. You need to understand these to implement your integration correctly. The core abstractions you must implement are:

    • ChatLanguageModel / StreamingChatLanguageModel: For conversational models (like ChatGPT, Gemini). Implement ChatLanguageModel for synchronous responses, and StreamingChatLanguageModel if the model supports streaming responses token by token.
    • LanguageModel / StreamingLanguageModel: For models with a simpler text-in, text-out interface (less common these days).
    • EmbeddingModel: If the model provider offers embedding capabilities.
    • ModerationModel: If the model provider offers content moderation.
    • ScoringModel: If the model provider offers scoring/ranking capabilities.
    • Builder Factories: You'll also need to create builder factories (SPIs) for each model type you implement. These are how users will construct your model classes. See examples like AzureOpenAiChatModelBuilderFactory. These are registered using the Java ServiceLoader mechanism (the META-INF/services files).
  2. Choose a Module Structure (and Repository):

    • Community Repo (Preferred for new integrations): Start your integration in the langchain4j-community repository. This is the recommended approach for new contributions. It allows for easier initial review and iteration before considering a move to the core langchain4j repository. Clone this repo, don't fork the main langchain4j repo directly.
    • Main langchain4j Repo (For Core Integrations): If your integration is with a very widely used and well-established model provider (like OpenAI, Google, etc.), and you are confident in its stability and long-term maintenance, you might propose it for the main repo. However, start in langchain4j-community first.
    • Module Naming: Follow the pattern: langchain4j-{provider-name} (e.g., langchain4j-my-llm).
    • Directory Structure: Create a directory structure mirroring the existing modules (see langchain4j-open-ai or langchain4j-ollama as good examples):
      langchain4j-{provider-name}/
          pom.xml  (Your module's Maven build file)
          src/
              main/
                  java/
                      dev/
                          langchain4j/
                              model/
                                  {providername}/  (e.g., myllm)
                                      {ProviderName}ChatModel.java  (Your implementation)
                                      internal/ (API client and related classes)
                                      spi/      (Builder factory for your model)
                                          {ProviderName}ChatModelBuilderFactory.java
                  resources/
                      META-INF/
                          services/
                              (Files to register your builder factory, see examples)
              test/
                  java/
                      dev/
                          langchain4j/
                              model/
                                  {providername}/
                                      {ProviderName}ChatModelIT.java (Integration tests)
      
  3. Implement the API Client:

    • Official SDK (Preferred): If the LLM provider has an official Java SDK, use it. This is usually the best approach for stability, performance, and access to all features. See langchain4j-bedrock for an example using an official SDK.
    • HTTP Client (If no SDK): If there's no official SDK, use the JDK's built-in java.net.http.HttpClient (available since Java 11). This minimizes external dependencies. Avoid adding new dependencies unless absolutely necessary. See http-clients/langchain4j-http-client-jdk for how LangChain4j wraps this. Avoid using the older okhttp3 directly if possible, prefer langchain4j-http-client-jdk (or langchain4j-http-client-spring-restclient if building a Spring Boot starter).
    • JSON Handling: Use Jackson for JSON serialization/deserialization, as it's already a dependency.
    • Error Handling: Make sure to handle HTTP errors (non-2xx responses) appropriately. Throw a dev.langchain4j.exception.HttpException for these.
    • Request/Response Logging: Implement logging for requests and responses (see langchain4j-anthropic for a complete example). This is very helpful for debugging.
  4. Implement the Model Interface(s):

    • Implement ChatLanguageModel, StreamingChatLanguageModel, EmbeddingModel, etc., as appropriate, based on the provider's capabilities.
    • Use the Builder pattern for your model classes to allow for flexible configuration.
    • Make sure your implementation handles request/response mapping and error handling correctly.
    • Implement TokenCountEstimator if possible, so the TokenWindowChatMemory can calculate the token usage. Implement DimensionAwareEmbeddingModel to report the output dimension from the embedding model.
  5. Write Tests:

    • Unit Tests: Create unit tests for any complex logic, utility methods, and request/response mappers.
    • Integration Tests (ITs): Create integration tests (e.g., MyLlmChatModelIT.java) that interact with the real LLM provider's API. These are crucial for ensuring your integration works correctly.
      • Use environment variables (e.g., MYLLM_API_KEY) to store API keys and other secrets. Do not hardcode them.
      • Use @EnabledIfEnvironmentVariable to skip the tests if the required environment variables are not set.
      • Extend AbstractChatModelIT, AbstractStreamingChatModelIT, AbstractEmbeddingModelIT, and/or AbstractScoringModelIT to get a set of basic tests.
      • Test all relevant features of the model (e.g., text generation, streaming, different parameters, tool use, JSON mode).
      • Add test for concurrent requests if possible.
      • Consider adding a test for the Tokenizer interface (see examples in langchain4j-core).
      • Add @RetryingTest if model response is inconsistent
  6. Add to BOM (Bill of Materials): Add your new module to langchain4j-bom/pom.xml. This helps users manage dependencies.

  7. Documentation:

    • Update README.md: Add your integration to the list of supported models and embedding stores.
    • Create Markdown Documentation: Create Markdown files in the docs/docs/integrations/ directory, following the structure of existing integrations. You'll need:
      • A main file (e.g., my-llm.md).
      • An entry in docs/docs/integrations/language-models/index.md and in docs/sidebars.js.
      • An entry in _category_.json files in docs/docs/integrations/language-models and docs/docs/integrations/embedding-stores
    • Examples (Highly Recommended): Create a simple example in the langchain4j-examples repository. This is very helpful for users.
  8. General Guidelines (from CONTRIBUTING.md):

    • Java 17: Maintain compatibility with Java 17.
    • Minimal Dependencies: Avoid adding new dependencies if possible. If necessary, try to use libraries already present. Run mvn dependency:analyze to check.
    • Backwards Compatibility: Avoid breaking changes. If necessary, deprecate old methods/fields instead of removing them.
    • Naming Conventions: Follow existing naming conventions.
    • No Lombok: Avoid using Lombok in new code; remove it from existing code if you touch it.
    • Javadoc: Add Javadoc where needed.
    • Code Style: Run make lint and make format before committing.
    • Large Features: Discuss large features with maintainers (@langchain4j) before implementation.
  9. Open a Pull Request (Draft First):

    • Open a draft PR in the langchain4j-community repository.
    • Fill out all sections of the PR template.
    • Once the PR is reviewed and approved, you will be asked to finalize it (add documentation, examples, etc.).

Example Code Snippets (Illustrative)

// MyLlmChatModel.java (in the appropriate package)
public class MyLlmChatModel implements ChatLanguageModel {

    private final MyLlmClient client; // Your custom client
    private final String modelName;

    @Builder
    public MyLlmChatModel(String baseUrl, String apiKey, String modelName, Duration timeout) {
        this.client = new MyLlmClient(baseUrl, apiKey, timeout); // Your custom client
        this.modelName = modelName;
    }

    @Override
    public Response<AiMessage> generate(List<ChatMessage> messages) {
        // 1. Convert Langchain4j messages to MyLLM's request format
        // 2. Call client.chat(...)
        // 3. Convert MyLLM's response to LangChain4j's Response<AiMessage>
        // 4. Handle errors (throw HttpException)
        return null; // TODO: Implement
    }

    // ... other methods, builder, etc. ...
}


// MyLlmClient.java (in a suitable package, e.g., dev.langchain4j.model.myllm.internal)
class MyLlmClient {

    // ... implementation using java.net.http.HttpClient ...
}

// MyLlmChatModelBuilderFactory.java (in a suitable package, e.g., dev.langchain4j.model.myllm.spi)
public class MyLlmChatModelBuilderFactory implements Supplier<MyLlmChatModel.Builder> {

    @Override
    public MyLlmChatModel.Builder get() {
        return MyLlmChatModel.builder();
    }
}

// META-INF/services/dev.langchain4j.model.chat.spi.ChatLanguageModelBuilderFactory (resource file)
// Add a line with the fully qualified name of your factory:
// dev.langchain4j.model.myllm.spi.MyLlmChatModelBuilderFactory

Key Points and Common Mistakes

  • Thorough Testing: Integration tests are absolutely essential. Test with real API calls.
  • Error Handling: Handle HTTP errors (non-2xx responses) from the LLM provider's API.
  • Token Usage: If the LLM provider gives you token usage information, include it in the Response.
  • Finish Reason: If the LLM provider gives you a finish reason (e.g., "stop", "length"), include it in the Response.
  • Consistency: Maintain consistency with existing LangChain4j modules in terms of naming, structure, and coding style.
  • Documentation: Clear, concise, and accurate documentation is critical. Follow the patterns you see in existing modules.
  • SPI Registration: Remember to create the src/main/resources/META-INF/services/ files to register your model implementation as a service.

This comprehensive guide, combined with the provided file structure and examples, should give you a strong foundation for contributing your new language model integration to LangChain4j. Remember to start in the langchain4j-community repository for initial development and review. Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment