Okay, let's outline the steps to create a new language model module for LangChain4j and propose its inclusion. Based on the provided file structure, you'll be focusing on creating a new module similar to the existing ones (e.g., langchain4j-open-ai, langchain4j-ollama, etc.). Here's a breakdown of the process, referencing the structure you've provided:
Key Steps and Considerations
-
Understand the Abstractions and SPI: LangChain4j, like its Python counterpart, is built around core abstractions. You need to understand these to implement your integration correctly. The core abstractions you must implement are:
ChatLanguageModel/StreamingChatLanguageModel: For conversational models (like ChatGPT, Gemini). ImplementChatLanguageModelfor synchronous responses, andStreamingChatLanguageModelif the model supports streaming responses token by token.LanguageModel/StreamingLanguageModel: For models with a simpler text-in, text-out interface (less common these days).EmbeddingModel: If the model provider offers embedding capabilities.ModerationModel: If the model provider offers content moderation.ScoringModel: If the model provider offers scoring/ranking capabilities.- Builder Factories: You'll also need to create builder factories (SPIs) for each model type you implement. These are how users will construct your model classes. See examples like
AzureOpenAiChatModelBuilderFactory. These are registered using the Java ServiceLoader mechanism (theMETA-INF/servicesfiles).
-
Choose a Module Structure (and Repository):
- Community Repo (Preferred for new integrations): Start your integration in the
langchain4j-communityrepository. This is the recommended approach for new contributions. It allows for easier initial review and iteration before considering a move to the corelangchain4jrepository. Clone this repo, don't fork the mainlangchain4jrepo directly. - Main
langchain4jRepo (For Core Integrations): If your integration is with a very widely used and well-established model provider (like OpenAI, Google, etc.), and you are confident in its stability and long-term maintenance, you might propose it for the main repo. However, start inlangchain4j-communityfirst. - Module Naming: Follow the pattern:
langchain4j-{provider-name}(e.g.,langchain4j-my-llm). - Directory Structure: Create a directory structure mirroring the existing modules (see
langchain4j-open-aiorlangchain4j-ollamaas good examples):langchain4j-{provider-name}/ pom.xml (Your module's Maven build file) src/ main/ java/ dev/ langchain4j/ model/ {providername}/ (e.g., myllm) {ProviderName}ChatModel.java (Your implementation) internal/ (API client and related classes) spi/ (Builder factory for your model) {ProviderName}ChatModelBuilderFactory.java resources/ META-INF/ services/ (Files to register your builder factory, see examples) test/ java/ dev/ langchain4j/ model/ {providername}/ {ProviderName}ChatModelIT.java (Integration tests)
- Community Repo (Preferred for new integrations): Start your integration in the
-
Implement the API Client:
- Official SDK (Preferred): If the LLM provider has an official Java SDK, use it. This is usually the best approach for stability, performance, and access to all features. See
langchain4j-bedrockfor an example using an official SDK. - HTTP Client (If no SDK): If there's no official SDK, use the JDK's built-in
java.net.http.HttpClient(available since Java 11). This minimizes external dependencies. Avoid adding new dependencies unless absolutely necessary. Seehttp-clients/langchain4j-http-client-jdkfor how LangChain4j wraps this. Avoid using the olderokhttp3directly if possible, preferlangchain4j-http-client-jdk(orlangchain4j-http-client-spring-restclientif building a Spring Boot starter). - JSON Handling: Use Jackson for JSON serialization/deserialization, as it's already a dependency.
- Error Handling: Make sure to handle HTTP errors (non-2xx responses) appropriately. Throw a
dev.langchain4j.exception.HttpExceptionfor these. - Request/Response Logging: Implement logging for requests and responses (see
langchain4j-anthropicfor a complete example). This is very helpful for debugging.
- Official SDK (Preferred): If the LLM provider has an official Java SDK, use it. This is usually the best approach for stability, performance, and access to all features. See
-
Implement the Model Interface(s):
- Implement
ChatLanguageModel,StreamingChatLanguageModel,EmbeddingModel, etc., as appropriate, based on the provider's capabilities. - Use the
Builderpattern for your model classes to allow for flexible configuration. - Make sure your implementation handles request/response mapping and error handling correctly.
- Implement
TokenCountEstimatorif possible, so theTokenWindowChatMemorycan calculate the token usage. ImplementDimensionAwareEmbeddingModelto report the output dimension from the embedding model.
- Implement
-
Write Tests:
- Unit Tests: Create unit tests for any complex logic, utility methods, and request/response mappers.
- Integration Tests (ITs): Create integration tests (e.g.,
MyLlmChatModelIT.java) that interact with the real LLM provider's API. These are crucial for ensuring your integration works correctly.- Use environment variables (e.g.,
MYLLM_API_KEY) to store API keys and other secrets. Do not hardcode them. - Use
@EnabledIfEnvironmentVariableto skip the tests if the required environment variables are not set. - Extend
AbstractChatModelIT,AbstractStreamingChatModelIT,AbstractEmbeddingModelIT, and/orAbstractScoringModelITto get a set of basic tests. - Test all relevant features of the model (e.g., text generation, streaming, different parameters, tool use, JSON mode).
- Add test for concurrent requests if possible.
- Consider adding a test for the
Tokenizerinterface (see examples inlangchain4j-core). - Add
@RetryingTestif model response is inconsistent
- Use environment variables (e.g.,
-
Add to BOM (Bill of Materials): Add your new module to
langchain4j-bom/pom.xml. This helps users manage dependencies. -
Documentation:
- Update
README.md: Add your integration to the list of supported models and embedding stores. - Create Markdown Documentation: Create Markdown files in the
docs/docs/integrations/directory, following the structure of existing integrations. You'll need:- A main file (e.g.,
my-llm.md). - An entry in
docs/docs/integrations/language-models/index.mdand indocs/sidebars.js. - An entry in
_category_.jsonfiles indocs/docs/integrations/language-modelsanddocs/docs/integrations/embedding-stores
- A main file (e.g.,
- Examples (Highly Recommended): Create a simple example in the
langchain4j-examplesrepository. This is very helpful for users.
- Update
-
General Guidelines (from CONTRIBUTING.md):
- Java 17: Maintain compatibility with Java 17.
- Minimal Dependencies: Avoid adding new dependencies if possible. If necessary, try to use libraries already present. Run
mvn dependency:analyzeto check. - Backwards Compatibility: Avoid breaking changes. If necessary, deprecate old methods/fields instead of removing them.
- Naming Conventions: Follow existing naming conventions.
- No Lombok: Avoid using Lombok in new code; remove it from existing code if you touch it.
- Javadoc: Add Javadoc where needed.
- Code Style: Run
make lintandmake formatbefore committing. - Large Features: Discuss large features with maintainers (@langchain4j) before implementation.
-
Open a Pull Request (Draft First):
- Open a draft PR in the
langchain4j-communityrepository. - Fill out all sections of the PR template.
- Once the PR is reviewed and approved, you will be asked to finalize it (add documentation, examples, etc.).
- Open a draft PR in the
Example Code Snippets (Illustrative)
// MyLlmChatModel.java (in the appropriate package)
public class MyLlmChatModel implements ChatLanguageModel {
private final MyLlmClient client; // Your custom client
private final String modelName;
@Builder
public MyLlmChatModel(String baseUrl, String apiKey, String modelName, Duration timeout) {
this.client = new MyLlmClient(baseUrl, apiKey, timeout); // Your custom client
this.modelName = modelName;
}
@Override
public Response<AiMessage> generate(List<ChatMessage> messages) {
// 1. Convert Langchain4j messages to MyLLM's request format
// 2. Call client.chat(...)
// 3. Convert MyLLM's response to LangChain4j's Response<AiMessage>
// 4. Handle errors (throw HttpException)
return null; // TODO: Implement
}
// ... other methods, builder, etc. ...
}
// MyLlmClient.java (in a suitable package, e.g., dev.langchain4j.model.myllm.internal)
class MyLlmClient {
// ... implementation using java.net.http.HttpClient ...
}
// MyLlmChatModelBuilderFactory.java (in a suitable package, e.g., dev.langchain4j.model.myllm.spi)
public class MyLlmChatModelBuilderFactory implements Supplier<MyLlmChatModel.Builder> {
@Override
public MyLlmChatModel.Builder get() {
return MyLlmChatModel.builder();
}
}
// META-INF/services/dev.langchain4j.model.chat.spi.ChatLanguageModelBuilderFactory (resource file)
// Add a line with the fully qualified name of your factory:
// dev.langchain4j.model.myllm.spi.MyLlmChatModelBuilderFactoryKey Points and Common Mistakes
- Thorough Testing: Integration tests are absolutely essential. Test with real API calls.
- Error Handling: Handle HTTP errors (non-2xx responses) from the LLM provider's API.
- Token Usage: If the LLM provider gives you token usage information, include it in the
Response. - Finish Reason: If the LLM provider gives you a finish reason (e.g., "stop", "length"), include it in the
Response. - Consistency: Maintain consistency with existing LangChain4j modules in terms of naming, structure, and coding style.
- Documentation: Clear, concise, and accurate documentation is critical. Follow the patterns you see in existing modules.
- SPI Registration: Remember to create the
src/main/resources/META-INF/services/files to register your model implementation as a service.
This comprehensive guide, combined with the provided file structure and examples, should give you a strong foundation for contributing your new language model integration to LangChain4j. Remember to start in the langchain4j-community repository for initial development and review. Good luck!