Okay, let's outline the steps to create a new language model module for LangChain4j and propose its inclusion. Based on the provided file structure, you'll be focusing on creating a new module similar to the existing ones (e.g., langchain4j-open-ai
, langchain4j-ollama
, etc.). Here's a breakdown of the process, referencing the structure you've provided:
Key Steps and Considerations
-
Understand the Abstractions and SPI: LangChain4j, like its Python counterpart, is built around core abstractions. You need to understand these to implement your integration correctly. The core abstractions you must implement are:
ChatLanguageModel
/StreamingChatLanguageModel
: For conversational models (like ChatGPT, Gemini). ImplementChatLanguageModel
for synchronous responses, andStreamingChatLanguageModel
if the model supports streaming responses token by token.LanguageModel
/StreamingLanguageModel
: For models with a simpler text-in, text-out interface (less common these days).EmbeddingModel
: If the model provider offers embedding capabilities.ModerationModel
: If the model provider offers content moderation.ScoringModel
: If the model provider offers scoring/ranking capabilities.- Builder Factories: You'll also need to create builder factories (SPIs) for each model type you implement. These are how users will construct your model classes. See examples like
AzureOpenAiChatModelBuilderFactory
. These are registered using the Java ServiceLoader mechanism (theMETA-INF/services
files).
-
Choose a Module Structure (and Repository):
- Community Repo (Preferred for new integrations): Start your integration in the
langchain4j-community
repository. This is the recommended approach for new contributions. It allows for easier initial review and iteration before considering a move to the corelangchain4j
repository. Clone this repo, don't fork the mainlangchain4j
repo directly. - Main
langchain4j
Repo (For Core Integrations): If your integration is with a very widely used and well-established model provider (like OpenAI, Google, etc.), and you are confident in its stability and long-term maintenance, you might propose it for the main repo. However, start inlangchain4j-community
first. - Module Naming: Follow the pattern:
langchain4j-{provider-name}
(e.g.,langchain4j-my-llm
). - Directory Structure: Create a directory structure mirroring the existing modules (see
langchain4j-open-ai
orlangchain4j-ollama
as good examples):langchain4j-{provider-name}/ pom.xml (Your module's Maven build file) src/ main/ java/ dev/ langchain4j/ model/ {providername}/ (e.g., myllm) {ProviderName}ChatModel.java (Your implementation) internal/ (API client and related classes) spi/ (Builder factory for your model) {ProviderName}ChatModelBuilderFactory.java resources/ META-INF/ services/ (Files to register your builder factory, see examples) test/ java/ dev/ langchain4j/ model/ {providername}/ {ProviderName}ChatModelIT.java (Integration tests)
- Community Repo (Preferred for new integrations): Start your integration in the
-
Implement the API Client:
- Official SDK (Preferred): If the LLM provider has an official Java SDK, use it. This is usually the best approach for stability, performance, and access to all features. See
langchain4j-bedrock
for an example using an official SDK. - HTTP Client (If no SDK): If there's no official SDK, use the JDK's built-in
java.net.http.HttpClient
(available since Java 11). This minimizes external dependencies. Avoid adding new dependencies unless absolutely necessary. Seehttp-clients/langchain4j-http-client-jdk
for how LangChain4j wraps this. Avoid using the olderokhttp3
directly if possible, preferlangchain4j-http-client-jdk
(orlangchain4j-http-client-spring-restclient
if building a Spring Boot starter). - JSON Handling: Use Jackson for JSON serialization/deserialization, as it's already a dependency.
- Error Handling: Make sure to handle HTTP errors (non-2xx responses) appropriately. Throw a
dev.langchain4j.exception.HttpException
for these. - Request/Response Logging: Implement logging for requests and responses (see
langchain4j-anthropic
for a complete example). This is very helpful for debugging.
- Official SDK (Preferred): If the LLM provider has an official Java SDK, use it. This is usually the best approach for stability, performance, and access to all features. See
-
Implement the Model Interface(s):
- Implement
ChatLanguageModel
,StreamingChatLanguageModel
,EmbeddingModel
, etc., as appropriate, based on the provider's capabilities. - Use the
Builder
pattern for your model classes to allow for flexible configuration. - Make sure your implementation handles request/response mapping and error handling correctly.
- Implement
TokenCountEstimator
if possible, so theTokenWindowChatMemory
can calculate the token usage. ImplementDimensionAwareEmbeddingModel
to report the output dimension from the embedding model.
- Implement
-
Write Tests:
- Unit Tests: Create unit tests for any complex logic, utility methods, and request/response mappers.
- Integration Tests (ITs): Create integration tests (e.g.,
MyLlmChatModelIT.java
) that interact with the real LLM provider's API. These are crucial for ensuring your integration works correctly.- Use environment variables (e.g.,
MYLLM_API_KEY
) to store API keys and other secrets. Do not hardcode them. - Use
@EnabledIfEnvironmentVariable
to skip the tests if the required environment variables are not set. - Extend
AbstractChatModelIT
,AbstractStreamingChatModelIT
,AbstractEmbeddingModelIT
, and/orAbstractScoringModelIT
to get a set of basic tests. - Test all relevant features of the model (e.g., text generation, streaming, different parameters, tool use, JSON mode).
- Add test for concurrent requests if possible.
- Consider adding a test for the
Tokenizer
interface (see examples inlangchain4j-core
). - Add
@RetryingTest
if model response is inconsistent
- Use environment variables (e.g.,
-
Add to BOM (Bill of Materials): Add your new module to
langchain4j-bom/pom.xml
. This helps users manage dependencies. -
Documentation:
- Update
README.md
: Add your integration to the list of supported models and embedding stores. - Create Markdown Documentation: Create Markdown files in the
docs/docs/integrations/
directory, following the structure of existing integrations. You'll need:- A main file (e.g.,
my-llm.md
). - An entry in
docs/docs/integrations/language-models/index.md
and indocs/sidebars.js
. - An entry in
_category_.json
files indocs/docs/integrations/language-models
anddocs/docs/integrations/embedding-stores
- A main file (e.g.,
- Examples (Highly Recommended): Create a simple example in the
langchain4j-examples
repository. This is very helpful for users.
- Update
-
General Guidelines (from CONTRIBUTING.md):
- Java 17: Maintain compatibility with Java 17.
- Minimal Dependencies: Avoid adding new dependencies if possible. If necessary, try to use libraries already present. Run
mvn dependency:analyze
to check. - Backwards Compatibility: Avoid breaking changes. If necessary, deprecate old methods/fields instead of removing them.
- Naming Conventions: Follow existing naming conventions.
- No Lombok: Avoid using Lombok in new code; remove it from existing code if you touch it.
- Javadoc: Add Javadoc where needed.
- Code Style: Run
make lint
andmake format
before committing. - Large Features: Discuss large features with maintainers (@langchain4j) before implementation.
-
Open a Pull Request (Draft First):
- Open a draft PR in the
langchain4j-community
repository. - Fill out all sections of the PR template.
- Once the PR is reviewed and approved, you will be asked to finalize it (add documentation, examples, etc.).
- Open a draft PR in the
Example Code Snippets (Illustrative)
// MyLlmChatModel.java (in the appropriate package)
public class MyLlmChatModel implements ChatLanguageModel {
private final MyLlmClient client; // Your custom client
private final String modelName;
@Builder
public MyLlmChatModel(String baseUrl, String apiKey, String modelName, Duration timeout) {
this.client = new MyLlmClient(baseUrl, apiKey, timeout); // Your custom client
this.modelName = modelName;
}
@Override
public Response<AiMessage> generate(List<ChatMessage> messages) {
// 1. Convert Langchain4j messages to MyLLM's request format
// 2. Call client.chat(...)
// 3. Convert MyLLM's response to LangChain4j's Response<AiMessage>
// 4. Handle errors (throw HttpException)
return null; // TODO: Implement
}
// ... other methods, builder, etc. ...
}
// MyLlmClient.java (in a suitable package, e.g., dev.langchain4j.model.myllm.internal)
class MyLlmClient {
// ... implementation using java.net.http.HttpClient ...
}
// MyLlmChatModelBuilderFactory.java (in a suitable package, e.g., dev.langchain4j.model.myllm.spi)
public class MyLlmChatModelBuilderFactory implements Supplier<MyLlmChatModel.Builder> {
@Override
public MyLlmChatModel.Builder get() {
return MyLlmChatModel.builder();
}
}
// META-INF/services/dev.langchain4j.model.chat.spi.ChatLanguageModelBuilderFactory (resource file)
// Add a line with the fully qualified name of your factory:
// dev.langchain4j.model.myllm.spi.MyLlmChatModelBuilderFactory
Key Points and Common Mistakes
- Thorough Testing: Integration tests are absolutely essential. Test with real API calls.
- Error Handling: Handle HTTP errors (non-2xx responses) from the LLM provider's API.
- Token Usage: If the LLM provider gives you token usage information, include it in the
Response
. - Finish Reason: If the LLM provider gives you a finish reason (e.g., "stop", "length"), include it in the
Response
. - Consistency: Maintain consistency with existing LangChain4j modules in terms of naming, structure, and coding style.
- Documentation: Clear, concise, and accurate documentation is critical. Follow the patterns you see in existing modules.
- SPI Registration: Remember to create the
src/main/resources/META-INF/services/
files to register your model implementation as a service.
This comprehensive guide, combined with the provided file structure and examples, should give you a strong foundation for contributing your new language model integration to LangChain4j. Remember to start in the langchain4j-community
repository for initial development and review. Good luck!