Objective: To provide readers with a foundational understanding of the concepts, components, and tools that constitute modern AI applications, from simple generative models to complex autonomous agents. This document can serve as a reference guide for building the next generation of AI-powered products.
This section introduces the fundamental building block of the current AI revolution: the Large Language Model (LLM). Almost every modern AI application starts here.
-
What are LLMs?
- Large Language Models are advanced neural networks, most commonly based on the Transformer architecture, trained on massive amounts of text and code. They are not sentient, nor do they "understand" in a human sense. Instead, they are incredibly sophisticated pattern-recognition engines.
- Their core function is to predict the next most probable word (or "token") in a sequence. By doing this repeatedly, they can generate coherent and contextually relevant sentences, paragraphs, and even entire documents. This simple mechanism, scaled up, gives rise to complex emergent abilities like reasoning, translation, and code generation.
- Key Concept: LLMs
-
Types of LLMs and How to Choose One
- A common misconception is that a single, massive LLM can solve every problem. In reality, the landscape is diverse, and choosing the right model is a critical engineering decision.
- A Spectrum of Models:
- Foundational / Base Models: These are the raw output of the initial, massive training process. They are excellent at understanding and predicting text but don't inherently know how to follow instructions or hold a conversation. They are a blank slate, often used as the starting point for further customization.
- Instruction-Tuned Models: This is the most common type of model we interact with (e.g., ChatGPT, Gemini). These are base models that have undergone an additional training phase (fine-tuning) to become helpful "assistants" that can answer questions, follow commands, and engage in dialogue.
- Specialized Models: These are models fine-tuned for a specific domain, such as Code Llama for programming, BloombergGPT for finance, or models for legal and medical fields. They trade general-purpose ability for expert-level performance in their niche.
- The "No Silver Bullet" Rule: How to Choose the Right LLM
- There is no single "best" model. The choice is a trade-off based on your specific needs, often revolving around a triangle of Performance vs. Speed vs. Cost.
- Key Selection Criteria:
- Task Complexity & Performance: Does your task require cutting-edge reasoning (e.g., a complex agent) or simple generation (e.g., summarizing text)? Use powerful models like GPT-4o or Gemini 2.5 Pro for the former, and smaller, faster models for the latter.
- Context Window: This is the model's "short-term memory." How much information (prompt + history) does your application need to consider at once? A chatbot analyzing a 300-page document needs a model with a very large context window.
- Speed & Cost: High-volume, user-facing applications often prioritize low latency and low cost. Smaller models like Llama 3 8B or Gemini Flash are much faster and cheaper per query.
- Multi-modality: The Fusion of Senses
- What it is: Multi-modality is the ability of an AI model to understand, process, and reason about information from multiple data types (or "modalities") simultaneously. Instead of just text, these models can natively handle images, audio, and even video as inputs.
- How it benefits users: It allows for more intuitive and powerful interactions. Users can "show" the AI what they're talking about instead of just describing it. This unlocks new use cases that are impossible with text-only models.
- Use Case: A developer wants to get feedback on a new UI mockup.
- Sample Multimodal Prompt:
- Image Input: [A screenshot of a web application's user profile page is provided.]
- Text Prompt: "Analyze this UI mockup. As a UX expert, provide three specific suggestions to improve the visual hierarchy and reduce clutter. Present your feedback in a numbered list."
- Key Concept: Multi-modality
- Fine-Tuning: The High-Effort Alternative
- Fine-tuning is the process of taking a pre-trained model and training it further on a smaller, curated dataset specific to your task. This adapts the model's behavior and style.
- Pros:
- Deep Specialization: It's the best way to teach a model a specific style, tone, or format. For example, you could fine-tune a model to always respond in perfectly formatted JSON or to adopt the persona of a specific character.
- Can Improve Performance: For highly specific, repetitive tasks, a fine-tuned model can outperform a general-purpose model.
- Reduces Prompt Complexity: Because the desired behavior is "baked in," you can often use shorter, simpler prompts to get the output you need.
- Cons:
- Does NOT Impart New Knowledge: This is the most common misconception. Fine-tuning does not teach the model new facts. It teaches it new skills. To add knowledge, you must use RAG (discussed in Section 3).
- Expensive & Time-Consuming: The process requires creating a high-quality dataset of hundreds or thousands of examples, which is a significant engineering effort. The training process itself also incurs compute costs.
- Risk of "Catastrophic Forgetting": If not done carefully, fine-tuning can degrade the model's general reasoning abilities, making it worse at tasks outside its specific domain.
- RAG is Often Better: For most use cases involving proprietary information, RAG is a faster, cheaper, and more effective solution than fine-tuning.
- Key Concepts: LLM Context, LLMs, Fine-tuning
-
The Art of Conversation: Prompting
-
Prompting is the process of designing the input given to an LLM to elicit a desired output. It is the primary interface for developers to guide and control the model's behavior.
-
Prompt Engineering: The craft of refining prompts to be as clear and effective as possible. Let's look at an example.
- Scenario: A user wants to find a suitable dog for their family.
- Bad Prompt: "Tell me about dogs." (Too vague, will produce a generic, unhelpful summary).
- Better Prompt: "What are some good dog breeds for families?" (Better, but still lacks crucial context about the family's lifestyle).
- Best Prompt: "Generate a list of the top 5 dog breeds for families with small children living in an apartment. For each breed, include its temperament, size, and exercise needs." (Specific, provides context, constrains the request, and defines the desired output format).
- Key Concept: Prompt Engineering
-
Role-Based Prompting: Assigning a persona to the AI within the "system prompt" (a high-level instruction). This is a powerful way to frame the AI's knowledge and shape its response style. Let's demonstrate with a code review scenario.
Scenario: We ask an LLM to review a simple JavaScript function.
function add(a, b) {
return a + b;
}Example 1: Prompt without a Role
Prompt:
"Review this JavaScript code and suggest improvements."
Sample LLM Response (Generic and Polite):"This is a simple function that adds two numbers. It works correctly for numeric inputs. For improvement, you could add comments to explain what it does. Overall, it's a good starting point."
Example 2: Prompt with a Role
System Prompt (The Role):
"You are a Senior Software Engineer at Adobe, known for your attention to detail and commitment to writing clean, maintainable, and robust code. Review the following code from the perspective of ensuring it meets Adobe's high standards for quality. Focus on documentation, type safety, and modern best practices."
User Prompt:
"Please review this code."
Sample LLM Response (Professional and Constructive):"This function is functionally correct for its intended purpose but does not meet our standards for production code. Here are the required improvements:- Missing Documentation: The function lacks a JSDoc block. All functions must be documented to explain their purpose, parameters, and return value for maintainability.
- Lack of Type Safety: JavaScript's dynamic typing is a risk here. If non-numeric types are passed, the + operator will perform string concatenation, leading to unexpected behavior. The function should include runtime checks to ensure both arguments are numbers.
- Error Handling: The function does not handle invalid input gracefully. It should throw a TypeError if the arguments are not numbers, so the calling code can manage the error.
- Modern Syntax: While a standard function declaration is acceptable, consider using an ES6 arrow function for conciseness in modern codebases.
Please add a JSDoc block and input validation before this can be approved. Here is an example of a more robust implementation:/**
* Adds two numbers together.
* @param {number} a The first number.
* @param {number} b The second number.
* @returns {number} The sum of the two numbers.
* @throws {TypeError} If either argument is not a number.
*/
const add = (a, b) => {
if (typeof a !== 'number' || typeof b !== 'number') {
throw new TypeError('Both arguments must be numbers.');
}
return a + b;
};
```"
By defining a professional, standards-driven role, we transformed a generic comment into a constructive, actionable, and educational code review that enforces team standards.-
Key Concept: Role based prompting
-
Chain of Thought (CoT) Prompting: A technique that dramatically improves an LLM's reasoning ability on complex problems. By instructing the model to "think step-by-step," you force it to break down a problem into intermediate steps before giving a final answer, reducing errors and improving the quality of the output.
Scenario: We need a JavaScript function to calculate the total price of a shopping cart, including discounts and taxes. This involves several steps, and the order of operations matters.
Example 1: Standard Prompt
Prompt:
"Write a JavaScript function that takes an array of items, a discount percentage, and a tax rate, and returns the final price."
Sample LLM Response (Potentially Correct, but Opaque):
function calculateTotal(items, discount, taxRate) {
const subtotal = items.reduce((sum, item) => sum + item.price, 0);
return subtotal * (1 - discount) * (1 + taxRate);
}This response might be mathematically correct, but it's a dense one-liner that's hard to debug. It's unclear if the tax is applied correctly to the discounted price.
Example 2: Chain of Thought Prompt
Prompt:
`"Write a JavaScript function to calculate a final shopping cart price. Let's think step by step:- First, calculate the subtotal by summing the price of all items.
- Second, calculate the discount amount from the subtotal.
- Third, subtract the discount from the subtotal to get the discounted total.
- Fourth, calculate the tax amount based on the discounted total.
- Finally, add the tax to the discounted total for the final price.
Please include comments in the code that correspond to these steps."`
Sample LLM Response (Clear, Correct, and Debuggable):
/**
* Calculates the final price of a shopping cart with discounts and taxes.
* @param {Array<Object>} items - Array of items, each with a 'price' property.
* @param {number} discount - The discount percentage (e.g., 0.10 for 10%).
* @param {number} taxRate - The tax rate (e.g., 0.08 for 8%).
* @returns {number} The final calculated price.
*/
function calculateTotal(items, discount, taxRate) {
// Step 1: Calculate the subtotal.
const subtotal = items.reduce((sum, item) => sum + item.price, 0);// Step 2: Calculate the discount amount. const discountAmount \= subtotal \* discount; // Step 3: Apply the discount. const discountedTotal \= subtotal \- discountAmount; // Step 4: Calculate the tax amount on the discounted total. const taxAmount \= discountedTotal \* taxRate; // Step 5: Add tax to get the final price. const finalPrice \= discountedTotal \+ taxAmount; return finalPrice; } By forcing the model to follow a logical sequence, we get code that is not only correct but also self-documenting and easy for any developer to understand and verify.- Key Concept: Chain of Thought
-
With an understanding of LLMs, we can now explore the different types of applications we can build.
- Paradigm 1: Generative AI
- Definition: The most straightforward AI application. Its primary purpose is to generate new content based on a user's prompt. The interaction is typically one-shot; it takes an input and produces an output.
- Simple Architecture: UI -> Backend -> LLM API Call.
- Adobe Commerce Use Case: Automatically generating compelling product descriptions. A merchant provides a few keywords (e.g., "men's running shoe, lightweight, breathable, blue"), and the AI generates a full, SEO-friendly description for the Adobe Commerce catalog.
- Key Concept: Generative AI
- Paradigm 2: Agentic AI
- Definition: An evolution from simple generation. An AI Agent is an autonomous system that can reason, plan, and use tools to achieve a complex goal. It can make decisions and take actions without human intervention at every step.
- Core Components:
- Planning: The agent breaks down a high-level goal into a sequence of executable steps.
- Memory: It maintains both short-term (in-context) and long-term (often using a database) memory to track its progress.
- Tool Use: This is the critical component. An agent can use external tools (APIs, databases, code interpreters) to gather information or perform actions.
- Adobe Commerce Use Case: An agent tasked with the goal: "Optimize my store's summer sale performance."
- Plan:
- Analyze last year's sales data for the same period (Tool: get_sales_report).
- Identify underperforming products in the current sale (Tool: get_product_performance).
- Propose a new discount strategy for those products (Reasoning).
- Generate new promotional copy for the top 3 underperforming items (Tool: generate_product_description).
- Stage the pricing changes for merchant approval (Tool: update_product_pricing_staging).
- Plan:
- Key Concept: Agentic AI
- Paradigm 3: AI Workflows
- Definition: A more structured and deterministic approach than agents. Workflows define a predictable, often graph-based path for information to flow through. They are less autonomous than agents but more reliable and controllable. Think of them as a state machine orchestrated by an LLM.
- Adobe Commerce Use Case: A "Customer Review to Action" workflow.
- Step 1 (Trigger): A new 1-star or 2-star product review is submitted.
- Step 2 (LLM - Classification): An LLM reads the review and classifies the complaint (e.g., "Shipping Damage," "Product Defect," "Sizing Issue").
- Step 3 (Tool - Ticketing): Based on the classification, automatically create a support ticket in the correct department.
- Step 4 (LLM - Generation): Generate a personalized, empathetic email response to the customer, acknowledging their specific issue and informing them that a support ticket has been created.
- Key Concept: Workflows
- Comparison: Gen AI vs. Agentic AI vs. Workflows
| Paradigm | Primary Function | Autonomy | Complexity | Ideal Use Case |
|---|---|---|---|---|
| Generative AI | Answers Questions | Low | Low | Content creation, Q&A |
| Agentic AI | Achieves Goals | High | High | Complex problem-solving, research |
| Workflows | Executes Processes | Medium | Medium | Business process automation, ETL |
- Key Concept: Gen AI vs Agentic AI vs Workflows
This section covers how we make AI applications smarter, more accurate, and more useful by connecting them to proprietary data and external systems.
-
The Knowledge Problem: Giving LLMs a Brain Extension
- Retrieval-Augmented Generation (RAG): The most important pattern in applied AI today. LLMs only know what they were trained on, which can be outdated or generic. RAG solves this by giving the LLM an "open-book exam."
- How it works: When a user asks a question, the system first retrieves relevant documents from a knowledge base (like our internal Confluence pages or codebase). Then, it augments the user's original prompt with this retrieved information before sending it to the LLM to generate an answer. This grounds the LLM in factual, up-to-date data, dramatically reducing hallucinations.
- Key Concept: RAG
-
RAG Architectures
- Naive RAG: The simple Retrieve -> Augment -> Generate flow described above.
- Advanced RAG: Modern RAG pipelines are more complex and may include steps like:
- Query Transformation: Rewriting the user's query to be more effective for database retrieval.
- Re-ranking: Using a secondary, more precise (and often smaller/faster) model to re-rank the initial retrieved documents for relevance before sending them to the main LLM. This helps combat the "lost in the middle" problem, where models pay less attention to information in the middle of a long context. By placing the most relevant information at the top, you significantly increase the quality of the final answer.
Example: The Impact of Re-ranking
-
Scenario: A merchant asks our Adobe Commerce support bot: "How do I set up a 'buy one, get one free' promotion?"
-
Step 1: Initial Retrieval (from Vector DB)
The system retrieves 3 documents that are semantically similar to the query:- Doc A (Broad): A general guide titled "Configuring Cart Price Rules".
- Doc B (Related but wrong): A tutorial on "Creating a 15% Off Coupon".
- Doc C (Perfect Match): A specific guide titled "Setting up a BOGO 'Buy X Get Y Free' Promotion".
-
Step 2 (Without Re-ranking): Context Sent to LLM
The documents are simply concatenated. The perfect match is buried in the middle.
[CONTEXT]
Doc A: To set up a cart price rule, navigate to Marketing > Promotions...
Doc B: To create a 15% off coupon, you must first...
Doc C: For a BOGO promotion, set the Action to 'Buy X get Y free'...
[/CONTEXT]Result: The LLM might give a generic answer based on Doc A or get confused by Doc B.
-
Step 2 (With Re-ranking): The Re-ranker's Job
The re-ranker model evaluates all 3 docs against the specific query "How do I set up a 'buy one, get one free' promotion?" and scores Doc C as the most relevant. -
Step 3 (With Re-ranking): Final Context Sent to LLM
The documents are re-ordered to place the most relevant one first.
[CONTEXT]
Doc C: For a BOGO promotion, set the Action to 'Buy X get Y free'...
Doc A: To set up a cart price rule, navigate to Marketing > Promotions...
Doc B: To create a 15% off coupon, you must first...
[/CONTEXT]Result: The LLM now has the most relevant information at the beginning of its context, making it far more likely to provide a direct, accurate, and helpful answer based on Doc C.
-
Key Concept: RAG Architectures
-
Databases for RAG: Storing and Finding Relevant Information
- Vector Databases: The engine behind modern RAG. To make text searchable by meaning, we first use an embedding model to convert text into a vector (a list of numbers). Vector databases are specifically designed to store these vectors and perform incredibly fast "semantic search" to find the most similar vectors (and thus the most relevant text chunks) to a user's query vector.
- Examples: Pinecone, Weaviate, ChromaDB, Milvus, Google Vector Search.
- Key Concept: Vector DBs
- Graph Databases: While vector databases find relevant documents, graph databases understand the relationships between them. They are excellent for knowledge graphs where connections are key (e.g., this team owns this service, which depends on this other API).
- Examples: Neo4j.
- Key Concept: Graph DBs
- Vector Databases: The engine behind modern RAG. To make text searchable by meaning, we first use an embedding model to convert text into a vector (a list of numbers). Vector databases are specifically designed to store these vectors and perform incredibly fast "semantic search" to find the most similar vectors (and thus the most relevant text chunks) to a user's query vector.
-
The Action Problem: Allowing LLMs to Do Things
- Tools & Function Calling: This is the core mechanism that enables Agentic AI, allowing an LLM to interact with external systems. The model is given a description of available tools (functions). When it determines an action is needed, it pauses generation and outputs a structured request to call a specific function with certain arguments. Our code executes this function and returns the result to the LLM, which then uses that new information to continue its reasoning.
- How it Works in Practice:
- OpenAI API: You define your tools as a JSON schema and pass it in the tools parameter of your API call. The model will return a tool_calls object in its response when it wants to use a tool, which you then execute.
- LangChain / CrewAI: These frameworks provide helpful abstractions. You can often define a tool using a simple Python decorator (@tool) or a Pydantic class. The framework handles the complex work of formatting the tool definition for the specific model being used (OpenAI, Gemini, etc.) and parsing the model's response.
- The Interoperability Challenge:
- This abstraction highlights a key problem: tools are not platform-agnostic. A tool formatted for OpenAI's API will not work with Google's Gemini API without being translated into Gemini's specific format. This creates significant maintenance overhead and vendor lock-in for teams that want to use multiple model providers. Every new tool or model provider adds to this complexity.
- The Solution: Towards a Standard Protocol (e.g., Model Context Protocol)
- The industry needs a universal standard for defining tools—a "Model Context Protocol" (MCP). The idea is simple: define a tool once in a standard, model-agnostic format.
- Under this paradigm, your application would provide the tool definition in the standard MCP format. The framework (like LangChain) or the model provider itself (like Vertex AI) would be responsible for translating that standard definition into its proprietary internal format.
- This would allow developers to seamlessly switch between model providers without rewriting all their tool integrations, fostering a more competitive and innovative ecosystem. While a single, universally adopted standard like this is still an emerging concept, it represents the clear future for building scalable and maintainable agentic systems.
- Key Concepts: Tools, Function calling
This section introduces the frameworks that help us build these sophisticated systems and explores the cutting edge.
- Agentic AI & Workflow Frameworks
- Purpose: To provide a high-level abstraction over the complex, low-level tasks of building agents and workflows, such as managing prompts, parsing LLM outputs, handling tool calls, and maintaining state.
- Examples:
- LangChain: The "Swiss Army knife." A very broad and popular library with integrations for almost every model, database, and tool imaginable.
- LangGraph: Built by the LangChain team, it's specifically for creating cyclical, stateful applications (which agents often are). It allows for more control and reliability than traditional agent loops.
- CrewAI: A framework focused on orchestrating collaborative multi-agent systems. You define different agents with specific roles and goals, and CrewAI manages their interaction to solve a problem.
- Key Concept: Agentic AI Frameworks
- Multi-Agent Systems
- Concept: The idea that for very complex tasks, a single agent is not enough. Instead, you create a "team" of specialized agents that collaborate, breaking down a problem into sub-tasks, much like a human team.
- Agent-to-Agent (A2A) Communication: This is the nervous system of a multi-agent system. It's not just about one agent sending a message to another; it's the protocol and content of that communication. A2A protocols define how agents discover each other, negotiate tasks, share context, and report results. This can be implemented via direct API calls, a shared message bus, or by writing to a common memory space.
- Collaboration Patterns & Strategy: This is the "playbook" for your agent team. It's the high-level strategy you design to orchestrate your agents. Common patterns include:
- Hierarchical: A "manager" agent decomposes a task and assigns sub-tasks to "worker" agents. It then synthesizes the results.
- Sequential / Assembly Line: Agents operate in a sequence, where one agent's output becomes the input for the next.
- Round-Table: Agents work in a shared context, each contributing their expertise to refine a solution iteratively, often with a "moderator" agent guiding the process.
- A2A in Practice: A Use Case
- Goal: A merchant wants to launch a targeted marketing campaign in Adobe Commerce for a new product line.
- The Agent Team:
- Campaign Manager Agent (Orchestrator): Receives the initial goal from the merchant.
- Data Analyst Agent: Specializes in querying sales and customer data.
- Content Strategist Agent: Specializes in generating marketing copy and creative ideas.
- Commerce Operations Agent: Specializes in configuring promotions and segments in Adobe Commerce.
- The A2A Protocol in Action:
- Manager to Analyst: Task: Identify top 10% of customers who purchased 'hiking boots' in the last 6 months. Output: Customer Segment ID.
- Analyst to Manager: Result: Customer Segment ID '789' created.
- Manager to Strategist: Task: Generate 3 email subject lines for a new 'TrailPro Hiking Sandal' targeting Customer Segment ID '789'. Context: These are repeat customers who value durability.
- Strategist to Manager: Result: ["Ready for Your Next Adventure?", "Trail-Tested, Hiker-Approved.", "Your Boots Were Just the Beginning."]
- Manager to Commerce Ops: Task: Create a 15% off promotion for 'TrailPro Hiking Sandal' applicable only to Customer Segment ID '789'.
- Commerce Ops to Manager: Result: Promotion 'B4G23' created successfully.
- Manager to Merchant: Final Report: Campaign is ready. Customer segment '789' has been identified. Email subject lines are drafted. A 15% discount is staged. Ready for your approval to launch.
- Key Concepts: A2A, Collaboration Patterns
- The Frontier: Specialized Agents
- Coding Agents: AI agents that are not just capable of writing code snippets but can interact with an entire codebase. They can be given access to a file system, a terminal, and a list of goals, and can then read files, write new code, run tests, and even debug based on the output, all in service of a high-level task like "implement this new API endpoint."
- Key Concept: Coding Agents
- "Vibe Coding": A forward-looking term describing a more intuitive style of development. Instead of writing precise, line-by-line code, a developer might give a coding agent a high-level, "vibe-based" goal like: "Refactor this service to be more scalable and resilient, use the repository pattern for data access." The agent would then handle the implementation details.
- Key Concept: Vibe Coding
- Coding Agents: AI agents that are not just capable of writing code snippets but can interact with an entire codebase. They can be given access to a file system, a terminal, and a list of goals, and can then read files, write new code, run tests, and even debug based on the output, all in service of a high-level task like "implement this new API endpoint."
This section brings it all together by looking at where and how these AI applications are hosted and managed.
- Cloud Service Providers
- The major cloud platforms have become one-stop shops for AI development, offering integrated services that remove the need to stitch together dozens of different vendors.
- Examples:
- Google Cloud: Vertex AI is a unified platform offering access to Google's Gemini models, a Model Garden with dozens of open-source models, integrated Vector Search, and tools for MLOps.
- AWS: Amazon Bedrock provides easy API access to a wide range of models from providers like Anthropic, Meta, and Amazon's own Titan. SageMaker is their more traditional, powerful platform for building, training, and deploying ML models.
- Azure: Azure AI Studio brings together models from OpenAI, Meta, and others, along with services for prompt engineering, safety, and deployment.
- Key Concept: Cloud Service Providers
- Summary: We've journeyed from the fundamental concept of an LLM, through the different ways to build applications (Generative, Agentic, Workflows), to the tools that enhance them (RAG, Vector DBs), and finally to the frameworks and platforms that enable us to build and deploy them at scale.
- Key Takeaway: Modern AI development is an act of composition. Our value as engineers comes not from building models from scratch, but from skillfully selecting, orchestrating, and integrating these powerful components to solve real-world problems.