awesome-copilot/instructions/langchain-python.instructions.md

12 KiB

description applyTo
Instructions for using LangChain with Python **/*.py

LangChain with Python — Development Instructions

Purpose: concise, opinionated guidance for building reliable, secure, and maintainable LangChain applications in Python. Includes setup, patterns, tests, CI, and links to authoritative docs.

Quick setup

  • Python: 3.10+
  • Suggested install (adjust to your needs):
pip install "langchain[all]" openai chromadb faiss-cpu
  • Use virtualenv/venv, Poetry, or pip-tools for dependency isolation and reproducible installs. Prefer a lockfile for CI builds (poetry.lock or requirements.txt).
  • Keep provider API keys and credentials in environment variables or a secret store (Azure Key Vault, HashiCorp Vault). Do not commit secrets.
  • Official docs (bookmark): https://python.langchain.com/

Pinning and versions

LangChain evolves quickly. Pin a tested minor version in pyproject.toml or requirements.txt.

Example (requirements.txt):

langchain==0.3.<your-tested>
chromadb>=0.3,<1.0
faiss-cpu==1.7.3

Adjust versions after local verification — changes between minor releases can be breaking.

Suggested repo layout

src/
  app/            # application entry points
  models/         # domain models / dataclasses
  agents/         # agent definitions and tool adapters
  prompts/        # canonical prompt templates (files)
  services/       # LLM wiring, retrievers, adapters
  tests/          # unit & integration tests
examples/         # minimal examples and notebooks
scripts/
docker/
pyproject.toml or requirements.txt
README.md

Core concepts & patterns

  • LLM client factory: centralize provider configs (API keys), timeouts, retries, and telemetry. Provide a single place to switch providers or client settings.
  • Prompt templates: store templates under prompts/ and load via a safe helper. Keep templates small and testable.
  • Chains vs Agents: prefer Chains for deterministic pipelines (RAG, summarization). Use Agents when you require planning or dynamic tool selection.
  • Tools: implement typed adapter interfaces for tools; validate inputs and outputs strictly.
  • Memory: default to stateless design. When memory is needed, store minimal context and document retention/erasure policies.
  • Retrievers: build retrieval + rerank pipelines. Keep vectorstore schema stable (id, text, metadata).

Patterns

  • Callbacks & tracing: use LangChain callbacks and integrate with LangSmith or your tracing system to capture request/response lifecycle.
  • Separation of concerns: keep prompt construction, LLM wiring, and business logic separate to simplify testing and reduce accidental prompt changes.

Embeddings & vectorstores

  • Use consistent chunking and metadata fields (source, page, chunk_index).
  • Cache embeddings to avoid repeated cost for unchanged documents.
  • Local/dev: Chroma or FAISS. Production: managed vector DBs (Pinecone, Qdrant, Milvus, Weaviate) depending on scale and SLAs.

Prompt engineering & governance

  • Store canonical prompts under prompts/ and reference them by filename from code.
  • Write unit tests that assert required placeholders exist and that rendered prompts fit expected patterns (length, variables present).
  • Maintain a CHANGELOG for prompt and schema changes that affect behavior.

Chat models

Overview

Large Language Models (LLMs) power a wide range of language tasks (generation, summarization, QA, etc.). Modern LLMs are commonly exposed via a chat model interface that accepts a list of messages and returns a message or list of messages.

Newer chat models include advanced capabilities:

  • Tool calling: native APIs that allow models to call external tools/services (see tool calling guides).
  • Structured output: ask models to emit JSON or schema-shaped responses (use with_structured_output where available).
  • Multimodality: support for non-text inputs (images, audio) in some models — consult provider docs for support and limits.

Features & benefits

LangChain offers a consistent interface for chat models with additional features for monitoring, debugging, and optimization:

  • Integrations with many providers (OpenAI, Anthropic, Ollama, Azure, Google Vertex, Amazon Bedrock, Hugging Face, Cohere, Groq, etc.). See the chat model integrations in the official docs for the current list.
  • Support for LangChain's message format and OpenAI-style message format.
  • Standardized tool-calling API for binding tools and handling tool requests/results.
  • with_structured_output helper for structured responses.
  • Async, streaming, and optimized batching support.
  • LangSmith integration for tracing/monitoring.
  • Standardized token usage reporting, rate limiting hooks, and caching support.

Integrations

Integrations are either:

  1. Official: packaged langchain-<provider> integrations maintained by the LangChain team or provider.
  2. Community: contributed integrations (in langchain-community).

Chat models typically follow a naming convention with a Chat prefix (e.g., ChatOpenAI, ChatAnthropic, ChatOllama). Models without the Chat prefix (or with an LLM suffix) often implement the older string-in/string-out interface and are less preferred for modern chat workflows.

Interface

Chat models implement BaseChatModel and support the Runnable interface: streaming, async, batching, and more. Many operations accept and return LangChain messages (roles like system, user, assistant). See the BaseChatModel API reference for details.

Key methods include:

  • invoke(messages, ...) — send a list of messages and receive a response.
  • stream(messages, ...) — stream partial outputs as tokens arrive.
  • batch(inputs, ...) — batch multiple requests.
  • bind_tools(tools) — attach tool adapters for tool calling.
  • with_structured_output(schema) — helper to request structured responses.

Inputs and outputs

  • LangChain supports its own message format and OpenAI's message format; pick one consistently in your codebase.
  • Messages include a role and content blocks; content can include structured or multimodal payloads where supported.

Standard parameters

Commonly supported parameters (provider-dependent):

  • model: model identifier (eg. gpt-4o, gpt-3.5-turbo).
  • temperature: randomness control (0.0 deterministic — 1.0 creative).
  • timeout: seconds to wait before canceling.
  • max_tokens: response token limit.
  • stop: stop sequences.
  • max_retries: retry attempts for network/limit failures.
  • api_key, base_url: provider auth and endpoint configuration.
  • rate_limiter: optional BaseRateLimiter to space requests and avoid provider quota errors.

Note: Not all parameters are implemented by every provider. Always consult the provider integration docs.

Tool calling

Chat models can call tools (APIs, DBs, system adapters). Use LangChain's tool-calling APIs to:

  • Register tools with strict input/output typing.
  • Observe and log tool call requests and results.
  • Validate tool outputs before passing them back to the model or executing side effects.

See the tool-calling guide in the LangChain docs for examples and safe patterns.

Structured outputs

Use with_structured_output or schema-enforced methods to request JSON or typed outputs from the model. Structured outputs are essential for reliable extraction and downstream processing (parsers, DB writes, analytics).

Multimodality

Some models support multimodal inputs (images, audio). Check provider docs for supported input types and limitations. Multimodal outputs are rare — treat them as experimental and validate rigorously.

Context window

Models have a finite context window measured in tokens. When designing conversational flows:

  • Keep messages concise and prioritize important context.
  • Trim old context (summarize or archive) outside the model when it exceeds the window.
  • Use a retriever + RAG pattern to surface relevant long-form context instead of pasting large documents into the chat.

Advanced topics

Rate-limiting

  • Use rate_limiter when initializing chat models to space calls.
  • Implement retry with exponential backoff and consider fallback models or degraded modes when throttled.

Caching

  • Exact-input caching for conversations is often ineffective. Consider semantic caching (embedding-based) for repeated meaning-level queries.
  • Semantic caching introduces dependency on embeddings and is not universally suitable.
  • Cache only where it reduces cost and meets correctness requirements (e.g., FAQ bots).

Best practices

  • Use type hints and dataclasses for public APIs.
  • Validate inputs before calling LLMs or tools.
  • Load secrets from secret managers; never log secrets or unredacted model outputs.
  • Deterministic tests: mock LLMs and embedding calls.
  • Cache embeddings and frequent retrieval results.
  • Observability: log request_id, model name, latency, and sanitized token counts.
  • Implement exponential backoff and idempotency for external calls.

Security & privacy

  • Treat model outputs as untrusted. Sanitize before executing generated code or system commands.
  • Validate any user-supplied URLs and inputs to avoid SSRF and injection attacks.
  • Document data retention and add an API to erase user data on request.
  • Limit stored PII and encrypt sensitive fields at rest.

Testing

  • Unit tests: mock LLM and embedding clients; assert prompt rendering and chain wiring.
  • Integration tests: use sandboxed providers or local mocks to keep costs low.
  • Regression tests: snapshot prompt outputs with mocked LLM responses; update fixtures intentionally and with review.

Suggested libraries:

  • pytest, pytest-mock for testing
  • responses or requests-mock for HTTP provider mocks

CI: add a low-cost job that runs prompt-template tests using mocks to detect silent regressions.

Example — minimal chain

import os
from langchain import OpenAI, PromptTemplate, LLMChain

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"), temperature=0.0)
template = PromptTemplate(input_variables=["q"], template="Answer concisely: {q}")
chain = LLMChain(llm=llm, prompt=template)

resp = chain.run({"q": "What is LangChain?"})
print(resp)

Note: LangChain provides both LLM and chat-model APIs (e.g., ChatOpenAI). Prefer the interface that matches your provider and desired message semantics.

Agents & tools

  • Use Agents (Agent, AgentExecutor) only when dynamic planning or tool orchestration is required.
  • Sandbox and scope tools: avoid arbitrary shell or filesystem operations from model outputs. Validate and restrict tool inputs.
  • Follow the official agents tutorial: https://python.langchain.com/docs/tutorials/agents/

CI / deployment

  • Pin dependencies and run pip-audit or safety in CI.
  • Run tests (unit + lightweight integration) on PRs.
  • Containerize with resource limits and provide secrets via your platform's secret manager (do not commit .env files).

Example Dockerfile (minimal):

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "-m", "your_app.entrypoint"]

Observability & cost control

Documentation & governance

  • Keep prompts and templates under version control in prompts/.
  • Add examples/ with Jupyter notebooks or scripts that demonstrate RAG, a simple agent, and callback handlers.
  • Add README sections explaining local run, tests, and secret configuration.