Context Management API¶

PatchPal's context management system handles token estimation, context window limits, and automatic compaction.

TokenEstimator¶

`patchpal.context.TokenEstimator(model_id)` ¶

Estimate tokens in messages for context management.

Source code in patchpal/context.py

def __init__(self, model_id: str):
    self.model_id = model_id
    self._encoder = self._get_encoder()

`estimate_tokens(text)` ¶

Estimate tokens in text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to estimate tokens for	required

Returns:

Type	Description
`int`	Estimated token count

Source code in patchpal/context.py

def estimate_tokens(self, text: str) -> int:
    """Estimate tokens in text.

    Args:
        text: Text to estimate tokens for

    Returns:
        Estimated token count
    """
    if not text:
        return 0

    if self._encoder:
        try:
            return len(self._encoder.encode(str(text)))
        except Exception:
            pass

    # Fallback: ~3 chars per token (conservative for code-heavy content)
    # This is more accurate than 4 chars/token for technical content
    return len(str(text)) // 3

`estimate_message_tokens(message)` ¶

Estimate tokens in a single message.

Parameters:

Name	Type	Description	Default
`message`	`Dict[str, Any]`	Message dict with role, content, tool_calls, etc.	required

Returns:

Type	Description
`int`	Estimated token count

Source code in patchpal/context.py

def estimate_message_tokens(self, message: Dict[str, Any]) -> int:
    """Estimate tokens in a single message.

    Args:
        message: Message dict with role, content, tool_calls, etc.

    Returns:
        Estimated token count
    """
    tokens = 0

    # Role and content
    if "role" in message:
        tokens += 4  # Role overhead

    if "content" in message and message["content"]:
        tokens += self.estimate_tokens(str(message["content"]))

    # Tool calls
    if message.get("tool_calls"):
        for tool_call in message["tool_calls"]:
            tokens += 10  # Tool call overhead
            if hasattr(tool_call, "function"):
                tokens += self.estimate_tokens(tool_call.function.name)
                tokens += self.estimate_tokens(tool_call.function.arguments)

    # Tool call ID
    if message.get("tool_call_id"):
        tokens += 5

    # Name field
    if message.get("name"):
        tokens += self.estimate_tokens(message["name"])

    return tokens

`estimate_messages_tokens(messages)` ¶

Estimate tokens in a list of messages.

Parameters:

Name	Type	Description	Default
`messages`	`List[Dict[str, Any]]`	List of message dicts	required

Returns:

Type	Description
`int`	Total estimated token count

Source code in patchpal/context.py

def estimate_messages_tokens(self, messages: List[Dict[str, Any]]) -> int:
    """Estimate tokens in a list of messages.

    Args:
        messages: List of message dicts

    Returns:
        Total estimated token count
    """
    return sum(self.estimate_message_tokens(msg) for msg in messages)

ContextManager¶

`patchpal.context.ContextManager(model_id, system_prompt)` ¶

Manage context window with auto-compaction and pruning.

Initialize context manager.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	LiteLLM model identifier	required
`system_prompt`	`str`	System prompt text	required

Source code in patchpal/context.py

def __init__(self, model_id: str, system_prompt: str):
    """Initialize context manager.

    Args:
        model_id: LiteLLM model identifier
        system_prompt: System prompt text
    """
    self.model_id = model_id
    self.system_prompt = system_prompt
    self.estimator = TokenEstimator(model_id)
    self.context_limit = self._get_context_limit()
    self.output_reserve = 4_096  # Reserve tokens for model output

`needs_compaction(messages)` ¶

Check if context window needs compaction.

Parameters:

Name	Type	Description	Default
`messages`	`List[Dict[str, Any]]`	Current message history	required

Returns:

Type	Description
`bool`	True if compaction is needed

Source code in patchpal/context.py

def needs_compaction(self, messages: List[Dict[str, Any]]) -> bool:
    """Check if context window needs compaction.

    Args:
        messages: Current message history

    Returns:
        True if compaction is needed
    """
    # Estimate total tokens
    system_tokens = self.estimator.estimate_tokens(self.system_prompt)
    message_tokens = self.estimator.estimate_messages_tokens(messages)
    total_tokens = system_tokens + message_tokens + self.output_reserve

    # Check threshold
    usage_ratio = total_tokens / self.context_limit
    return usage_ratio >= self.COMPACT_THRESHOLD

`get_usage_stats(messages)` ¶

Get current context usage statistics.

Parameters:

Name	Type	Description	Default
`messages`	`List[Dict[str, Any]]`	Current message history	required

Returns:

Type	Description
`Dict[str, Any]`	Dict with usage statistics

Source code in patchpal/context.py

def get_usage_stats(self, messages: List[Dict[str, Any]]) -> Dict[str, Any]:
    """Get current context usage statistics.

    Args:
        messages: Current message history

    Returns:
        Dict with usage statistics
    """
    system_tokens = self.estimator.estimate_tokens(self.system_prompt)
    message_tokens = self.estimator.estimate_messages_tokens(messages)
    total_tokens = system_tokens + message_tokens + self.output_reserve

    return {
        "system_tokens": system_tokens,
        "message_tokens": message_tokens,
        "output_reserve": self.output_reserve,
        "total_tokens": total_tokens,
        "context_limit": self.context_limit,
        "usage_ratio": total_tokens / self.context_limit,
        "usage_percent": int((total_tokens / self.context_limit) * 100),
    }

Usage Example¶

from patchpal.agent import create_agent

agent = create_agent()

# Check context usage
stats = agent.context_manager.get_usage_stats(agent.messages)
print(f"Token usage: {stats['total_tokens']:,} / {stats['context_limit']:,}")
print(f"Usage: {stats['usage_percent']}%")
print(f"Output budget remaining: {stats['output_budget_remaining']:,} tokens")

# Check if compaction is needed
if agent.context_manager.needs_compaction(agent.messages):
    print("Context window getting full - compaction will trigger soon")

# Manually trigger compaction (usually automatic)
agent._perform_auto_compaction()

How Context Management Works¶

Token Estimation: Uses tiktoken (or fallback character estimation) to estimate message tokens
Context Limits: Tracks model-specific context window sizes (e.g., 200K for Claude Sonnet)
Automatic Compaction: When context reaches 70% full, summarizes old messages to free space
Output Budget: Reserves tokens for model output based on context window size

Context Limits by Model Family¶

The context manager automatically detects limits for common models:

Claude 3.5 Sonnet: 200,000 tokens
Claude 3 Opus: 200,000 tokens
GPT-4 Turbo: 128,000 tokens
GPT-4: 8,192 tokens
GPT-3.5: 16,385 tokens

For unknown models, falls back to 128,000 tokens.

Context Management Guide - Overview of context management
Agent API - Using the agent with automatic context management

Context Management API¶

TokenEstimator¶

patchpal.context.TokenEstimator(model_id) ¶

estimate_tokens(text) ¶

estimate_message_tokens(message) ¶

estimate_messages_tokens(messages) ¶

ContextManager¶

patchpal.context.ContextManager(model_id, system_prompt) ¶

needs_compaction(messages) ¶

get_usage_stats(messages) ¶

Usage Example¶

How Context Management Works¶

Context Limits by Model Family¶

Related¶

`patchpal.context.TokenEstimator(model_id)` ¶

`estimate_tokens(text)` ¶

`estimate_message_tokens(message)` ¶

`estimate_messages_tokens(messages)` ¶

`patchpal.context.ContextManager(model_id, system_prompt)` ¶

`needs_compaction(messages)` ¶

`get_usage_stats(messages)` ¶