Skip to content

Context Management API

PatchPal's context management system handles token estimation, context window limits, and automatic compaction.

TokenEstimator

patchpal.context.TokenEstimator(model_id)

Estimate tokens in messages for context management.

Source code in patchpal/context.py
def __init__(self, model_id: str):
    self.model_id = model_id
    self._encoder = self._get_encoder()

estimate_tokens(text)

Estimate tokens in text.

Parameters:

Name Type Description Default
text str

Text to estimate tokens for

required

Returns:

Type Description
int

Estimated token count

Source code in patchpal/context.py
def estimate_tokens(self, text: str) -> int:
    """Estimate tokens in text.

    Args:
        text: Text to estimate tokens for

    Returns:
        Estimated token count
    """
    if not text:
        return 0

    if self._encoder:
        try:
            return len(self._encoder.encode(str(text)))
        except Exception:
            pass

    # Fallback: ~3 chars per token (conservative for code-heavy content)
    # This is more accurate than 4 chars/token for technical content
    return len(str(text)) // 3

estimate_message_tokens(message)

Estimate tokens in a single message.

Parameters:

Name Type Description Default
message Dict[str, Any]

Message dict with role, content, tool_calls, etc.

required

Returns:

Type Description
int

Estimated token count

Source code in patchpal/context.py
def estimate_message_tokens(self, message: Dict[str, Any]) -> int:
    """Estimate tokens in a single message.

    Args:
        message: Message dict with role, content, tool_calls, etc.

    Returns:
        Estimated token count
    """
    tokens = 0

    # Role and content
    if "role" in message:
        tokens += 4  # Role overhead

    if "content" in message and message["content"]:
        tokens += self.estimate_tokens(str(message["content"]))

    # Tool calls
    if message.get("tool_calls"):
        for tool_call in message["tool_calls"]:
            tokens += 10  # Tool call overhead
            if hasattr(tool_call, "function"):
                tokens += self.estimate_tokens(tool_call.function.name)
                tokens += self.estimate_tokens(tool_call.function.arguments)

    # Tool call ID
    if message.get("tool_call_id"):
        tokens += 5

    # Name field
    if message.get("name"):
        tokens += self.estimate_tokens(message["name"])

    return tokens

estimate_messages_tokens(messages)

Estimate tokens in a list of messages.

Parameters:

Name Type Description Default
messages List[Dict[str, Any]]

List of message dicts

required

Returns:

Type Description
int

Total estimated token count

Source code in patchpal/context.py
def estimate_messages_tokens(self, messages: List[Dict[str, Any]]) -> int:
    """Estimate tokens in a list of messages.

    Args:
        messages: List of message dicts

    Returns:
        Total estimated token count
    """
    return sum(self.estimate_message_tokens(msg) for msg in messages)

ContextManager

patchpal.context.ContextManager(model_id, system_prompt)

Manage context window with auto-compaction and pruning.

Initialize context manager.

Parameters:

Name Type Description Default
model_id str

LiteLLM model identifier

required
system_prompt str

System prompt text

required
Source code in patchpal/context.py
def __init__(self, model_id: str, system_prompt: str):
    """Initialize context manager.

    Args:
        model_id: LiteLLM model identifier
        system_prompt: System prompt text
    """
    self.model_id = model_id
    self.system_prompt = system_prompt
    self.estimator = TokenEstimator(model_id)
    self.context_limit = self._get_context_limit()
    self.output_reserve = 4_096  # Reserve tokens for model output

needs_compaction(messages)

Check if context window needs compaction.

Parameters:

Name Type Description Default
messages List[Dict[str, Any]]

Current message history

required

Returns:

Type Description
bool

True if compaction is needed

Source code in patchpal/context.py
def needs_compaction(self, messages: List[Dict[str, Any]]) -> bool:
    """Check if context window needs compaction.

    Args:
        messages: Current message history

    Returns:
        True if compaction is needed
    """
    # Estimate total tokens
    system_tokens = self.estimator.estimate_tokens(self.system_prompt)
    message_tokens = self.estimator.estimate_messages_tokens(messages)
    total_tokens = system_tokens + message_tokens + self.output_reserve

    # Check threshold
    usage_ratio = total_tokens / self.context_limit
    return usage_ratio >= self.COMPACT_THRESHOLD

get_usage_stats(messages)

Get current context usage statistics.

Parameters:

Name Type Description Default
messages List[Dict[str, Any]]

Current message history

required

Returns:

Type Description
Dict[str, Any]

Dict with usage statistics

Source code in patchpal/context.py
def get_usage_stats(self, messages: List[Dict[str, Any]]) -> Dict[str, Any]:
    """Get current context usage statistics.

    Args:
        messages: Current message history

    Returns:
        Dict with usage statistics
    """
    system_tokens = self.estimator.estimate_tokens(self.system_prompt)
    message_tokens = self.estimator.estimate_messages_tokens(messages)
    total_tokens = system_tokens + message_tokens + self.output_reserve

    return {
        "system_tokens": system_tokens,
        "message_tokens": message_tokens,
        "output_reserve": self.output_reserve,
        "total_tokens": total_tokens,
        "context_limit": self.context_limit,
        "usage_ratio": total_tokens / self.context_limit,
        "usage_percent": int((total_tokens / self.context_limit) * 100),
    }

Usage Example

from patchpal.agent import create_agent

agent = create_agent()

# Check context usage
stats = agent.context_manager.get_usage_stats(agent.messages)
print(f"Token usage: {stats['total_tokens']:,} / {stats['context_limit']:,}")
print(f"Usage: {stats['usage_percent']}%")
print(f"Output budget remaining: {stats['output_budget_remaining']:,} tokens")

# Check if compaction is needed
if agent.context_manager.needs_compaction(agent.messages):
    print("Context window getting full - compaction will trigger soon")

# Manually trigger compaction (usually automatic)
agent._perform_auto_compaction()

How Context Management Works

  1. Token Estimation: Uses tiktoken (or fallback character estimation) to estimate message tokens
  2. Context Limits: Tracks model-specific context window sizes (e.g., 200K for Claude Sonnet)
  3. Automatic Compaction: When context reaches 70% full, summarizes old messages to free space
  4. Output Budget: Reserves tokens for model output based on context window size

Context Limits by Model Family

The context manager automatically detects limits for common models:

  • Claude 3.5 Sonnet: 200,000 tokens
  • Claude 3 Opus: 200,000 tokens
  • GPT-4 Turbo: 128,000 tokens
  • GPT-4: 8,192 tokens
  • GPT-3.5: 16,385 tokens

For unknown models, falls back to 128,000 tokens.