Appearance
Context Compaction
As conversations grow, the message history can exceed the model's context window. Context compaction summarizes older messages to keep the conversation within token budget while preserving important information.
Why Compaction Is Needed
The SDK operates within a 200K token context budget (CONTEXT_SIZE = 200_000). Long conversations with tool calls, attachments, and team consultations can quickly approach this limit. Without compaction, the runtime would need to truncate messages, losing important context.
Threshold and Trigger
Compaction triggers when the estimated token count of the conversation reaches 70% of the context budget:
Threshold = min(
contextSize * 0.7,
contextSize - outputReserve - safetyMargin
)With default settings (200K context):
- Output reserve: 32,000 tokens
- Safety margin: 8,000 tokens
- Threshold: ~140,000 tokens
The check runs at the end of each turn via finalizeTurnRun(). If the threshold is exceeded, a context-compaction BullMQ job is enqueued.
How Compaction Works
1. Assessment
The runtime estimates the token count of the full conversation payload (summary + live messages) using a simple character-based heuristic (chars / 3).
2. Boundary Selection
Messages are split into two groups:
- Prefix -- older messages to be compacted (everything before the tail)
- Tail -- the most recent 6 messages (
WORKSTREAM_RAW_TAIL_MESSAGES) that are kept verbatim
3. Content Extraction
For each message in the prefix:
- Text parts are extracted directly
- Tool parts are selectively included based on an allow-list
- Reasoning parts are excluded
- The result is formatted as a numbered transcript
4. LLM Summarization
The transcript is sent to the context-compacter system agent with the prompt:
<context-compaction-input>
<previous-summary>...</previous-summary>
<existing-workstream-state>...</existing-workstream-state>
<new-messages>...</new-messages>
</context-compaction-input>The agent produces:
- A replacement summary with structured sections (KEY FACTS, CONVERSATION FLOW, OPEN THREADS)
- A
WorkstreamStateDeltawith updates to decisions, tasks, constraints, etc.
5. Chunked Processing
If the transcript exceeds 120,000 characters (COMPACTION_CHUNK_MAX_CHARS), it is split into chunks and processed sequentially. Each chunk builds on the previous summary.
6. Summary Rollup
If the resulting summary exceeds 80,000 tokens (SUMMARY_ROLLUP_MAX_TOKENS), a rollup pass compacts the summary itself.
7. Persistence
After compaction:
compactionSummaryis updated on the workstream recordlastCompactedMessageIdis set to the last compacted messagestateis updated with the merged state deltaisCompactingis cleared- Compacted messages are marked with
isCompacted: truein metadata
Tools Preserved Through Compaction
Only specific tool calls survive compaction. All others are stripped from the transcript:
Included by name:
userQuestionsproceedInOnboarding
Included by prefix:
linear*(all Linear tools)
To add new tool families to the compaction allow-list, update CONTEXT_COMPACTION_INCLUDED_TOOL_NAMES or CONTEXT_COMPACTION_INCLUDED_TOOL_PREFIXES in src/runtime/context-compaction-constants.ts.
Summary Injection
On subsequent turns, the summary is injected as the first message in the conversation when both compactionSummary and lastCompactedMessageId are present:
[system] Compacted context summary:
KEY FACTS:
- ...
CONVERSATION FLOW:
- ...
OPEN THREADS:
- ...This is followed by the uncompacted tail messages, giving the agent both historical context and recent conversation detail.
Memory Block Compaction
Separate from context compaction, each workstream's memory block has its own compaction cycle:
- Memory block entries accumulate as the conversation progresses.
- When entries reach 15 (
MEMORY_BLOCK_COMPACTION_TRIGGER_ENTRIES), the oldest 10 are compacted. - A
buildMemoryBlockCompactionPromptcall blends the previous summary with the new entries. - The result is stored as
memoryBlockSummaryon the workstream. - Only the remaining un-compacted entries stay in
memoryBlock.
Both the summary and raw entries are injected into the agent context.
Workstream State Updates
During compaction, the workstream state is updated from the compaction delta. This ensures that decisions, tasks, constraints, and other tracked items discovered during compaction are preserved in the structured state, even though the source messages have been summarized away.
State limits are enforced after each merge:
| Field | Max Items |
|---|---|
| Key decisions | 8 |
| Active constraints | 6 |
| Tasks | 10 |
| Open questions | 5 |
| Risks | 5 |
| Artifacts | 10 |
| Agent contributions | 6 |
Compaction Queue
Compaction runs as a BullMQ job on the context-compaction queue:
- Concurrency: 2 workers
- Lock duration: 5 minutes
- Retry: 2 attempts with exponential backoff (3s base)
- Deduplication: by
compact:{domain}:{entityId}
While compaction is running, the workstream's isCompacting flag is true. New chat requests wait for compaction to finish before proceeding.