Context Compaction

As conversations grow, the message history can exceed the model's context window. Context compaction summarizes older messages to keep the conversation within token budget while preserving important information.

Why Compaction Is Needed

The SDK operates within a 200K token context budget (CONTEXT_SIZE = 200_000). Long conversations with tool calls, attachments, and team consultations can quickly approach this limit. Without compaction, the runtime would need to truncate messages, losing important context.

Threshold and Trigger

Compaction triggers when the estimated token count of the conversation reaches 70% of the context budget:

Threshold = min(
  contextSize * 0.7,
  contextSize - outputReserve - safetyMargin
)

With default settings (200K context):

Output reserve: 32,000 tokens
Safety margin: 8,000 tokens
Threshold: ~140,000 tokens

The check runs at the end of each turn via finalizeTurnRun(). If the threshold is exceeded, a context-compaction BullMQ job is enqueued.

How Compaction Works

1. Assessment

The runtime estimates the token count of the full conversation payload (summary + live messages) using a simple character-based heuristic (chars / 3).

2. Boundary Selection

Messages are split into two groups:

Prefix -- older messages to be compacted (everything before the tail)
Tail -- the most recent 6 messages (WORKSTREAM_RAW_TAIL_MESSAGES) that are kept verbatim

3. Content Extraction

For each message in the prefix:

Text parts are extracted directly
Tool parts are selectively included based on an allow-list
Reasoning parts are excluded
The result is formatted as a numbered transcript

4. LLM Summarization

The transcript is sent to the context-compacter system agent with the prompt:

<context-compaction-input>
  <previous-summary>...</previous-summary>
  <existing-workstream-state>...</existing-workstream-state>
  <new-messages>...</new-messages>
</context-compaction-input>

The agent produces:

A replacement summary with structured sections (KEY FACTS, CONVERSATION FLOW, OPEN THREADS)
A WorkstreamStateDelta with updates to decisions, tasks, constraints, etc.

5. Chunked Processing

If the transcript exceeds 120,000 characters (COMPACTION_CHUNK_MAX_CHARS), it is split into chunks and processed sequentially. Each chunk builds on the previous summary.

6. Summary Rollup

If the resulting summary exceeds 80,000 tokens (SUMMARY_ROLLUP_MAX_TOKENS), a rollup pass compacts the summary itself.

7. Persistence

After compaction:

compactionSummary is updated on the workstream record
lastCompactedMessageId is set to the last compacted message
state is updated with the merged state delta
isCompacting is cleared
Compacted messages are marked with isCompacted: true in metadata

Tools Preserved Through Compaction

Only specific tool calls survive compaction. All others are stripped from the transcript:

Included by name:

userQuestions
proceedInOnboarding

Included by prefix:

linear* (all Linear tools)

To add new tool families to the compaction allow-list, update CONTEXT_COMPACTION_INCLUDED_TOOL_NAMES or CONTEXT_COMPACTION_INCLUDED_TOOL_PREFIXES in src/runtime/context-compaction-constants.ts.

Summary Injection

On subsequent turns, the summary is injected as the first message in the conversation when both compactionSummary and lastCompactedMessageId are present:

[system] Compacted context summary:
KEY FACTS:
- ...
CONVERSATION FLOW:
- ...
OPEN THREADS:
- ...

This is followed by the uncompacted tail messages, giving the agent both historical context and recent conversation detail.

Memory Block Compaction

Separate from context compaction, each workstream's memory block has its own compaction cycle:

Memory block entries accumulate as the conversation progresses.
When entries reach 15 (MEMORY_BLOCK_COMPACTION_TRIGGER_ENTRIES), the oldest 10 are compacted.
A buildMemoryBlockCompactionPrompt call blends the previous summary with the new entries.
The result is stored as memoryBlockSummary on the workstream.
Only the remaining un-compacted entries stay in memoryBlock.

Both the summary and raw entries are injected into the agent context.

Workstream State Updates

During compaction, the workstream state is updated from the compaction delta. This ensures that decisions, tasks, constraints, and other tracked items discovered during compaction are preserved in the structured state, even though the source messages have been summarized away.

State limits are enforced after each merge:

Field	Max Items
Key decisions	8
Active constraints	6
Tasks	10
Open questions	5
Risks	5
Artifacts	10
Agent contributions	6

Compaction Queue

Compaction runs as a BullMQ job on the context-compaction queue:

Concurrency: 2 workers
Lock duration: 5 minutes
Retry: 2 attempts with exponential backoff (3s base)
Deduplication: by compact:{domain}:{entityId}

While compaction is running, the workstream's isCompacting flag is true. New chat requests wait for compaction to finish before proceeding.

Context Compaction ​

Why Compaction Is Needed ​

Threshold and Trigger ​

How Compaction Works ​

1. Assessment ​

2. Boundary Selection ​

3. Content Extraction ​

4. LLM Summarization ​

5. Chunked Processing ​

6. Summary Rollup ​

7. Persistence ​

Tools Preserved Through Compaction ​

Summary Injection ​

Memory Block Compaction ​

Workstream State Updates ​

Compaction Queue ​