SurrealDB

Overview

SurrealDB is the primary persistence layer for the Lota SDK. It stores workstreams, messages, memories, execution plans, learned skills, and runtime bootstrap state. SurrealDB was chosen for its native support for vector search (HNSW indexes for semantic retrieval), full-text search (BM25 with configurable analyzers), graph relations (used for memory relation edges), and flexible schemaful/schemaless table modes — all within a single database engine.

Schema Files

Built-in schema files live in core/infrastructure/schema/ and are applied at startup. Each file defines tables, fields, and indexes using SurrealQL.

00_workstream.surql

Defines the workstream and message persistence layer.

Tables:

Table	Description
`workstream`	Persistent conversation threads
`workstreamMessage`	AI SDK UIMessage persistence
`workstreamAttachment`	File attachment metadata

Key workstream fields:

Field	Type	Description
`organizationId`	`option<string>`	Owning organization
`userId`	`option<string>`	Owning user
`agentId`	`option<string>`	Agent for direct workstreams
`mode`	`string`	`'direct'` or `'group'`
`core`	`bool`	Whether this is a core workstream
`coreType`	`option<string>`	Core workstream type
`title`	`option<string>`	Display title
`status`	`string`	`'regular'` or `'archived'`
`memoryBlock`	`option<string>`	Serialized memory block entries (JSON)
`memoryBlockSummary`	`option<string>`	Compacted memory block summary
`activeRunId`	`option<string>`	Current agent run ID
`compactionSummary`	`option<string>`	Compacted conversation summary
`lastCompactedMessageId`	`option<string>`	Compaction boundary cursor
`isCompacting`	`bool`	Whether compaction is in progress
`state`	`option<object>`	Structured workstream state (flexible)

Key indexes:

workstreamOrgIdx on organizationId
workstreamMessageWorkstreamCreatedIdx on (workstreamId, createdAt) for cursor pagination
workstreamMessageWorkstreamMessageUniqueIdx on (workstreamId, messageId) (UNIQUE)

01_memory.surql

Defines the memory system with vector and full-text search.

Tables:

Table	Type	Description
`memory`	SCHEMAFULL	Memory records with embeddings
`memoryRelation`	RELATION	Semantic edges between memories
`memoryHistory`	SCHEMAFULL	Audit trail of memory changes

Key memory fields:

Field	Type	Description
`content`	`string`	The fact or preference text
`embedding`	`array<float>`	1536-dimensional vector
`hash`	`string`	Content hash for deduplication
`scopeId`	`string`	Namespace (org or agent scope)
`memoryType`	`string`	`fact`, `preference`, `interaction`, etc.
`durability`	`string`	`core`, `standard`, or `ephemeral`
`importance`	`float`	0-1 importance score
`accessCount`	`int`	Number of times accessed
`needsReview`	`bool`	Flagged for review (parent superseded)

Vector Index (HNSW)

sql

DEFINE INDEX memoryVectorIdx
  ON TABLE memory FIELDS embedding
  HNSW DIMENSION 1536 DIST COSINE TYPE F32;

This index enables approximate nearest-neighbor search for semantic memory retrieval. The 1536 dimension matches the text-embedding-3-small model output.

Full-Text Search Index (BM25)

sql

DEFINE ANALYZER memoryAnalyzer
  TOKENIZERS blank, class, punct
  FILTERS lowercase, snowball(english);

DEFINE INDEX memoryContentFts
  ON TABLE memory FIELDS content
  FULLTEXT ANALYZER memoryAnalyzer BM25 HIGHLIGHTS;

The BM25 index provides keyword-based search with English stemming. It complements vector search for hybrid retrieval.

Other Memory Indexes

Index	Columns	Purpose
`memoryScopeIdIdx`	`scopeId`	Fast scope filtering
`memoryHashIdx`	`hash` (UNIQUE)	Deduplication
`memoryArchivedIdx`	`archivedAt`	Archived memory filtering
`memoryScopeTypeArchivedIdx`	`(scopeId, memoryType, archivedAt)`	Scoped type queries
`memoryScopeTemporalIdx`	`(scopeId, archivedAt, validFrom, validUntil)`	Temporal queries
`memoryNeedsReviewIdx`	`(scopeId, needsReview, archivedAt, updatedAt)`	Review queue
`memoryTopLookupIdx`	Multiple columns	Pre-seeded memory lookup

02_execution_plan.surql

Defines the execution plan runtime.

Tables:

Table	Description
`planSpec`	Compiled execution plan specifications
`planNodeSpec`	Node contracts within a compiled specification
`planRun`	Execution run instances
`planNodeRun`	Per-node runtime state inside a run
`planNodeAttempt`	Submitted node attempts
`planArtifact`	First-class node output artifacts
`planApproval`	Human approval records
`planCheckpoint`	Restart-safe run snapshots
`planValidationIssue`	Blocking and warning validation records
`planEvent`	Execution audit events

Key indexes:

planSpecWorkstreamStatusIdx on (workstreamId, status) for spec lookup
planNodeSpecPlanPositionIdx on (planSpecId, position) for ordered node retrieval
planRunWorkstreamStatusIdx on (workstreamId, status) for active run lookup
planNodeRunRunStatusIdx on (runId, status) for executor state traversal
planCheckpointRunSequenceIdx on (runId, sequence) for latest checkpoint lookup
planEventRunTimeIdx on (runId, createdAt) for recent event queries

03_learned_skill.surql

Stores procedures learned from conversation history.

04_runtime_bootstrap.surql

Stores the database schema readiness marker used by waitForDatabaseBootstrap() to coordinate multi-process startup.

Schema Application

Schemas are applied at startup by reading each .surql file and executing it against SurrealDB:

for (const schemaFile of runtime.schemaFiles) {
  const sql = await Bun.file(schemaFile).text()
  await runtime.services.database.query(sql)
}

After application, publishDatabaseBootstrap() writes a fingerprint record so that other processes (workers) can wait for schema readiness.

Consumer Schema Files

Consumers and plugins can contribute additional schema files via extraSchemaFiles in the config or contributions.schemaFiles in plugins. These are appended to the built-in schema file list.

Connection Configuration

database: {
  url: 'ws://localhost:8003',     // WebSocket URL
  namespace: 'lotasdk',          // SurrealDB namespace
  username: 'root',
  password: 'root',
}

The SDK database name is fixed to lotasdk and cannot be overridden.

The SDK uses the SurrealDB JavaScript driver (surrealdb package) with WebSocket transport.

Startup Flow

Schema application and database readiness follow a coordinated multi-step process:

Schema file ordering. Schema files are sorted alphabetically before execution. The numeric prefix convention (00_, 01_, 02_, etc.) ensures dependencies are satisfied — workstream tables exist before execution plan tables reference them, and so on. Consumer-provided extraSchemaFiles are appended after built-in schemas.
Connection with retry. connectWithStartupRetry attempts to establish the WebSocket connection to SurrealDB with exponential backoff. This handles the common case where the database container is still starting when the application boots.
Schema execution. Each .surql file is read and executed sequentially as a single query against the database. All statements use IF NOT EXISTS guards, making schema application idempotent — safe to run repeatedly without side effects.
Bootstrap publication. After all schema files execute successfully, publishDatabaseBootstrap() writes a fingerprint record to the runtimeBootstrap table. This record includes a hash of the applied schema files, serving as proof that the database is ready.
Worker coordination. Worker processes (BullMQ workers running in separate processes) call waitForDatabaseBootstrap() on startup. This function polls for the bootstrap fingerprint record, blocking the worker until the main process has finished applying the schema. This prevents workers from executing jobs against incomplete schema.

Main Process                     Worker Process
─────────────                    ──────────────
connect (with retry)             connect (with retry)
apply schema files               waitForDatabaseBootstrap() ← polls
publishDatabaseBootstrap()       ← bootstrap record found
                                 start processing jobs

Querying

Direct Queries

Use the database service for raw SurrealQL queries:

const results = await runtime.services.database.query(
  'SELECT * FROM workstream WHERE organizationId = $orgId AND status = $status',
  { orgId: 'org:abc', status: 'regular' }
)

Always use parameterized queries ($param syntax) to prevent injection and ensure proper type handling.

Transactions

Group multiple operations into an atomic transaction:

await runtime.services.database.transaction(async (tx) => {
  await tx.query('CREATE memory SET content = $content, hash = $hash, scopeId = $scopeId', {
    content: 'User prefers concise responses',
    hash: 'abc123',
    scopeId: 'org:xyz',
  })
  await tx.query('RELATE $from -> memoryRelation -> $to SET relationType = $type', {
    from: 'memory:a',
    to: 'memory:b',
    type: 'supersedes',
  })
})

Transactions ensure that all operations succeed or none do. This is particularly important for memory operations where a record and its relations must be created atomically.

Vector Search

Query memories by semantic similarity using the HNSW index:

sql

SELECT *, vector::similarity::cosine(embedding, $queryEmbedding) AS score
FROM memory
WHERE scopeId = $scopeId AND archivedAt IS NONE
ORDER BY score DESC
LIMIT 10

Full-Text Search

Query memories by keyword using the BM25 index:

sql

SELECT *, search::score(1) AS score
FROM memory
WHERE content @1@ $searchTerms AND scopeId = $scopeId
ORDER BY score DESC
LIMIT 10

Best Practices

Always use IF NOT EXISTS in schema files. Every DEFINE TABLE, DEFINE FIELD, and DEFINE INDEX statement must include the IF NOT EXISTS guard. This makes schema application idempotent and safe to run on every startup.
Never use OVERWRITE, REMOVE, or migration DML. The schema assumes a fresh database. There are no migration scripts, no data transforms, and no backward-compatibility shims. If a schema change is needed, update the definition directly.
Keep record-id inputs structural at the DB boundary. Pass record IDs as structured values (table + id) rather than relying on the nominal RecordId class. This avoids identity-matching issues across package boundaries in Bun workspace/link installs.
Use cursor pagination for large result sets. The workstreamMessageWorkstreamCreatedIdx index supports cursor-based pagination using createdAt as the cursor. Avoid LIMIT/OFFSET pagination for large datasets as it degrades with page depth.
Parameterize all queries. Always use $param placeholders instead of string interpolation. This prevents SurrealQL injection and ensures correct type handling for record IDs, dates, and arrays.
Scope queries by organization. Most tables include an organizationId or scopeId field. Always filter by scope to ensure tenant isolation and take advantage of scoped indexes.

SurrealDB ​

Overview ​

Schema Files ​

00_workstream.surql ​

01_memory.surql ​

Vector Index (HNSW) ​

Full-Text Search Index (BM25) ​

Other Memory Indexes ​

02_execution_plan.surql ​

03_learned_skill.surql ​

04_runtime_bootstrap.surql ​

Schema Application ​

Consumer Schema Files ​

Connection Configuration ​

Startup Flow ​

Querying ​

Direct Queries ​

Transactions ​

Vector Search ​

Full-Text Search ​

Best Practices ​

SurrealDB

Overview

Schema Files

00_workstream.surql

01_memory.surql

Vector Index (HNSW)

Full-Text Search Index (BM25)

Other Memory Indexes

02_execution_plan.surql

03_learned_skill.surql

04_runtime_bootstrap.surql

Schema Application

Consumer Schema Files

Connection Configuration

Startup Flow

Querying

Direct Queries

Transactions

Vector Search

Full-Text Search

Best Practices