Skip to content

SurrealDB

Overview

SurrealDB is the primary persistence layer for the Lota SDK. It stores workstreams, messages, memories, execution plans, learned skills, and runtime bootstrap state. SurrealDB was chosen for its native support for vector search (HNSW indexes for semantic retrieval), full-text search (BM25 with configurable analyzers), graph relations (used for memory relation edges), and flexible schemaful/schemaless table modes — all within a single database engine.

Schema Files

Built-in schema files live in core/infrastructure/schema/ and are applied at startup. Each file defines tables, fields, and indexes using SurrealQL.

00_workstream.surql

Defines the workstream and message persistence layer.

Tables:

TableDescription
workstreamPersistent conversation threads
workstreamMessageAI SDK UIMessage persistence
workstreamAttachmentFile attachment metadata

Key workstream fields:

FieldTypeDescription
organizationIdoption<string>Owning organization
userIdoption<string>Owning user
agentIdoption<string>Agent for direct workstreams
modestring'direct' or 'group'
coreboolWhether this is a core workstream
coreTypeoption<string>Core workstream type
titleoption<string>Display title
statusstring'regular' or 'archived'
memoryBlockoption<string>Serialized memory block entries (JSON)
memoryBlockSummaryoption<string>Compacted memory block summary
activeRunIdoption<string>Current agent run ID
compactionSummaryoption<string>Compacted conversation summary
lastCompactedMessageIdoption<string>Compaction boundary cursor
isCompactingboolWhether compaction is in progress
stateoption<object>Structured workstream state (flexible)

Key indexes:

  • workstreamOrgIdx on organizationId
  • workstreamMessageWorkstreamCreatedIdx on (workstreamId, createdAt) for cursor pagination
  • workstreamMessageWorkstreamMessageUniqueIdx on (workstreamId, messageId) (UNIQUE)

01_memory.surql

Defines the memory system with vector and full-text search.

Tables:

TableTypeDescription
memorySCHEMAFULLMemory records with embeddings
memoryRelationRELATIONSemantic edges between memories
memoryHistorySCHEMAFULLAudit trail of memory changes

Key memory fields:

FieldTypeDescription
contentstringThe fact or preference text
embeddingarray<float>1536-dimensional vector
hashstringContent hash for deduplication
scopeIdstringNamespace (org or agent scope)
memoryTypestringfact, preference, interaction, etc.
durabilitystringcore, standard, or ephemeral
importancefloat0-1 importance score
accessCountintNumber of times accessed
needsReviewboolFlagged for review (parent superseded)

Vector Index (HNSW)

sql
DEFINE INDEX memoryVectorIdx
  ON TABLE memory FIELDS embedding
  HNSW DIMENSION 1536 DIST COSINE TYPE F32;

This index enables approximate nearest-neighbor search for semantic memory retrieval. The 1536 dimension matches the text-embedding-3-small model output.

Full-Text Search Index (BM25)

sql
DEFINE ANALYZER memoryAnalyzer
  TOKENIZERS blank, class, punct
  FILTERS lowercase, snowball(english);

DEFINE INDEX memoryContentFts
  ON TABLE memory FIELDS content
  FULLTEXT ANALYZER memoryAnalyzer BM25 HIGHLIGHTS;

The BM25 index provides keyword-based search with English stemming. It complements vector search for hybrid retrieval.

Other Memory Indexes

IndexColumnsPurpose
memoryScopeIdIdxscopeIdFast scope filtering
memoryHashIdxhash (UNIQUE)Deduplication
memoryArchivedIdxarchivedAtArchived memory filtering
memoryScopeTypeArchivedIdx(scopeId, memoryType, archivedAt)Scoped type queries
memoryScopeTemporalIdx(scopeId, archivedAt, validFrom, validUntil)Temporal queries
memoryNeedsReviewIdx(scopeId, needsReview, archivedAt, updatedAt)Review queue
memoryTopLookupIdxMultiple columnsPre-seeded memory lookup

02_execution_plan.surql

Defines the execution plan runtime.

Tables:

TableDescription
planSpecCompiled execution plan specifications
planNodeSpecNode contracts within a compiled specification
planRunExecution run instances
planNodeRunPer-node runtime state inside a run
planNodeAttemptSubmitted node attempts
planArtifactFirst-class node output artifacts
planApprovalHuman approval records
planCheckpointRestart-safe run snapshots
planValidationIssueBlocking and warning validation records
planEventExecution audit events

Key indexes:

  • planSpecWorkstreamStatusIdx on (workstreamId, status) for spec lookup
  • planNodeSpecPlanPositionIdx on (planSpecId, position) for ordered node retrieval
  • planRunWorkstreamStatusIdx on (workstreamId, status) for active run lookup
  • planNodeRunRunStatusIdx on (runId, status) for executor state traversal
  • planCheckpointRunSequenceIdx on (runId, sequence) for latest checkpoint lookup
  • planEventRunTimeIdx on (runId, createdAt) for recent event queries

03_learned_skill.surql

Stores procedures learned from conversation history.

04_runtime_bootstrap.surql

Stores the database schema readiness marker used by waitForDatabaseBootstrap() to coordinate multi-process startup.

Schema Application

Schemas are applied at startup by reading each .surql file and executing it against SurrealDB:

ts
for (const schemaFile of runtime.schemaFiles) {
  const sql = await Bun.file(schemaFile).text()
  await runtime.services.database.query(sql)
}

After application, publishDatabaseBootstrap() writes a fingerprint record so that other processes (workers) can wait for schema readiness.

Consumer Schema Files

Consumers and plugins can contribute additional schema files via extraSchemaFiles in the config or contributions.schemaFiles in plugins. These are appended to the built-in schema file list.

Connection Configuration

ts
database: {
  url: 'ws://localhost:8003',     // WebSocket URL
  namespace: 'lotasdk',          // SurrealDB namespace
  username: 'root',
  password: 'root',
}

The SDK database name is fixed to lotasdk and cannot be overridden.

The SDK uses the SurrealDB JavaScript driver (surrealdb package) with WebSocket transport.

Startup Flow

Schema application and database readiness follow a coordinated multi-step process:

  1. Schema file ordering. Schema files are sorted alphabetically before execution. The numeric prefix convention (00_, 01_, 02_, etc.) ensures dependencies are satisfied — workstream tables exist before execution plan tables reference them, and so on. Consumer-provided extraSchemaFiles are appended after built-in schemas.

  2. Connection with retry. connectWithStartupRetry attempts to establish the WebSocket connection to SurrealDB with exponential backoff. This handles the common case where the database container is still starting when the application boots.

  3. Schema execution. Each .surql file is read and executed sequentially as a single query against the database. All statements use IF NOT EXISTS guards, making schema application idempotent — safe to run repeatedly without side effects.

  4. Bootstrap publication. After all schema files execute successfully, publishDatabaseBootstrap() writes a fingerprint record to the runtimeBootstrap table. This record includes a hash of the applied schema files, serving as proof that the database is ready.

  5. Worker coordination. Worker processes (BullMQ workers running in separate processes) call waitForDatabaseBootstrap() on startup. This function polls for the bootstrap fingerprint record, blocking the worker until the main process has finished applying the schema. This prevents workers from executing jobs against incomplete schema.

Main Process                     Worker Process
─────────────                    ──────────────
connect (with retry)             connect (with retry)
apply schema files               waitForDatabaseBootstrap() ← polls
publishDatabaseBootstrap()       ← bootstrap record found
                                 start processing jobs

Querying

Direct Queries

Use the database service for raw SurrealQL queries:

ts
const results = await runtime.services.database.query(
  'SELECT * FROM workstream WHERE organizationId = $orgId AND status = $status',
  { orgId: 'org:abc', status: 'regular' }
)

Always use parameterized queries ($param syntax) to prevent injection and ensure proper type handling.

Transactions

Group multiple operations into an atomic transaction:

ts
await runtime.services.database.transaction(async (tx) => {
  await tx.query('CREATE memory SET content = $content, hash = $hash, scopeId = $scopeId', {
    content: 'User prefers concise responses',
    hash: 'abc123',
    scopeId: 'org:xyz',
  })
  await tx.query('RELATE $from -> memoryRelation -> $to SET relationType = $type', {
    from: 'memory:a',
    to: 'memory:b',
    type: 'supersedes',
  })
})

Transactions ensure that all operations succeed or none do. This is particularly important for memory operations where a record and its relations must be created atomically.

Query memories by semantic similarity using the HNSW index:

sql
SELECT *, vector::similarity::cosine(embedding, $queryEmbedding) AS score
FROM memory
WHERE scopeId = $scopeId AND archivedAt IS NONE
ORDER BY score DESC
LIMIT 10

Query memories by keyword using the BM25 index:

sql
SELECT *, search::score(1) AS score
FROM memory
WHERE content @1@ $searchTerms AND scopeId = $scopeId
ORDER BY score DESC
LIMIT 10

Best Practices

  • Always use IF NOT EXISTS in schema files. Every DEFINE TABLE, DEFINE FIELD, and DEFINE INDEX statement must include the IF NOT EXISTS guard. This makes schema application idempotent and safe to run on every startup.

  • Never use OVERWRITE, REMOVE, or migration DML. The schema assumes a fresh database. There are no migration scripts, no data transforms, and no backward-compatibility shims. If a schema change is needed, update the definition directly.

  • Keep record-id inputs structural at the DB boundary. Pass record IDs as structured values (table + id) rather than relying on the nominal RecordId class. This avoids identity-matching issues across package boundaries in Bun workspace/link installs.

  • Use cursor pagination for large result sets. The workstreamMessageWorkstreamCreatedIdx index supports cursor-based pagination using createdAt as the cursor. Avoid LIMIT/OFFSET pagination for large datasets as it degrades with page depth.

  • Parameterize all queries. Always use $param placeholders instead of string interpolation. This prevents SurrealQL injection and ensures correct type handling for record IDs, dates, and arrays.

  • Scope queries by organization. Most tables include an organizationId or scopeId field. Always filter by scope to ensure tenant isolation and take advantage of scoped indexes.