Appearance
SurrealDB
Overview
SurrealDB is the primary persistence layer for the Lota SDK. It stores workstreams, messages, memories, execution plans, learned skills, and runtime bootstrap state. SurrealDB was chosen for its native support for vector search (HNSW indexes for semantic retrieval), full-text search (BM25 with configurable analyzers), graph relations (used for memory relation edges), and flexible schemaful/schemaless table modes — all within a single database engine.
Schema Files
Built-in schema files live in core/infrastructure/schema/ and are applied at startup. Each file defines tables, fields, and indexes using SurrealQL.
00_workstream.surql
Defines the workstream and message persistence layer.
Tables:
| Table | Description |
|---|---|
workstream | Persistent conversation threads |
workstreamMessage | AI SDK UIMessage persistence |
workstreamAttachment | File attachment metadata |
Key workstream fields:
| Field | Type | Description |
|---|---|---|
organizationId | option<string> | Owning organization |
userId | option<string> | Owning user |
agentId | option<string> | Agent for direct workstreams |
mode | string | 'direct' or 'group' |
core | bool | Whether this is a core workstream |
coreType | option<string> | Core workstream type |
title | option<string> | Display title |
status | string | 'regular' or 'archived' |
memoryBlock | option<string> | Serialized memory block entries (JSON) |
memoryBlockSummary | option<string> | Compacted memory block summary |
activeRunId | option<string> | Current agent run ID |
compactionSummary | option<string> | Compacted conversation summary |
lastCompactedMessageId | option<string> | Compaction boundary cursor |
isCompacting | bool | Whether compaction is in progress |
state | option<object> | Structured workstream state (flexible) |
Key indexes:
workstreamOrgIdxonorganizationIdworkstreamMessageWorkstreamCreatedIdxon(workstreamId, createdAt)for cursor paginationworkstreamMessageWorkstreamMessageUniqueIdxon(workstreamId, messageId)(UNIQUE)
01_memory.surql
Defines the memory system with vector and full-text search.
Tables:
| Table | Type | Description |
|---|---|---|
memory | SCHEMAFULL | Memory records with embeddings |
memoryRelation | RELATION | Semantic edges between memories |
memoryHistory | SCHEMAFULL | Audit trail of memory changes |
Key memory fields:
| Field | Type | Description |
|---|---|---|
content | string | The fact or preference text |
embedding | array<float> | 1536-dimensional vector |
hash | string | Content hash for deduplication |
scopeId | string | Namespace (org or agent scope) |
memoryType | string | fact, preference, interaction, etc. |
durability | string | core, standard, or ephemeral |
importance | float | 0-1 importance score |
accessCount | int | Number of times accessed |
needsReview | bool | Flagged for review (parent superseded) |
Vector Index (HNSW)
sql
DEFINE INDEX memoryVectorIdx
ON TABLE memory FIELDS embedding
HNSW DIMENSION 1536 DIST COSINE TYPE F32;This index enables approximate nearest-neighbor search for semantic memory retrieval. The 1536 dimension matches the text-embedding-3-small model output.
Full-Text Search Index (BM25)
sql
DEFINE ANALYZER memoryAnalyzer
TOKENIZERS blank, class, punct
FILTERS lowercase, snowball(english);
DEFINE INDEX memoryContentFts
ON TABLE memory FIELDS content
FULLTEXT ANALYZER memoryAnalyzer BM25 HIGHLIGHTS;The BM25 index provides keyword-based search with English stemming. It complements vector search for hybrid retrieval.
Other Memory Indexes
| Index | Columns | Purpose |
|---|---|---|
memoryScopeIdIdx | scopeId | Fast scope filtering |
memoryHashIdx | hash (UNIQUE) | Deduplication |
memoryArchivedIdx | archivedAt | Archived memory filtering |
memoryScopeTypeArchivedIdx | (scopeId, memoryType, archivedAt) | Scoped type queries |
memoryScopeTemporalIdx | (scopeId, archivedAt, validFrom, validUntil) | Temporal queries |
memoryNeedsReviewIdx | (scopeId, needsReview, archivedAt, updatedAt) | Review queue |
memoryTopLookupIdx | Multiple columns | Pre-seeded memory lookup |
02_execution_plan.surql
Defines the execution plan runtime.
Tables:
| Table | Description |
|---|---|
planSpec | Compiled execution plan specifications |
planNodeSpec | Node contracts within a compiled specification |
planRun | Execution run instances |
planNodeRun | Per-node runtime state inside a run |
planNodeAttempt | Submitted node attempts |
planArtifact | First-class node output artifacts |
planApproval | Human approval records |
planCheckpoint | Restart-safe run snapshots |
planValidationIssue | Blocking and warning validation records |
planEvent | Execution audit events |
Key indexes:
planSpecWorkstreamStatusIdxon(workstreamId, status)for spec lookupplanNodeSpecPlanPositionIdxon(planSpecId, position)for ordered node retrievalplanRunWorkstreamStatusIdxon(workstreamId, status)for active run lookupplanNodeRunRunStatusIdxon(runId, status)for executor state traversalplanCheckpointRunSequenceIdxon(runId, sequence)for latest checkpoint lookupplanEventRunTimeIdxon(runId, createdAt)for recent event queries
03_learned_skill.surql
Stores procedures learned from conversation history.
04_runtime_bootstrap.surql
Stores the database schema readiness marker used by waitForDatabaseBootstrap() to coordinate multi-process startup.
Schema Application
Schemas are applied at startup by reading each .surql file and executing it against SurrealDB:
ts
for (const schemaFile of runtime.schemaFiles) {
const sql = await Bun.file(schemaFile).text()
await runtime.services.database.query(sql)
}After application, publishDatabaseBootstrap() writes a fingerprint record so that other processes (workers) can wait for schema readiness.
Consumer Schema Files
Consumers and plugins can contribute additional schema files via extraSchemaFiles in the config or contributions.schemaFiles in plugins. These are appended to the built-in schema file list.
Connection Configuration
ts
database: {
url: 'ws://localhost:8003', // WebSocket URL
namespace: 'lotasdk', // SurrealDB namespace
username: 'root',
password: 'root',
}The SDK database name is fixed to lotasdk and cannot be overridden.
The SDK uses the SurrealDB JavaScript driver (surrealdb package) with WebSocket transport.
Startup Flow
Schema application and database readiness follow a coordinated multi-step process:
Schema file ordering. Schema files are sorted alphabetically before execution. The numeric prefix convention (
00_,01_,02_, etc.) ensures dependencies are satisfied — workstream tables exist before execution plan tables reference them, and so on. Consumer-providedextraSchemaFilesare appended after built-in schemas.Connection with retry.
connectWithStartupRetryattempts to establish the WebSocket connection to SurrealDB with exponential backoff. This handles the common case where the database container is still starting when the application boots.Schema execution. Each
.surqlfile is read and executed sequentially as a single query against the database. All statements useIF NOT EXISTSguards, making schema application idempotent — safe to run repeatedly without side effects.Bootstrap publication. After all schema files execute successfully,
publishDatabaseBootstrap()writes a fingerprint record to theruntimeBootstraptable. This record includes a hash of the applied schema files, serving as proof that the database is ready.Worker coordination. Worker processes (BullMQ workers running in separate processes) call
waitForDatabaseBootstrap()on startup. This function polls for the bootstrap fingerprint record, blocking the worker until the main process has finished applying the schema. This prevents workers from executing jobs against incomplete schema.
Main Process Worker Process
───────────── ──────────────
connect (with retry) connect (with retry)
apply schema files waitForDatabaseBootstrap() ← polls
publishDatabaseBootstrap() ← bootstrap record found
start processing jobsQuerying
Direct Queries
Use the database service for raw SurrealQL queries:
ts
const results = await runtime.services.database.query(
'SELECT * FROM workstream WHERE organizationId = $orgId AND status = $status',
{ orgId: 'org:abc', status: 'regular' }
)Always use parameterized queries ($param syntax) to prevent injection and ensure proper type handling.
Transactions
Group multiple operations into an atomic transaction:
ts
await runtime.services.database.transaction(async (tx) => {
await tx.query('CREATE memory SET content = $content, hash = $hash, scopeId = $scopeId', {
content: 'User prefers concise responses',
hash: 'abc123',
scopeId: 'org:xyz',
})
await tx.query('RELATE $from -> memoryRelation -> $to SET relationType = $type', {
from: 'memory:a',
to: 'memory:b',
type: 'supersedes',
})
})Transactions ensure that all operations succeed or none do. This is particularly important for memory operations where a record and its relations must be created atomically.
Vector Search
Query memories by semantic similarity using the HNSW index:
sql
SELECT *, vector::similarity::cosine(embedding, $queryEmbedding) AS score
FROM memory
WHERE scopeId = $scopeId AND archivedAt IS NONE
ORDER BY score DESC
LIMIT 10Full-Text Search
Query memories by keyword using the BM25 index:
sql
SELECT *, search::score(1) AS score
FROM memory
WHERE content @1@ $searchTerms AND scopeId = $scopeId
ORDER BY score DESC
LIMIT 10Best Practices
Always use
IF NOT EXISTSin schema files. EveryDEFINE TABLE,DEFINE FIELD, andDEFINE INDEXstatement must include theIF NOT EXISTSguard. This makes schema application idempotent and safe to run on every startup.Never use
OVERWRITE,REMOVE, or migration DML. The schema assumes a fresh database. There are no migration scripts, no data transforms, and no backward-compatibility shims. If a schema change is needed, update the definition directly.Keep record-id inputs structural at the DB boundary. Pass record IDs as structured values (table + id) rather than relying on the nominal
RecordIdclass. This avoids identity-matching issues across package boundaries in Bun workspace/link installs.Use cursor pagination for large result sets. The
workstreamMessageWorkstreamCreatedIdxindex supports cursor-based pagination usingcreatedAtas the cursor. AvoidLIMIT/OFFSETpagination for large datasets as it degrades with page depth.Parameterize all queries. Always use
$paramplaceholders instead of string interpolation. This prevents SurrealQL injection and ensures correct type handling for record IDs, dates, and arrays.Scope queries by organization. Most tables include an
organizationIdorscopeIdfield. Always filter by scope to ensure tenant isolation and take advantage of scoped indexes.