Files
IdentityDB/docs/plans/2026-05-11-identitydb-memory-expansion.md

182 lines
5.2 KiB
Markdown

# IdentityDB Memory Expansion Implementation Plan
> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
**Goal:** Extend IdentityDB with explicit topic hierarchy, topic alias/canonicalization controls, and portable semantic fact search with embedding-backed similarity APIs.
**Architecture:** Keep the relational core portable across SQLite, PostgreSQL, MySQL, and MariaDB by introducing dedicated extension tables: `topic_relations` for abstract/concrete hierarchy, `topic_aliases` for canonical topic resolution, and `fact_embeddings` for semantic indexing. Expose high-level APIs from `IdentityDB` while preserving DB-agnostic behavior by doing semantic scoring in the application layer first.
**Tech Stack:** TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup.
---
## Scope and interpretation
- Topic hierarchy must be explicit rather than inferred only from shared facts.
- Canonical topics must remain first-class records in `topics`; aliases should resolve into those topics without duplicating canonical rows.
- Semantic search must stay provider-agnostic through a pluggable `EmbeddingProvider` interface.
- The first semantic-search release should favor portability and deterministic testing over ANN/vector-extension optimization.
- Ingestion should be able to detect likely duplicate facts by semantic similarity without forcing automatic merges.
---
## Data model additions
### `topic_relations`
- `parent_topic_id`
- `child_topic_id`
- `relation` — initially `parent_of`
- `created_at`
- composite primary key on (`parent_topic_id`, `child_topic_id`, `relation`)
### `topic_aliases`
- `id`
- `topic_id`
- `alias`
- `normalized_alias`
- `is_primary`
- `created_at`
- `updated_at`
- unique key on `normalized_alias`
### `fact_embeddings`
- `fact_id`
- `model`
- `dimensions`
- `embedding`
- `content_hash`
- `created_at`
- `updated_at`
- composite primary key on (`fact_id`, `model`)
---
## Public API additions
### Topic hierarchy
```ts
await db.linkTopics({
parentName: 'programming language',
childName: 'TypeScript',
});
await db.getTopicChildren('programming language');
await db.getTopicParents('TypeScript');
await db.getTopicLineage('TypeScript');
```
### Topic aliases
```ts
await db.addTopicAlias('TypeScript', 'TS');
await db.resolveTopic('ts');
await db.getTopicAliases('TypeScript');
```
### Semantic indexing and search
```ts
await db.indexFactEmbeddings({ provider });
await db.searchFacts({ query: 'When did I start using TS?', provider, limit: 5 });
await db.findSimilarFacts({ statement: 'I started using TypeScript in 2025.', provider, threshold: 0.9 });
```
### Dedup-aware ingestion
```ts
await db.ingestStatement(statement, {
extractor,
dedup: {
provider,
threshold: 0.9,
},
});
```
---
## Execution plan
### Task 1: Lock the extension schema and APIs with failing tests
**Objective:** Define tests for hierarchy, aliases, and semantic search before production code changes.
**Files:**
- Modify: `tests/migrations.test.ts`
- Modify: `tests/identity-db.test.ts`
- Modify: `tests/queries.test.ts`
- Create: `tests/semantic-search.test.ts`
- Modify: `src/types/api.ts`
- Modify: `src/types/domain.ts`
- Modify: `src/types/database.ts`
- Modify: `src/core/schema.ts`
**Verification:**
- Run focused test commands and confirm they fail for missing behavior.
### Task 2: Implement topic hierarchy storage and query APIs
**Objective:** Add `topic_relations` schema support plus parent/child/lineage APIs.
**Files:**
- Modify: `src/core/migrations.ts`
- Modify: `src/core/identity-db.ts`
- Modify: `src/core/utils.ts`
- Modify: `src/queries/topics.ts`
- Modify: `src/types/api.ts`
- Modify: `src/types/domain.ts`
- Modify: `src/types/database.ts`
**Verification:**
- Run hierarchy-focused tests until green.
### Task 3: Implement canonical topic aliases
**Objective:** Add alias storage, alias-aware resolution, and canonical topic lookup semantics.
**Files:**
- Modify: `src/core/migrations.ts`
- Modify: `src/core/identity-db.ts`
- Modify: `src/queries/topics.ts`
- Modify: `src/core/utils.ts`
- Modify: `src/types/api.ts`
- Modify: `src/types/domain.ts`
- Modify: `src/types/database.ts`
**Verification:**
- Run alias-focused tests until green.
### Task 4: Implement embedding-backed indexing and semantic search
**Objective:** Add `EmbeddingProvider`, embedding storage, search APIs, and similarity ranking.
**Files:**
- Create: `src/embeddings/provider.ts`
- Create: `src/queries/embeddings.ts`
- Modify: `src/core/migrations.ts`
- Modify: `src/core/identity-db.ts`
- Modify: `src/core/utils.ts`
- Modify: `src/types/api.ts`
- Modify: `src/types/domain.ts`
- Modify: `src/types/database.ts`
- Modify: `src/index.ts`
- Create: `tests/semantic-search.test.ts`
**Verification:**
- Run semantic-search tests until green.
### Task 5: Add dedup-aware ingestion, docs, and full verification
**Objective:** Surface semantic dedup hints during ingestion, document the new APIs, and run the full suite.
**Files:**
- Modify: `src/ingestion/types.ts`
- Modify: `src/core/identity-db.ts`
- Modify: `README.md`
- Modify: `src/index.ts`
**Verification:**
- Run `bun run test && bun run check && bun run build`
- Update docs to reflect the new public surface.