diff --git a/docs/plans/2026-05-11-identitydb-memory-expansion.md b/docs/plans/2026-05-11-identitydb-memory-expansion.md new file mode 100644 index 0000000..9f7a0d7 --- /dev/null +++ b/docs/plans/2026-05-11-identitydb-memory-expansion.md @@ -0,0 +1,181 @@ +# IdentityDB Memory Expansion Implementation Plan + +> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior. + +**Goal:** Extend IdentityDB with explicit topic hierarchy, topic alias/canonicalization controls, and portable semantic fact search with embedding-backed similarity APIs. + +**Architecture:** Keep the relational core portable across SQLite, PostgreSQL, MySQL, and MariaDB by introducing dedicated extension tables: `topic_relations` for abstract/concrete hierarchy, `topic_aliases` for canonical topic resolution, and `fact_embeddings` for semantic indexing. Expose high-level APIs from `IdentityDB` while preserving DB-agnostic behavior by doing semantic scoring in the application layer first. + +**Tech Stack:** TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup. + +--- + +## Scope and interpretation + +- Topic hierarchy must be explicit rather than inferred only from shared facts. +- Canonical topics must remain first-class records in `topics`; aliases should resolve into those topics without duplicating canonical rows. +- Semantic search must stay provider-agnostic through a pluggable `EmbeddingProvider` interface. +- The first semantic-search release should favor portability and deterministic testing over ANN/vector-extension optimization. +- Ingestion should be able to detect likely duplicate facts by semantic similarity without forcing automatic merges. + +--- + +## Data model additions + +### `topic_relations` +- `parent_topic_id` +- `child_topic_id` +- `relation` — initially `parent_of` +- `created_at` +- composite primary key on (`parent_topic_id`, `child_topic_id`, `relation`) + +### `topic_aliases` +- `id` +- `topic_id` +- `alias` +- `normalized_alias` +- `is_primary` +- `created_at` +- `updated_at` +- unique key on `normalized_alias` + +### `fact_embeddings` +- `fact_id` +- `model` +- `dimensions` +- `embedding` +- `content_hash` +- `created_at` +- `updated_at` +- composite primary key on (`fact_id`, `model`) + +--- + +## Public API additions + +### Topic hierarchy + +```ts +await db.linkTopics({ + parentName: 'programming language', + childName: 'TypeScript', +}); + +await db.getTopicChildren('programming language'); +await db.getTopicParents('TypeScript'); +await db.getTopicLineage('TypeScript'); +``` + +### Topic aliases + +```ts +await db.addTopicAlias('TypeScript', 'TS'); +await db.resolveTopic('ts'); +await db.getTopicAliases('TypeScript'); +``` + +### Semantic indexing and search + +```ts +await db.indexFactEmbeddings({ provider }); +await db.searchFacts({ query: 'When did I start using TS?', provider, limit: 5 }); +await db.findSimilarFacts({ statement: 'I started using TypeScript in 2025.', provider, threshold: 0.9 }); +``` + +### Dedup-aware ingestion + +```ts +await db.ingestStatement(statement, { + extractor, + dedup: { + provider, + threshold: 0.9, + }, +}); +``` + +--- + +## Execution plan + +### Task 1: Lock the extension schema and APIs with failing tests + +**Objective:** Define tests for hierarchy, aliases, and semantic search before production code changes. + +**Files:** +- Modify: `tests/migrations.test.ts` +- Modify: `tests/identity-db.test.ts` +- Modify: `tests/queries.test.ts` +- Create: `tests/semantic-search.test.ts` +- Modify: `src/types/api.ts` +- Modify: `src/types/domain.ts` +- Modify: `src/types/database.ts` +- Modify: `src/core/schema.ts` + +**Verification:** +- Run focused test commands and confirm they fail for missing behavior. + +### Task 2: Implement topic hierarchy storage and query APIs + +**Objective:** Add `topic_relations` schema support plus parent/child/lineage APIs. + +**Files:** +- Modify: `src/core/migrations.ts` +- Modify: `src/core/identity-db.ts` +- Modify: `src/core/utils.ts` +- Modify: `src/queries/topics.ts` +- Modify: `src/types/api.ts` +- Modify: `src/types/domain.ts` +- Modify: `src/types/database.ts` + +**Verification:** +- Run hierarchy-focused tests until green. + +### Task 3: Implement canonical topic aliases + +**Objective:** Add alias storage, alias-aware resolution, and canonical topic lookup semantics. + +**Files:** +- Modify: `src/core/migrations.ts` +- Modify: `src/core/identity-db.ts` +- Modify: `src/queries/topics.ts` +- Modify: `src/core/utils.ts` +- Modify: `src/types/api.ts` +- Modify: `src/types/domain.ts` +- Modify: `src/types/database.ts` + +**Verification:** +- Run alias-focused tests until green. + +### Task 4: Implement embedding-backed indexing and semantic search + +**Objective:** Add `EmbeddingProvider`, embedding storage, search APIs, and similarity ranking. + +**Files:** +- Create: `src/embeddings/provider.ts` +- Create: `src/queries/embeddings.ts` +- Modify: `src/core/migrations.ts` +- Modify: `src/core/identity-db.ts` +- Modify: `src/core/utils.ts` +- Modify: `src/types/api.ts` +- Modify: `src/types/domain.ts` +- Modify: `src/types/database.ts` +- Modify: `src/index.ts` +- Create: `tests/semantic-search.test.ts` + +**Verification:** +- Run semantic-search tests until green. + +### Task 5: Add dedup-aware ingestion, docs, and full verification + +**Objective:** Surface semantic dedup hints during ingestion, document the new APIs, and run the full suite. + +**Files:** +- Modify: `src/ingestion/types.ts` +- Modify: `src/core/identity-db.ts` +- Modify: `README.md` +- Modify: `src/index.ts` + +**Verification:** +- Run `bun run test && bun run check && bun run build` +- Update docs to reflect the new public surface.