docs: add IdentityDB memory expansion plan
This commit is contained in:
181
docs/plans/2026-05-11-identitydb-memory-expansion.md
Normal file
181
docs/plans/2026-05-11-identitydb-memory-expansion.md
Normal file
@@ -0,0 +1,181 @@
|
|||||||
|
# IdentityDB Memory Expansion Implementation Plan
|
||||||
|
|
||||||
|
> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
|
||||||
|
|
||||||
|
**Goal:** Extend IdentityDB with explicit topic hierarchy, topic alias/canonicalization controls, and portable semantic fact search with embedding-backed similarity APIs.
|
||||||
|
|
||||||
|
**Architecture:** Keep the relational core portable across SQLite, PostgreSQL, MySQL, and MariaDB by introducing dedicated extension tables: `topic_relations` for abstract/concrete hierarchy, `topic_aliases` for canonical topic resolution, and `fact_embeddings` for semantic indexing. Expose high-level APIs from `IdentityDB` while preserving DB-agnostic behavior by doing semantic scoring in the application layer first.
|
||||||
|
|
||||||
|
**Tech Stack:** TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope and interpretation
|
||||||
|
|
||||||
|
- Topic hierarchy must be explicit rather than inferred only from shared facts.
|
||||||
|
- Canonical topics must remain first-class records in `topics`; aliases should resolve into those topics without duplicating canonical rows.
|
||||||
|
- Semantic search must stay provider-agnostic through a pluggable `EmbeddingProvider` interface.
|
||||||
|
- The first semantic-search release should favor portability and deterministic testing over ANN/vector-extension optimization.
|
||||||
|
- Ingestion should be able to detect likely duplicate facts by semantic similarity without forcing automatic merges.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data model additions
|
||||||
|
|
||||||
|
### `topic_relations`
|
||||||
|
- `parent_topic_id`
|
||||||
|
- `child_topic_id`
|
||||||
|
- `relation` — initially `parent_of`
|
||||||
|
- `created_at`
|
||||||
|
- composite primary key on (`parent_topic_id`, `child_topic_id`, `relation`)
|
||||||
|
|
||||||
|
### `topic_aliases`
|
||||||
|
- `id`
|
||||||
|
- `topic_id`
|
||||||
|
- `alias`
|
||||||
|
- `normalized_alias`
|
||||||
|
- `is_primary`
|
||||||
|
- `created_at`
|
||||||
|
- `updated_at`
|
||||||
|
- unique key on `normalized_alias`
|
||||||
|
|
||||||
|
### `fact_embeddings`
|
||||||
|
- `fact_id`
|
||||||
|
- `model`
|
||||||
|
- `dimensions`
|
||||||
|
- `embedding`
|
||||||
|
- `content_hash`
|
||||||
|
- `created_at`
|
||||||
|
- `updated_at`
|
||||||
|
- composite primary key on (`fact_id`, `model`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Public API additions
|
||||||
|
|
||||||
|
### Topic hierarchy
|
||||||
|
|
||||||
|
```ts
|
||||||
|
await db.linkTopics({
|
||||||
|
parentName: 'programming language',
|
||||||
|
childName: 'TypeScript',
|
||||||
|
});
|
||||||
|
|
||||||
|
await db.getTopicChildren('programming language');
|
||||||
|
await db.getTopicParents('TypeScript');
|
||||||
|
await db.getTopicLineage('TypeScript');
|
||||||
|
```
|
||||||
|
|
||||||
|
### Topic aliases
|
||||||
|
|
||||||
|
```ts
|
||||||
|
await db.addTopicAlias('TypeScript', 'TS');
|
||||||
|
await db.resolveTopic('ts');
|
||||||
|
await db.getTopicAliases('TypeScript');
|
||||||
|
```
|
||||||
|
|
||||||
|
### Semantic indexing and search
|
||||||
|
|
||||||
|
```ts
|
||||||
|
await db.indexFactEmbeddings({ provider });
|
||||||
|
await db.searchFacts({ query: 'When did I start using TS?', provider, limit: 5 });
|
||||||
|
await db.findSimilarFacts({ statement: 'I started using TypeScript in 2025.', provider, threshold: 0.9 });
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dedup-aware ingestion
|
||||||
|
|
||||||
|
```ts
|
||||||
|
await db.ingestStatement(statement, {
|
||||||
|
extractor,
|
||||||
|
dedup: {
|
||||||
|
provider,
|
||||||
|
threshold: 0.9,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution plan
|
||||||
|
|
||||||
|
### Task 1: Lock the extension schema and APIs with failing tests
|
||||||
|
|
||||||
|
**Objective:** Define tests for hierarchy, aliases, and semantic search before production code changes.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `tests/migrations.test.ts`
|
||||||
|
- Modify: `tests/identity-db.test.ts`
|
||||||
|
- Modify: `tests/queries.test.ts`
|
||||||
|
- Create: `tests/semantic-search.test.ts`
|
||||||
|
- Modify: `src/types/api.ts`
|
||||||
|
- Modify: `src/types/domain.ts`
|
||||||
|
- Modify: `src/types/database.ts`
|
||||||
|
- Modify: `src/core/schema.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run focused test commands and confirm they fail for missing behavior.
|
||||||
|
|
||||||
|
### Task 2: Implement topic hierarchy storage and query APIs
|
||||||
|
|
||||||
|
**Objective:** Add `topic_relations` schema support plus parent/child/lineage APIs.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/core/migrations.ts`
|
||||||
|
- Modify: `src/core/identity-db.ts`
|
||||||
|
- Modify: `src/core/utils.ts`
|
||||||
|
- Modify: `src/queries/topics.ts`
|
||||||
|
- Modify: `src/types/api.ts`
|
||||||
|
- Modify: `src/types/domain.ts`
|
||||||
|
- Modify: `src/types/database.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run hierarchy-focused tests until green.
|
||||||
|
|
||||||
|
### Task 3: Implement canonical topic aliases
|
||||||
|
|
||||||
|
**Objective:** Add alias storage, alias-aware resolution, and canonical topic lookup semantics.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/core/migrations.ts`
|
||||||
|
- Modify: `src/core/identity-db.ts`
|
||||||
|
- Modify: `src/queries/topics.ts`
|
||||||
|
- Modify: `src/core/utils.ts`
|
||||||
|
- Modify: `src/types/api.ts`
|
||||||
|
- Modify: `src/types/domain.ts`
|
||||||
|
- Modify: `src/types/database.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run alias-focused tests until green.
|
||||||
|
|
||||||
|
### Task 4: Implement embedding-backed indexing and semantic search
|
||||||
|
|
||||||
|
**Objective:** Add `EmbeddingProvider`, embedding storage, search APIs, and similarity ranking.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/embeddings/provider.ts`
|
||||||
|
- Create: `src/queries/embeddings.ts`
|
||||||
|
- Modify: `src/core/migrations.ts`
|
||||||
|
- Modify: `src/core/identity-db.ts`
|
||||||
|
- Modify: `src/core/utils.ts`
|
||||||
|
- Modify: `src/types/api.ts`
|
||||||
|
- Modify: `src/types/domain.ts`
|
||||||
|
- Modify: `src/types/database.ts`
|
||||||
|
- Modify: `src/index.ts`
|
||||||
|
- Create: `tests/semantic-search.test.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run semantic-search tests until green.
|
||||||
|
|
||||||
|
### Task 5: Add dedup-aware ingestion, docs, and full verification
|
||||||
|
|
||||||
|
**Objective:** Surface semantic dedup hints during ingestion, document the new APIs, and run the full suite.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `src/ingestion/types.ts`
|
||||||
|
- Modify: `src/core/identity-db.ts`
|
||||||
|
- Modify: `README.md`
|
||||||
|
- Modify: `src/index.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run `bun run test && bun run check && bun run build`
|
||||||
|
- Update docs to reflect the new public surface.
|
||||||
Reference in New Issue
Block a user