docs: add IdentityDB foundation plan

2026-05-11 10:41:45 +09:00
parent cfd5356e53
commit bf1495a4d0
1 changed files with 419 additions and 0 deletions
--- a/docs/plans/2026-05-11-identitydb-foundation.md
+++ b/docs/plans/2026-05-11-identitydb-foundation.md
@@ -0,0 +1,419 @@
+# IdentityDB Foundation Implementation Plan
+
+> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
+
+**Goal:** Build the first usable version of `IdentityDB`, a TypeScript package that wraps relational databases and exposes a structured API for storing topics, facts, and their many-to-many graph relationships.
+
+**Architecture:** IdentityDB will use a layered architecture: a storage layer based on Kysely + dialect adapters, a domain layer for topics/facts/links, and a service layer that exposes ergonomic high-level APIs for querying and ingesting memory. Schema initialization will be automatic and idempotent. AI-assisted ingestion will be abstracted behind a pluggable extractor interface so callers can use a small LLM or a deterministic extractor without coupling the core package to a specific model provider.
+
+**Tech Stack:** TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup.
+
+---
+
+## Product constraints and interpretation
+
+- The package must support SQLite, PostgreSQL, MySQL, and MariaDB.
+- The database model must treat a single fact as a connector between multiple topics.
+- Topics can represent concrete entities (`TypeScript`), abstract concepts (`programming language`), or temporal anchors (`2025`).
+- Topic abstraction should be explicit in the schema so broad topics can store broad facts while specific topics store specific facts.
+- The initial release should prioritize correctness, portability, and ergonomic API design over advanced search or embedding features.
+- AI-assisted topic extraction should be implemented as an integration point in v1 foundation, not as a hardcoded provider-specific dependency.
+
+---
+
+## Target repository structure
+
+```text
+IdentityDB/
+├── src/
+│   ├── adapters/
+│   │   ├── dialect.ts
+│   │   └── index.ts
+│   ├── core/
+│   │   ├── errors.ts
+│   │   ├── identity-db.ts
+│   │   ├── migrations.ts
+│   │   └── schema.ts
+│   ├── ingestion/
+│   │   ├── extractor.ts
+│   │   ├── naive-extractor.ts
+│   │   └── types.ts
+│   ├── queries/
+│   │   ├── topics.ts
+│   │   └── facts.ts
+│   ├── types/
+│   │   ├── api.ts
+│   │   ├── domain.ts
+│   │   └── database.ts
+│   └── index.ts
+├── tests/
+│   ├── identity-db.test.ts
+│   ├── migrations.test.ts
+│   ├── queries.test.ts
+│   └── ingestion.test.ts
+├── docs/
+│   └── plans/
+│       └── 2026-05-11-identitydb-foundation.md
+├── package.json
+├── tsconfig.json
+├── tsup.config.ts
+├── vitest.config.ts
+├── .gitignore
+└── README.md
+```
+
+---
+
+## Data model proposal
+
+### Tables
+
+#### `topics`
+- `id` — string UUID
+- `name` — canonical display name, unique
+- `normalized_name` — lowercase normalized unique key
+- `category` — `entity | concept | temporal | custom`
+- `granularity` — `abstract | concrete | mixed`
+- `description` — nullable text
+- `metadata` — JSON / JSON-text depending on dialect
+- `created_at`
+- `updated_at`
+
+#### `facts`
+- `id` — string UUID
+- `statement` — original fact text
+- `summary` — optional normalized/clean summary
+- `source` — optional source identifier
+- `confidence` — nullable numeric confidence
+- `metadata` — JSON / JSON-text depending on dialect
+- `created_at`
+- `updated_at`
+
+#### `fact_topics`
+- `fact_id`
+- `topic_id`
+- `role` — optional semantic label (`subject`, `object`, `time`, etc.)
+- `position` — stable order for fact-topic relationships
+- composite unique key on (`fact_id`, `topic_id`, `role`)
+
+### Notes
+- The graph is modeled through `fact_topics`; facts are the connective tissue between topics.
+- No separate topic-to-topic edge table is needed in the initial version because relationships are derived from shared facts.
+- JSON portability should be implemented through small helpers so SQLite stores stringified JSON while Postgres/MySQL can still use text-compatible serialization safely.
+
+---
+
+## Public API proposal
+
+### Construction and lifecycle
+
+```ts
+const db = await IdentityDB.connect({
+  client: 'sqlite',
+  filename: ':memory:',
+});
+
+await db.initialize();
+await db.close();
+```
+
+### Core write APIs
+
+```ts
+await db.upsertTopic({
+  name: 'TypeScript',
+  category: 'entity',
+  granularity: 'concrete',
+});
+
+await db.addFact({
+  statement: 'I have worked with TypeScript since 2025.',
+  topics: [
+    { name: 'I', category: 'entity', granularity: 'concrete', role: 'subject' },
+    { name: 'TypeScript', category: 'entity', granularity: 'concrete', role: 'object' },
+    { name: '2025', category: 'temporal', granularity: 'concrete', role: 'time' },
+  ],
+});
+```
+
+### Query APIs
+
+```ts
+await db.getTopicByName('TypeScript', { includeFacts: true });
+await db.getTopicFacts('TypeScript');
+await db.getTopicFactsLinkedTo('TypeScript', '2025');
+await db.listTopics();
+await db.listTopics({ includeFacts: false, limit: 100 });
+await db.findConnectedTopics('TypeScript');
+await db.findFactsConnectingTopics(['I', 'TypeScript', '2025']);
+```
+
+### AI-assisted ingestion API
+
+```ts
+await db.ingestStatement('I have worked with TypeScript since 2025.', {
+  extractor,
+});
+```
+
+Where `extractor` implements:
+
+```ts
+interface FactExtractor {
+  extract(input: string): Promise<ExtractedFact>;
+}
+```
+
+The package will ship a simple `NaiveExtractor` for tests/examples, while real deployments can inject an LLM-backed extractor.
+
+---
+
+## Execution plan
+
+### Task 1: Scaffold package tooling and baseline configuration
+
+**Objective:** Create a clean TypeScript package foundation with build and test tooling.
+
+**Files:**
+- Create: `package.json`
+- Create: `tsconfig.json`
+- Create: `tsup.config.ts`
+- Create: `vitest.config.ts`
+- Create: `.gitignore`
+- Modify: `README.md`
+
+**Steps:**
+1. Add package metadata, scripts, dependency placeholders, and ESM export configuration.
+2. Add TypeScript config for library output.
+3. Add tsup config for bundling ESM + type declarations.
+4. Add Vitest config targeting Node.
+5. Expand README with project direction and current scope.
+6. Install dependencies and confirm `bun test` starts correctly.
+
+**Verification:**
+- Run: `bun install`
+- Run: `bun test`
+- Expected: test runner executes successfully even if there are zero or placeholder tests.
+
+**Commit:**
+```bash
+git add package.json tsconfig.json tsup.config.ts vitest.config.ts .gitignore README.md bun.lock
+git commit -m "chore: scaffold IdentityDB package tooling"
+```
+
+---
+
+### Task 2: Define domain types and write migration tests first
+
+**Objective:** Lock down the domain model and schema contract before implementing migrations.
+
+**Files:**
+- Create: `src/types/domain.ts`
+- Create: `src/types/database.ts`
+- Create: `src/types/api.ts`
+- Create: `src/core/schema.ts`
+- Create: `tests/migrations.test.ts`
+
+**Steps:**
+1. Write tests that describe the required tables and columns after initialization.
+2. Write tests for idempotent initialization (calling twice should not fail).
+3. Add domain and API type definitions that match the product model.
+4. Add schema description constants used by migrations.
+
+**Verification:**
+- Run: `bun test tests/migrations.test.ts`
+- Expected before implementation: FAIL because initialization does not exist yet.
+
+**Commit:**
+```bash
+git add src/types src/core/schema.ts tests/migrations.test.ts
+git commit -m "test: define schema contract for topic fact graph"
+```
+
+---
+
+### Task 3: Implement dialect adapters and automatic schema initialization
+
+**Objective:** Make the package connect to supported databases and create its schema automatically.
+
+**Files:**
+- Create: `src/adapters/dialect.ts`
+- Create: `src/adapters/index.ts`
+- Create: `src/core/migrations.ts`
+- Create: `src/core/errors.ts`
+- Modify: `src/core/schema.ts`
+- Modify: `tests/migrations.test.ts`
+
+**Steps:**
+1. Implement a connection config union for SQLite/Postgres/MySQL-family.
+2. Build a dialect factory returning a Kysely instance.
+3. Implement `initializeSchema()` with idempotent table creation.
+4. Add lightweight helpers for JSON serialization/deserialization portability.
+5. Re-run migration tests until green.
+
+**Verification:**
+- Run: `bun test tests/migrations.test.ts`
+- Expected: PASS
+
+**Commit:**
+```bash
+git add src/adapters src/core tests/migrations.test.ts
+git commit -m "feat: add multi-dialect schema initialization"
+```
+
+---
+
+### Task 4: Write failing query tests for topic/fact operations
+
+**Objective:** Specify the behavior of the high-level memory APIs before implementation.
+
+**Files:**
+- Create: `tests/identity-db.test.ts`
+- Create: `tests/queries.test.ts`
+
+**Steps:**
+1. Write tests for `upsertTopic` deduplication by normalized name.
+2. Write tests for `addFact` linking multiple topics to one fact.
+3. Write tests for `getTopicByName(..., { includeFacts: true })`.
+4. Write tests for `getTopicFactsLinkedTo(topicA, topicB)`.
+5. Write tests for `listTopics({ includeFacts: false })` returning topic-only records.
+6. Write tests for `findConnectedTopics(name)`.
+
+**Verification:**
+- Run: `bun test tests/identity-db.test.ts tests/queries.test.ts`
+- Expected before implementation: FAIL because `IdentityDB` methods are not implemented.
+
+**Commit:**
+```bash
+git add tests/identity-db.test.ts tests/queries.test.ts
+git commit -m "test: specify memory graph query APIs"
+```
+
+---
+
+### Task 5: Implement `IdentityDB` core service and query helpers
+
+**Objective:** Deliver the first usable high-level API for writing and reading memory graph data.
+
+**Files:**
+- Create: `src/core/identity-db.ts`
+- Create: `src/queries/topics.ts`
+- Create: `src/queries/facts.ts`
+- Create: `src/index.ts`
+- Modify: `src/types/api.ts`
+- Modify: `tests/identity-db.test.ts`
+- Modify: `tests/queries.test.ts`
+
+**Steps:**
+1. Implement `IdentityDB.connect()` and `initialize()`.
+2. Implement topic upsert with normalized key handling.
+3. Implement fact insertion plus topic linking transactionally.
+4. Implement topic lookup with optional fact expansion.
+5. Implement topic-to-topic and multi-topic fact queries.
+6. Implement topic listing and connected-topic discovery.
+7. Re-run the full test suite.
+
+**Verification:**
+- Run: `bun test`
+- Expected: PASS
+
+**Commit:**
+```bash
+git add src tests
+git commit -m "feat: add IdentityDB core memory graph APIs"
+```
+
+---
+
+### Task 6: Add ingestion abstractions and a naive extractor
+
+**Objective:** Support automatic topic/fact ingestion through a pluggable extraction pipeline.
+
+**Files:**
+- Create: `src/ingestion/types.ts`
+- Create: `src/ingestion/extractor.ts`
+- Create: `src/ingestion/naive-extractor.ts`
+- Create: `tests/ingestion.test.ts`
+- Modify: `src/core/identity-db.ts`
+- Modify: `src/index.ts`
+
+**Steps:**
+1. Write failing tests for `ingestStatement()` using a fake extractor.
+2. Define the extraction contracts and validation rules.
+3. Implement `ingestStatement()` by piping extractor output into `addFact()`.
+4. Add a deterministic `NaiveExtractor` for examples/tests.
+5. Add tests proving extractor-driven topic creation works.
+
+**Verification:**
+- Run: `bun test tests/ingestion.test.ts`
+- Run: `bun test`
+- Expected: PASS
+
+**Commit:**
+```bash
+git add src/ingestion src/core/identity-db.ts src/index.ts tests/ingestion.test.ts
+git commit -m "feat: add pluggable fact ingestion pipeline"
+```
+
+---
+
+### Task 7: Polish package docs and publish-ready ergonomics
+
+**Objective:** Make the repository understandable and usable after the foundation lands.
+
+**Files:**
+- Modify: `README.md`
+- Optionally create: `docs/examples/basic-usage.md`
+
+**Steps:**
+1. Document supported databases and the current API surface.
+2. Document the topic/fact graph model with a concrete example.
+3. Add example code for initialization, querying, and AI-assisted ingestion.
+4. Call out current limitations and near-term roadmap.
+
+**Verification:**
+- Manually review the README examples against actual exports.
+- Run: `bun run build`
+- Expected: PASS
+
+**Commit:**
+```bash
+git add README.md docs/examples/basic-usage.md
+git commit -m "docs: document IdentityDB foundation usage"
+```
+
+---
+
+## Test strategy
+
+- Use SQLite in-memory for the main automated tests.
+- Treat PostgreSQL/MySQL/MariaDB support as adapter-compatibility in the code path, with optional future integration tests behind environment variables.
+- Keep all public behavior covered through unit/integration-style tests against the public `IdentityDB` API.
+- Add regression tests for normalization, many-to-many fact linking, and topic filtering by connected topic.
+
+---
+
+## Risks and tradeoffs
+
+1. **Cross-dialect JSON handling** — JSON support differs between engines. The initial version should serialize metadata defensively for portability.
+2. **Case normalization semantics** — topic uniqueness depends on normalization. The first version should use a simple lowercase-trim normalization and document it.
+3. **Temporal topic modeling** — time can be a topic, but richer interval modeling should wait until a later phase.
+4. **Abstract vs concrete topic boundaries** — this is partly editorial, so the API should store explicit `granularity` rather than trying to infer it automatically.
+5. **LLM extraction variability** — extractor output can be messy. The core package should validate extractor results before writing them.
+
+---
+
+## Out of scope for this foundation pass
+
+- Embeddings or semantic vector search
+- Ranking/relevance algorithms
+- Full-text search indices
+- Topic merging/synonym resolution workflows
+- Multi-user authorization / remote HTTP service layer
+- Hosted API server package
+
+---
+
+## Immediate execution target
+
+For the first automated execution pass, implement Tasks 1 through 7 in order, but treat SQLite-backed functionality as the required tested path and the other SQL engines as supported adapter targets in the library surface.