From bf1495a4d0d136a3b6c88c79039475f769dd118d Mon Sep 17 00:00:00 2001 From: Shinwoo PARK Date: Mon, 11 May 2026 10:41:45 +0900 Subject: [PATCH] docs: add IdentityDB foundation plan --- .../plans/2026-05-11-identitydb-foundation.md | 419 ++++++++++++++++++ 1 file changed, 419 insertions(+) create mode 100644 docs/plans/2026-05-11-identitydb-foundation.md diff --git a/docs/plans/2026-05-11-identitydb-foundation.md b/docs/plans/2026-05-11-identitydb-foundation.md new file mode 100644 index 0000000..87a74cc --- /dev/null +++ b/docs/plans/2026-05-11-identitydb-foundation.md @@ -0,0 +1,419 @@ +# IdentityDB Foundation Implementation Plan + +> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior. + +**Goal:** Build the first usable version of `IdentityDB`, a TypeScript package that wraps relational databases and exposes a structured API for storing topics, facts, and their many-to-many graph relationships. + +**Architecture:** IdentityDB will use a layered architecture: a storage layer based on Kysely + dialect adapters, a domain layer for topics/facts/links, and a service layer that exposes ergonomic high-level APIs for querying and ingesting memory. Schema initialization will be automatic and idempotent. AI-assisted ingestion will be abstracted behind a pluggable extractor interface so callers can use a small LLM or a deterministic extractor without coupling the core package to a specific model provider. + +**Tech Stack:** TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup. + +--- + +## Product constraints and interpretation + +- The package must support SQLite, PostgreSQL, MySQL, and MariaDB. +- The database model must treat a single fact as a connector between multiple topics. +- Topics can represent concrete entities (`TypeScript`), abstract concepts (`programming language`), or temporal anchors (`2025`). +- Topic abstraction should be explicit in the schema so broad topics can store broad facts while specific topics store specific facts. +- The initial release should prioritize correctness, portability, and ergonomic API design over advanced search or embedding features. +- AI-assisted topic extraction should be implemented as an integration point in v1 foundation, not as a hardcoded provider-specific dependency. + +--- + +## Target repository structure + +```text +IdentityDB/ +├── src/ +│ ├── adapters/ +│ │ ├── dialect.ts +│ │ └── index.ts +│ ├── core/ +│ │ ├── errors.ts +│ │ ├── identity-db.ts +│ │ ├── migrations.ts +│ │ └── schema.ts +│ ├── ingestion/ +│ │ ├── extractor.ts +│ │ ├── naive-extractor.ts +│ │ └── types.ts +│ ├── queries/ +│ │ ├── topics.ts +│ │ └── facts.ts +│ ├── types/ +│ │ ├── api.ts +│ │ ├── domain.ts +│ │ └── database.ts +│ └── index.ts +├── tests/ +│ ├── identity-db.test.ts +│ ├── migrations.test.ts +│ ├── queries.test.ts +│ └── ingestion.test.ts +├── docs/ +│ └── plans/ +│ └── 2026-05-11-identitydb-foundation.md +├── package.json +├── tsconfig.json +├── tsup.config.ts +├── vitest.config.ts +├── .gitignore +└── README.md +``` + +--- + +## Data model proposal + +### Tables + +#### `topics` +- `id` — string UUID +- `name` — canonical display name, unique +- `normalized_name` — lowercase normalized unique key +- `category` — `entity | concept | temporal | custom` +- `granularity` — `abstract | concrete | mixed` +- `description` — nullable text +- `metadata` — JSON / JSON-text depending on dialect +- `created_at` +- `updated_at` + +#### `facts` +- `id` — string UUID +- `statement` — original fact text +- `summary` — optional normalized/clean summary +- `source` — optional source identifier +- `confidence` — nullable numeric confidence +- `metadata` — JSON / JSON-text depending on dialect +- `created_at` +- `updated_at` + +#### `fact_topics` +- `fact_id` +- `topic_id` +- `role` — optional semantic label (`subject`, `object`, `time`, etc.) +- `position` — stable order for fact-topic relationships +- composite unique key on (`fact_id`, `topic_id`, `role`) + +### Notes +- The graph is modeled through `fact_topics`; facts are the connective tissue between topics. +- No separate topic-to-topic edge table is needed in the initial version because relationships are derived from shared facts. +- JSON portability should be implemented through small helpers so SQLite stores stringified JSON while Postgres/MySQL can still use text-compatible serialization safely. + +--- + +## Public API proposal + +### Construction and lifecycle + +```ts +const db = await IdentityDB.connect({ + client: 'sqlite', + filename: ':memory:', +}); + +await db.initialize(); +await db.close(); +``` + +### Core write APIs + +```ts +await db.upsertTopic({ + name: 'TypeScript', + category: 'entity', + granularity: 'concrete', +}); + +await db.addFact({ + statement: 'I have worked with TypeScript since 2025.', + topics: [ + { name: 'I', category: 'entity', granularity: 'concrete', role: 'subject' }, + { name: 'TypeScript', category: 'entity', granularity: 'concrete', role: 'object' }, + { name: '2025', category: 'temporal', granularity: 'concrete', role: 'time' }, + ], +}); +``` + +### Query APIs + +```ts +await db.getTopicByName('TypeScript', { includeFacts: true }); +await db.getTopicFacts('TypeScript'); +await db.getTopicFactsLinkedTo('TypeScript', '2025'); +await db.listTopics(); +await db.listTopics({ includeFacts: false, limit: 100 }); +await db.findConnectedTopics('TypeScript'); +await db.findFactsConnectingTopics(['I', 'TypeScript', '2025']); +``` + +### AI-assisted ingestion API + +```ts +await db.ingestStatement('I have worked with TypeScript since 2025.', { + extractor, +}); +``` + +Where `extractor` implements: + +```ts +interface FactExtractor { + extract(input: string): Promise; +} +``` + +The package will ship a simple `NaiveExtractor` for tests/examples, while real deployments can inject an LLM-backed extractor. + +--- + +## Execution plan + +### Task 1: Scaffold package tooling and baseline configuration + +**Objective:** Create a clean TypeScript package foundation with build and test tooling. + +**Files:** +- Create: `package.json` +- Create: `tsconfig.json` +- Create: `tsup.config.ts` +- Create: `vitest.config.ts` +- Create: `.gitignore` +- Modify: `README.md` + +**Steps:** +1. Add package metadata, scripts, dependency placeholders, and ESM export configuration. +2. Add TypeScript config for library output. +3. Add tsup config for bundling ESM + type declarations. +4. Add Vitest config targeting Node. +5. Expand README with project direction and current scope. +6. Install dependencies and confirm `bun test` starts correctly. + +**Verification:** +- Run: `bun install` +- Run: `bun test` +- Expected: test runner executes successfully even if there are zero or placeholder tests. + +**Commit:** +```bash +git add package.json tsconfig.json tsup.config.ts vitest.config.ts .gitignore README.md bun.lock +git commit -m "chore: scaffold IdentityDB package tooling" +``` + +--- + +### Task 2: Define domain types and write migration tests first + +**Objective:** Lock down the domain model and schema contract before implementing migrations. + +**Files:** +- Create: `src/types/domain.ts` +- Create: `src/types/database.ts` +- Create: `src/types/api.ts` +- Create: `src/core/schema.ts` +- Create: `tests/migrations.test.ts` + +**Steps:** +1. Write tests that describe the required tables and columns after initialization. +2. Write tests for idempotent initialization (calling twice should not fail). +3. Add domain and API type definitions that match the product model. +4. Add schema description constants used by migrations. + +**Verification:** +- Run: `bun test tests/migrations.test.ts` +- Expected before implementation: FAIL because initialization does not exist yet. + +**Commit:** +```bash +git add src/types src/core/schema.ts tests/migrations.test.ts +git commit -m "test: define schema contract for topic fact graph" +``` + +--- + +### Task 3: Implement dialect adapters and automatic schema initialization + +**Objective:** Make the package connect to supported databases and create its schema automatically. + +**Files:** +- Create: `src/adapters/dialect.ts` +- Create: `src/adapters/index.ts` +- Create: `src/core/migrations.ts` +- Create: `src/core/errors.ts` +- Modify: `src/core/schema.ts` +- Modify: `tests/migrations.test.ts` + +**Steps:** +1. Implement a connection config union for SQLite/Postgres/MySQL-family. +2. Build a dialect factory returning a Kysely instance. +3. Implement `initializeSchema()` with idempotent table creation. +4. Add lightweight helpers for JSON serialization/deserialization portability. +5. Re-run migration tests until green. + +**Verification:** +- Run: `bun test tests/migrations.test.ts` +- Expected: PASS + +**Commit:** +```bash +git add src/adapters src/core tests/migrations.test.ts +git commit -m "feat: add multi-dialect schema initialization" +``` + +--- + +### Task 4: Write failing query tests for topic/fact operations + +**Objective:** Specify the behavior of the high-level memory APIs before implementation. + +**Files:** +- Create: `tests/identity-db.test.ts` +- Create: `tests/queries.test.ts` + +**Steps:** +1. Write tests for `upsertTopic` deduplication by normalized name. +2. Write tests for `addFact` linking multiple topics to one fact. +3. Write tests for `getTopicByName(..., { includeFacts: true })`. +4. Write tests for `getTopicFactsLinkedTo(topicA, topicB)`. +5. Write tests for `listTopics({ includeFacts: false })` returning topic-only records. +6. Write tests for `findConnectedTopics(name)`. + +**Verification:** +- Run: `bun test tests/identity-db.test.ts tests/queries.test.ts` +- Expected before implementation: FAIL because `IdentityDB` methods are not implemented. + +**Commit:** +```bash +git add tests/identity-db.test.ts tests/queries.test.ts +git commit -m "test: specify memory graph query APIs" +``` + +--- + +### Task 5: Implement `IdentityDB` core service and query helpers + +**Objective:** Deliver the first usable high-level API for writing and reading memory graph data. + +**Files:** +- Create: `src/core/identity-db.ts` +- Create: `src/queries/topics.ts` +- Create: `src/queries/facts.ts` +- Create: `src/index.ts` +- Modify: `src/types/api.ts` +- Modify: `tests/identity-db.test.ts` +- Modify: `tests/queries.test.ts` + +**Steps:** +1. Implement `IdentityDB.connect()` and `initialize()`. +2. Implement topic upsert with normalized key handling. +3. Implement fact insertion plus topic linking transactionally. +4. Implement topic lookup with optional fact expansion. +5. Implement topic-to-topic and multi-topic fact queries. +6. Implement topic listing and connected-topic discovery. +7. Re-run the full test suite. + +**Verification:** +- Run: `bun test` +- Expected: PASS + +**Commit:** +```bash +git add src tests +git commit -m "feat: add IdentityDB core memory graph APIs" +``` + +--- + +### Task 6: Add ingestion abstractions and a naive extractor + +**Objective:** Support automatic topic/fact ingestion through a pluggable extraction pipeline. + +**Files:** +- Create: `src/ingestion/types.ts` +- Create: `src/ingestion/extractor.ts` +- Create: `src/ingestion/naive-extractor.ts` +- Create: `tests/ingestion.test.ts` +- Modify: `src/core/identity-db.ts` +- Modify: `src/index.ts` + +**Steps:** +1. Write failing tests for `ingestStatement()` using a fake extractor. +2. Define the extraction contracts and validation rules. +3. Implement `ingestStatement()` by piping extractor output into `addFact()`. +4. Add a deterministic `NaiveExtractor` for examples/tests. +5. Add tests proving extractor-driven topic creation works. + +**Verification:** +- Run: `bun test tests/ingestion.test.ts` +- Run: `bun test` +- Expected: PASS + +**Commit:** +```bash +git add src/ingestion src/core/identity-db.ts src/index.ts tests/ingestion.test.ts +git commit -m "feat: add pluggable fact ingestion pipeline" +``` + +--- + +### Task 7: Polish package docs and publish-ready ergonomics + +**Objective:** Make the repository understandable and usable after the foundation lands. + +**Files:** +- Modify: `README.md` +- Optionally create: `docs/examples/basic-usage.md` + +**Steps:** +1. Document supported databases and the current API surface. +2. Document the topic/fact graph model with a concrete example. +3. Add example code for initialization, querying, and AI-assisted ingestion. +4. Call out current limitations and near-term roadmap. + +**Verification:** +- Manually review the README examples against actual exports. +- Run: `bun run build` +- Expected: PASS + +**Commit:** +```bash +git add README.md docs/examples/basic-usage.md +git commit -m "docs: document IdentityDB foundation usage" +``` + +--- + +## Test strategy + +- Use SQLite in-memory for the main automated tests. +- Treat PostgreSQL/MySQL/MariaDB support as adapter-compatibility in the code path, with optional future integration tests behind environment variables. +- Keep all public behavior covered through unit/integration-style tests against the public `IdentityDB` API. +- Add regression tests for normalization, many-to-many fact linking, and topic filtering by connected topic. + +--- + +## Risks and tradeoffs + +1. **Cross-dialect JSON handling** — JSON support differs between engines. The initial version should serialize metadata defensively for portability. +2. **Case normalization semantics** — topic uniqueness depends on normalization. The first version should use a simple lowercase-trim normalization and document it. +3. **Temporal topic modeling** — time can be a topic, but richer interval modeling should wait until a later phase. +4. **Abstract vs concrete topic boundaries** — this is partly editorial, so the API should store explicit `granularity` rather than trying to infer it automatically. +5. **LLM extraction variability** — extractor output can be messy. The core package should validate extractor results before writing them. + +--- + +## Out of scope for this foundation pass + +- Embeddings or semantic vector search +- Ranking/relevance algorithms +- Full-text search indices +- Topic merging/synonym resolution workflows +- Multi-user authorization / remote HTTP service layer +- Hosted API server package + +--- + +## Immediate execution target + +For the first automated execution pass, implement Tasks 1 through 7 in order, but treat SQLite-backed functionality as the required tested path and the other SQL engines as supported adapter targets in the library surface.