docs: add IdentityDB foundation plan
This commit is contained in:
419
docs/plans/2026-05-11-identitydb-foundation.md
Normal file
419
docs/plans/2026-05-11-identitydb-foundation.md
Normal file
@@ -0,0 +1,419 @@
|
||||
# IdentityDB Foundation Implementation Plan
|
||||
|
||||
> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
|
||||
|
||||
**Goal:** Build the first usable version of `IdentityDB`, a TypeScript package that wraps relational databases and exposes a structured API for storing topics, facts, and their many-to-many graph relationships.
|
||||
|
||||
**Architecture:** IdentityDB will use a layered architecture: a storage layer based on Kysely + dialect adapters, a domain layer for topics/facts/links, and a service layer that exposes ergonomic high-level APIs for querying and ingesting memory. Schema initialization will be automatic and idempotent. AI-assisted ingestion will be abstracted behind a pluggable extractor interface so callers can use a small LLM or a deterministic extractor without coupling the core package to a specific model provider.
|
||||
|
||||
**Tech Stack:** TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup.
|
||||
|
||||
---
|
||||
|
||||
## Product constraints and interpretation
|
||||
|
||||
- The package must support SQLite, PostgreSQL, MySQL, and MariaDB.
|
||||
- The database model must treat a single fact as a connector between multiple topics.
|
||||
- Topics can represent concrete entities (`TypeScript`), abstract concepts (`programming language`), or temporal anchors (`2025`).
|
||||
- Topic abstraction should be explicit in the schema so broad topics can store broad facts while specific topics store specific facts.
|
||||
- The initial release should prioritize correctness, portability, and ergonomic API design over advanced search or embedding features.
|
||||
- AI-assisted topic extraction should be implemented as an integration point in v1 foundation, not as a hardcoded provider-specific dependency.
|
||||
|
||||
---
|
||||
|
||||
## Target repository structure
|
||||
|
||||
```text
|
||||
IdentityDB/
|
||||
├── src/
|
||||
│ ├── adapters/
|
||||
│ │ ├── dialect.ts
|
||||
│ │ └── index.ts
|
||||
│ ├── core/
|
||||
│ │ ├── errors.ts
|
||||
│ │ ├── identity-db.ts
|
||||
│ │ ├── migrations.ts
|
||||
│ │ └── schema.ts
|
||||
│ ├── ingestion/
|
||||
│ │ ├── extractor.ts
|
||||
│ │ ├── naive-extractor.ts
|
||||
│ │ └── types.ts
|
||||
│ ├── queries/
|
||||
│ │ ├── topics.ts
|
||||
│ │ └── facts.ts
|
||||
│ ├── types/
|
||||
│ │ ├── api.ts
|
||||
│ │ ├── domain.ts
|
||||
│ │ └── database.ts
|
||||
│ └── index.ts
|
||||
├── tests/
|
||||
│ ├── identity-db.test.ts
|
||||
│ ├── migrations.test.ts
|
||||
│ ├── queries.test.ts
|
||||
│ └── ingestion.test.ts
|
||||
├── docs/
|
||||
│ └── plans/
|
||||
│ └── 2026-05-11-identitydb-foundation.md
|
||||
├── package.json
|
||||
├── tsconfig.json
|
||||
├── tsup.config.ts
|
||||
├── vitest.config.ts
|
||||
├── .gitignore
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data model proposal
|
||||
|
||||
### Tables
|
||||
|
||||
#### `topics`
|
||||
- `id` — string UUID
|
||||
- `name` — canonical display name, unique
|
||||
- `normalized_name` — lowercase normalized unique key
|
||||
- `category` — `entity | concept | temporal | custom`
|
||||
- `granularity` — `abstract | concrete | mixed`
|
||||
- `description` — nullable text
|
||||
- `metadata` — JSON / JSON-text depending on dialect
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
#### `facts`
|
||||
- `id` — string UUID
|
||||
- `statement` — original fact text
|
||||
- `summary` — optional normalized/clean summary
|
||||
- `source` — optional source identifier
|
||||
- `confidence` — nullable numeric confidence
|
||||
- `metadata` — JSON / JSON-text depending on dialect
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
#### `fact_topics`
|
||||
- `fact_id`
|
||||
- `topic_id`
|
||||
- `role` — optional semantic label (`subject`, `object`, `time`, etc.)
|
||||
- `position` — stable order for fact-topic relationships
|
||||
- composite unique key on (`fact_id`, `topic_id`, `role`)
|
||||
|
||||
### Notes
|
||||
- The graph is modeled through `fact_topics`; facts are the connective tissue between topics.
|
||||
- No separate topic-to-topic edge table is needed in the initial version because relationships are derived from shared facts.
|
||||
- JSON portability should be implemented through small helpers so SQLite stores stringified JSON while Postgres/MySQL can still use text-compatible serialization safely.
|
||||
|
||||
---
|
||||
|
||||
## Public API proposal
|
||||
|
||||
### Construction and lifecycle
|
||||
|
||||
```ts
|
||||
const db = await IdentityDB.connect({
|
||||
client: 'sqlite',
|
||||
filename: ':memory:',
|
||||
});
|
||||
|
||||
await db.initialize();
|
||||
await db.close();
|
||||
```
|
||||
|
||||
### Core write APIs
|
||||
|
||||
```ts
|
||||
await db.upsertTopic({
|
||||
name: 'TypeScript',
|
||||
category: 'entity',
|
||||
granularity: 'concrete',
|
||||
});
|
||||
|
||||
await db.addFact({
|
||||
statement: 'I have worked with TypeScript since 2025.',
|
||||
topics: [
|
||||
{ name: 'I', category: 'entity', granularity: 'concrete', role: 'subject' },
|
||||
{ name: 'TypeScript', category: 'entity', granularity: 'concrete', role: 'object' },
|
||||
{ name: '2025', category: 'temporal', granularity: 'concrete', role: 'time' },
|
||||
],
|
||||
});
|
||||
```
|
||||
|
||||
### Query APIs
|
||||
|
||||
```ts
|
||||
await db.getTopicByName('TypeScript', { includeFacts: true });
|
||||
await db.getTopicFacts('TypeScript');
|
||||
await db.getTopicFactsLinkedTo('TypeScript', '2025');
|
||||
await db.listTopics();
|
||||
await db.listTopics({ includeFacts: false, limit: 100 });
|
||||
await db.findConnectedTopics('TypeScript');
|
||||
await db.findFactsConnectingTopics(['I', 'TypeScript', '2025']);
|
||||
```
|
||||
|
||||
### AI-assisted ingestion API
|
||||
|
||||
```ts
|
||||
await db.ingestStatement('I have worked with TypeScript since 2025.', {
|
||||
extractor,
|
||||
});
|
||||
```
|
||||
|
||||
Where `extractor` implements:
|
||||
|
||||
```ts
|
||||
interface FactExtractor {
|
||||
extract(input: string): Promise<ExtractedFact>;
|
||||
}
|
||||
```
|
||||
|
||||
The package will ship a simple `NaiveExtractor` for tests/examples, while real deployments can inject an LLM-backed extractor.
|
||||
|
||||
---
|
||||
|
||||
## Execution plan
|
||||
|
||||
### Task 1: Scaffold package tooling and baseline configuration
|
||||
|
||||
**Objective:** Create a clean TypeScript package foundation with build and test tooling.
|
||||
|
||||
**Files:**
|
||||
- Create: `package.json`
|
||||
- Create: `tsconfig.json`
|
||||
- Create: `tsup.config.ts`
|
||||
- Create: `vitest.config.ts`
|
||||
- Create: `.gitignore`
|
||||
- Modify: `README.md`
|
||||
|
||||
**Steps:**
|
||||
1. Add package metadata, scripts, dependency placeholders, and ESM export configuration.
|
||||
2. Add TypeScript config for library output.
|
||||
3. Add tsup config for bundling ESM + type declarations.
|
||||
4. Add Vitest config targeting Node.
|
||||
5. Expand README with project direction and current scope.
|
||||
6. Install dependencies and confirm `bun test` starts correctly.
|
||||
|
||||
**Verification:**
|
||||
- Run: `bun install`
|
||||
- Run: `bun test`
|
||||
- Expected: test runner executes successfully even if there are zero or placeholder tests.
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add package.json tsconfig.json tsup.config.ts vitest.config.ts .gitignore README.md bun.lock
|
||||
git commit -m "chore: scaffold IdentityDB package tooling"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Define domain types and write migration tests first
|
||||
|
||||
**Objective:** Lock down the domain model and schema contract before implementing migrations.
|
||||
|
||||
**Files:**
|
||||
- Create: `src/types/domain.ts`
|
||||
- Create: `src/types/database.ts`
|
||||
- Create: `src/types/api.ts`
|
||||
- Create: `src/core/schema.ts`
|
||||
- Create: `tests/migrations.test.ts`
|
||||
|
||||
**Steps:**
|
||||
1. Write tests that describe the required tables and columns after initialization.
|
||||
2. Write tests for idempotent initialization (calling twice should not fail).
|
||||
3. Add domain and API type definitions that match the product model.
|
||||
4. Add schema description constants used by migrations.
|
||||
|
||||
**Verification:**
|
||||
- Run: `bun test tests/migrations.test.ts`
|
||||
- Expected before implementation: FAIL because initialization does not exist yet.
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add src/types src/core/schema.ts tests/migrations.test.ts
|
||||
git commit -m "test: define schema contract for topic fact graph"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Implement dialect adapters and automatic schema initialization
|
||||
|
||||
**Objective:** Make the package connect to supported databases and create its schema automatically.
|
||||
|
||||
**Files:**
|
||||
- Create: `src/adapters/dialect.ts`
|
||||
- Create: `src/adapters/index.ts`
|
||||
- Create: `src/core/migrations.ts`
|
||||
- Create: `src/core/errors.ts`
|
||||
- Modify: `src/core/schema.ts`
|
||||
- Modify: `tests/migrations.test.ts`
|
||||
|
||||
**Steps:**
|
||||
1. Implement a connection config union for SQLite/Postgres/MySQL-family.
|
||||
2. Build a dialect factory returning a Kysely instance.
|
||||
3. Implement `initializeSchema()` with idempotent table creation.
|
||||
4. Add lightweight helpers for JSON serialization/deserialization portability.
|
||||
5. Re-run migration tests until green.
|
||||
|
||||
**Verification:**
|
||||
- Run: `bun test tests/migrations.test.ts`
|
||||
- Expected: PASS
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add src/adapters src/core tests/migrations.test.ts
|
||||
git commit -m "feat: add multi-dialect schema initialization"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Write failing query tests for topic/fact operations
|
||||
|
||||
**Objective:** Specify the behavior of the high-level memory APIs before implementation.
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/identity-db.test.ts`
|
||||
- Create: `tests/queries.test.ts`
|
||||
|
||||
**Steps:**
|
||||
1. Write tests for `upsertTopic` deduplication by normalized name.
|
||||
2. Write tests for `addFact` linking multiple topics to one fact.
|
||||
3. Write tests for `getTopicByName(..., { includeFacts: true })`.
|
||||
4. Write tests for `getTopicFactsLinkedTo(topicA, topicB)`.
|
||||
5. Write tests for `listTopics({ includeFacts: false })` returning topic-only records.
|
||||
6. Write tests for `findConnectedTopics(name)`.
|
||||
|
||||
**Verification:**
|
||||
- Run: `bun test tests/identity-db.test.ts tests/queries.test.ts`
|
||||
- Expected before implementation: FAIL because `IdentityDB` methods are not implemented.
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add tests/identity-db.test.ts tests/queries.test.ts
|
||||
git commit -m "test: specify memory graph query APIs"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Implement `IdentityDB` core service and query helpers
|
||||
|
||||
**Objective:** Deliver the first usable high-level API for writing and reading memory graph data.
|
||||
|
||||
**Files:**
|
||||
- Create: `src/core/identity-db.ts`
|
||||
- Create: `src/queries/topics.ts`
|
||||
- Create: `src/queries/facts.ts`
|
||||
- Create: `src/index.ts`
|
||||
- Modify: `src/types/api.ts`
|
||||
- Modify: `tests/identity-db.test.ts`
|
||||
- Modify: `tests/queries.test.ts`
|
||||
|
||||
**Steps:**
|
||||
1. Implement `IdentityDB.connect()` and `initialize()`.
|
||||
2. Implement topic upsert with normalized key handling.
|
||||
3. Implement fact insertion plus topic linking transactionally.
|
||||
4. Implement topic lookup with optional fact expansion.
|
||||
5. Implement topic-to-topic and multi-topic fact queries.
|
||||
6. Implement topic listing and connected-topic discovery.
|
||||
7. Re-run the full test suite.
|
||||
|
||||
**Verification:**
|
||||
- Run: `bun test`
|
||||
- Expected: PASS
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add src tests
|
||||
git commit -m "feat: add IdentityDB core memory graph APIs"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Add ingestion abstractions and a naive extractor
|
||||
|
||||
**Objective:** Support automatic topic/fact ingestion through a pluggable extraction pipeline.
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ingestion/types.ts`
|
||||
- Create: `src/ingestion/extractor.ts`
|
||||
- Create: `src/ingestion/naive-extractor.ts`
|
||||
- Create: `tests/ingestion.test.ts`
|
||||
- Modify: `src/core/identity-db.ts`
|
||||
- Modify: `src/index.ts`
|
||||
|
||||
**Steps:**
|
||||
1. Write failing tests for `ingestStatement()` using a fake extractor.
|
||||
2. Define the extraction contracts and validation rules.
|
||||
3. Implement `ingestStatement()` by piping extractor output into `addFact()`.
|
||||
4. Add a deterministic `NaiveExtractor` for examples/tests.
|
||||
5. Add tests proving extractor-driven topic creation works.
|
||||
|
||||
**Verification:**
|
||||
- Run: `bun test tests/ingestion.test.ts`
|
||||
- Run: `bun test`
|
||||
- Expected: PASS
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add src/ingestion src/core/identity-db.ts src/index.ts tests/ingestion.test.ts
|
||||
git commit -m "feat: add pluggable fact ingestion pipeline"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 7: Polish package docs and publish-ready ergonomics
|
||||
|
||||
**Objective:** Make the repository understandable and usable after the foundation lands.
|
||||
|
||||
**Files:**
|
||||
- Modify: `README.md`
|
||||
- Optionally create: `docs/examples/basic-usage.md`
|
||||
|
||||
**Steps:**
|
||||
1. Document supported databases and the current API surface.
|
||||
2. Document the topic/fact graph model with a concrete example.
|
||||
3. Add example code for initialization, querying, and AI-assisted ingestion.
|
||||
4. Call out current limitations and near-term roadmap.
|
||||
|
||||
**Verification:**
|
||||
- Manually review the README examples against actual exports.
|
||||
- Run: `bun run build`
|
||||
- Expected: PASS
|
||||
|
||||
**Commit:**
|
||||
```bash
|
||||
git add README.md docs/examples/basic-usage.md
|
||||
git commit -m "docs: document IdentityDB foundation usage"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test strategy
|
||||
|
||||
- Use SQLite in-memory for the main automated tests.
|
||||
- Treat PostgreSQL/MySQL/MariaDB support as adapter-compatibility in the code path, with optional future integration tests behind environment variables.
|
||||
- Keep all public behavior covered through unit/integration-style tests against the public `IdentityDB` API.
|
||||
- Add regression tests for normalization, many-to-many fact linking, and topic filtering by connected topic.
|
||||
|
||||
---
|
||||
|
||||
## Risks and tradeoffs
|
||||
|
||||
1. **Cross-dialect JSON handling** — JSON support differs between engines. The initial version should serialize metadata defensively for portability.
|
||||
2. **Case normalization semantics** — topic uniqueness depends on normalization. The first version should use a simple lowercase-trim normalization and document it.
|
||||
3. **Temporal topic modeling** — time can be a topic, but richer interval modeling should wait until a later phase.
|
||||
4. **Abstract vs concrete topic boundaries** — this is partly editorial, so the API should store explicit `granularity` rather than trying to infer it automatically.
|
||||
5. **LLM extraction variability** — extractor output can be messy. The core package should validate extractor results before writing them.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope for this foundation pass
|
||||
|
||||
- Embeddings or semantic vector search
|
||||
- Ranking/relevance algorithms
|
||||
- Full-text search indices
|
||||
- Topic merging/synonym resolution workflows
|
||||
- Multi-user authorization / remote HTTP service layer
|
||||
- Hosted API server package
|
||||
|
||||
---
|
||||
|
||||
## Immediate execution target
|
||||
|
||||
For the first automated execution pass, implement Tasks 1 through 7 in order, but treat SQLite-backed functionality as the required tested path and the other SQL engines as supported adapter targets in the library surface.
|
||||
Reference in New Issue
Block a user