Files
IdentityDB/docs/plans/2026-05-11-identitydb-foundation.md

14 KiB

IdentityDB Foundation Implementation Plan

For Hermes: Use the subagent-driven-development skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.

Goal: Build the first usable version of IdentityDB, a TypeScript package that wraps relational databases and exposes a structured API for storing topics, facts, and their many-to-many graph relationships.

Architecture: IdentityDB will use a layered architecture: a storage layer based on Kysely + dialect adapters, a domain layer for topics/facts/links, and a service layer that exposes ergonomic high-level APIs for querying and ingesting memory. Schema initialization will be automatic and idempotent. AI-assisted ingestion will be abstracted behind a pluggable extractor interface so callers can use a small LLM or a deterministic extractor without coupling the core package to a specific model provider.

Tech Stack: TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup.


Product constraints and interpretation

  • The package must support SQLite, PostgreSQL, MySQL, and MariaDB.
  • The database model must treat a single fact as a connector between multiple topics.
  • Topics can represent concrete entities (TypeScript), abstract concepts (programming language), or temporal anchors (2025).
  • Topic abstraction should be explicit in the schema so broad topics can store broad facts while specific topics store specific facts.
  • The initial release should prioritize correctness, portability, and ergonomic API design over advanced search or embedding features.
  • AI-assisted topic extraction should be implemented as an integration point in v1 foundation, not as a hardcoded provider-specific dependency.

Target repository structure

IdentityDB/
├── src/
│   ├── adapters/
│   │   ├── dialect.ts
│   │   └── index.ts
│   ├── core/
│   │   ├── errors.ts
│   │   ├── identity-db.ts
│   │   ├── migrations.ts
│   │   └── schema.ts
│   ├── ingestion/
│   │   ├── extractor.ts
│   │   ├── naive-extractor.ts
│   │   └── types.ts
│   ├── queries/
│   │   ├── topics.ts
│   │   └── facts.ts
│   ├── types/
│   │   ├── api.ts
│   │   ├── domain.ts
│   │   └── database.ts
│   └── index.ts
├── tests/
│   ├── identity-db.test.ts
│   ├── migrations.test.ts
│   ├── queries.test.ts
│   └── ingestion.test.ts
├── docs/
│   └── plans/
│       └── 2026-05-11-identitydb-foundation.md
├── package.json
├── tsconfig.json
├── tsup.config.ts
├── vitest.config.ts
├── .gitignore
└── README.md

Data model proposal

Tables

topics

  • id — string UUID
  • name — canonical display name, unique
  • normalized_name — lowercase normalized unique key
  • categoryentity | concept | temporal | custom
  • granularityabstract | concrete | mixed
  • description — nullable text
  • metadata — JSON / JSON-text depending on dialect
  • created_at
  • updated_at

facts

  • id — string UUID
  • statement — original fact text
  • summary — optional normalized/clean summary
  • source — optional source identifier
  • confidence — nullable numeric confidence
  • metadata — JSON / JSON-text depending on dialect
  • created_at
  • updated_at

fact_topics

  • fact_id
  • topic_id
  • role — optional semantic label (subject, object, time, etc.)
  • position — stable order for fact-topic relationships
  • composite unique key on (fact_id, topic_id, role)

Notes

  • The graph is modeled through fact_topics; facts are the connective tissue between topics.
  • No separate topic-to-topic edge table is needed in the initial version because relationships are derived from shared facts.
  • JSON portability should be implemented through small helpers so SQLite stores stringified JSON while Postgres/MySQL can still use text-compatible serialization safely.

Public API proposal

Construction and lifecycle

const db = await IdentityDB.connect({
  client: 'sqlite',
  filename: ':memory:',
});

await db.initialize();
await db.close();

Core write APIs

await db.upsertTopic({
  name: 'TypeScript',
  category: 'entity',
  granularity: 'concrete',
});

await db.addFact({
  statement: 'I have worked with TypeScript since 2025.',
  topics: [
    { name: 'I', category: 'entity', granularity: 'concrete', role: 'subject' },
    { name: 'TypeScript', category: 'entity', granularity: 'concrete', role: 'object' },
    { name: '2025', category: 'temporal', granularity: 'concrete', role: 'time' },
  ],
});

Query APIs

await db.getTopicByName('TypeScript', { includeFacts: true });
await db.getTopicFacts('TypeScript');
await db.getTopicFactsLinkedTo('TypeScript', '2025');
await db.listTopics();
await db.listTopics({ includeFacts: false, limit: 100 });
await db.findConnectedTopics('TypeScript');
await db.findFactsConnectingTopics(['I', 'TypeScript', '2025']);

AI-assisted ingestion API

await db.ingestStatement('I have worked with TypeScript since 2025.', {
  extractor,
});

Where extractor implements:

interface FactExtractor {
  extract(input: string): Promise<ExtractedFact>;
}

The package will ship a simple NaiveExtractor for tests/examples, while real deployments can inject an LLM-backed extractor.


Execution plan

Task 1: Scaffold package tooling and baseline configuration

Objective: Create a clean TypeScript package foundation with build and test tooling.

Files:

  • Create: package.json
  • Create: tsconfig.json
  • Create: tsup.config.ts
  • Create: vitest.config.ts
  • Create: .gitignore
  • Modify: README.md

Steps:

  1. Add package metadata, scripts, dependency placeholders, and ESM export configuration.
  2. Add TypeScript config for library output.
  3. Add tsup config for bundling ESM + type declarations.
  4. Add Vitest config targeting Node.
  5. Expand README with project direction and current scope.
  6. Install dependencies and confirm bun test starts correctly.

Verification:

  • Run: bun install
  • Run: bun test
  • Expected: test runner executes successfully even if there are zero or placeholder tests.

Commit:

git add package.json tsconfig.json tsup.config.ts vitest.config.ts .gitignore README.md bun.lock
git commit -m "chore: scaffold IdentityDB package tooling"

Task 2: Define domain types and write migration tests first

Objective: Lock down the domain model and schema contract before implementing migrations.

Files:

  • Create: src/types/domain.ts
  • Create: src/types/database.ts
  • Create: src/types/api.ts
  • Create: src/core/schema.ts
  • Create: tests/migrations.test.ts

Steps:

  1. Write tests that describe the required tables and columns after initialization.
  2. Write tests for idempotent initialization (calling twice should not fail).
  3. Add domain and API type definitions that match the product model.
  4. Add schema description constants used by migrations.

Verification:

  • Run: bun test tests/migrations.test.ts
  • Expected before implementation: FAIL because initialization does not exist yet.

Commit:

git add src/types src/core/schema.ts tests/migrations.test.ts
git commit -m "test: define schema contract for topic fact graph"

Task 3: Implement dialect adapters and automatic schema initialization

Objective: Make the package connect to supported databases and create its schema automatically.

Files:

  • Create: src/adapters/dialect.ts
  • Create: src/adapters/index.ts
  • Create: src/core/migrations.ts
  • Create: src/core/errors.ts
  • Modify: src/core/schema.ts
  • Modify: tests/migrations.test.ts

Steps:

  1. Implement a connection config union for SQLite/Postgres/MySQL-family.
  2. Build a dialect factory returning a Kysely instance.
  3. Implement initializeSchema() with idempotent table creation.
  4. Add lightweight helpers for JSON serialization/deserialization portability.
  5. Re-run migration tests until green.

Verification:

  • Run: bun test tests/migrations.test.ts
  • Expected: PASS

Commit:

git add src/adapters src/core tests/migrations.test.ts
git commit -m "feat: add multi-dialect schema initialization"

Task 4: Write failing query tests for topic/fact operations

Objective: Specify the behavior of the high-level memory APIs before implementation.

Files:

  • Create: tests/identity-db.test.ts
  • Create: tests/queries.test.ts

Steps:

  1. Write tests for upsertTopic deduplication by normalized name.
  2. Write tests for addFact linking multiple topics to one fact.
  3. Write tests for getTopicByName(..., { includeFacts: true }).
  4. Write tests for getTopicFactsLinkedTo(topicA, topicB).
  5. Write tests for listTopics({ includeFacts: false }) returning topic-only records.
  6. Write tests for findConnectedTopics(name).

Verification:

  • Run: bun test tests/identity-db.test.ts tests/queries.test.ts
  • Expected before implementation: FAIL because IdentityDB methods are not implemented.

Commit:

git add tests/identity-db.test.ts tests/queries.test.ts
git commit -m "test: specify memory graph query APIs"

Task 5: Implement IdentityDB core service and query helpers

Objective: Deliver the first usable high-level API for writing and reading memory graph data.

Files:

  • Create: src/core/identity-db.ts
  • Create: src/queries/topics.ts
  • Create: src/queries/facts.ts
  • Create: src/index.ts
  • Modify: src/types/api.ts
  • Modify: tests/identity-db.test.ts
  • Modify: tests/queries.test.ts

Steps:

  1. Implement IdentityDB.connect() and initialize().
  2. Implement topic upsert with normalized key handling.
  3. Implement fact insertion plus topic linking transactionally.
  4. Implement topic lookup with optional fact expansion.
  5. Implement topic-to-topic and multi-topic fact queries.
  6. Implement topic listing and connected-topic discovery.
  7. Re-run the full test suite.

Verification:

  • Run: bun test
  • Expected: PASS

Commit:

git add src tests
git commit -m "feat: add IdentityDB core memory graph APIs"

Task 6: Add ingestion abstractions and a naive extractor

Objective: Support automatic topic/fact ingestion through a pluggable extraction pipeline.

Files:

  • Create: src/ingestion/types.ts
  • Create: src/ingestion/extractor.ts
  • Create: src/ingestion/naive-extractor.ts
  • Create: tests/ingestion.test.ts
  • Modify: src/core/identity-db.ts
  • Modify: src/index.ts

Steps:

  1. Write failing tests for ingestStatement() using a fake extractor.
  2. Define the extraction contracts and validation rules.
  3. Implement ingestStatement() by piping extractor output into addFact().
  4. Add a deterministic NaiveExtractor for examples/tests.
  5. Add tests proving extractor-driven topic creation works.

Verification:

  • Run: bun test tests/ingestion.test.ts
  • Run: bun test
  • Expected: PASS

Commit:

git add src/ingestion src/core/identity-db.ts src/index.ts tests/ingestion.test.ts
git commit -m "feat: add pluggable fact ingestion pipeline"

Task 7: Polish package docs and publish-ready ergonomics

Objective: Make the repository understandable and usable after the foundation lands.

Files:

  • Modify: README.md
  • Optionally create: docs/examples/basic-usage.md

Steps:

  1. Document supported databases and the current API surface.
  2. Document the topic/fact graph model with a concrete example.
  3. Add example code for initialization, querying, and AI-assisted ingestion.
  4. Call out current limitations and near-term roadmap.

Verification:

  • Manually review the README examples against actual exports.
  • Run: bun run build
  • Expected: PASS

Commit:

git add README.md docs/examples/basic-usage.md
git commit -m "docs: document IdentityDB foundation usage"

Test strategy

  • Use SQLite in-memory for the main automated tests.
  • Treat PostgreSQL/MySQL/MariaDB support as adapter-compatibility in the code path, with optional future integration tests behind environment variables.
  • Keep all public behavior covered through unit/integration-style tests against the public IdentityDB API.
  • Add regression tests for normalization, many-to-many fact linking, and topic filtering by connected topic.

Risks and tradeoffs

  1. Cross-dialect JSON handling — JSON support differs between engines. The initial version should serialize metadata defensively for portability.
  2. Case normalization semantics — topic uniqueness depends on normalization. The first version should use a simple lowercase-trim normalization and document it.
  3. Temporal topic modeling — time can be a topic, but richer interval modeling should wait until a later phase.
  4. Abstract vs concrete topic boundaries — this is partly editorial, so the API should store explicit granularity rather than trying to infer it automatically.
  5. LLM extraction variability — extractor output can be messy. The core package should validate extractor results before writing them.

Out of scope for this foundation pass

  • Embeddings or semantic vector search
  • Ranking/relevance algorithms
  • Full-text search indices
  • Topic merging/synonym resolution workflows
  • Multi-user authorization / remote HTTP service layer
  • Hosted API server package

Immediate execution target

For the first automated execution pass, implement Tasks 1 through 7 in order, but treat SQLite-backed functionality as the required tested path and the other SQL engines as supported adapter targets in the library surface.