Files

Shinwoo PARK bf1495a4d0 docs: add IdentityDB foundation plan

2026-05-11 10:41:45 +09:00

14 KiB

Raw Blame History

IdentityDB Foundation Implementation Plan

For Hermes: Use the subagent-driven-development skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.

Goal: Build the first usable version of IdentityDB, a TypeScript package that wraps relational databases and exposes a structured API for storing topics, facts, and their many-to-many graph relationships.

Architecture: IdentityDB will use a layered architecture: a storage layer based on Kysely + dialect adapters, a domain layer for topics/facts/links, and a service layer that exposes ergonomic high-level APIs for querying and ingesting memory. Schema initialization will be automatic and idempotent. AI-assisted ingestion will be abstracted behind a pluggable extractor interface so callers can use a small LLM or a deterministic extractor without coupling the core package to a specific model provider.

Tech Stack: TypeScript, Bun, Node.js, Kysely, better-sqlite3, pg, mysql2, Vitest, tsup.

Product constraints and interpretation

The package must support SQLite, PostgreSQL, MySQL, and MariaDB.
The database model must treat a single fact as a connector between multiple topics.
Topics can represent concrete entities (TypeScript), abstract concepts (programming language), or temporal anchors (2025).
Topic abstraction should be explicit in the schema so broad topics can store broad facts while specific topics store specific facts.
The initial release should prioritize correctness, portability, and ergonomic API design over advanced search or embedding features.
AI-assisted topic extraction should be implemented as an integration point in v1 foundation, not as a hardcoded provider-specific dependency.

Target repository structure

IdentityDB/
├── src/
│   ├── adapters/
│   │   ├── dialect.ts
│   │   └── index.ts
│   ├── core/
│   │   ├── errors.ts
│   │   ├── identity-db.ts
│   │   ├── migrations.ts
│   │   └── schema.ts
│   ├── ingestion/
│   │   ├── extractor.ts
│   │   ├── naive-extractor.ts
│   │   └── types.ts
│   ├── queries/
│   │   ├── topics.ts
│   │   └── facts.ts
│   ├── types/
│   │   ├── api.ts
│   │   ├── domain.ts
│   │   └── database.ts
│   └── index.ts
├── tests/
│   ├── identity-db.test.ts
│   ├── migrations.test.ts
│   ├── queries.test.ts
│   └── ingestion.test.ts
├── docs/
│   └── plans/
│       └── 2026-05-11-identitydb-foundation.md
├── package.json
├── tsconfig.json
├── tsup.config.ts
├── vitest.config.ts
├── .gitignore
└── README.md

Data model proposal

Tables

`topics`

id — string UUID
name — canonical display name, unique
normalized_name — lowercase normalized unique key
category — entity | concept | temporal | custom
granularity — abstract | concrete | mixed
description — nullable text
metadata — JSON / JSON-text depending on dialect
created_at
updated_at

`facts`

id — string UUID
statement — original fact text
summary — optional normalized/clean summary
source — optional source identifier
confidence — nullable numeric confidence
metadata — JSON / JSON-text depending on dialect
created_at
updated_at

`fact_topics`

fact_id
topic_id
role — optional semantic label (subject, object, time, etc.)
position — stable order for fact-topic relationships
composite unique key on (fact_id, topic_id, role)

Notes

The graph is modeled through fact_topics; facts are the connective tissue between topics.
No separate topic-to-topic edge table is needed in the initial version because relationships are derived from shared facts.
JSON portability should be implemented through small helpers so SQLite stores stringified JSON while Postgres/MySQL can still use text-compatible serialization safely.

Public API proposal

Construction and lifecycle

const db = await IdentityDB.connect({
  client: 'sqlite',
  filename: ':memory:',
});

await db.initialize();
await db.close();

Core write APIs

await db.upsertTopic({
  name: 'TypeScript',
  category: 'entity',
  granularity: 'concrete',
});

await db.addFact({
  statement: 'I have worked with TypeScript since 2025.',
  topics: [
    { name: 'I', category: 'entity', granularity: 'concrete', role: 'subject' },
    { name: 'TypeScript', category: 'entity', granularity: 'concrete', role: 'object' },
    { name: '2025', category: 'temporal', granularity: 'concrete', role: 'time' },
  ],
});

Query APIs

await db.getTopicByName('TypeScript', { includeFacts: true });
await db.getTopicFacts('TypeScript');
await db.getTopicFactsLinkedTo('TypeScript', '2025');
await db.listTopics();
await db.listTopics({ includeFacts: false, limit: 100 });
await db.findConnectedTopics('TypeScript');
await db.findFactsConnectingTopics(['I', 'TypeScript', '2025']);

AI-assisted ingestion API

await db.ingestStatement('I have worked with TypeScript since 2025.', {
  extractor,
});

Where extractor implements:

interface FactExtractor {
  extract(input: string): Promise<ExtractedFact>;
}

The package will ship a simple NaiveExtractor for tests/examples, while real deployments can inject an LLM-backed extractor.

Execution plan

Task 1: Scaffold package tooling and baseline configuration

Objective: Create a clean TypeScript package foundation with build and test tooling.

Files:

Create: package.json
Create: tsconfig.json
Create: tsup.config.ts
Create: vitest.config.ts
Create: .gitignore
Modify: README.md

Steps:

Add package metadata, scripts, dependency placeholders, and ESM export configuration.
Add TypeScript config for library output.
Add tsup config for bundling ESM + type declarations.
Add Vitest config targeting Node.
Expand README with project direction and current scope.
Install dependencies and confirm bun test starts correctly.

Verification:

Run: bun install
Run: bun test
Expected: test runner executes successfully even if there are zero or placeholder tests.

Commit:

git add package.json tsconfig.json tsup.config.ts vitest.config.ts .gitignore README.md bun.lock
git commit -m "chore: scaffold IdentityDB package tooling"

Task 2: Define domain types and write migration tests first

Objective: Lock down the domain model and schema contract before implementing migrations.

Files:

Create: src/types/domain.ts
Create: src/types/database.ts
Create: src/types/api.ts
Create: src/core/schema.ts
Create: tests/migrations.test.ts

Steps:

Write tests that describe the required tables and columns after initialization.
Write tests for idempotent initialization (calling twice should not fail).
Add domain and API type definitions that match the product model.
Add schema description constants used by migrations.

Verification:

Run: bun test tests/migrations.test.ts
Expected before implementation: FAIL because initialization does not exist yet.

Commit:

git add src/types src/core/schema.ts tests/migrations.test.ts
git commit -m "test: define schema contract for topic fact graph"

Task 3: Implement dialect adapters and automatic schema initialization

Objective: Make the package connect to supported databases and create its schema automatically.

Files:

Create: src/adapters/dialect.ts
Create: src/adapters/index.ts
Create: src/core/migrations.ts
Create: src/core/errors.ts
Modify: src/core/schema.ts
Modify: tests/migrations.test.ts

Steps:

Implement a connection config union for SQLite/Postgres/MySQL-family.
Build a dialect factory returning a Kysely instance.
Implement initializeSchema() with idempotent table creation.
Add lightweight helpers for JSON serialization/deserialization portability.
Re-run migration tests until green.

Verification:

Run: bun test tests/migrations.test.ts
Expected: PASS

Commit:

git add src/adapters src/core tests/migrations.test.ts
git commit -m "feat: add multi-dialect schema initialization"

Task 4: Write failing query tests for topic/fact operations

Objective: Specify the behavior of the high-level memory APIs before implementation.

Files:

Create: tests/identity-db.test.ts
Create: tests/queries.test.ts

Steps:

Write tests for upsertTopic deduplication by normalized name.
Write tests for addFact linking multiple topics to one fact.
Write tests for getTopicByName(..., { includeFacts: true }).
Write tests for getTopicFactsLinkedTo(topicA, topicB).
Write tests for listTopics({ includeFacts: false }) returning topic-only records.
Write tests for findConnectedTopics(name).

Verification:

Run: bun test tests/identity-db.test.ts tests/queries.test.ts
Expected before implementation: FAIL because IdentityDB methods are not implemented.

Commit:

git add tests/identity-db.test.ts tests/queries.test.ts
git commit -m "test: specify memory graph query APIs"

Task 5: Implement `IdentityDB` core service and query helpers

Objective: Deliver the first usable high-level API for writing and reading memory graph data.

Files:

Create: src/core/identity-db.ts
Create: src/queries/topics.ts
Create: src/queries/facts.ts
Create: src/index.ts
Modify: src/types/api.ts
Modify: tests/identity-db.test.ts
Modify: tests/queries.test.ts

Steps:

Implement IdentityDB.connect() and initialize().
Implement topic upsert with normalized key handling.
Implement fact insertion plus topic linking transactionally.
Implement topic lookup with optional fact expansion.
Implement topic-to-topic and multi-topic fact queries.
Implement topic listing and connected-topic discovery.
Re-run the full test suite.

Verification:

Run: bun test
Expected: PASS

Commit:

git add src tests
git commit -m "feat: add IdentityDB core memory graph APIs"

Task 6: Add ingestion abstractions and a naive extractor

Objective: Support automatic topic/fact ingestion through a pluggable extraction pipeline.

Files:

Create: src/ingestion/types.ts
Create: src/ingestion/extractor.ts
Create: src/ingestion/naive-extractor.ts
Create: tests/ingestion.test.ts
Modify: src/core/identity-db.ts
Modify: src/index.ts

Steps:

Write failing tests for ingestStatement() using a fake extractor.
Define the extraction contracts and validation rules.
Implement ingestStatement() by piping extractor output into addFact().
Add a deterministic NaiveExtractor for examples/tests.
Add tests proving extractor-driven topic creation works.

Verification:

Run: bun test tests/ingestion.test.ts
Run: bun test
Expected: PASS

Commit:

git add src/ingestion src/core/identity-db.ts src/index.ts tests/ingestion.test.ts
git commit -m "feat: add pluggable fact ingestion pipeline"

Task 7: Polish package docs and publish-ready ergonomics

Objective: Make the repository understandable and usable after the foundation lands.

Files:

Modify: README.md
Optionally create: docs/examples/basic-usage.md

Steps:

Document supported databases and the current API surface.
Document the topic/fact graph model with a concrete example.
Add example code for initialization, querying, and AI-assisted ingestion.
Call out current limitations and near-term roadmap.

Verification:

Manually review the README examples against actual exports.
Run: bun run build
Expected: PASS

Commit:

git add README.md docs/examples/basic-usage.md
git commit -m "docs: document IdentityDB foundation usage"

Test strategy

Use SQLite in-memory for the main automated tests.
Treat PostgreSQL/MySQL/MariaDB support as adapter-compatibility in the code path, with optional future integration tests behind environment variables.
Keep all public behavior covered through unit/integration-style tests against the public IdentityDB API.
Add regression tests for normalization, many-to-many fact linking, and topic filtering by connected topic.

Risks and tradeoffs

Cross-dialect JSON handling — JSON support differs between engines. The initial version should serialize metadata defensively for portability.
Case normalization semantics — topic uniqueness depends on normalization. The first version should use a simple lowercase-trim normalization and document it.
Temporal topic modeling — time can be a topic, but richer interval modeling should wait until a later phase.
Abstract vs concrete topic boundaries — this is partly editorial, so the API should store explicit granularity rather than trying to infer it automatically.
LLM extraction variability — extractor output can be messy. The core package should validate extractor results before writing them.

Out of scope for this foundation pass

Embeddings or semantic vector search
Ranking/relevance algorithms
Full-text search indices
Topic merging/synonym resolution workflows
Multi-user authorization / remote HTTP service layer
Hosted API server package

Immediate execution target

For the first automated execution pass, implement Tasks 1 through 7 in order, but treat SQLite-backed functionality as the required tested path and the other SQL engines as supported adapter targets in the library surface.

14 KiB Raw Blame History

IdentityDB Foundation Implementation Plan

Product constraints and interpretation

Target repository structure

Data model proposal

Tables

topics

facts

fact_topics

Notes

Public API proposal

Construction and lifecycle

Core write APIs

Query APIs

AI-assisted ingestion API

Execution plan

Task 1: Scaffold package tooling and baseline configuration

Task 2: Define domain types and write migration tests first

Task 3: Implement dialect adapters and automatic schema initialization

Task 4: Write failing query tests for topic/fact operations

Task 5: Implement IdentityDB core service and query helpers

Task 6: Add ingestion abstractions and a naive extractor

Task 7: Polish package docs and publish-ready ergonomics

Test strategy

Risks and tradeoffs

Out of scope for this foundation pass

Immediate execution target

14 KiB

Raw Blame History

`topics`

`facts`

`fact_topics`

Task 5: Implement `IdentityDB` core service and query helpers