From 7a02621e407014c450da3c2a200ee6dcb2582c07 Mon Sep 17 00:00:00 2001 From: Shinwoo PARK Date: Mon, 11 May 2026 12:14:55 +0900 Subject: [PATCH] docs: add LLM extractor adapter plan --- ...-05-11-identitydb-llm-extractor-adapter.md | 87 +++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md diff --git a/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md b/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md new file mode 100644 index 0000000..26febee --- /dev/null +++ b/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md @@ -0,0 +1,87 @@ +# IdentityDB LLM Extractor Adapter Implementation Plan + +> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior. + +**Goal:** Add a provider-agnostic LLM-backed fact extractor adapter so callers can plug a small language model into IdentityDB ingestion without coupling the package to a specific SDK. + +**Architecture:** Keep `FactExtractor` as the stable ingestion contract, then add an `LlmFactExtractor` adapter that delegates prompting and text generation to a narrow model interface. The adapter should build a deterministic JSON-only extraction prompt, parse structured JSON from the model response, validate the shape, and return `ExtractedFact` objects that flow through the existing ingestion validation path. + +**Tech Stack:** TypeScript, Bun, Node.js, Kysely, Vitest, tsup. + +--- + +## Scope and interpretation + +- The new adapter must remain provider-agnostic and must not depend on OpenAI, Anthropic, or any other SDK. +- The adapter should accept a minimal language-model interface that returns text so package consumers can bridge any LLM client they want. +- Structured output must be validated in the adapter before returning it to `extractFact()`. +- The adapter should tolerate common model formatting noise such as fenced ```json blocks around the payload. +- Initial release should focus on correctness and predictable integration, not prompt-optimization or retries. + +--- + +## Public API additions + +```ts +const extractor = new LlmFactExtractor({ + model: { + async generateText(prompt) { + return jsonStringFromSomeLlm(prompt); + }, + }, +}); + +const fact = await db.ingestStatement('I have worked with Bun and TypeScript since 2025.', { + extractor, +}); +``` + +Optional customization: + +```ts +const extractor = new LlmFactExtractor({ + model, + instructions: 'Prefer product and technology topics over generic nouns.', +}); +``` + +--- + +## Execution plan + +### Task 1: Lock the adapter behavior with failing tests + +**Objective:** Define the LLM adapter contract before implementation. + +**Files:** +- Modify: `tests/ingestion.test.ts` +- Modify: `src/ingestion/types.ts` +- Modify: `src/index.ts` + +**Verification:** +- Run focused ingestion tests and confirm they fail for the missing adapter behavior. + +### Task 2: Implement the LLM adapter and response parsing + +**Objective:** Add a reusable `LlmFactExtractor` implementation plus robust JSON extraction helpers. + +**Files:** +- Create: `src/ingestion/llm-extractor.ts` +- Modify: `src/ingestion/types.ts` +- Modify: `src/ingestion/extractor.ts` +- Modify: `src/index.ts` + +**Verification:** +- Run the focused ingestion tests until green. + +### Task 3: Document the adapter and run the full suite + +**Objective:** Expose the new adapter in docs and ensure the whole package still passes verification. + +**Files:** +- Modify: `README.md` +- Modify: `src/index.ts` + +**Verification:** +- Run `bun run test && bun run check && bun run build` +- Confirm the README shows how to bridge an arbitrary LLM client into the adapter.