docs: add LLM extractor adapter plan

2026-05-11 12:14:55 +09:00
parent 4c418dc39a
commit 7a02621e40
1 changed files with 87 additions and 0 deletions
--- a/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md
+++ b/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md
@@ -0,0 +1,87 @@
 # IdentityDB LLM Extractor Adapter Implementation Plan
 > **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
 **Goal:** Add a provider-agnostic LLM-backed fact extractor adapter so callers can plug a small language model into IdentityDB ingestion without coupling the package to a specific SDK.
 **Architecture:** Keep `FactExtractor` as the stable ingestion contract, then add an `LlmFactExtractor` adapter that delegates prompting and text generation to a narrow model interface. The adapter should build a deterministic JSON-only extraction prompt, parse structured JSON from the model response, validate the shape, and return `ExtractedFact` objects that flow through the existing ingestion validation path.
 **Tech Stack:** TypeScript, Bun, Node.js, Kysely, Vitest, tsup.
 ---
 ## Scope and interpretation
 - The new adapter must remain provider-agnostic and must not depend on OpenAI, Anthropic, or any other SDK.
 - The adapter should accept a minimal language-model interface that returns text so package consumers can bridge any LLM client they want.
 - Structured output must be validated in the adapter before returning it to `extractFact()`.
 - The adapter should tolerate common model formatting noise such as fenced ```json blocks around the payload.
 - Initial release should focus on correctness and predictable integration, not prompt-optimization or retries.
 ---
 ## Public API additions
 ```ts
 const extractor = new LlmFactExtractor({
  model: {
    async generateText(prompt) {
      return jsonStringFromSomeLlm(prompt);
    },
  },
 });
 const fact = await db.ingestStatement('I have worked with Bun and TypeScript since 2025.', {
  extractor,
 });
 ```
 Optional customization:
 ```ts
 const extractor = new LlmFactExtractor({
  model,
  instructions: 'Prefer product and technology topics over generic nouns.',
 });
 ```
 ---
 ## Execution plan
 ### Task 1: Lock the adapter behavior with failing tests
 **Objective:** Define the LLM adapter contract before implementation.
 **Files:**
 - Modify: `tests/ingestion.test.ts`
 - Modify: `src/ingestion/types.ts`
 - Modify: `src/index.ts`
 **Verification:**
 - Run focused ingestion tests and confirm they fail for the missing adapter behavior.
 ### Task 2: Implement the LLM adapter and response parsing
 **Objective:** Add a reusable `LlmFactExtractor` implementation plus robust JSON extraction helpers.
 **Files:**
 - Create: `src/ingestion/llm-extractor.ts`
 - Modify: `src/ingestion/types.ts`
 - Modify: `src/ingestion/extractor.ts`
 - Modify: `src/index.ts`
 **Verification:**
 - Run the focused ingestion tests until green.
 ### Task 3: Document the adapter and run the full suite
 **Objective:** Expose the new adapter in docs and ensure the whole package still passes verification.
 **Files:**
 - Modify: `README.md`
 - Modify: `src/index.ts`
 **Verification:**
 - Run `bun run test && bun run check && bun run build`
 - Confirm the README shows how to bridge an arbitrary LLM client into the adapter.