From 7a02621e407014c450da3c2a200ee6dcb2582c07 Mon Sep 17 00:00:00 2001
From: Shinwoo PARK <shinwoo.park@psw.kr>
Date: Mon, 11 May 2026 12:14:55 +0900
Subject: [PATCH] docs: add LLM extractor adapter plan

---
 ...-05-11-identitydb-llm-extractor-adapter.md | 87 +++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100644 docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md

diff --git a/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md b/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md
new file mode 100644
index 0000000..26febee
--- /dev/null
+++ b/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md
@@ -0,0 +1,87 @@
+# IdentityDB LLM Extractor Adapter Implementation Plan
+
+> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
+
+**Goal:** Add a provider-agnostic LLM-backed fact extractor adapter so callers can plug a small language model into IdentityDB ingestion without coupling the package to a specific SDK.
+
+**Architecture:** Keep `FactExtractor` as the stable ingestion contract, then add an `LlmFactExtractor` adapter that delegates prompting and text generation to a narrow model interface. The adapter should build a deterministic JSON-only extraction prompt, parse structured JSON from the model response, validate the shape, and return `ExtractedFact` objects that flow through the existing ingestion validation path.
+
+**Tech Stack:** TypeScript, Bun, Node.js, Kysely, Vitest, tsup.
+
+---
+
+## Scope and interpretation
+
+- The new adapter must remain provider-agnostic and must not depend on OpenAI, Anthropic, or any other SDK.
+- The adapter should accept a minimal language-model interface that returns text so package consumers can bridge any LLM client they want.
+- Structured output must be validated in the adapter before returning it to `extractFact()`.
+- The adapter should tolerate common model formatting noise such as fenced ```json blocks around the payload.
+- Initial release should focus on correctness and predictable integration, not prompt-optimization or retries.
+
+---
+
+## Public API additions
+
+```ts
+const extractor = new LlmFactExtractor({
+  model: {
+    async generateText(prompt) {
+      return jsonStringFromSomeLlm(prompt);
+    },
+  },
+});
+
+const fact = await db.ingestStatement('I have worked with Bun and TypeScript since 2025.', {
+  extractor,
+});
+```
+
+Optional customization:
+
+```ts
+const extractor = new LlmFactExtractor({
+  model,
+  instructions: 'Prefer product and technology topics over generic nouns.',
+});
+```
+
+---
+
+## Execution plan
+
+### Task 1: Lock the adapter behavior with failing tests
+
+**Objective:** Define the LLM adapter contract before implementation.
+
+**Files:**
+- Modify: `tests/ingestion.test.ts`
+- Modify: `src/ingestion/types.ts`
+- Modify: `src/index.ts`
+
+**Verification:**
+- Run focused ingestion tests and confirm they fail for the missing adapter behavior.
+
+### Task 2: Implement the LLM adapter and response parsing
+
+**Objective:** Add a reusable `LlmFactExtractor` implementation plus robust JSON extraction helpers.
+
+**Files:**
+- Create: `src/ingestion/llm-extractor.ts`
+- Modify: `src/ingestion/types.ts`
+- Modify: `src/ingestion/extractor.ts`
+- Modify: `src/index.ts`
+
+**Verification:**
+- Run the focused ingestion tests until green.
+
+### Task 3: Document the adapter and run the full suite
+
+**Objective:** Expose the new adapter in docs and ensure the whole package still passes verification.
+
+**Files:**
+- Modify: `README.md`
+- Modify: `src/index.ts`
+
+**Verification:**
+- Run `bun run test && bun run check && bun run build`
+- Confirm the README shows how to bridge an arbitrary LLM client into the adapter.