docs: add LLM extractor adapter plan

This commit is contained in:
2026-05-11 12:14:55 +09:00
parent 4c418dc39a
commit 7a02621e40

View File

@@ -0,0 +1,87 @@
# IdentityDB LLM Extractor Adapter Implementation Plan
> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
**Goal:** Add a provider-agnostic LLM-backed fact extractor adapter so callers can plug a small language model into IdentityDB ingestion without coupling the package to a specific SDK.
**Architecture:** Keep `FactExtractor` as the stable ingestion contract, then add an `LlmFactExtractor` adapter that delegates prompting and text generation to a narrow model interface. The adapter should build a deterministic JSON-only extraction prompt, parse structured JSON from the model response, validate the shape, and return `ExtractedFact` objects that flow through the existing ingestion validation path.
**Tech Stack:** TypeScript, Bun, Node.js, Kysely, Vitest, tsup.
---
## Scope and interpretation
- The new adapter must remain provider-agnostic and must not depend on OpenAI, Anthropic, or any other SDK.
- The adapter should accept a minimal language-model interface that returns text so package consumers can bridge any LLM client they want.
- Structured output must be validated in the adapter before returning it to `extractFact()`.
- The adapter should tolerate common model formatting noise such as fenced ```json blocks around the payload.
- Initial release should focus on correctness and predictable integration, not prompt-optimization or retries.
---
## Public API additions
```ts
const extractor = new LlmFactExtractor({
model: {
async generateText(prompt) {
return jsonStringFromSomeLlm(prompt);
},
},
});
const fact = await db.ingestStatement('I have worked with Bun and TypeScript since 2025.', {
extractor,
});
```
Optional customization:
```ts
const extractor = new LlmFactExtractor({
model,
instructions: 'Prefer product and technology topics over generic nouns.',
});
```
---
## Execution plan
### Task 1: Lock the adapter behavior with failing tests
**Objective:** Define the LLM adapter contract before implementation.
**Files:**
- Modify: `tests/ingestion.test.ts`
- Modify: `src/ingestion/types.ts`
- Modify: `src/index.ts`
**Verification:**
- Run focused ingestion tests and confirm they fail for the missing adapter behavior.
### Task 2: Implement the LLM adapter and response parsing
**Objective:** Add a reusable `LlmFactExtractor` implementation plus robust JSON extraction helpers.
**Files:**
- Create: `src/ingestion/llm-extractor.ts`
- Modify: `src/ingestion/types.ts`
- Modify: `src/ingestion/extractor.ts`
- Modify: `src/index.ts`
**Verification:**
- Run the focused ingestion tests until green.
### Task 3: Document the adapter and run the full suite
**Objective:** Expose the new adapter in docs and ensure the whole package still passes verification.
**Files:**
- Modify: `README.md`
- Modify: `src/index.ts`
**Verification:**
- Run `bun run test && bun run check && bun run build`
- Confirm the README shows how to bridge an arbitrary LLM client into the adapter.