docs: add LLM extractor adapter plan
This commit is contained in:
87
docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md
Normal file
87
docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md
Normal file
@@ -0,0 +1,87 @@
|
|||||||
|
# IdentityDB LLM Extractor Adapter Implementation Plan
|
||||||
|
|
||||||
|
> **For Hermes:** Use the `subagent-driven-development` skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.
|
||||||
|
|
||||||
|
**Goal:** Add a provider-agnostic LLM-backed fact extractor adapter so callers can plug a small language model into IdentityDB ingestion without coupling the package to a specific SDK.
|
||||||
|
|
||||||
|
**Architecture:** Keep `FactExtractor` as the stable ingestion contract, then add an `LlmFactExtractor` adapter that delegates prompting and text generation to a narrow model interface. The adapter should build a deterministic JSON-only extraction prompt, parse structured JSON from the model response, validate the shape, and return `ExtractedFact` objects that flow through the existing ingestion validation path.
|
||||||
|
|
||||||
|
**Tech Stack:** TypeScript, Bun, Node.js, Kysely, Vitest, tsup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope and interpretation
|
||||||
|
|
||||||
|
- The new adapter must remain provider-agnostic and must not depend on OpenAI, Anthropic, or any other SDK.
|
||||||
|
- The adapter should accept a minimal language-model interface that returns text so package consumers can bridge any LLM client they want.
|
||||||
|
- Structured output must be validated in the adapter before returning it to `extractFact()`.
|
||||||
|
- The adapter should tolerate common model formatting noise such as fenced ```json blocks around the payload.
|
||||||
|
- Initial release should focus on correctness and predictable integration, not prompt-optimization or retries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Public API additions
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const extractor = new LlmFactExtractor({
|
||||||
|
model: {
|
||||||
|
async generateText(prompt) {
|
||||||
|
return jsonStringFromSomeLlm(prompt);
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const fact = await db.ingestStatement('I have worked with Bun and TypeScript since 2025.', {
|
||||||
|
extractor,
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Optional customization:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const extractor = new LlmFactExtractor({
|
||||||
|
model,
|
||||||
|
instructions: 'Prefer product and technology topics over generic nouns.',
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution plan
|
||||||
|
|
||||||
|
### Task 1: Lock the adapter behavior with failing tests
|
||||||
|
|
||||||
|
**Objective:** Define the LLM adapter contract before implementation.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `tests/ingestion.test.ts`
|
||||||
|
- Modify: `src/ingestion/types.ts`
|
||||||
|
- Modify: `src/index.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run focused ingestion tests and confirm they fail for the missing adapter behavior.
|
||||||
|
|
||||||
|
### Task 2: Implement the LLM adapter and response parsing
|
||||||
|
|
||||||
|
**Objective:** Add a reusable `LlmFactExtractor` implementation plus robust JSON extraction helpers.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `src/ingestion/llm-extractor.ts`
|
||||||
|
- Modify: `src/ingestion/types.ts`
|
||||||
|
- Modify: `src/ingestion/extractor.ts`
|
||||||
|
- Modify: `src/index.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run the focused ingestion tests until green.
|
||||||
|
|
||||||
|
### Task 3: Document the adapter and run the full suite
|
||||||
|
|
||||||
|
**Objective:** Expose the new adapter in docs and ensure the whole package still passes verification.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `README.md`
|
||||||
|
- Modify: `src/index.ts`
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- Run `bun run test && bun run check && bun run build`
|
||||||
|
- Confirm the README shows how to bridge an arbitrary LLM client into the adapter.
|
||||||
Reference in New Issue
Block a user