Files
IdentityDB/docs/plans/2026-05-11-identitydb-llm-extractor-adapter.md

3.0 KiB

IdentityDB LLM Extractor Adapter Implementation Plan

For Hermes: Use the subagent-driven-development skill to execute this plan task-by-task. Enforce strict TDD for every production behavior.

Goal: Add a provider-agnostic LLM-backed fact extractor adapter so callers can plug a small language model into IdentityDB ingestion without coupling the package to a specific SDK.

Architecture: Keep FactExtractor as the stable ingestion contract, then add an LlmFactExtractor adapter that delegates prompting and text generation to a narrow model interface. The adapter should build a deterministic JSON-only extraction prompt, parse structured JSON from the model response, validate the shape, and return ExtractedFact objects that flow through the existing ingestion validation path.

Tech Stack: TypeScript, Bun, Node.js, Kysely, Vitest, tsup.


Scope and interpretation

  • The new adapter must remain provider-agnostic and must not depend on OpenAI, Anthropic, or any other SDK.
  • The adapter should accept a minimal language-model interface that returns text so package consumers can bridge any LLM client they want.
  • Structured output must be validated in the adapter before returning it to extractFact().
  • The adapter should tolerate common model formatting noise such as fenced ```json blocks around the payload.
  • Initial release should focus on correctness and predictable integration, not prompt-optimization or retries.

Public API additions

const extractor = new LlmFactExtractor({
  model: {
    async generateText(prompt) {
      return jsonStringFromSomeLlm(prompt);
    },
  },
});

const fact = await db.ingestStatement('I have worked with Bun and TypeScript since 2025.', {
  extractor,
});

Optional customization:

const extractor = new LlmFactExtractor({
  model,
  instructions: 'Prefer product and technology topics over generic nouns.',
});

Execution plan

Task 1: Lock the adapter behavior with failing tests

Objective: Define the LLM adapter contract before implementation.

Files:

  • Modify: tests/ingestion.test.ts
  • Modify: src/ingestion/types.ts
  • Modify: src/index.ts

Verification:

  • Run focused ingestion tests and confirm they fail for the missing adapter behavior.

Task 2: Implement the LLM adapter and response parsing

Objective: Add a reusable LlmFactExtractor implementation plus robust JSON extraction helpers.

Files:

  • Create: src/ingestion/llm-extractor.ts
  • Modify: src/ingestion/types.ts
  • Modify: src/ingestion/extractor.ts
  • Modify: src/index.ts

Verification:

  • Run the focused ingestion tests until green.

Task 3: Document the adapter and run the full suite

Objective: Expose the new adapter in docs and ensure the whole package still passes verification.

Files:

  • Modify: README.md
  • Modify: src/index.ts

Verification:

  • Run bun run test && bun run check && bun run build
  • Confirm the README shows how to bridge an arbitrary LLM client into the adapter.