Add two-stage IdentityDB memory extraction for BoxBrain conversations #1

Open
opened 2026-05-11 22:50:29 +09:00 by p-sw · 0 comments
Owner

Summary

When BoxBrain conversations happen, we want to update IdentityDB based on the inbound/outbound content — but not by sending every message directly into the extractor.

Instead, add a two-stage pipeline:

  1. Stage 1: memory-worthiness classification
    • Evaluate each first-pass inbound/outbound BoxBrain message.
    • Decide whether the message contains information worth remembering.
    • Skip extractor work for messages that are not memory-worthy.
  2. Stage 2: IdentityDB LLM extractor ingestion
    • Send only memory-worthy messages to the IdentityDB LLM extractor.
    • Persist the extracted facts into the persona space (or another clearly defined space if design requires it).

Goals

  • Reduce noise and unnecessary fact extraction.
  • Preserve only durable, user-relevant, or persona-relevant memories.
  • Keep the extraction workflow compatible with BoxBrain conversation history.

Suggested implementation notes

  • Run Stage 1 on both inbound user messages and first-pass outbound BoxBrain messages.
  • Store enough metadata to trace which conversation turn produced a memory candidate.
  • Make the classifier and extractor models configurable/reusable through the existing BoxBrain runtime adapters where possible.
  • Add tests for:
    • memory-worthy vs non-memory-worthy filtering
    • extractor invocation only for approved candidates
    • persisted facts landing in the expected IdentityDB space
  • Consider deduplication / similarity checks so repeated conversations do not spam IdentityDB.

Out of scope for this issue

  • Changing the existing proactive, schedule, or status behaviors.
  • Bulk backfilling old conversation logs unless explicitly planned later.
## Summary When BoxBrain conversations happen, we want to update IdentityDB based on the inbound/outbound content — but not by sending every message directly into the extractor. Instead, add a two-stage pipeline: 1. **Stage 1: memory-worthiness classification** - Evaluate each first-pass inbound/outbound BoxBrain message. - Decide whether the message contains information worth remembering. - Skip extractor work for messages that are not memory-worthy. 2. **Stage 2: IdentityDB LLM extractor ingestion** - Send only memory-worthy messages to the IdentityDB LLM extractor. - Persist the extracted facts into the persona space (or another clearly defined space if design requires it). ## Goals - Reduce noise and unnecessary fact extraction. - Preserve only durable, user-relevant, or persona-relevant memories. - Keep the extraction workflow compatible with BoxBrain conversation history. ## Suggested implementation notes - Run Stage 1 on both inbound user messages and first-pass outbound BoxBrain messages. - Store enough metadata to trace which conversation turn produced a memory candidate. - Make the classifier and extractor models configurable/reusable through the existing BoxBrain runtime adapters where possible. - Add tests for: - memory-worthy vs non-memory-worthy filtering - extractor invocation only for approved candidates - persisted facts landing in the expected IdentityDB space - Consider deduplication / similarity checks so repeated conversations do not spam IdentityDB. ## Out of scope for this issue - Changing the existing proactive, schedule, or status behaviors. - Bulk backfilling old conversation logs unless explicitly planned later.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: p-sw/BoxBrain#1
No description provided.