Add two-stage IdentityDB memory extraction for BoxBrain conversations #1

New Issue

p-sw · 2026-05-11T22:50:29+09:00

p-sw commented

2026-05-11 22:50:29 +09:00

Summary

When BoxBrain conversations happen, we want to update IdentityDB based on the inbound/outbound content — but not by sending every message directly into the extractor.

Instead, add a two-stage pipeline:

Stage 1: memory-worthiness classification
- Evaluate each first-pass inbound/outbound BoxBrain message.
- Decide whether the message contains information worth remembering.
- Skip extractor work for messages that are not memory-worthy.
Stage 2: IdentityDB LLM extractor ingestion
- Send only memory-worthy messages to the IdentityDB LLM extractor.
- Persist the extracted facts into the persona space (or another clearly defined space if design requires it).

Goals

Reduce noise and unnecessary fact extraction.
Preserve only durable, user-relevant, or persona-relevant memories.
Keep the extraction workflow compatible with BoxBrain conversation history.

Suggested implementation notes

Run Stage 1 on both inbound user messages and first-pass outbound BoxBrain messages.
Store enough metadata to trace which conversation turn produced a memory candidate.
Make the classifier and extractor models configurable/reusable through the existing BoxBrain runtime adapters where possible.
Add tests for:
- memory-worthy vs non-memory-worthy filtering
- extractor invocation only for approved candidates
- persisted facts landing in the expected IdentityDB space
Consider deduplication / similarity checks so repeated conversations do not spam IdentityDB.

Out of scope for this issue

Changing the existing proactive, schedule, or status behaviors.
Bulk backfilling old conversation logs unless explicitly planned later.

## Summary When BoxBrain conversations happen, we want to update IdentityDB based on the inbound/outbound content — but not by sending every message directly into the extractor. Instead, add a two-stage pipeline: 1. **Stage 1: memory-worthiness classification** - Evaluate each first-pass inbound/outbound BoxBrain message. - Decide whether the message contains information worth remembering. - Skip extractor work for messages that are not memory-worthy. 2. **Stage 2: IdentityDB LLM extractor ingestion** - Send only memory-worthy messages to the IdentityDB LLM extractor. - Persist the extracted facts into the persona space (or another clearly defined space if design requires it). ## Goals - Reduce noise and unnecessary fact extraction. - Preserve only durable, user-relevant, or persona-relevant memories. - Keep the extraction workflow compatible with BoxBrain conversation history. ## Suggested implementation notes - Run Stage 1 on both inbound user messages and first-pass outbound BoxBrain messages. - Store enough metadata to trace which conversation turn produced a memory candidate. - Make the classifier and extractor models configurable/reusable through the existing BoxBrain runtime adapters where possible. - Add tests for: - memory-worthy vs non-memory-worthy filtering - extractor invocation only for approved candidates - persisted facts landing in the expected IdentityDB space - Consider deduplication / similarity checks so repeated conversations do not spam IdentityDB. ## Out of scope for this issue - Changing the existing proactive, schedule, or status behaviors. - Bulk backfilling old conversation logs unless explicitly planned later.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: p-sw/BoxBrain#1