
Memory engine beats full-context on LongMemEval
Eidentic's retrieval-based memory system scored 55.2% on the LongMemEval benchmark versus 41.0% for a full-context baseline, using up to 39× fewer tokens per query [Dev.to].
Eidentic's memory engine outperformed a full-context baseline on the LongMemEval benchmark, achieving 55.2% overall versus 41.0% and cutting token usage by up to 39× per question [Dev.to]. The study compared two configurations on the same 500-question set. The full-context baseline used the entire conversation history—about 115 k tokens across ~50 sessions—into the prompt for every query. Eidentic's memory engine ingested the same history into a four-tier retrieval system and fetched only the relevant snippets, averaging 2 550 tokens per answer. Results by question type show the memory engine leading on every metric, for example 84.3% vs 67.1% on single-session user questions and 92.9% vs 73.2% on single-session assistant questions [Dev.to].
The memory engine's advantage is most pronounced when the conversation exceeds the model's context window, delivering a 14.2-point lift overall and winning every one of the six question categories [Dev.to]. In contrast, the full-context approach edged ahead by 7.8 points on a small-haystack benchmark (LoCoMo), but still used roughly 19 030 tokens per query versus 893 for memory. This translates to a 39× reduction in token consumption, directly lowering inference spend for any production agent that must handle long dialogues.
The LongMemEval results provide a data-driven threshold for when retrieval memory outweighs the marginal accuracy gain of full context, allowing engineers to quantify the crossover point: once a session surpasses a few thousand tokens, the cost-accuracy advantage of retrieval memory becomes clear [Dev.to].
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation


