
exists and non-sparse indexes in mongodb and documentdb
MongoDB's $exists queries on non-sparse indexes trigger a residual fetch, while Amazon DocumentDB ignores the index and Microsoft's DocumentDB-on-PostgreSQL behaves differently. Sparse indexes eliminate the wasted work [Dev.to].
In MongoDB, missing fields are treated as null in a non-sparse B-tree index. An $exists:true filter cannot be satisfied by the index alone, resulting in a FETCH stage that discards documents where the field is absent [Dev.to]. This leads to wasted work, as seen in an eight-document test set where the engine examines 8 keys and 8 documents, but only returns 6 results.
In contrast, a sparse index in MongoDB only includes documents that contain the indexed field, eliminating the need for a residual predicate. The plan shows 6 keys examined and 6 docs examined.
Amazon DocumentDB (v8) ignores a non-sparse index and falls back to a COLLSCAN, scanning all eight documents. Adding { sparse: true } forces an IXSCAN that returns six keys, matching the MongoDB sparse behavior [Dev.to].
Microsoft DocumentDB on PostgreSQL uses a RUM index that stores path-based entries. A standard non-sparse index already satisfies $exists:true, returning six keys with no residual filter. The underlying RUM operator class excludes missing-field entries, making a sparse flag unnecessary.
The choice of index strategy has significant performance implications. In MongoDB, each absent field forces a full document read, leading to increased I/O and latency on large collections. Amazon DocumentDB requires a sparse index for any $exists:true scan, or the planner chooses a full collection scan. Meanwhile, PostgreSQL-backed DocumentDB demonstrates that index semantics are not uniform, and assuming MongoDB-style behavior can lead to sub-optimal indexing choices.
Sparse indexes are the safest default for existence checks, as they guarantee a true index-only plan across the major MongoDB-compatible platforms. The hidden fetch cost in MongoDB and the outright index abandonment in Amazon DocumentDB make sparse indexes a necessary optimization to minimize latency and cloud costs.
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation


