fix(libmdbx): fix TOCTOU race in mdbx_txn_clone MVCC validation

The previous implementation checked cached_oldest BEFORE binding the
reader slot and registering the txnid. This created a race window where:

1. Check: source->txnid >= cached_oldest (passes)
2. Gap: Allocate memory, bind reader slot
3. Register: Write txnid to reader slot

During the gap, if the source transaction was the only reader holding
that snapshot and got aborted, cached_oldest could advance and GC could
reclaim pages before the clone's reader slot was registered.

The fix follows the same pattern used in mdbx_txn_begin: register the
reader slot first, then verify the snapshot is still valid. This ensures
the GC sees our registered txnid before we check validity.
This commit is contained in:
Alexey Shekhirin
2025-12-19 01:10:21 +00:00
parent a8e6fc5855
commit e7339aed84

View File

@@ -12958,10 +12958,6 @@ int mdbx_txn_clone(const MDBX_txn *source, MDBX_txn **dest) {
if (unlikely(!source->to.reader))
return LOG_IFERR(MDBX_BAD_TXN);
const txnid_t snap_oldest = atomic_load64(&env->lck->cached_oldest, mo_AcquireRelease);
if (unlikely(source->txnid < snap_oldest))
return LOG_IFERR(MDBX_MVCC_RETARDED);
const uint32_t snapshot_pages_used = atomic_load32(&source->to.reader->snapshot_pages_used, mo_Relaxed);
const uint64_t snapshot_pages_retired = atomic_load64(&source->to.reader->snapshot_pages_retired, mo_Relaxed);
@@ -13047,6 +13043,17 @@ int mdbx_txn_clone(const MDBX_txn *source, MDBX_txn **dest) {
safe64_write(&r->txnid, source->txnid);
atomic_store32(&env->lck->rdt_refresh_flag, true, mo_AcquireRelease);
const txnid_t snap_oldest = atomic_load64(&env->lck->cached_oldest, mo_AcquireRelease);
if (unlikely(source->txnid < snap_oldest)) {
safe64_reset(&r->txnid, true);
if ((env->flags & ENV_TXKEY) == 0)
atomic_store32(&r->pid, 0, mo_Relaxed);
txn->to.reader = nullptr;
if (!reuse)
osal_free(txn);
return LOG_IFERR(MDBX_MVCC_RETARDED);
}
txn->flags = MDBX_TXN_RDONLY | (env->flags & MDBX_NOSTICKYTHREADS);
#if defined(_WIN32) || defined(_WIN64)
const size_t used_bytes = pgno2bytes(env, txn->geo.first_unallocated);