fix: [2.6] protect tbb concurrent_map emplace to avoid race condition deadlock (#45682)

Cherry-pick from master
pr: #45681
Related to #44974

The emplace() operation on tbb::concurrent_hash_map was not protected,
allowing other threads to erase entries between the emplace attempt and
the subsequent lookup.

Solution:
1. Add shared_lock protection around the emplace() operation to prevent
concurrent erasure during insertion
2. Instead of returning nullptr when the key is not found on retry,
recursively call Get(key) to retry the entire operation
3. Fix typo: "earsed" -> "erased"

This ensures that concurrent Get() operations are properly synchronized
and will eventually succeed even under high contention.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This commit is contained in:
congqixia 2025-11-20 14:23:12 +08:00 committed by GitHub
parent 3741fcdd78
commit 29c9132e55
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -37,16 +37,21 @@ StorageV2FSCache::Get(const Key& key) {
std::promise<milvus_storage::ArrowFileSystemPtr> p;
std::shared_future<milvus_storage::ArrowFileSystemPtr> f = p.get_future();
std::shared_lock lck(mutex_);
auto [iter, inserted] =
concurrent_map_.emplace(key, Value(std::move(p), f));
lck.unlock();
if (!inserted) {
std::shared_lock lck(mutex_);
// double check: avoid iter has been earsed by other thread
// double check: avoid iter has been erased by other thread
auto it = concurrent_map_.find(key);
if (it != concurrent_map_.end()) {
return it->second.second.get();
}
return nullptr;
lck.unlock();
// retry if already delete
return Get(key);
}
try {