fix: [2.6] protect tbb concurrent_map emplace to avoid race condition deadlock (#45682)

Cherry-pick from master pr: #45681 Related to #44974 The emplace() operation on tbb::concurrent_hash_map was not protected, allowing other threads to erase entries between the emplace attempt and the subsequent lookup. Solution: 1. Add shared_lock protection around the emplace() operation to prevent concurrent erasure during insertion 2. Instead of returning nullptr when the key is not found on retry, recursively call Get(key) to retry the entire operation 3. Fix typo: "earsed" -> "erased" This ensures that concurrent Get() operations are properly synchronized and will eventually succeed even under high contention. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-06 17:18:35 +08:00 · 2025-11-20 14:23:12 +08:00 · 2025-11-20 14:23:12 +08:00 · 29c9132e55
commit 29c9132e55
parent 3741fcdd78
1 changed files with 7 additions and 2 deletions
--- a/internal/core/src/storage/StorageV2FSCache.cpp
+++ b/internal/core/src/storage/StorageV2FSCache.cpp
@ -37,16 +37,21 @@ StorageV2FSCache::Get(const Key& key) {
    std::promise<milvus_storage::ArrowFileSystemPtr> p;
    std::shared_future<milvus_storage::ArrowFileSystemPtr> f = p.get_future();

+    std::shared_lock lck(mutex_);
    auto [iter, inserted] =
        concurrent_map_.emplace(key, Value(std::move(p), f));
+    lck.unlock();
+
    if (!inserted) {
        std::shared_lock lck(mutex_);
-        // double check: avoid iter has been earsed by other thread
+        // double check: avoid iter has been erased by other thread
        auto it = concurrent_map_.find(key);
        if (it != concurrent_map_.end()) {
            return it->second.second.get();
        }
-        return nullptr;
+        lck.unlock();
+        // retry if already delete
+        return Get(key);
    }

    try {