This commit optimizes std::vector usage across segcore by adding
reserve() calls where the size is known in advance, reducing memory
reallocations during push_back operations.
Changes:
TimestampIndex.cpp: Reserve space for prefix_sums and timestamp_barriers
SegmentGrowingImpl.cpp: Reserve space for binlog info vectors
ChunkedSegmentSealedImpl.cpp: Reserve space for futures and field data
vectors
storagev2translator/GroupChunkTranslator.cpp: Reserve space for metadata
vectors
This improves performance by avoiding multiple memory reallocations when
the vector size is predictable.
issue: https://github.com/milvus-io/milvus/issues/45679
pr: https://github.com/milvus-io/milvus/pull/45757
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
This will avoid endless retry CreateDatabase/DropDatabase when
cipherPlugin fails in the new DDL framework.
See also: #45826
pr: #45827
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
1. Array.h: Add output_data(ScalarFieldProto&) overload for both Array
and ArrayView classes
2. Use std::string_view instead of std::string for VARCHAR and GEOMETRY
types to avoid extra string copies
3. Call Reserve(length_) before writing to proto objects to reduce
memory reallocations
a simple test shows those optimizations improve the Array of Varchar
bulk_subscript performance by 20%
issue: https://github.com/milvus-io/milvus/issues/45679
pr: https://github.com/milvus-io/milvus/pull/45743
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #45782
pr: #45787
- because the zero value of the repeated field and bytes field in proto
is ignored or treated as empty value but not nil pointer, so we need to
fix the recovery info of the broadcast task from proto to keep the
consistency of memory state.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #45718
pr: #45719
Logging complete segment ID arrays caused excessive log volume (3-6 TB
for 200k segments). Remove arrays from logger fields and keep only
segment counts for observability.
Changes:
- Remove requestSegments/preparedSegments arrays from Load logger
- Remove segmentIDs from BM25 stats logs
- Remove entries structure from sync distribution log
This reduces log volume by 99.99% for large-scale operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #45681
Related to #44974
The emplace() operation on tbb::concurrent_hash_map was not protected,
allowing other threads to erase entries between the emplace attempt and
the subsequent lookup.
Solution:
1. Add shared_lock protection around the emplace() operation to prevent
concurrent erasure during insertion
2. Instead of returning nullptr when the key is not found on retry,
recursively call Get(key) to retry the entire operation
3. Fix typo: "earsed" -> "erased"
This ensures that concurrent Get() operations are properly synchronized
and will eventually succeed even under high contention.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45608
pr: #45609
When component.Prepare() fails (e.g., net listener creation error), the
sign channel was never closed, causing runComponent to block
indefinitely at <-sign. This resulted in the entire process hanging
after logging the error message.
Changes:
- Move close(sign) to defer statement in runComponent goroutine
- Ensures sign channel is always closed regardless of success/failure
- Allows proper error propagation through future.Await() mechanism
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45210
pr: #45606
If the underlying WAL is failed to open, the recovery info size of
streaming coord streamingcoord-meta/pchannel will increase fast until
reaching the etcd limitation.
So make a compaction by serverID at assignment history to decrease the
streamingcoord-meta/pchannel size.
Signed-off-by: chyezh <chyezh@outlook.com>
Cherry-pick from master
pr: #45615
Related to #45614
This commit fixes a bug where certain collection attributes were not
properly updated during collection modification, causing metadata errors
after cluster restart and collection reload failures.
When altering a collection, the `EnableDynamicField` and `SchemaVersion`
attributes were not being persisted to the catalog. This caused
inconsistencies between the in-memory collection metadata and the
persisted state, leading to:
- Dynamic field validation failures after restart
- Collection loading errors
- Metadata state mismatches
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45572
Related to #45543
When a field with a default value is added to a collection, the default
value becomes null after compaction instead of retaining the expected
default value.
**Root Cause**
The `appendValueAt` function in `internal/storage/arrow_util.go`
incorrectly checked if the entire arrow.Array was nil before handling
default values. This meant that default values were only applied when
the array itself was nil, not when individual field values were null
(which is the correct condition).
**Changes**
1. **Early nil check**: Added a guard at the function entry to detect
nil arrow.Array and return an error immediately, as this is an
unexpected condition that should not occur during normal operation.
2. **Refactored default value handling**: Removed the per-type nil array
checks and moved default value logic to handle individual null values
within the array (when `IsNull(idx)` returns true).
3. **Applied to all types**: Updated the logic consistently across all
builder types:
- BooleanBuilder
- Int8Builder, Int16Builder, Int32Builder, Int64Builder
- Float32Builder
- StringBuilder
- BinaryBuilder (added default value support for internal $meta json)
- ListBuilder (removed unnecessary nil check)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44800
pr: #44846
This commit enhances the upsert and validation logic to properly handle
nullable Geometry (WKT/WKB) and Timestamptz data types:
- Add ToCompressedFormatNullable support for TimestamptzData,
GeometryWktData, and GeometryData to filter out null values during data
compression
- Implement GenNullableFieldData for Timestamptz and Geometry types to
generate nullable field data structures
- Update FillWithNullValue to handle both GeometryData and
GeometryWktData with null value filling logic
- Add UpdateFieldData support for Timestamptz, GeometryData, and
GeometryWktData field updates
- Comprehensive unit tests covering all new data type handling scenarios
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45227
pr: #45228
Increase the default session TTL to 30 seconds to tolerate etcd failover
time. This prevents session expiration during etcd cluster failover,
improving system stability.
When etcd undergoes failover (leader election or node restart), the
previous 10s TTL was too short to survive the failover window, causing
unnecessary session expiration and component restarts. The new 30s TTL
provides sufficient buffer for etcd to complete failover while
maintaining session liveness.
Changes:
- Update DefaultSessionTTL constant from 10 to 30
- Update SessionTTL ParamItem DefaultValue from "10" to "30"
Signed-off-by: Wei Liu <wei.liu@zilliz.com>