issue: #45718
pr: #45719
Logging complete segment ID arrays caused excessive log volume (3-6 TB
for 200k segments). Remove arrays from logger fields and keep only
segment counts for observability.
Changes:
- Remove requestSegments/preparedSegments arrays from Load logger
- Remove segmentIDs from BM25 stats logs
- Remove entries structure from sync distribution log
This reduces log volume by 99.99% for large-scale operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #45681
Related to #44974
The emplace() operation on tbb::concurrent_hash_map was not protected,
allowing other threads to erase entries between the emplace attempt and
the subsequent lookup.
Solution:
1. Add shared_lock protection around the emplace() operation to prevent
concurrent erasure during insertion
2. Instead of returning nullptr when the key is not found on retry,
recursively call Get(key) to retry the entire operation
3. Fix typo: "earsed" -> "erased"
This ensures that concurrent Get() operations are properly synchronized
and will eventually succeed even under high contention.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45608
pr: #45609
When component.Prepare() fails (e.g., net listener creation error), the
sign channel was never closed, causing runComponent to block
indefinitely at <-sign. This resulted in the entire process hanging
after logging the error message.
Changes:
- Move close(sign) to defer statement in runComponent goroutine
- Ensures sign channel is always closed regardless of success/failure
- Allows proper error propagation through future.Await() mechanism
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45210
pr: #45606
If the underlying WAL is failed to open, the recovery info size of
streaming coord streamingcoord-meta/pchannel will increase fast until
reaching the etcd limitation.
So make a compaction by serverID at assignment history to decrease the
streamingcoord-meta/pchannel size.
Signed-off-by: chyezh <chyezh@outlook.com>
Cherry-pick from master
pr: #45615
Related to #45614
This commit fixes a bug where certain collection attributes were not
properly updated during collection modification, causing metadata errors
after cluster restart and collection reload failures.
When altering a collection, the `EnableDynamicField` and `SchemaVersion`
attributes were not being persisted to the catalog. This caused
inconsistencies between the in-memory collection metadata and the
persisted state, leading to:
- Dynamic field validation failures after restart
- Collection loading errors
- Metadata state mismatches
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45572
Related to #45543
When a field with a default value is added to a collection, the default
value becomes null after compaction instead of retaining the expected
default value.
**Root Cause**
The `appendValueAt` function in `internal/storage/arrow_util.go`
incorrectly checked if the entire arrow.Array was nil before handling
default values. This meant that default values were only applied when
the array itself was nil, not when individual field values were null
(which is the correct condition).
**Changes**
1. **Early nil check**: Added a guard at the function entry to detect
nil arrow.Array and return an error immediately, as this is an
unexpected condition that should not occur during normal operation.
2. **Refactored default value handling**: Removed the per-type nil array
checks and moved default value logic to handle individual null values
within the array (when `IsNull(idx)` returns true).
3. **Applied to all types**: Updated the logic consistently across all
builder types:
- BooleanBuilder
- Int8Builder, Int16Builder, Int32Builder, Int64Builder
- Float32Builder
- StringBuilder
- BinaryBuilder (added default value support for internal $meta json)
- ListBuilder (removed unnecessary nil check)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44800
pr: #44846
This commit enhances the upsert and validation logic to properly handle
nullable Geometry (WKT/WKB) and Timestamptz data types:
- Add ToCompressedFormatNullable support for TimestamptzData,
GeometryWktData, and GeometryData to filter out null values during data
compression
- Implement GenNullableFieldData for Timestamptz and Geometry types to
generate nullable field data structures
- Update FillWithNullValue to handle both GeometryData and
GeometryWktData with null value filling logic
- Add UpdateFieldData support for Timestamptz, GeometryData, and
GeometryWktData field updates
- Comprehensive unit tests covering all new data type handling scenarios
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #45227
pr: #45228
Increase the default session TTL to 30 seconds to tolerate etcd failover
time. This prevents session expiration during etcd cluster failover,
improving system stability.
When etcd undergoes failover (leader election or node restart), the
previous 10s TTL was too short to survive the failover window, causing
unnecessary session expiration and component restarts. The new 30s TTL
provides sufficient buffer for etcd to complete failover while
maintaining session liveness.
Changes:
- Update DefaultSessionTTL constant from 10 to 30
- Update SessionTTL ParamItem DefaultValue from "10" to "30"
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
This causes collection schema properties is empty in datacoord caches,
thus making compaction, indexing, unable to get properties from schema.
See also: #45053, #45159
pr: #45502
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #45452
pr: #45506
- alias/rename related DDL should use database level exclusive lock
- alias cannot use as the resource key of lock, use collection name
instead
- transfer replica should use WAL-based framework
Signed-off-by: chyezh <chyezh@outlook.com>
This PR changes the config layout according to the latest design, and
adds two external credential configs for aws kms
See also: #45169
pr: #45170
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #45397, #45403, #45463
pr: #45461
also pick pr: #45447
- fix alter collection with alias failed with collection not found
- fix the Nightly E2E failures.
- fix the wrong update timetick of altering collection to fix the
related load failure.
---------
Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: sijie-ni-0214 <sijie.ni@zilliz.com>
Cherry-pick from master
pr: #45455
Related to #45445
Previously, FillFieldData for JSON fields would assert and fail when a
default_value was provided, blocking index creation for JSON fields with
default values (including dynamic fields like $meta).
This change enables JSON default value support by:
- Removing the assertion that blocked default values
- Parsing bytes_data into Json objects when default_value is present
- Properly filling data_ array and setting valid_data_ bitset to true
- Maintaining null behavior when no default_value is provided
Impact:
- Fixes index creation failure for JSON fields with default values
- Resolves upgrade issues from 2.5 to 2.6.5 where dynamic fields with
default values couldn't be indexed
- Index builds that were stuck in InProgress state can now complete
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45444
Related to #45338
When using bulk vector search in hybrid search with rerank functions,
the output field values for different queries were all equal to the
values returned by the first query, instead of the correct values
belonging to each document ID. The document IDs were correct, but the
entity field values were wrong.
In rerank functions (RRF, weighted, decay, model), when processing
multiple queries in a batch, the `idLocations` stored only the relative
offset within each result set (`idx`), not accounting for the absolute
position within the entire batch. This caused `FillFieldData` to
retrieve field data from the wrong positions, always using offsets
relative to the first query.
This fix ensures that when processing bulk searches with rerank
functions, each result correctly retrieves its corresponding field data
based on the absolute offset within the entire batch, resolving the
issue where all queries returned the first query's field values.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>