Cherry pick from 2.6
pr: 45283
Related to #45282
Initialize `tsFrom` and `tsTo` fields in the composite binlog record
writer constructor to prevent timestamp range information loss in stats
tasks.
The composite binlog writer now properly initializes the timestamp range
fields, ensuring that:
1. The first timestamp update will correctly set the minimum (`tsFrom`)
2. The first timestamp update will correctly set the maximum (`tsTo`)
3. All subsequent data writes will maintain accurate timestamp range
tracking
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44909
pr: #44917
When requery optimization is enabled, search results contain IDs but
empty FieldsData. During reduce/rerank operations, if the first shard
has empty FieldsData while others have data, PrepareResultFieldData
initializes an empty array, causing AppendFieldData to panic when
accessing array indices.
Changes:
- Find first non-empty FieldsData as template in 3 functions:
reduceAdvanceGroupBy, reduceSearchResultDataWithGroupBy,
reduceSearchResultDataNoGroupBy
- Add length check before 2 AppendFieldData calls in reduce functions to
prevent panic
- Improve newRerankOutputs to find first non-empty fieldData using
len(FieldsData) check instead of GetSizeOfIDs
- Add length check in appendResult before AppendFieldData
- Add comprehensive unit tests for empty and partial empty FieldsData
scenarios in both reduce and rerank functions
This fix handles both pure requery (all empty) and mixed scenarios (some
empty, some with data) without breaking normal search flow. The key
improvement is checking FieldsData length directly rather than IDs, as
requery may have IDs but empty FieldsData.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
fix: [2.6] add null check for packed_writer_ in
JsonStatsParquetWriter::Close() (#45158) (#45176)
Cherry-pick from master
Related to #45157
Fix a bug where DataNode panics when building json stats index throws an
exception before the writer is initialized. The destructor would call
Close() on an uninitialized packed_writer_ pointer, causing a null
pointer dereference.
fix: fix bug for shredding json when empty but not null json (#45214)
pr: #45221
fix: not use json_shredding for json path is null (#45311)
pr: #45310
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
When there're no growing segments in the collection, L0 Compaction will
try to choose all L0 segments that hits all L1/L2 segments.
However, if there's Sealed Segment still under flushing in DataNode at
the same time L0 Compaction selects satisfied L1/L2 segments, L0
Compaction will ignore this Segment because it's not in "FlushState",
which is wrong, causing missing deletes on the Sealed Segment.
This quick solution here is to fail this L0 compaction task once
selected a Sealed segment.
See also: #45339
pr: #45340
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Cherry-pick from master
pr: #45334
Related to #45333
Fix segment loading failure when adding fields with text match enabled.
The issue occurred because text indexes were being loaded before
FinishLoad() was called, meaning raw data was not properly available
when text index creation attempted to access it, resulting in "failed to
create text index, neither raw data nor index are found" errors.
Solution is to move the FinishLoad() call to execute after raw data
loading but before text index loading. This ensures that:
1. Raw data is properly loaded and available in memory
2. Text indexes can access the raw data they need during creation
3. The segment is in the correct state before any index operations
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45330
Related to #45329
Fix compaction failure when handling newly added dynamic fields with
storage v1 binlogs. The issue occurred because the
`GenerateEmptyArrayFromSchema` function did not support JSON data type
default values, causing "Unexpected default value" errors during
compaction.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #45062
Update milvus-storage version from aa189ad to e5f5b4c to include the fix
for duplicate AWS SDK initialization that was causing init conflicts.
This update removes the redundant arrow::fs::InitializeS3() call that
was resulting in duplicate Aws::InitAPI() initialization. The duplicate
initialization was causing AWS SDK to ignore custom configurations,
particularly affecting GCP Workload Identity authentication.
Changes in milvus-storage e5f5b4c:
- Remove redundant arrow::fs::InitializeS3() call
- Keep only the extended S3 initialization with custom AWS SDK options
- Ensure GCP IAM authentication via custom HTTP client factory works
correctly
Relates to #44745
Reference: milvus-io/milvus-storage#285
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43858
pr: #44834
Fix the issue introduced in PR #43992 where deactivating the balance
checker incorrectly stops stopping balance operations.
Changes:
- Move IsActive() check after stopping balance logic
- Only skip normal balance when checker is inactive
- Allow stopping balance to proceed regardless of checker state
This ensures stopping balance can execute even when the balance checker
is deactivated.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #44912
Related to #44897
Add missing JSON data type handling in CreateArrowScalarFromDefaultValue
to fix query failures when dynamic fields are enabled. JSON default
values are now properly converted to arrow::BinaryScalar using
bytes_data().
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #44648
master pr: #44911
If the value is `null` during insertion, it will be omitted instead of
being filled with nil. Therefore, when performing checks, there’s no
need to retrieve data based on the valid offset.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #44802
master pr: #44889
After a Geometry object is serialized into WKB, the resulting binary may
contain '\0' bytes.
When growing mmap is enabled, the append data logic uses strcpy, which
stops copying at the first '\0' bytes.
This causes only part of the WKB---typically the portion up to the
geometry type field to be copied, leading to corrupted data.
As a result, during parsing, all POINT geometries are incorrectly
interperted as POINT(0 0).
To fix this issue, memcpy will be used instead of strcpy.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Cherry-pick from master
pr: #44870
Related to #44819
This fix addresses an issue(#44819) where the offset parameter did not
work correctly during searches when multiple results had identical
scores. The problem occurred because results with equal scores were not
consistently ordered, leading to unpredictable pagination behavior.
The solution adds a new sorting step (SortEqualScoresByPks) in the
reduce phase that sorts results with identical scores by their primary
keys in ascending order. This ensures deterministic ordering and enables
proper offset functionality.
Changes:
- Add SortEqualScoresByPks() to sort results with equal scores by PK
- Add SortEqualScoresOneNQ() to handle per-query sorting logic
- Invoke sorting step after FillPrimaryKey() in Reduce() workflow
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related issue: https://github.com/milvus-io/milvus/issues/44425
pr: #44801
split insert.py into a few files: upsert.py, insert.py,
partial_upsert.py ...
add test for allow insert auto id
---------
Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
Cherry-pick from master
pr: #44789
This commit addresses issue #44788 where the
`datacoord_stored_binlog_size` metric could become inaccurate when
multiple concurrent `GetMetrics` calls arrived at DataCoord.
### Problem
The original implementation called `Reset()` followed by `Add()`
operations on Prometheus metrics within the `GetQuotaInfo()` method.
When multiple goroutines invoked this method concurrently, race
conditions occurred:
- Thread 1: Reset() → Add(value1)
- Thread 2: Reset() → Add(value2)
- Result: Metrics could be reset multiple times and values added in an
interleaved fashion, leading to inaccurate and inflated metric values
### Solution
Changed the approach from `Reset() + Add()` to aggregating metric values
in local maps first, then using `Set()` to update metrics atomically:
1. Collect segment size data into local maps:
- `storedBinlogSize`: tracks size per collection per segment state
- `binlogFileSize`: tracks total file count per collection
- `coll2DbName`: maps collection IDs to database names
2. After aggregation is complete, use `Set()` (instead of `Add()`) to
update metrics in a single operation per label combination
This ensures that concurrent `GetMetrics` calls don't interfere with
each other, as each invocation works with its own local state and only
updates the final metric value atomically.
---------
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>