22824 Commits

Author SHA1 Message Date
tinswzy
173efe2b98
enhance: wp metrics and update deps to v0.1.0 (#43569)
#43574   #43604 #43431  #43603 
Fix wp metrics not registered bug;
Update the version dependent on wp to v0.1.2-rc1;
improve advanced reader with concurrent prefetch blks;
add the segment rolling policy based on the number of blocks;
improve concurrent compaction
release lock failed bug

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-29 14:51:35 +08:00
congqixia
268f1cdace
fix: Hold field shared_ptr in case of being released (#43614)
Related to #43584

Directly accessing `fields_` in `get_raw_data` may have race if load vec
index happens concurrently during getting raw data.

This PR make `bulk_subscript` hold shared_ptr of field column prevent
field column being release during reading it.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-29 12:15:36 +08:00
Chun Han
4ee9f63f72
fix: return id by default(#43595) (#43601)
related: #43595

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-29 12:07:36 +08:00
zhenshan.cao
4835ef9db8
enhance: update committers (#43620)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-07-29 12:05:43 +08:00
congqixia
18d8dc82b8
feat: [GoSDK] Support search iterator v2 (#43612)
Related to #37548

Also link #43122

This patch implements basic functions of search iterator v2.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-29 11:21:35 +08:00
aoiasd
c9412434c8
enhance: add char group tokenizer (#42793)
relate: https://github.com/milvus-io/milvus/issues/42792
Add char group tokenizer which support use costum char group or use some
build-in char group as delimiters.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-29 11:11:35 +08:00
congqixia
f666d89919
fix: [StorageV2] Access future result to get exception if any (#43613)
Related to #43584

When `LoadWithStrategy` throw exception, the ex was wrapped in the
returned future. If the future is not handled, this exception would be
ignored.

This patch add `future.get()` to get exception if any.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-28 22:33:35 +08:00
Xiaofan
bd31b32167
fix: hybridsearch should support offset param in restful api (#43586)
Add support of offset param for reqeustful. api and refine some constant
usage related #43556

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-07-28 22:15:36 +08:00
yihao.dai
a29b3272b0
fix: Improve import memory management to prevent OOM (#43568)
1. Use blocking memory allocation to wait until memory becomes available
2. Perform memory allocation at the file level instead of per task
3. Limit Parquet file reader batch size to prevent excessive memory
consumption
4. Limit import buffer size from 20% to 10% of total memory

issue: https://github.com/milvus-io/milvus/issues/43387,
https://github.com/milvus-io/milvus/issues/43131

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-28 21:25:35 +08:00
Zhen Ye
5b9b895cb0
fix: get schema panics when recover from channel checkpoint (#43605)
issue: #43597, #43598

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-28 16:42:56 +08:00
Spade A
864d1b93b1
enhance: enable stlsort with mmap support (#43359)
issue: https://github.com/milvus-io/milvus/issues/43358

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-28 15:32:55 +08:00
zhagnlu
9bf1cb02d5
fix: add array_contains_all int to float converter (#43593)
#43334

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-28 14:14:55 +08:00
Zhen Ye
648994182f
fix: pulsar use more memory for queue (#43565)
issue: #43564

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-28 14:00:56 +08:00
yihao.dai
192521c6bd
enhance: Fix unbalanced task scheduling (#43581)
Make scheduler always pick the node with the most available slots.

issue: https://github.com/milvus-io/milvus/issues/43580

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-28 12:58:55 +08:00
congqixia
34d3f0c0f8
enhance: Reserve builder space for ValueSerializer (#43570)
Add `arrowBuild.Reserve` call for `ValueSerializer` to reduce repeated
resizing buffer when write size is large

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-28 11:02:55 +08:00
yanliang567
10ec3ce2bf
test: Upgrade pymilvus to 2.6.0rc166 (#43543)
related issue: #40698

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
2025-07-28 10:30:55 +08:00
wei liu
7b8bf6393b
enhance: Improve partial result evaluation with row count based strategy (#43361)
issue: #43360
Enhance the partial result evaluation mechanism in delegator to use row
count based data ratio instead of simple segment count ratio for better
accuracy.

Key improvements:
- Introduce PartialResultEvaluator interface for flexible evaluation
strategy
- Implement NewRowCountBasedEvaluator using sealed segment row count
data
- Replace segment count based ratio with row count based data ratio
calculation
- Update PinReadableSegments to return sealedRowCount information
- Modify executeSubTasks to use configurable evaluator for partial
result decisions
- Add comprehensive unit tests for the new row count based evaluation
logic

This change provides more accurate partial result evaluation by
considering the actual data volume rather than just segment quantity,
leading to better query performance and consistency when some segments
are unavailable.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-28 10:18:55 +08:00
Zhen Ye
7877aaa96c
fix: dirty cp metrics after drop (#43567)
issue: #42688

- The channel cp is dropped by garbage collector
- The channel is dropped and the cp is marked as math.Uint64
- If we drop it here, the update channel checkpoints will write the
dirty cp back.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-27 23:22:55 +08:00
Zhen Ye
feb5db60f2
fix: make flush save binlog paths idempotent (#43579)
issue: #43574

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-27 23:14:55 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Buqian Zheng
b497d3d7a4
fix: call promise->setValue only after released the ListNode mtx (#43547)
issue: #43261

`promise->setValue(folly::Unit());` may run callbacks inline and some of
them may attempt to grab `mtx_`. So we should not call
`promise->setValue(folly::Unit());` while holding the lock.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-26 18:34:55 +08:00
Ted Xu
9041bf1b9a
fix: including shouldCopy parameter in file readers (#43578)
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.

See #43186

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-26 17:30:55 +08:00
Bingyi Sun
742d72a6c2
fix: Fix wrong null offsets for json path index (#43390)
issue: https://github.com/milvus-io/milvus/issues/43315

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-26 17:26:54 +08:00
Bingyi Sun
a89e579485
fix: use tantivy version to make json index compatible with milvus 2.5 (#43563)
issue: https://github.com/milvus-io/milvus/issues/43562

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-26 17:18:55 +08:00
congqixia
0b860b4aec
fix: Revert "enhance: DataCodec to release ownership of input_data after initialization (#43542)" (#43571) 2025-07-25 20:48:16 +08:00
Zhen Ye
070aabd27e
enhance: fix remove flushing state of segment (#43560)
issue: #43559, #42884

- also fix the data lost when streaming resuming from old arch message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-25 18:08:54 +08:00
congqixia
2a7b7a811a
fix: [StorageV2] Throw exception when read rg fails (#43561)
Related to #43261

Read error with catched in `LoadWithStrategy`. Caller could not detect
read failure when some error occurred. This patch make
`LoadWithStrategy` throw ex instead of swallowing it.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-25 17:40:55 +08:00
yihao.dai
0e1f367164
enhance: Fail compaction task to prevent data loss (#43545)
We’ve frequently observed data loss caused by broken mutual exclusion in
compaction tasks. This PR introduces a post-check: before modifying
metadata upon compaction task completion, it verifies the state of the
input segments. If any input segment has been dropped, the compaction
task will be marked as failed.

issue: https://github.com/milvus-io/milvus/issues/43513

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-25 16:24:54 +08:00
Ted Xu
078ccf5e08
fix: the underlying record got released in clustering compaction (#43551)
See: #43186

In this PR:

1. Flush renamed to FlushChunk, while a new Flush primitive is
introduced to serialize values to records.
2. Segment mapping in clustering compaction now process data by records
instead of values, it calls flush to all buffers after each record is
processed.

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-07-25 15:04:54 +08:00
Buqian Zheng
d23205b718
enhance: DataCodec to release ownership of input_data after initialization (#43542)
issue: https://github.com/milvus-io/milvus/issues/43088
issue: https://github.com/milvus-io/milvus/issues/43038

see also https://github.com/milvus-io/milvus/pull/43533.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-25 14:24:54 +08:00
wei liu
369a811ae1
fix: only clear exclude node list after refresh shard leader cache (#43553)
issue: #43511

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-25 14:18:54 +08:00
sthuang
5cebc9f7f6
fix: [StorageV2] handle correct cid with multiple files and add storage v2 prefix logs (#43539)
related: #43372

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-25 11:22:54 +08:00
Shuyoou
87326a5a64
fix: [skip e2e] webui collection filter params error (#42969)
Fix Issue: #40929

Signed-off-by: Shuyoou <shuyoou@outlook.com>
2025-07-25 10:40:53 +08:00
tinswzy
83f6811dbd
fix: local fs incomplete block read bug (#43444)
#43340 fix log reader bug
#43370 list object goroutine leak ; block flush bug
#43431 #43356 improve read latency 
other fix: local FS block CRC fix; incomplete block read bugfix;
multi-segment rolling not complete bug; local fs concurent flush bug
other enhance: log reader EOF-based segment end detection ; revisioned
log/segment meta updates.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2025-07-25 10:30:54 +08:00
Spade A
10fe53ff59
feat: support json for ngram (#43170)
Ref https://github.com/milvus-io/milvus/issues/42053

This PR enable ngram to support json data type.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-25 10:28:54 +08:00
sthuang
a0c9f499ee
fix: [StorageV2] sync panic with nullable add field (#43142)
related: https://github.com/milvus-io/milvus/pull/42932
fix: https://github.com/milvus-io/milvus/issues/43072

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-25 10:08:53 +08:00
zhagnlu
c86307aef0
enhance: forbid two column comparison with json type in parser stage (#43382)
#43381

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-24 19:42:54 +08:00
congqixia
fe16de702b
test: [GoSDK] Use strong consistency level for hybrid search cases (#43536)
There are some unstable cases in go sdk e2e cases, which used default
bounded consistency level. This patch make these cases use strong level
to avoid unstable test results

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-24 15:26:54 +08:00
yihao.dai
804a7692a6
fix: Fix delete loss caused by missing mutual exclusion in sort compaction (#43540)
issue: https://github.com/milvus-io/milvus/issues/43513

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-24 14:53:34 +08:00
qixuan
3d8f728091
test: modify add field case about skipped cases (#43461)
related issue: #42126

Signed-off-by: qixuan <673771573@qq.com>
2025-07-24 14:26:54 +08:00
Buqian Zheng
d367770649
enhance: greatly reduce the loading memory overhead - by up to 25% (#43533)
issue: #43088
issue: #43038

The current loading process:

* When loading an index, we first download the index files into a list
of buffers, say A
* then constructing(copying) them into a vector of FieldDatas(each file
is a FieldData), say B
* assembles them together as a huge BinarySet, say C
* lastly, copy into the actual index data structure, say D

The problem:

* We can see that, after each step, we don't need the data in previous
step.
* But currently, we release the memory of A, B, C only after we have
finished constructing D
* This leads to a up to 4x peak memory usage comparing with the raw
index size, during the loading process
* This PR allows timely releasing of B after we assembled C. So after
this PR, the peak memory usage during loading will be up to 3x of the
raw index size.

I will create another PR to release A after we created B, that seems
more complicated and need more work.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-24 11:26:54 +08:00
congqixia
4bdb5ccafa
fix: Close segment writer when reader returns error (#43531)
Realted #43520

Datanode may have memory leakage when reader returns error. In
previously mention issue, datanodes got OOM killed due to continueous
error in read path.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-24 11:18:54 +08:00
Jean-Francois Weber-Marx
1bd66b09e3
enhance: allow '.' and '-' characters in usernames (#42417) (#42588)
related: #42417

- update the isValidUsername function to accept dots and hyphens in
addition to letters, digits, and underscores
- this change improves compatibility with common username formats and
addresses feedback in issue #42417

Signed-off-by: Jean-Francois Weber-Marx <jfwm@hotmail.com>
Signed-off-by: Jean-Francois Weber-Marx <jf.webermarx@criteo.com>
2025-07-24 09:54:54 +08:00
wei liu
990a25e51a
fix: Prevent delete records loss during slow segment loading [QueryNodeV2] (#43527)
issue: #42884
Fixes an issue where delete records for a segment are lost from the
delete buffer if `load segment` execution on the delegator is too slow,
causing `syncTargetVersion` or other cleanup operations to clear them
prematurely.

Changes include:
- Introduced `Pin` and `Unpin` methods in `DeleteBuffer` interface and
its implementations (`doubleCacheBuffer`, `listDeleteBuffer`).
- Added a `pinnedTimestamps` map to track timestamps protected from
cleanup by specific segments.
- Modified `LoadSegments` in `shardDelegator` to `Pin` relevant segment
delete records before loading and `Unpin` them afterwards.
- Added `isPinned` check in `UnRegister` and `TryDiscard` methods of
`listDeleteBuffer` to skip cleanup if corresponding timestamps are
pinned.
- Added comprehensive unit tests for `Pin`, `Unpin`, and `isPinned`
functionality, covering basic, multiple pins, concurrent, and edge
cases.

This ensures the integrity of delete records by preventing their
premature removal from the delete buffer during segment loading.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-24 01:00:54 +08:00
congqixia
1cf8ed505f
fix: Implement NeededFields feature in RecordReader (#43523)
Related to #43522

Currently, passing partial schema to storage v2 packed reader may
trigger SEGV during clustering compaction unit test.

This patch implement `NeededFields` differently in each `RecordReader`
imlementation. For now, v2 will implemented as no-op. This will be
supported after packed reader support this API.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-24 00:22:54 +08:00
yanliang567
abb3aeacdf
test: Refactor diskann and hsnw index, and update gen data functions (#43452)
related issue #40698
1. add diskann and hnsw index test
2. update gen_row_data and gen_column_data functions

---------

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
2025-07-23 22:04:54 +08:00
Zhen Ye
e9ab73e93d
enhance: add schema version at recovery storage (#43500)
issue: #43072, #43289

- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-23 21:38:54 +08:00
yihao.dai
9fbd41a97d
fix: Adjust binlog and parquet reader buffer size for import (#43495)
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.

issue: https://github.com/milvus-io/milvus/issues/43387

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-07-23 21:28:54 +08:00
foxspy
ed57650b52
fix: remove invalid restrictions on dim for int8 vector (#43469)
issue: #43466

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-23 20:22:54 +08:00
cai.zhang
74c08069ef
fix: Set result storage version for sort compaction (#43521)
issue: #43520

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-23 19:04:53 +08:00