128 Commits

Author SHA1 Message Date
aoiasd
7d19c40e3c
feat: support search highlight with queries (#45736)
Previously, search with highlight only supported using BM25 search text
as the highlight target.
This PR adds support for highlighting with user-defined queries.
relate: https://github.com/milvus-io/milvus/issues/42589

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-01 10:17:09 +08:00
aoiasd
5efb0cedc8
feat: support use fragment config for highlight (#45099)
relate: https://github.com/milvus-io/milvus/issues/42589

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-24 17:07:06 +08:00
aoiasd
947c8855f3
feat: support search bm25 with highlight (#44923)
relate: https://github.com/milvus-io/milvus/issues/42589

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-18 16:09:39 +08:00
aoiasd
754997ac2b
enhance: update some annotations (#44769)
relate: https://github.com/milvus-io/milvus/issues/43114

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-17 16:22:02 +08:00
jiaqizho
338ed2fed4
enhance: Introduce sparse filter in query (#44347)
issue: #44373

The current commit implements sparse filtering in query tasks using the
statistical information (Bloom filter/MinMax) of the Primary Key (PK).

The statistical information of the PK is bound to the segment during the
segment loading phase. A new filter has been added to the segment filter
to enable the sparse filtering functionality.

Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
2025-09-23 09:58:09 +08:00
Bingyi Sun
0c0630cc38
feat: support dropping index without releasing collection (#42941)
issue: #42942

This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-02 16:17:52 +08:00
Zhen Ye
5551d99425
enhance: remove old arch non-streaming arch code (#43651)
issue: #41609

- remove all dml dead code at proxy
- remove dead code at l0_write_buffer
- remove msgstream dependency at proxy
- remove timetick reporter from proxy
- remove replicate stream implementation

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:41:40 +08:00
sparknack
bdd65871ea
enhance: tiered storage: estimate segment loading resource usage while considering eviction (#43323)
issue: #41435 

After introducing the caching layer's lazy loading and eviction
mechanisms, most parts of a segment won't be loaded into memory or disk
immediately, even if the segment is marked as LOADED. This means
physical resource usage may be very low. However, we still need to
reserve enough resources for the segments marked as LOADED. Thus, the
logic of resource usage estimation during segment loading, which based
on physcial resource usage only for now, should be changed.

To address this issue, we introduced the concept of logical resource
usage in this patch. This can be thought of as the base reserved
resource for each LOADED segment.

A segment’s logical resource usage is derived from its final evictable
and inevictable resource usage and calculated as follows:

```
SLR = SFPIER + evitable_cache_ratio * SFPER
```

it also equals to

```
SLR = (SFPIER + SFPER) - (1.0 - evitable_cache_ratio) * SFPER
```

`SLR`: The logical resource usage of a segment.
`SFPIER`: The final physical inevictable resource usage of a segment.
`SFPER`: The final physical evictable resource usage of a segment.
`evitable_cache_ratio`: The ratio of a segment's evictable resources
that can be cached locally. The higher the ratio, the more physical
memory is reserved for evictable memory.

When loading a segment, two types of resource usage are taken into
account.

First is the estimated maximum physical resource usage:

```
PPR = HPR + CPR + SMPR - SFPER
```

`PPR`: The predicted physical resource usage after the current segment
is allowed to load.
`HPR`: The physical resource usage obtained from hardware information.  
`CPR`: The total physical resource usage of segments that have been
committed but not yet loaded. When one new segment is allow to load,
`CPR' = CPR + (SMR - SER)`. When one of the committed segments is
loaded, `CPR' = CPR - (SMR - SER)`.
`SMPR`: The maximum physical resource usage of the current segment.
`SFPER`: The final physical evictable resource usage of the current
segment.

Second is the estimated logical resource usage, this check is only valid
when eviction is enabled:

```
PLR = LLR + CLR + SLR
```

`PLR`: The predicted logical resource usage after the current segment is
allowed to load.
`LLR`: The total logical resource usage of all loaded segments. When a
new segment is loaded, `LLR` should be updated to `LLR' = LLR + SLR`.
`CLR`: The total logical resource usage of segments that have been
committed but not yet loaded. When one new segment is allow to load,
`CLR' = CLR + SLR`. When one of the committed segments is loaded, `CLR'
= CLR - SLR`.
`SLR`: The logical resource usage of the current segment.

Only when `PPR < PRL && PLR < PRL` (`PRL`: Physical resource limit of
the querynode), the segment is allowed to be loaded.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-01 21:31:37 +08:00
wei liu
990a25e51a
fix: Prevent delete records loss during slow segment loading [QueryNodeV2] (#43527)
issue: #42884
Fixes an issue where delete records for a segment are lost from the
delete buffer if `load segment` execution on the delegator is too slow,
causing `syncTargetVersion` or other cleanup operations to clear them
prematurely.

Changes include:
- Introduced `Pin` and `Unpin` methods in `DeleteBuffer` interface and
its implementations (`doubleCacheBuffer`, `listDeleteBuffer`).
- Added a `pinnedTimestamps` map to track timestamps protected from
cleanup by specific segments.
- Modified `LoadSegments` in `shardDelegator` to `Pin` relevant segment
delete records before loading and `Unpin` them afterwards.
- Added `isPinned` check in `UnRegister` and `TryDiscard` methods of
`listDeleteBuffer` to skip cleanup if corresponding timestamps are
pinned.
- Added comprehensive unit tests for `Pin`, `Unpin`, and `isPinned`
functionality, covering basic, multiple pins, concurrent, and edge
cases.

This ensures the integrity of delete records by preventing their
premature removal from the delete buffer during segment loading.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-24 01:00:54 +08:00
aoiasd
54cc0b60f2
fix: dropped segment in excluded segment use wrong excluded ts (#43115)
cause some excluded growing data insert again
relate: https://github.com/milvus-io/milvus/issues/43114

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-08 18:04:46 +08:00
congqixia
7bc7b18ed5
fix: [AddField] Prevent concurrent load during UpdateSchema (#43043)
Related to #43028

This PR:
- Add mutex prevent concurrent load segment & schema change
- Add schema verison field in load meta
- Update schema in PutOrRef if schema verison is larger

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-02 17:38:44 +08:00
Zhen Ye
4bad293655
enhance: make upgrading from 2.5.x less down time (#42082)
issue: #40532

- start timeticksync at rootcoord if the streaming service is not
available
- stop timeticksync if the streaming service is available
- open a read-only wal if some nodes in cluster is not upgrading to 2.6
- allow to open read-write wal after all nodes in cluster is upgrading
to 2.6

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:02:29 +08:00
wei liu
54619eaa2c
feat: Implement partial result support on node down (#42009)
issue: https://github.com/milvus-io/milvus/issues/41690
This commit implements partial search result functionality when query
nodes go down, improving system availability during node failures. The
changes include:

- Enhanced load balancing in proxy (lb_policy.go) to handle node
failures with retry support
- Added partial search result capability in querynode delegator and
distribution logic
- Implemented tests for various partial result scenarios when nodes go
down
- Added metrics to track partial search results in querynode_metrics.go
- Updated parameter configuration to support partial result required
data ratio
- Replaced old partial_search_test.go with more comprehensive
partial_result_on_node_down_test.go
- Updated proto definitions and improved retry logic

These changes improve query resilience by returning partial results to
users when some query nodes are unavailable, ensuring that queries don't
completely fail when a portion of data remains accessible.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-28 00:12:28 +08:00
aoiasd
0fafb706ba
enhance: add segment bm25 stats local cache (#41775)
relate: https://github.com/milvus-io/milvus/issues/41424

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-05-26 18:44:27 +08:00
Zhen Ye
38c804fb01
fix: more stable recovery graceful closing and stable unittest (#42013)
issue: #41544

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-23 17:52:26 +08:00
wei liu
78010262f0
enhance: Optimize shard serviceable mechanism (#41937)
issue: https://github.com/milvus-io/milvus/issues/41690
- Merge leader view and channel management into ChannelDistManager,
allowing a channel to have multiple delegators.
- Improve shard leader switching to ensure a single replica only has one
shard leader per channel. The shard leader handles all resource loading
and query requests.
- Refine the serviceable mechanism: after QC completes loading, sync the
query view to the delegator. The delegator then determines its
serviceable status based on the query view.
- When a delegator encounters forwarding query or deletion failures,
mark the corresponding segment as offline and transition it to an
unserviceable state.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-22 11:38:24 +08:00
Buqian Zheng
ff5c2770e5
feat: cachinglayer: various improvements (#41546)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR is based on https://github.com/milvus-io/milvus/pull/41436. 

Improvements include:

- Lazy Load support for Storage v1
- Use Low/High watermark to control eviction
- Caching Layer related config changes
- Removed ChunkCache related configs and code in golang
- Add `PinAllCells` helper method to CacheSlot class
- Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce
caching layer overhead
- Removed some unclear templated bulk_subscript methods
- CachedSearchIterator to store PinWrapper when searching on
ChunkedColumn, and removed unused contrustor.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-10 09:19:16 +08:00
Zhen Ye
de8f0af20d
enhance: use dispatcher at delegator when enable streaming (#41266)
issue: #38399

- add an adaptor type to adapt the streaming service client and
msgstream client to reuse the msgdispatcher.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-06 01:12:53 +08:00
aoiasd
3892451880
fix: bm25 search failed when avgdl == nan (#41502)
relate: https://github.com/milvus-io/milvus/issues/41490

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-27 17:34:38 +08:00
aoiasd
f52c2909c4
feat: support multi analyzer for bm25 function (#41351)
relate: https://github.com/milvus-io/milvus/issues/41213

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 18:22:38 +08:00
aoiasd
655cc7fe06
fix: bm25 stats idf oracle leak (#41425)
relate: https://github.com/milvus-io/milvus/issues/41424

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 14:28:37 +08:00
SimFG
91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
wei liu
57212e5376
enhance: Optimize log output for L0 segment deletions (#40975)
related to: #40884 #39552
Reduce log frequency by aggregating deletion logs for L0 segments:
- Add segment count statistics in rangeHitL0Deletions function
- Change individual segment logs to a single consolidated log entry
- Include total number of processed L0 segments in log output

This change significantly reduces log volume while maintaining essential
visibility into deletion operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-08 12:04:26 +08:00
wei liu
06310a5994
fix: Fix L0 segment retention and improve delete buffer logging (#40884)
issue:#40207
related to https://github.com/milvus-io/milvus/pull/39552

- Correct comparison operator in UnRegister from > to >= to prevent
premature release of L0 segments with matching timestamps
- Add detailed logging for segment retention decisions during
unregistration
- Enhance error logging for buffer cleanup operations
- Add trace logs for segment registration/release lifecycle
- Include timestamp comparisons in debug logs for future troubleshooting

    Signed-off-by: Wei Liu <wei.liu@zilliz.com>

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-27 11:24:21 +08:00
Buqian Zheng
c12abf4e2a
enhance: improve sparse query nnz metric (#40713)
add query type and field id label; add metric for hybrid search

issue: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-03-18 17:20:16 +08:00
congqixia
94a859c028
enhance: Add buffer forwarder for stream delta loading (#40559)
See also #40558
Related to #35303 & #38066 as well

This PR:
- Add `BufferedForward` to limit memory usage forwarding stream delete
- Add `UseLoad` flag to determine `Delete` shall use `segment.Delete` or
`segment.LoadDelta`
- Fix delegator accidentally use always true candidate while load
streaming delta

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-03-17 15:24:10 +08:00
wei liu
0420dc1eb1
fix: use correct delete checkpoint to prevent premature data cleanup (#40366)
issue: #40292
related to #39552

- Fix incorrect delete checkpoint usage in SyncDistribution
- Change checkpoint parameter from action.GetCheckpoint() to
action.GetDeleteCP() in SyncTargetVersion call
- This resolves the issue where delete buffer data was being cleaned
prematurely due to wrong checkpoint reference

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-12 15:00:08 +08:00
wei liu
69b8b89369
enhance: Remove QueryCoord's scheduling of L0 segments (#39552)
issue: #39551
This PR remove querycoord's scheduling of l0 segments:
  - only load l0 segment when watch channel
- only release l0 segment when release channel or sync data distribution

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-02-26 21:38:00 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Patrick Weizhi Xu
04fff74a56
feat: introduce Text data type (#39874)
issue: https://github.com/milvus-io/milvus/issues/39818

This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily. 

Storage changes for Text data type will be in the following PRs.

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2025-02-19 11:04:51 +08:00
aoiasd
24d2bbc441
enhance: unmashall ts msg in dispatcher instead in msgstream (#38656)
relate: https://github.com/milvus-io/milvus/issues/38655

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-02-14 12:04:13 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
Bingyi Sun
aa0a87eda7
fix: Block warmup submit if pool full in sync mode (#38690)
https://github.com/milvus-io/milvus/issues/38692

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-01-02 15:04:58 +08:00
Zhen Ye
69a9fd6ead
enhance: enable rmq for streaming (#38669)
issue: #38399

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-24 20:24:48 +08:00
congqixia
9c8c1b3bb7
enhance: Remove levelZeroMut totally (#38473)
The level zero mutex could be remove since all operations are guarded by
segment manager mutex

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-16 14:38:45 +08:00
Buqian Zheng
75e64b993f
enhance: add metrics for counting number of nun-zeros/tokens of sparse/FTS search (#38329)
sparse vectors may have arbitrary number of non zeros and it is hard to
optimize without knowing the actual distribution of nnz. this PR adds a
metric for analyzing that.

issue: https://github.com/milvus-io/milvus/issues/35853

comparing with https://github.com/milvus-io/milvus/pull/38328, this
includes also metric for FTS in query node delegator

also fixed a bug of sparse when searching by pk

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-12-12 16:22:43 +08:00
congqixia
051bc280dd
enhance: Make dynamic load/release partition follow targets (#38059)
Related to #37849

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 16:24:40 +08:00
congqixia
1ed686783f
enhance: Use PrimaryKeys to replace interface slice for segment delete (#37880)
Related to #35303

Reduce temporary memory usage for PK interface for segment delete.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-22 11:52:33 +08:00
wei liu
5f3601a6a5
fix: unstable integration test caused by paramtable.GetNodeID (#37909)
issue: #37908
cause paramtable is global single instance, which cause
paramtable.GetNodeID may return wrong server id in integration test.

This PR use node.GetNodeID to replace paramtable.GetNodeID

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 22:16:33 +08:00
congqixia
c79fbd5eab
fix: Load l0 delta for growings when using RemoteLoad (#37771)
Related to #37574

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-19 10:38:31 +08:00
wei liu
351463b67e
fix: L0 segment has been loaded to worker during channel balance (#37748)
issue: #37703

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-18 20:58:30 +08:00
congqixia
f54cf41830
enhance: Move forward l0 logic out of delta lock (#37337)
Related to #35303

`deleteMut` shall be protecting streaming delete buffer, forward l0
could be move out of the rlock section to reduce tsafe impact from
loading segments.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-04 10:42:22 +08:00
congqixia
f87acdf2a2
fix: Ref collection meta when load l0 segment meta only (#37178)
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 15:49:38 +08:00
congqixia
05f880708d
enhance: Make skip load work for all branches (#37160)
Related to #37112

Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 23:37:29 +08:00
Buqian Zheng
088d5d7d76
fix: optimize BM25 err message (#37074)
issue: https://github.com/milvus-io/milvus/issues/37022

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-25 14:35:45 +08:00
congqixia
b086ef6b19
enhance: Skip load delta data in delegater when using RemoteLoad (#37082)
Related to #35303

Delta data is not needed when using `RemoteLoad` l0 forward policy. By
skipping load delta data, memory pressure could be eased if l0 segment
size/number is large.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-24 16:21:37 +08:00
Zhen Ye
ac178eeea5
enhance: make delegator lock critical smaller (#36997)
issue: #36804

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-21 11:33:25 +08:00
aoiasd
fbe177d6e7
fix: avoid panic when load segment with pkoracle and idforacle already exist (#36959)
relate: https://github.com/milvus-io/milvus/issues/36949

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-18 11:57:24 +08:00
aoiasd
72dc07ba48
fix: bm25 search failed when nq > 1 and remove idf oracle when no bm25 field exist. (#36886)
relate: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-16 12:51:23 +08:00
Buqian Zheng
383350c120
feat: added more checks for function creation check (#36766)
issue: https://github.com/milvus-io/milvus/issues/35853

* BM25 Function now takes no params, k1, b should be passed via index
params
* support BM25 full text search when metric type is not present in
search request
* add more strict validation with functions at collection creation time

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-13 17:43:22 +08:00