424 Commits

Author SHA1 Message Date
wei liu
3fbee154f6
enhance: Remove large segment ID arrays from QueryNode logs (#45719)
issue: #45718

Logging complete segment ID arrays caused excessive log volume (3-6 TB
for 200k segments). Remove arrays from logger fields and keep only
segment counts for observability.

Changes:
- Remove requestSegments/preparedSegments arrays from Load logger
- Remove segmentIDs from BM25 stats logs
- Remove entries structure from sync distribution log

This reduces log volume by 99.99% for large-scale operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-11-20 17:18:14 +08:00
congqixia
0a208d7224
enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488)
Related to #45060

Refactor segment loading architecture to make segments autonomously
manage their own loading process, moving the orchestration logic from Go
(segment_loader.go) to C++ (segcore).

**C++ Layer (segcore):**
- Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and
implementations
- Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading
strategy:
  - Separates indexed fields from non-indexed fields
  - Loads indexes concurrently using thread pools
  - Loads field data for non-indexed fields in parallel
- Implemented `SegmentGrowingImpl::Load()` to convert and load field
data
- Extracted `LoadIndexData()` as a reusable utility function in
`Utils.cpp`
- Added `SegmentLoad()` C binding in `segment_c.cpp`

**Go Layer:**
- Added `Load()` method to segment interfaces
- Updated mock implementations and test interfaces
- Integrated new C++ `SegmentLoad()` binding in Go segment wrapper

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-14 11:21:37 +08:00
sparknack
f815f57b82
enhance: check both eviction and warmup when estimate segment loading size (#45222)
issue: #44857

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-10 14:15:36 +08:00
congqixia
e284733399
fix: Move FinishLoad before text index creation to ensure raw data availability (#45334)
Related to #45333

Fix segment loading failure when adding fields with text match enabled.
The issue occurred because text indexes were being loaded before
FinishLoad() was called, meaning raw data was not properly available
when text index creation attempted to access it, resulting in "failed to
create text index, neither raw data nor index are found" errors.

Solution is to move the FinishLoad() call to execute after raw data
loading but before text index loading. This ensures that:
1. Raw data is properly loaded and available in memory
2. Text indexes can access the raw data they need during creation
3. The segment is in the correct state before any index operations

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-06 14:49:34 +08:00
Lior Friedman
a4d69031f1
fix: Add AiSAQ index type RAM estimation implementation on the query node. (#45246)
Currently, the index type AiSAQ RAM usage estimation is not being
calculated correctly.
AiSAQ index type consumes less RAM usage while loading the index than
DISKANN does, and the query node module is missing the implementation of
the RAM usage estimation for that AiSAQ index type.
We suggest that the AiSAQ RAM usage estimation calculation should be as
follows:
 
UsedDiskMemoryRatioAisaq = 1024 (contrary to the UsedDiskMemoryRatio,
which is 4)
neededMemSize = indexInfo.IndexSize / UsedDiskMemoryRatioAisaq 
neededDiskSize = indexInfo.IndexSize

Reported issue is #45247

---------

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: friedl <lior.friedman@kioxia.com>
Co-authored-by: friedl <lior.friedman@kioxia.com>
2025-11-06 08:53:34 +08:00
congqixia
e6be590b97
enhance: set schema version when creating new collection (#45263)
Related to #43028

Initialize the schema version field when creating a new collection
instance in QueryNode. The schema version is extracted from loadMetaInfo
and assigned to the collection, ensuring proper schema version tracking
and consistency across the distributed system.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-04 10:15:32 +08:00
congqixia
36a887b38b
enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061)
This commit introduces the foundation for enabling segments to manage
their own loading process by passing load information during segment
creation.

Changes:

C++ Layer:
- Add NewSegmentWithLoadInfo() C API to create segments with serialized
load info
- Add SetLoadInfo() method to SegmentInterface for storing load
information
- Refactor segment creation logic into shared CreateSegment() helper
function
- Add comprehensive documentation for the new API

Go Layer:
- Extend CreateCSegmentRequest to support optional LoadInfo field
- Update segment creation in querynode to pass SegmentLoadInfo when
available
- Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto
translation

Proto Definitions:
- Add segcorepb.SegmentLoadInfo message with essential loading metadata
- Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo,
TextIndexStats, JsonKeyStats
- Remove dependency on data_coord.proto by creating segcore-specific
definitions

Testing:
- Add comprehensive unit tests for proto conversion functions
- Test edge cases including nil inputs, empty data, and nil array/map
elements

This is the first step toward issue #45060 - enabling segments to
autonomously manage their loading process, which will:
- Clarify responsibilities between Go and C++ layers
- Reduce cross-language call overhead
- Enable precise resource management at the C++ level
- Support better integration with caching layer
- Enable proactive schema evolution handling

Related to #45060

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-27 15:28:12 +08:00
sparknack
935160840c
enhance: add a disk quota for the loaded binlog size to prevent load failure of querynode (#44893)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-19 19:44:01 +08:00
sparknack
c8a4d6e2ef
enhance: add cachinglayer management for TextMatchIndex (#44741)
issue: #41435, #44502

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-13 14:37:58 +08:00
sparknack
6d5b41644b
enhance: remove logical usage checks during segment loading (#44743)
issue: #41435

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-13 14:21:58 +08:00
congqixia
5ece760d73
fix: Pass fs via FileManagerContext when loading index (#44733)
Related to #44615

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-11 09:55:57 +08:00
zhagnlu
eac16a577c
enhance:support cachelayer for json stats (#44446)
#42533

Signed-off-by: zhagnlu <lu.zhang@zilliz.com>
2025-09-24 15:30:04 +08:00
Tianx
2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
jiaqizho
338ed2fed4
enhance: Introduce sparse filter in query (#44347)
issue: #44373

The current commit implements sparse filtering in query tasks using the
statistical information (Bloom filter/MinMax) of the Primary Key (PK).

The statistical information of the PK is bound to the segment during the
segment loading phase. A new filter has been added to the segment filter
to enable the sparse filtering functionality.

Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
2025-09-23 09:58:09 +08:00
Gao
d3784c6515
enhance: add storage resource usage for vector search (#44308)
issue: #44212 

Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-09-19 20:20:02 +08:00
zhenshan.cao
691a8df953
feat: Add RESTful api for rolling upgrade support (#44381)
issue: https://github.com/milvus-io/milvus/issues/43968

Co-authored-by: chyezh <ye.zhen@zilliz.com>
2025-09-16 20:08:00 +08:00
congqixia
aa861f55e6
enhance: [StorageV2] Reverts #44232 bucket name change (#44390)
Related to #39173

- Put bucket name concatenation logic back for azure support

This reverts commit 8f97eb355fde6b86cf37f166d2191750b4210ba3.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-16 10:10:00 +08:00
congqixia
abe22b95c7
enhance: Utilize group info estimating logic usage as well (#44356)
Related to #44257 #44334

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-12 22:47:57 +08:00
congqixia
9d2ff48d63
enhance: Utilize group split info to estimate usage (#44338)
Related to #44257 #44334

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-12 14:49:57 +08:00
congqixia
8f97eb355f
enhance: [StorageV2] Make bucket name concatenation transparent to user (#44232)
Related to #39173

This PR:
- Bump milvus-storage commit to handle bucket name concatenation logic
in multipart s3 fs
- Remove all user-side bucket name concatenation code

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-08 10:15:55 +08:00
Gao
2e98cb0103
enhance: load resource estimation for tiered index (#44171)
issue: https://github.com/milvus-io/milvus/issues/42032

- Use bytes to estimate load resource in the whole estimation procedure
- Add num_rows and dim info for vector index to better estimate
- Disable eviction for tiered index's meta

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-09-04 19:41:53 +08:00
Bingyi Sun
0c0630cc38
feat: support dropping index without releasing collection (#42941)
issue: #42942

This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-02 16:17:52 +08:00
congqixia
7721edf32a
enhance: Add mutex and range check preventing concurrent del (#44128)
This PR adds a mutex prevent concurrent applying delete on same segment
and check latestDeltaTimestamp to skip overlapping delete range

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-01 14:29:52 +08:00
zhagnlu
fc876639cf
enhance: support json stats with shredding design (#42534)
#42533

Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-01 10:49:52 +08:00
sparknack
70c8114e85
enhance: cachinglayer: resource management for segment loading (#43846)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-29 11:37:50 +08:00
Buqian Zheng
6420d72391
enhance: print as storage size unit MB with 2 digits only, so the log is easier to read (#44085)
issue: https://github.com/milvus-io/milvus/issues/41435

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-08-27 19:47:50 +08:00
Chun Han
da156981c6
feat: milvus support posix-compatible mode(milvus-io#43942) (#43944)
related: #43942

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-27 16:29:50 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Spade A
8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
zhagnlu
8934c18792
enhance: support cache result cache for expr (#43923)
issue: #43878

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-26 10:55:52 +08:00
wei liu
399f63300c
enhance: Implement dynamic interval updates for ticker components (#43865)
issue: #43858

Enable dynamic configuration updates for ticker intervals without
restart. This enhancement allows runtime configuration changes to take
effect immediately for better operational flexibility.

Changes include:
- Apply "drain+Reset only when interval changed" pattern across all
ticker components to preserve existing timing phases
- Fix goroutine variable capture issue in CheckerController.Start()
- Remove unnecessary ticker.Stop() in manual trigger paths
- Add dynamic interval checking in QueryCoordV2 components:
  * checkers/controller.go: Various checker intervals
  * dist/dist_handler.go: DistPullInterval, CheckExecutedFlagInterval
  * session/cluster.go: CheckNodeSessionInterval
  * server.go: CheckAutoBalanceConfigInterval
  * observers/target_observer.go: UpdateNextTargetInterval
  * observers/collection_observer.go: CollectionObserverInterval
- Add dynamic interval checking in QueryNodeV2 components:
  * segments/disk_usage_fetcher.go: DiskSizeFetchInterval
- Ensure thread safety by performing all ticker operations in same
goroutine with proper drain before Reset to avoid spurious triggers

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-21 10:07:47 +08:00
Spade A
d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
Xianhui Lin
c7d8dc100a
fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex (#43811)
fix: add segment lock in LoadTextIndex and LoadJSONKeyIndex
issue:https://github.com/milvus-io/milvus/issues/43572

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-08-18 01:17:52 +08:00
sparknack
544c7c0600
enhance: update cachinglayer default cache ratio to 0.3 (#43723)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-05 01:35:39 +08:00
zhagnlu
f14c7d598c
fix: skip load raw data when loading index for storagev2 (#43720)
#43653

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-04 21:17:39 +08:00
Chun Han
d826d6ac91
fix: try to get span raw data for variable length data type(#43544) (#43705)
related: #43544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-04 11:15:38 +08:00
sparknack
bdd65871ea
enhance: tiered storage: estimate segment loading resource usage while considering eviction (#43323)
issue: #41435 

After introducing the caching layer's lazy loading and eviction
mechanisms, most parts of a segment won't be loaded into memory or disk
immediately, even if the segment is marked as LOADED. This means
physical resource usage may be very low. However, we still need to
reserve enough resources for the segments marked as LOADED. Thus, the
logic of resource usage estimation during segment loading, which based
on physcial resource usage only for now, should be changed.

To address this issue, we introduced the concept of logical resource
usage in this patch. This can be thought of as the base reserved
resource for each LOADED segment.

A segment’s logical resource usage is derived from its final evictable
and inevictable resource usage and calculated as follows:

```
SLR = SFPIER + evitable_cache_ratio * SFPER
```

it also equals to

```
SLR = (SFPIER + SFPER) - (1.0 - evitable_cache_ratio) * SFPER
```

`SLR`: The logical resource usage of a segment.
`SFPIER`: The final physical inevictable resource usage of a segment.
`SFPER`: The final physical evictable resource usage of a segment.
`evitable_cache_ratio`: The ratio of a segment's evictable resources
that can be cached locally. The higher the ratio, the more physical
memory is reserved for evictable memory.

When loading a segment, two types of resource usage are taken into
account.

First is the estimated maximum physical resource usage:

```
PPR = HPR + CPR + SMPR - SFPER
```

`PPR`: The predicted physical resource usage after the current segment
is allowed to load.
`HPR`: The physical resource usage obtained from hardware information.  
`CPR`: The total physical resource usage of segments that have been
committed but not yet loaded. When one new segment is allow to load,
`CPR' = CPR + (SMR - SER)`. When one of the committed segments is
loaded, `CPR' = CPR - (SMR - SER)`.
`SMPR`: The maximum physical resource usage of the current segment.
`SFPER`: The final physical evictable resource usage of the current
segment.

Second is the estimated logical resource usage, this check is only valid
when eviction is enabled:

```
PLR = LLR + CLR + SLR
```

`PLR`: The predicted logical resource usage after the current segment is
allowed to load.
`LLR`: The total logical resource usage of all loaded segments. When a
new segment is loaded, `LLR` should be updated to `LLR' = LLR + SLR`.
`CLR`: The total logical resource usage of segments that have been
committed but not yet loaded. When one new segment is allow to load,
`CLR' = CLR + SLR`. When one of the committed segments is loaded, `CLR'
= CLR - SLR`.
`SLR`: The logical resource usage of the current segment.

Only when `PPR < PRL && PLR < PRL` (`PRL`: Physical resource limit of
the querynode), the segment is allowed to be loaded.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-01 21:31:37 +08:00
zhagnlu
2594250906
fix: fix miss loading index for storagev2 (#43674)
#43653

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-01 13:07:36 +08:00
Chun Han
d72c0357ff
fix: empty hybridsearch result due to one-sub-search empty(#43537) (#43647)
related: #43537

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-31 19:47:37 +08:00
Spade A
faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
Buqian Zheng
f7b262a702
feat: make storagev1 to support eviction (#43219)
issue: https://github.com/milvus-io/milvus/issues/41435

turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-19 02:02:52 +08:00
Zhen Ye
3aacd179f7
fix: balance channel before balance segment when upgrading (#43346)
issue: #43117, #42966, #43373

- also fix channel balance may not work at 2.6.
- fix error lost at delete path
- add mvcc into s/q log
- change the log level for TestCoordDownSearch

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-17 20:16:52 +08:00
Chun Han
07745439b5
fix: empty search groupby result causing crash(#43137) (#43214)
related: #43137

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-10 12:04:48 +08:00
congqixia
f027eea545
enhance: [AddField] Add log for segcore segment schema change (#43215)
Related to #39178

This PR add logs for segment schema change operations.

Also fixes the nit comments from PR #42490

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-10 10:22:47 +08:00
congqixia
7bc7b18ed5
fix: [AddField] Prevent concurrent load during UpdateSchema (#43043)
Related to #43028

This PR:
- Add mutex prevent concurrent load segment & schema change
- Add schema verison field in load meta
- Update schema in PutOrRef if schema verison is larger

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-02 17:38:44 +08:00
wei liu
c381bf3e41
enhance: add logs for count(*) (#43001)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-01 19:36:43 +08:00
Spade A
26ec841feb
feat: optimize Like query with n-gram (#41803)
Ref #42053

This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.


How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```

min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-07-01 10:08:44 +08:00
sthuang
0d57acb13a
enhance: [StorageV2] field id as meta path for wide column when load (#42863)
related: #42862 #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-06-25 11:08:48 +08:00
congqixia
4ba177cd2c
enhance: [StorageV2] Handle narrow column group resource estimation (#42842)
Related to #39173

In storage v2, "narrow" column group could have group id not mapped
schema, which causing loading fails or resource estimation result
inaccurate.

This PR handles this case by mapping binlog from index instead of vice
versa.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-19 14:44:39 +08:00