845 Commits

Author SHA1 Message Date
aoiasd
55feb7ded8
feat: set related resource ids in collection schema (#46423)
Support crate analyzer with file resource info, and return used file
resource ids when validate analyzer.
Save the related resource ids in collection schema.
relate: https://github.com/milvus-io/milvus/issues/43687

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: analyzer file-resource resolution is deterministic and
traceable by threading a FileResourcePathHelper (collecting used
resource IDs in a HashSet) through all tokenizer/analyzer construction
and validation paths; validate_analyzer(params, extra_info) returns the
collected Vec<i64) which is propagated through C/Rust/Go layers to
callers (CValidateResult → RustResult::from_vec_i64 → Go []int64 →
querypb.ValidateAnalyzerResponse.ResourceIds →
CollectionSchema.FileResourceIds).

- Logic removed/simplified: ad‑hoc, scattered resource-path lookups and
per-filter file helpers (e.g., read_synonyms_file and other inline
file-reading logic) were consolidated into ResourceInfo +
FileResourcePathHelper and a centralized get_resource_path(helper, ...)
API; filter/tokenizer builder APIs now accept &mut
FileResourcePathHelper so all file path resolution and ID collection use
the same path and bookkeeping logic (redundant duplicated lookups
removed).

- Why no data loss or behavior regression: changes are additive and
default-preserving — existing call sites pass extra_info = "" so
analyzer creation/validation behavior and error paths remain unchanged;
new Collection.FileResourceIds is populated from resp.ResourceIds in
validateSchema and round‑tripped through marshal/unmarshal
(model.Collection ↔ schemapb.CollectionSchema) so schema persistence
uses the new list without overwriting other schema fields; proto change
adds a repeated field (resource_ids) which is wire‑compatible (older
clients ignore extra field). Concrete code paths: analyzer creation
still uses create_analyzer (now with extra_info ""), tokenizer
validation still returns errors as before but now also returns IDs via
CValidateResult/RustResult, and rootcoord.validateSchema assigns
resp.ResourceIds → schema.FileResourceIds.

- New capability added: end‑to‑end discovery, return, and persistence of
file resource IDs used by analyzers — validate flows now return resource
IDs and the system stores them in collection schema (affects tantivy
analyzer binding, canalyzer C bindings, internal/util analyzer APIs,
querynode ValidateAnalyzer response, and rootcoord/create_collection
flow).
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-26 22:49:19 +08:00
zhagnlu
9ba0c4e501
fix:add json stats version because previous change #46130 (#46467)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-24 19:17:18 +08:00
congqixia
6452d146af
enhance: move jemalloc_stats from pkg to internal/util/segcore (#46560)
Related to #46133

Move jemalloc_stats.go and its test file from pkg/util/hardware to
internal/util/segcore. This is a more appropriate location because:
- jemalloc_stats depends on milvus_core C++ library via cgo
- The pkg directory should remain independent of internal C++
dependencies
- segcore is the natural home for core memory allocator utilities

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved internal code organization by reorganizing memory statistics
collection infrastructure for better maintainability and modularity. No
impact on end-user functionality or behavior.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-24 19:03:18 +08:00
marcelo-cjl
3b599441fd
feat: Add nullable vector support for proxy and querynode (#46305)
related: #45993 

This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
    for nullable vectors with sparse storage.
    
    Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
      to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
      nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
      index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
      for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
    - Update msg_pack.go with nullable vector field data processing
    
    QueryNode layer changes:
    - Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
    
    Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
    
    Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
      logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
    - Update funcutil.go with nullable vector support functions

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
  * Client: compaction control and compaction-state APIs.

* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.

* **Chores**
  * Persisted validity maps with indexes and on-disk formats.

* **Tests**
  * Extensive new and updated end-to-end nullable-vector tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-24 10:13:19 +08:00
XuanYang-cn
0507db2015
feat: Add force merge (#45556)
See also: #46043

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-12-19 18:03:18 +08:00
Spade A
ad8aba7cb4
feat: impl ComputePhraseMatchSlop for compute min slop for phrase match query (#45892)
issue: https://github.com/milvus-io/milvus/issues/45890

ComputePhraseMatchSlop accepts three pararms:
1. A string: query text
2. Some trings: data texts
3. Analyzer params,

Slop will be calculated for the query text with each data text in the
context of phrase match where they are tokenized with tokenizer with
analyzer params.

So two array will be returned:
1. is_match: is phrase match can sucess
2. slop: the related slop if phrase match can sucess, or -1 is cannot.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-19 16:03:18 +08:00
aoiasd
7e4f87e351
fix: Init analyzer at delegator for all field with enable analyzer (#46361)
To support text match highlight
relate: https://github.com/milvus-io/milvus/issues/46308

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-19 10:23:18 +08:00
congqixia
21ed1fabfd
feat: support reopen segment for data/schema changes (#46359)
issue: #46358

This PR implements segment reopening functionality on query nodes,
enabling the application of data or schema changes to already-loaded
segments without requiring a full reload.

### Core (C++)

**New SegmentLoadInfo class**
(`internal/core/src/segcore/SegmentLoadInfo.h/cpp`):
- Encapsulates segment load configuration with structured access
- Implements `ComputeDiff()` to calculate differences between old and
new load states
- Tracks indexes, binlogs, and column groups that need to be loaded or
dropped
- Provides `ConvertFieldIndexInfoToLoadIndexInfo()` for index loading

**ChunkedSegmentSealedImpl modifications**:
- Added `Reopen(const SegmentLoadInfo&)` method to apply incremental
changes based on computed diff
- Refactored `LoadColumnGroups()` and `LoadColumnGroup()` to support
selective loading via field ID map
- Extracted `LoadBatchIndexes()` and `LoadBatchFieldData()` for reusable
batch loading logic
- Added `LoadManifest()` for manifest-based loading path
- Updated all methods to use `SegmentLoadInfo` wrapper instead of direct
proto access

**SegmentGrowingImpl modifications**:
- Added `Reopen()` stub method for interface compliance

**C API additions** (`segment_c.h/cpp`):
- Added `ReopenSegment()` function exposing reopen to Go layer

### Go Side

**QueryNode handlers** (`internal/querynodev2/`):
- Added `HandleReopen()` in handlers.go
- Added `ReopenSegments()` RPC in services.go

**Segment interface** (`internal/querynodev2/segments/`):
- Extended `Segment` interface with `Reopen()` method
- Implemented `Reopen()` in LocalSegment
- Added `Reopen()` to segment loader

**Segcore wrapper** (`internal/util/segcore/`):
- Added `Reopen()` method in segment.go
- Added `ReopenSegmentRequest` in requests.go

### Proto

- Added new fields to support reopen in `query_coord.proto`

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-17 15:49:16 +08:00
congqixia
efa7ccdf81
fix: pass manifest path when loading growing segments (#46378)
Related to #44956

Pass ManifestPath field to SegmentLoadInfo when loading growing segments
in loadGrowingSegments function. This ensures storage v2 can properly
locate segment data via manifest path, consistent with other segment
loading paths.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-17 10:19:15 +08:00
XuanYang-cn
0bbb134e39
feat: Enable to backup and reload ez (#46332)
see also: #40013

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-12-16 17:19:16 +08:00
Spade A
f6f716bcfd
feat: impl StructArray -- support embedding searches embeddings in embedding list with element level filter expression (#45830)
issue: https://github.com/milvus-io/milvus/issues/42148

For a vector field inside a STRUCT, since a STRUCT can only appear as
the element type of an ARRAY field, the vector field in STRUCT is
effectively an array of vectors, i.e. an embedding list.
Milvus already supports searching embedding lists with metrics whose
names start with the prefix MAX_SIM_.

This PR allows Milvus to search embeddings inside an embedding list
using the same metrics as normal embedding fields. Each embedding in the
list is treated as an independent vector and participates in ANN search.

Further, since STRUCT may contain scalar fields that are highly related
to the embedding field, this PR introduces an element-level filter
expression to refine search results.
The grammar of the element-level filter is:

element_filter(structFieldName, $[subFieldName] == 3)

where $[subFieldName] refers to the value of subFieldName in each
element of the STRUCT array structFieldName.

It can be combined with existing filter expressions, for example:

"varcharField == 'aaa' && element_filter(struct_field, $[struct_int] ==
3)"

A full example:
```
struct_schema = milvus_client.create_struct_field_schema()
struct_schema.add_field("struct_str", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("struct_int", DataType.INT32)
struct_schema.add_field("struct_float_vec", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)

schema.add_field(
    "struct_field",
    datatype=DataType.ARRAY,
    element_type=DataType.STRUCT,
    struct_schema=struct_schema,
    max_capacity=1000,
)
...

filter = "varcharField == 'aaa' && element_filter(struct_field, $[struct_int] == 3 && $[struct_str] == 'abc')"
res = milvus_client.search(
    COLLECTION_NAME,
    data=query_embeddings,
    limit=10,
    anns_field="struct_field[struct_float_vec]",
    filter=filter,
    output_fields=["struct_field[struct_int]", "varcharField"],
)

```
TODO:
1. When an `element_filter` expression is used, a regular filter
expression must also be present. Remove this restriction.
2. Implement `element_filter` expressions in the `query`.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-15 12:01:15 +08:00
aoiasd
0c54875832
enhance: ValidateAnalyzer return ValidateAnalyzerResponse instead common.Status (#46292)
Prepare for return more info when validate analyzer.
relate: https://github.com/milvus-io/milvus/issues/43687

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-12 10:35:14 +08:00
wgcn
6e2872c982
fix: wrong reduce lantency metric (#46233)
#46248

Signed-off-by: wgcn <wangg48@chinatelecom.cn>
Co-authored-by: wgcn <wangg48@chinatelecom.cn>
2025-12-10 14:17:13 +08:00
zhagnlu
8f0b7983ec
enhance: add jemalloc cached monitor (#46041)
#46133

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-09 19:53:13 +08:00
wei liu
046693eaf7
test: [skip e2e] fix race condition in TestQueryNodePipeline/TestBasic (#46218)
issue: #46217
The test was failing intermittently because it didn't wait for the
pipeline to finish processing messages before exiting. The test sent a
message to the pipeline and immediately returned, causing the deferred
Close() to execute before ProcessInsert, ProcessDelete, and UpdateTSafe
could be called.

Fix by:
- Moving message construction before mock expectations setup
- Adding a done channel to synchronize on UpdateTSafe completion
- Waiting for the signal before test exits

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-09 17:57:14 +08:00
Zhen Ye
459425ac84
fix: wrong context using by session of grpc client (#46183)
issue: #46182

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-08 21:47:12 +08:00
aoiasd
354ab2f55e
enhance: sync file resource to querynode and datanode (#44480)
relate:https://github.com/milvus-io/milvus/issues/43687
Support use file resource with sync mode.
Auto download or remove file resource to local when user add or remove
file resource.
Sync file resource to node when find new node session.

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-04 16:23:11 +08:00
wei liu
e70c01362d
enhance: Add resource exhaustion querynode penalty policy (#45808)
issue: #40513
for querynode which return resource exhausted error, add a penalty
duration on it, and suspend loading new resource until penalty duration
expired.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-12-02 16:59:11 +08:00
Zhen Ye
2ef18c5b4f
enhance: remove watch at session liveness check (#45968)
issue: #45724

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-12-01 17:55:10 +08:00
zhagnlu
3901f112ae
enhance: make estimate json stats size more accurate (#45875)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-01 15:31:10 +08:00
aoiasd
7d19c40e3c
feat: support search highlight with queries (#45736)
Previously, search with highlight only supported using BM25 search text
as the highlight target.
This PR adds support for highlighting with user-defined queries.
relate: https://github.com/milvus-io/milvus/issues/42589

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-01 10:17:09 +08:00
congqixia
a7275e190e
fix: populate index info after segment loading to prevent redundant load tasks (#45803)
After segments gained self-management capabilities for loading, the
index information from the initial load was not being preserved in the
Go-side segment metadata. This caused QueryCoord to repeatedly dispatch
load index tasks, which would fail in segcore since the indexes were
already loaded.

**Root Cause:**
The segment's `fieldIndexes` map was not being populated with index
metadata after calling `FinishLoad`, leading to a mismatch between the
Go-side metadata and segcore's internal state.

**Solution:**
After successfully loading a sealed segment, iterate through
`loadInfo.IndexInfos` and insert each index entry into the segment's
`fieldIndexes` map. This ensures the Go-side metadata stays in sync with
segcore and prevents redundant load index operations.

Fixes #45802
Related to #45060

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-24 19:55:07 +08:00
XuanYang-cn
c082317681
fix: Use base64 to encode not utf-8 bytes (#45655)
See also: #45654

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-11-24 18:23:06 +08:00
aoiasd
5efb0cedc8
feat: support use fragment config for highlight (#45099)
relate: https://github.com/milvus-io/milvus/issues/42589

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-24 17:07:06 +08:00
wei liu
3fbee154f6
enhance: Remove large segment ID arrays from QueryNode logs (#45719)
issue: #45718

Logging complete segment ID arrays caused excessive log volume (3-6 TB
for 200k segments). Remove arrays from logger fields and keep only
segment counts for observability.

Changes:
- Remove requestSegments/preparedSegments arrays from Load logger
- Remove segmentIDs from BM25 stats logs
- Remove entries structure from sync distribution log

This reduces log volume by 99.99% for large-scale operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-11-20 17:18:14 +08:00
aoiasd
947c8855f3
feat: support search bm25 with highlight (#44923)
relate: https://github.com/milvus-io/milvus/issues/42589

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-11-18 16:09:39 +08:00
congqixia
0a208d7224
enhance: Move segment loading logic from Go layer to segcore for self-managed loading (#45488)
Related to #45060

Refactor segment loading architecture to make segments autonomously
manage their own loading process, moving the orchestration logic from Go
(segment_loader.go) to C++ (segcore).

**C++ Layer (segcore):**
- Added `SetLoadInfo()` and `Load()` methods to `SegmentInterface` and
implementations
- Implemented `ChunkedSegmentSealedImpl::Load()` with parallel loading
strategy:
  - Separates indexed fields from non-indexed fields
  - Loads indexes concurrently using thread pools
  - Loads field data for non-indexed fields in parallel
- Implemented `SegmentGrowingImpl::Load()` to convert and load field
data
- Extracted `LoadIndexData()` as a reusable utility function in
`Utils.cpp`
- Added `SegmentLoad()` C binding in `segment_c.cpp`

**Go Layer:**
- Added `Load()` method to segment interfaces
- Updated mock implementations and test interfaces
- Integrated new C++ `SegmentLoad()` binding in Go segment wrapper

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-14 11:21:37 +08:00
cai.zhang
216c576da2
fix: Retain collection early to prevent it from being released before query completion (#45413)
issue: #45314

This PR only ensures that no panic occurs. However, we still need to
provide protection for the delegator handling ongoing query tasks.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-11-11 20:29:37 +08:00
sparknack
f815f57b82
enhance: check both eviction and warmup when estimate segment loading size (#45222)
issue: #44857

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-10 14:15:36 +08:00
congqixia
e284733399
fix: Move FinishLoad before text index creation to ensure raw data availability (#45334)
Related to #45333

Fix segment loading failure when adding fields with text match enabled.
The issue occurred because text indexes were being loaded before
FinishLoad() was called, meaning raw data was not properly available
when text index creation attempted to access it, resulting in "failed to
create text index, neither raw data nor index are found" errors.

Solution is to move the FinishLoad() call to execute after raw data
loading but before text index loading. This ensures that:
1. Raw data is properly loaded and available in memory
2. Text indexes can access the raw data they need during creation
3. The segment is in the correct state before any index operations

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-06 14:49:34 +08:00
Lior Friedman
a4d69031f1
fix: Add AiSAQ index type RAM estimation implementation on the query node. (#45246)
Currently, the index type AiSAQ RAM usage estimation is not being
calculated correctly.
AiSAQ index type consumes less RAM usage while loading the index than
DISKANN does, and the query node module is missing the implementation of
the RAM usage estimation for that AiSAQ index type.
We suggest that the AiSAQ RAM usage estimation calculation should be as
follows:
 
UsedDiskMemoryRatioAisaq = 1024 (contrary to the UsedDiskMemoryRatio,
which is 4)
neededMemSize = indexInfo.IndexSize / UsedDiskMemoryRatioAisaq 
neededDiskSize = indexInfo.IndexSize

Reported issue is #45247

---------

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: friedl <lior.friedman@kioxia.com>
Co-authored-by: friedl <lior.friedman@kioxia.com>
2025-11-06 08:53:34 +08:00
congqixia
e6be590b97
enhance: set schema version when creating new collection (#45263)
Related to #43028

Initialize the schema version field when creating a new collection
instance in QueryNode. The schema version is extracted from loadMetaInfo
and assigned to the collection, ensuring proper schema version tracking
and consistency across the distributed system.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-04 10:15:32 +08:00
Zhen Ye
576084fe86
enhance: support alter collection/database with WAL-based DDL framework (#45266)
issue: #43897

- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-04 09:59:33 +08:00
congqixia
8c98adfeb3
fix: update QueryNode NumEntities metrics when collection has no segments (#45147)
Related to #44509

Fix a bug where QueryNodeNumEntities metrics were not updated for
collections with zero segments, causing stale metrics when all segments
are flushed or compacted.

The previous implementation used separate loops: one to update size
metrics for all collections, and another to update num entities metrics
only for collections present in the grouped segments map. Collections
with no segments were skipped in the second loop, leaving their
NumEntities metrics stale.

Changes:
- Consolidate size and num entities metric updates into single loop
- Iterate over all collections instead of grouped segments
- Get collection metadata from manager instead of segment instances
- Correctly set NumEntities to 0 for collections with no segments
- Apply the same fix to both growing and sealed segment processing
- Add nil check for collection metadata before processing

This ensures all collection metrics are updated consistently, even when
segment count drops to zero.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-30 10:12:08 +08:00
aoiasd
ad9a0cae48
enhance: add global analyzer options (#44684)
relate: https://github.com/milvus-io/milvus/issues/43687
Add global analyzer options, avoid having to merge some milvus params
into user's analyzer params.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-28 14:52:10 +08:00
congqixia
36a887b38b
enhance: add NewSegmentWithLoadInfo API to support segment self-managed loading (#45061)
This commit introduces the foundation for enabling segments to manage
their own loading process by passing load information during segment
creation.

Changes:

C++ Layer:
- Add NewSegmentWithLoadInfo() C API to create segments with serialized
load info
- Add SetLoadInfo() method to SegmentInterface for storing load
information
- Refactor segment creation logic into shared CreateSegment() helper
function
- Add comprehensive documentation for the new API

Go Layer:
- Extend CreateCSegmentRequest to support optional LoadInfo field
- Update segment creation in querynode to pass SegmentLoadInfo when
available
- Add ConvertToSegcoreSegmentLoadInfo() and helper converters for proto
translation

Proto Definitions:
- Add segcorepb.SegmentLoadInfo message with essential loading metadata
- Add supporting messages: Binlog, FieldBinlog, FieldIndexInfo,
TextIndexStats, JsonKeyStats
- Remove dependency on data_coord.proto by creating segcore-specific
definitions

Testing:
- Add comprehensive unit tests for proto conversion functions
- Test edge cases including nil inputs, empty data, and nil array/map
elements

This is the first step toward issue #45060 - enabling segments to
autonomously manage their loading process, which will:
- Clarify responsibilities between Go and C++ layers
- Reduce cross-language call overhead
- Enable precise resource management at the C++ level
- Support better integration with caching layer
- Enable proactive schema evolution handling

Related to #45060

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-27 15:28:12 +08:00
aoiasd
cfeb095ad7
enhance: forbid build analyzer at proxy (#44067)
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-23 10:58:12 +08:00
aoiasd
ac82bad0b3
enhance: optimize idf oracle sync logic (#44628)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-20 15:42:08 +08:00
sparknack
935160840c
enhance: add a disk quota for the loaded binlog size to prevent load failure of querynode (#44893)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-19 19:44:01 +08:00
aoiasd
754997ac2b
enhance: update some annotations (#44769)
relate: https://github.com/milvus-io/milvus/issues/43114

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-17 16:22:02 +08:00
sparknack
c8a4d6e2ef
enhance: add cachinglayer management for TextMatchIndex (#44741)
issue: #41435, #44502

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-13 14:37:58 +08:00
sparknack
6d5b41644b
enhance: remove logical usage checks during segment loading (#44743)
issue: #41435

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-10-13 14:21:58 +08:00
congqixia
5ece760d73
fix: Pass fs via FileManagerContext when loading index (#44733)
Related to #44615

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-11 09:55:57 +08:00
wei liu
33d1e7de83
fix: Replace incorrect log import with milvus v2 log package (#44731)
issue: #44730
Fix the issue where logs were not outputting as expected due to
incorrect log package imports across multiple components.

Changes include:
- Add golangci-lint rule to forbid github.com/pingcap/log usage
- Replace github.com/pingcap/log with
github.com/milvus-io/milvus/pkg/v2/log

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-10-10 20:27:57 +08:00
Zhen Ye
a110d8cc49
fix: don't use logical resource for metrics of quota center on streaming node (#44613)
issue: #44599

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-29 21:34:13 +08:00
aoiasd
78ee76f018
enhance: support preload sealed segment bm25 stats and optimize bm25 stats serialize (#44279)
relate: https://github.com/milvus-io/milvus/issues/41424

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-09-29 16:35:05 +08:00
Zhen Ye
b6b59bd222
fix: remove redundant initialization of storage v2 (#44597)
issue: #44596

- querynode already init the storage v2 and segcore, so streamingnode
should not do this again.
- It also fix the gcp object storage access denied.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-09-29 10:17:04 +08:00
zhagnlu
eac16a577c
enhance:support cachelayer for json stats (#44446)
#42533

Signed-off-by: zhagnlu <lu.zhang@zilliz.com>
2025-09-24 15:30:04 +08:00
Tianx
2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
jiaqizho
338ed2fed4
enhance: Introduce sparse filter in query (#44347)
issue: #44373

The current commit implements sparse filtering in query tasks using the
statistical information (Bloom filter/MinMax) of the Primary Key (PK).

The statistical information of the PK is bound to the segment during the
segment loading phase. A new filter has been added to the segment filter
to enable the sparse filtering functionality.

Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
2025-09-23 09:58:09 +08:00