milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-02-04 11:18:44 +08:00

Author	SHA1	Message	Date
aoiasd	55feb7ded8	feat: set related resource ids in collection schema (#46423 ) Support crate analyzer with file resource info, and return used file resource ids when validate analyzer. Save the related resource ids in collection schema. relate: https://github.com/milvus-io/milvus/issues/43687 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: analyzer file-resource resolution is deterministic and traceable by threading a FileResourcePathHelper (collecting used resource IDs in a HashSet) through all tokenizer/analyzer construction and validation paths; validate_analyzer(params, extra_info) returns the collected Vec<i64) which is propagated through C/Rust/Go layers to callers (CValidateResult → RustResult::from_vec_i64 → Go []int64 → querypb.ValidateAnalyzerResponse.ResourceIds → CollectionSchema.FileResourceIds). - Logic removed/simplified: ad‑hoc, scattered resource-path lookups and per-filter file helpers (e.g., read_synonyms_file and other inline file-reading logic) were consolidated into ResourceInfo + FileResourcePathHelper and a centralized get_resource_path(helper, ...) API; filter/tokenizer builder APIs now accept &mut FileResourcePathHelper so all file path resolution and ID collection use the same path and bookkeeping logic (redundant duplicated lookups removed). - Why no data loss or behavior regression: changes are additive and default-preserving — existing call sites pass extra_info = "" so analyzer creation/validation behavior and error paths remain unchanged; new Collection.FileResourceIds is populated from resp.ResourceIds in validateSchema and round‑tripped through marshal/unmarshal (model.Collection ↔ schemapb.CollectionSchema) so schema persistence uses the new list without overwriting other schema fields; proto change adds a repeated field (resource_ids) which is wire‑compatible (older clients ignore extra field). Concrete code paths: analyzer creation still uses create_analyzer (now with extra_info ""), tokenizer validation still returns errors as before but now also returns IDs via CValidateResult/RustResult, and rootcoord.validateSchema assigns resp.ResourceIds → schema.FileResourceIds. - New capability added: end‑to‑end discovery, return, and persistence of file resource IDs used by analyzers — validate flows now return resource IDs and the system stores them in collection schema (affects tantivy analyzer binding, canalyzer C bindings, internal/util analyzer APIs, querynode ValidateAnalyzer response, and rootcoord/create_collection flow). <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-12-26 22:49:19 +08:00
Zhen Ye	7c575a18b0	enhance: support AckSyncUp for broadcaster, and enable it in truncate api (#46313 ) issue: #43897 also for issue: #46166 add ack_sync_up flag into broadcast message header, which indicates that whether the broadcast operation is need to be synced up between the streaming node and the coordinator. If the ack_sync_up is false, the broadcast operation will be acked once the recovery storage see the message at current vchannel, the fast ack operation can be applied to speed up the broadcast operation. If the ack_sync_up is true, the broadcast operation will be acked after the checkpoint of current vchannel reach current message. The fast ack operation can not be applied to speed up the broadcast operation, because the ack operation need to be synced up with streaming node. e.g. if truncate collection operation want to call ack once callback after the all segment are flushed at current vchannel, it should set the ack_sync_up to be true. TODO: current implementation doesn't promise the ack sync up semantic, it only promise FastAck operation will not be applied, wait for 3.0 to implement the ack sync up semantic. only for truncate api now. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-12-17 16:55:17 +08:00
Zhen Ye	1cd0ef943e	fix: use latest timetick to expire cache (#45717 ) issue: #45697 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-20 21:39:04 +08:00
zhenshan.cao	a3b8bcb198	fix: correct default value backfill during AddField (#45634 ) issue: https://github.com/milvus-io/milvus/issues/44585 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2025-11-18 23:05:42 +08:00
Zhen Ye	b7fb8ed38c	fix: use the right resource key lock for ddl and use new ddl in transfer replica (#45506 ) issue: #45452 - alias/rename related DDL should use database level exclusive lock - alias cannot use as the resource key of lock, use collection name instead - transfer replica should use WAL-based framework Signed-off-by: chyezh <chyezh@outlook.com>	2025-11-12 19:01:38 +08:00
Zhen Ye	309d564796	enhance: support collection and index with WAL-based DDL framework (#45033 ) issue: #43897 - Part of collection/index related DDL is implemented by WAL-based DDL framework now. - Support following message type in wal, CreateCollection, DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex, DropIndex. - Part of collection/index related DDL can be synced by new CDC now. - Refactor some UT for collection/index DDL. - Add Tombstone scheduler to manage the tombstone GC for collection or partition meta. - Move the vchannel allocation into streaming pchannel manager. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-10-30 14:24:08 +08:00

6 Commits