milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-07 19:31:51 +08:00

Author	SHA1	Message	Date
marcelo-cjl	3c2cf2c066	feat: Add nullable vector support in import utility layer (#46142 ) related: #45993 Add nullable vector support in import utility layer Key changes: ImportV2 util: - Add nullable vector types (FloatVector, Float16Vector, BFloat16Vector, BinaryVector, SparseFloatVector, Int8Vector) to AppendNullableDefaultFieldsData() - Add tests for nullable vector field data appending CSV/JSON/Numpy readers: - Add nullPercent parameter to test data generation for better null coverage - Mark vector fields as nullable in test schemas - Add test cases for nullable vector field parsing - Refactor tests to use loop-based approach with 0%, 50%, 100% null percentages Parquet field reader: - Add ReadNullableBinaryData() for nullable BinaryVector/Float16Vector/BFloat16Vector - Add ReadNullableFloatVectorData() for nullable FloatVector - Add ReadNullableSparseFloatVectorData() for nullable SparseFloatVector - Add ReadNullableInt8VectorData() for nullable Int8Vector - Add ReadNullableStructData() for generic nullable struct data - Update Next() to use nullable read methods when field is nullable - Add null data validation for non-nullable fields <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - Core invariant: import must preserve per-row alignment and validity for every field — nullable vector fields are expected to be encoded with per-row validity masks and all readers/writers must emit arrays aligned to original input rows (null entries represented explicitly). - New feature & scope: adds end-to-end nullable-vector support in the import utility layer — AppendNullableDefaultFieldsData in internal/datanode/importv2/util.go now appends nil placeholders for nullable vectors (FloatVector, Float16Vector, BFloat16Vector, BinaryVector, SparseFloatVector, Int8Vector); parquet reader (internal/util/importutilv2/parquet/field_reader.go) adds ReadNullableBinaryData, ReadNullableFloatVectorData, ReadNullableSparseFloatVectorData, ReadNullableInt8VectorData, ReadNullableStructData and routes nullable branches to these helpers; CSV/JSON/Numpy readers and test utilities updated to generate and validate 0/50/100% null scenarios and mark vector fields as nullable in test schemas. - Logic removed / simplified: eliminates ad-hoc "parameter-invalid" rejections for nullable vectors inside FieldReader.Next by centralizing nullable handling into ReadNullable* helpers and shared validators (getArrayDataNullable, checkNullableVectorAlignWithDim/checkNullableVectorAligned), simplifying control flow and removing scattered special-case checks. - No data loss / no regression (concrete code paths): nulls are preserved end-to-end — AppendNullableDefaultFieldsData explicitly inserts nil entries per null row (datanode import append path); ReadNullable*Data helpers return both data and []bool validity masks so callers in field_reader.go and downstream readers receive exact per-row validity; testutil.BuildSparseVectorData was extended to accept validData so sparse vectors are materialized only for valid rows while null rows are represented as missing. These concrete paths ensure null rows are represented rather than dropped, preventing data loss or behavioral regression. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>	2025-12-29 10:51:21 +08:00
junjiejiangjjj	617a77b0bd	enhance: Add embedding model and schema field type checks (#46421 ) https://github.com/milvus-io/milvus/issues/46415 - Add output type validation when creating functions - Fix improper error handling in bulk insert tasks Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-12-19 11:05:19 +08:00
congqixia	46c14781be	enhance: support useLoonFFI flag in import workflow (#46363 ) Related to #44956 This change propagates the useLoonFFI configuration through the import pipeline to enable LOON FFI usage during data import operations. Key changes: - Add use_loon_ffi field to ImportRequest protobuf message - Add manifest_path field to ImportSegmentInfo for tracking manifest - Initialize manifest path when creating segments (both import and growing) - Pass useLoonFFI flag through NewSyncTask in import tasks - Simplify pack_writer_v2 by removing GetManifestInfo method and relying on pre-initialized manifest path from segment creation - Update segment meta with manifest path after import completion This allows the import workflow to use the LOON FFI based packed writer when the common.useLoonFFI configuration is enabled. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-12-17 16:35:16 +08:00
junjiejiangjjj	50f198e346	feat: Support zilliz models (#45168 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-11-13 12:55:37 +08:00
Bingyi Sun	c25166a202	fix: Fix bulk import with autoid (#44604 ) issue: #44424 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-10-09 12:09:56 +08:00
junjiejiangjjj	f07979f91d	enhance: add support for controlling function output field insertion (#44162 ) #44053 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-09-24 17:26:04 +08:00
Bingyi Sun	96e1de4e22	feat: allow users to write pk field when autoid is enabled (#44424 ) https://github.com/milvus-io/milvus/issues/44425 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-09-23 16:10:04 +08:00
Spade A	eb793531b9	feat: impl StructArray -- support import for CSV/JSON/PARQUET/BINLOG (#44201 ) Ref https://github.com/milvus-io/milvus/issues/42148 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-09-15 20:41:59 +08:00
Tianx	c0d62268ac	feat: add timesatmptz data type (#44005 ) issue: https://github.com/milvus-io/milvus/issues/27467 > https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420 > * [x] M1 Create collection with timestamptz field > * [x] M2 Insert timestamptz field data > * [x] M3 Retrieve timestamptz field data > * [x] M4 Implement handoff[ ] The second PR of issue: https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4 described above. --------- Signed-off-by: xtx <xtianx@smail.nju.edu.cn>	2025-08-26 15:59:53 +08:00
junjiejiangjjj	f3d7e47227	feat: Supports more rerankers (#43270 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjiejiangjjj <junjie.jiang@zilliz.com>	2025-08-22 17:29:47 +08:00
yihao.dai	a29b3272b0	fix: Improve import memory management to prevent OOM (#43568 ) 1. Use blocking memory allocation to wait until memory becomes available 2. Perform memory allocation at the file level instead of per task 3. Limit Parquet file reader batch size to prevent excessive memory consumption 4. Limit import buffer size from 20% to 10% of total memory issue: https://github.com/milvus-io/milvus/issues/43387, https://github.com/milvus-io/milvus/issues/43131 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-28 21:25:35 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Zhen Ye	e9ab73e93d	enhance: add schema version at recovery storage (#43500 ) issue: #43072, #43289 - manage the schema version at recovery storage. - update the schema when creating collection or alter schema. - get schema at write buffer based on version. - recover the schema when upgrading from 2.5. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-23 21:38:54 +08:00
cai.zhang	3ffd44f302	fix: Fix remaining issues with Datanode pooling and StorageV2 (#43147 ) issue: #43146 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-07-10 14:26:48 +08:00
yihao.dai	9cbd194c6b	fix: Prevent import from generating small binlogs (#43132 ) - Introduce dynamic buffer sizing to avoid generating small binlogs during import - Refactor import slot calculation based on CPU and memory constraints - Implement dynamic pool sizing for sync manager and import tasks according to CPU core count issue: https://github.com/milvus-io/milvus/issues/43131 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-07 21:32:47 +08:00
groot	1ee8cea35b	enhance: bulkinsert handle nullable/defaultValue/functionOutput fields (#42956 ) issue: https://github.com/milvus-io/milvus/issues/42173 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-07-04 14:20:44 +08:00
aoiasd	2eb24fbe7c	fix: analyzer memory leak because function runner not close (#41839 ) relate: https://github.com/milvus-io/milvus/issues/41213 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-06-05 14:24:40 +08:00
Zhen Ye	66cc194ab2	enhance: add partition gc at streaming arch (#42179 ) issue: #41976 - make drop partition message as a broadcast message. - add gc when drop partition message is acked. - add a call back to handle the broadcast message when ack. - the ack operation of broadcast message will retry until success. Signed-off-by: chyezh <chyezh@outlook.com>	2025-05-29 23:20:30 +08:00
groot	14563ad2b3	enhance: bulkinsert handles nullable/default (#42127 ) issue: https://github.com/milvus-io/milvus/issues/42096, https://github.com/milvus-io/milvus/issues/42130 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-05-28 18:02:28 +08:00
congqixia	b8d7045539	enhance: [Add Field] Use consistent schema for single buffer (#41891 ) Related to #41873 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-17 19:46:22 +08:00
aoiasd	9166c77a72	fix: bulk insert should use function runner's input field list instead schema's (#41560 ) relate: https://github.com/milvus-io/milvus/issues/41213 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-12 19:14:56 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
yihao.dai	5d89838ad9	fix: Fix import failed due to 0 row num (#39886 ) issue: https://github.com/milvus-io/milvus/issues/39885 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-02-14 19:42:13 +08:00
junjiejiangjjj	16cbdfb3b1	feat: Add Text Embedding Function (#36366 ) https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>	2025-01-24 14:23:06 +08:00
Ted Xu	56659bacbb	enhance: make serialization be part of sync task to support file format change (#38946 ) See #38945 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-01-23 15:49:05 +08:00
aoiasd	9cb4c4e8ac	fix: bm25 import segment without bm25 stats meta (#38855 ) relate: https://github.com/milvus-io/milvus/issues/38854 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-01-21 11:09:04 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
Buqian Zheng	82c5cf2fa2	feat: add bulk insert support for Functions (#36715 ) issue: https://github.com/milvus-io/milvus/issues/35853 and https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-10-12 17:19:20 +08:00
yihao.dai	80f25d497f	enhance: Add metrics to monitor import throughput and imported rows (#36519 ) issue: https://github.com/milvus-io/milvus/issues/36518 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-09-28 17:31:15 +08:00
aoiasd	139787371e	feat: support embedding bm25 sparse vector and flush bm25 stats log (#36036 ) relate: https://github.com/milvus-io/milvus/issues/35853 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-09-19 10:57:12 +08:00
yihao.dai	9868fe4e6c	fix: Fix panic due to empty candidate import segments (#35673 ) issue: https://github.com/milvus-io/milvus/issues/35662 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-08-27 17:08:59 +08:00
congqixia	ab532ae199	enhance: Add back BF lazy load logic for datanode watch channel (#35646 ) Add back lazy loading statslog when watch dml channel on datanode. Related to #22994 #27675 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-22 19:42:57 +08:00
zhenshan.cao	aa247f192d	enhance: remove unused code for StorageV2 (#35132 ) issue: https://github.com/milvus-io/milvus/issues/34168 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-01 12:08:13 +08:00
yihao.dai	8aab6cbfac	enhance: Organize the common modules of streamingNode and dataNode (#34773 ) 1. Move the common modules of streamingNode and dataNode to flushcommon 2. Add new GetVChannels interface for rootcoord issue: https://github.com/milvus-io/milvus/issues/33285 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-07-22 11:33:51 +08:00
yihao.dai	4e5f1d5f75	enhance: Pre-allocate ids for import (#33958 ) The import is dependent on syncTask, which in turn relies on the allocator. This PR pre-allocate the necessary IDs for import syncTask. issue: https://github.com/milvus-io/milvus/issues/33957 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-07-07 21:26:14 +08:00
congqixia	512ea6be5f	enhance: Avoid merging insert data when buffering insert msgs (#33562 ) See also #33561 This PR: - Use zero copy when buffering insert messages - Make `storage.InsertCodec` support serialize multiple insert data chunk into same batch binlog files Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-13 11:15:56 +08:00
yihao.dai	3540eee977	enhance: Support L0 import (#33514 ) issue: https://github.com/milvus-io/milvus/issues/33157 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-06-07 14:17:20 +08:00
yihao.dai	bbdf99a45e	fix: Fix import segment size is uneven (#33605 ) The data coordinator computed the appropriate number of import segments, thus when importing in the data node, one can randomly select a segment. issue: https://github.com/milvus-io/milvus/issues/33604 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-06-05 15:41:51 +08:00
yihao.dai	558feed5ed	fix: Use pk from binlog during import (#32118 ) During binlog import, even if the primary key's autoID is set to true, the primary key from the binlog should be used instead of being reassigned. issue: https://github.com/milvus-io/milvus/discussions/31943, https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-04-16 14:51:20 +08:00
yihao.dai	31cf849f68	enhance: Support retriving file size from importutilv2.Reader (#31533 ) To reduce the overhead caused by listing the S3 objects, add an interface to importutil.Reader to retrieve file sizes. issue: https://github.com/milvus-io/milvus/issues/31532, https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-25 20:29:07 +08:00
yihao.dai	f65a796d18	enhance: Add max file num limit and max file size limit for import (#31497 ) The max number of import files per request should not exceed 1024 by default (configurable). The import file size allowed for importing should not exceed 16GB by default (configurable). issue: https://github.com/milvus-io/milvus/issues/28521 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-22 18:13:06 +08:00
yihao.dai	776709e5ff	fix: Fix binlog import (#31310 ) Fix binlog import functionality by removing the existing check and refining the size retrieval process. issue: https://github.com/milvus-io/milvus/issues/31221, https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-17 20:59:04 +08:00
yihao.dai	b5c67948b7	enhance: Enhance and modify the return content of ImportV2 (#31192 ) 1. The Import APIs now provide detailed progress information for each imported file, including details such as file name, file size, progress, and more. 2. The APIs now return the collection name and the completion time. 3. Other modifications include changing jobID to jobId and other similar adjustments. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-13 19:51:03 +08:00
yihao.dai	a434d33e75	feat: Add import scheduler and manager (#29367 ) This PR introduces novel managerial roles for importv2: 1. ImportMeta: To manage all the import tasks; 2. ImportScheduler: To process tasks and modify their states; 3. ImportChecker: To ascertain the completion of all tasks and instigate relevant operations. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-01 18:31:02 +08:00
yihao.dai	18b979d9b4	enhance: Extend support for varchar autoID to BulkInsertV2 (#30477 ) issue: https://github.com/milvus-io/milvus/issues/30476 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-02-04 16:57:05 +08:00
yihao.dai	7ce876a072	fix: Decoupling importing segment from flush process (#30402 ) This pr decoups importing segment from flush process by: 1. Exclude the importing segment from the flush policy, this approch avoids notifying the datanode to flush the importing segment, which may not exist. 2. When RootCoord call Flush, DataCoord directly set the importing segment state to `Flushed`. issue: https://github.com/milvus-io/milvus/issues/30359 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-02-03 13:01:12 +08:00
yihao.dai	c5918290e6	feat: Add import executor and manager for datanode (#29438 ) This PR introduces novel importv2 roles for datanode: 1. Executor: To execute tasks, a import task will be divided into the following steps: read data -> hash data -> sync data; 2. Manager: To manage all the tasks; issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-31 20:45:04 +08:00

48 Commits