15 Commits

Author SHA1 Message Date
marcelo-cjl
3c2cf2c066
feat: Add nullable vector support in import utility layer (#46142)
related: #45993 

Add nullable vector support in import utility layer
    
Key changes:

ImportV2 util:
- Add nullable vector types (FloatVector, Float16Vector, BFloat16Vector,
BinaryVector, SparseFloatVector, Int8Vector) to
AppendNullableDefaultFieldsData()
- Add tests for nullable vector field data appending

CSV/JSON/Numpy readers:
- Add nullPercent parameter to test data generation for better null
coverage
- Mark vector fields as nullable in test schemas
- Add test cases for nullable vector field parsing
- Refactor tests to use loop-based approach with 0%, 50%, 100% null
percentages

Parquet field reader:
- Add ReadNullableBinaryData() for nullable
BinaryVector/Float16Vector/BFloat16Vector
- Add ReadNullableFloatVectorData() for nullable FloatVector
- Add ReadNullableSparseFloatVectorData() for nullable SparseFloatVector
- Add ReadNullableInt8VectorData() for nullable Int8Vector
- Add ReadNullableStructData() for generic nullable struct data
- Update Next() to use nullable read methods when field is nullable
- Add null data validation for non-nullable fields

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: import must preserve per-row alignment and validity
for every field — nullable vector fields are expected to be encoded with
per-row validity masks and all readers/writers must emit arrays aligned
to original input rows (null entries represented explicitly).
- New feature & scope: adds end-to-end nullable-vector support in the
import utility layer — AppendNullableDefaultFieldsData in
internal/datanode/importv2/util.go now appends nil placeholders for
nullable vectors (FloatVector, Float16Vector, BFloat16Vector,
BinaryVector, SparseFloatVector, Int8Vector); parquet reader
(internal/util/importutilv2/parquet/field_reader.go) adds
ReadNullableBinaryData, ReadNullableFloatVectorData,
ReadNullableSparseFloatVectorData, ReadNullableInt8VectorData,
ReadNullableStructData and routes nullable branches to these helpers;
CSV/JSON/Numpy readers and test utilities updated to generate and
validate 0/50/100% null scenarios and mark vector fields as nullable in
test schemas.
- Logic removed / simplified: eliminates ad-hoc "parameter-invalid"
rejections for nullable vectors inside FieldReader.Next by centralizing
nullable handling into ReadNullable* helpers and shared validators
(getArrayDataNullable,
checkNullableVectorAlignWithDim/checkNullableVectorAligned), simplifying
control flow and removing scattered special-case checks.
- No data loss / no regression (concrete code paths): nulls are
preserved end-to-end — AppendNullableDefaultFieldsData explicitly
inserts nil entries per null row (datanode import append path);
ReadNullable*Data helpers return both data and []bool validity masks so
callers in field_reader.go and downstream readers receive exact per-row
validity; testutil.BuildSparseVectorData was extended to accept
validData so sparse vectors are materialized only for valid rows while
null rows are represented as missing. These concrete paths ensure null
rows are represented rather than dropped, preventing data loss or
behavioral regression.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-29 10:51:21 +08:00
Bingyi Sun
96e1de4e22
feat: allow users to write pk field when autoid is enabled (#44424)
https://github.com/milvus-io/milvus/issues/44425

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-09-23 16:10:04 +08:00
groot
1ee8cea35b
enhance: bulkinsert handle nullable/defaultValue/functionOutput fields (#42956)
issue: https://github.com/milvus-io/milvus/issues/42173

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-07-04 14:20:44 +08:00
groot
14563ad2b3
enhance: bulkinsert handles nullable/default (#42127)
issue: https://github.com/milvus-io/milvus/issues/42096,
https://github.com/milvus-io/milvus/issues/42130

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2025-05-28 18:02:28 +08:00
congqixia
cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
yihao.dai
5d89838ad9
fix: Fix import failed due to 0 row num (#39886)
issue: https://github.com/milvus-io/milvus/issues/39885

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-02-14 19:42:13 +08:00
Zhen Ye
bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
yihao.dai
9868fe4e6c
fix: Fix panic due to empty candidate import segments (#35673)
issue: https://github.com/milvus-io/milvus/issues/35662

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-27 17:08:59 +08:00
yihao.dai
4e5f1d5f75
enhance: Pre-allocate ids for import (#33958)
The import is dependent on syncTask, which in turn relies on the
allocator. This PR pre-allocate the necessary IDs for import syncTask.

issue: https://github.com/milvus-io/milvus/issues/33957

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-07-07 21:26:14 +08:00
yihao.dai
eb5d4de390
fix: Check if the import job exists (#33672)
issue: https://github.com/milvus-io/milvus/issues/33671

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-10 21:51:55 +08:00
yihao.dai
3540eee977
enhance: Support L0 import (#33514)
issue: https://github.com/milvus-io/milvus/issues/33157

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-07 14:17:20 +08:00
yihao.dai
bbdf99a45e
fix: Fix import segment size is uneven (#33605)
The data coordinator computed the appropriate number of import segments,
thus when importing in the data node, one can randomly select a segment.

issue: https://github.com/milvus-io/milvus/issues/33604

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-05 15:41:51 +08:00
yihao.dai
558feed5ed
fix: Use pk from binlog during import (#32118)
During binlog import, even if the primary key's autoID is set to true,
the primary key from the binlog should be used instead of being
reassigned.

issue: https://github.com/milvus-io/milvus/discussions/31943,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-16 14:51:20 +08:00
yihao.dai
a434d33e75
feat: Add import scheduler and manager (#29367)
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-01 18:31:02 +08:00
yihao.dai
18b979d9b4
enhance: Extend support for varchar autoID to BulkInsertV2 (#30477)
issue: https://github.com/milvus-io/milvus/issues/30476

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-04 16:57:05 +08:00