1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs
issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39173
Like logic in #41919, storage v2 fs shall use complete paths with
bucketName prefix to be compatible with its definition. This PR fills
bucket name from config when creating reader for compaction tasks.
NOTE: the bucket name shall be read from task params config for
compaction task pooling.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Increase insert buffer size from 16MB to 64MB, while keeping delete
buffer size at 16MB.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #42530
The cluster id is missing when drop worker drop causing redoing task on
report duplicated task error.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Remove the unlimited logID mechanism and switch to redundantly
allocating a large number of IDs.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Remove the hardcoded batchSize of 100,000 and instead trigger a write
every 64MB based on actual data size. This prevents sort stats from
generating excessively large binlog files.
issue: https://github.com/milvus-io/milvus/issues/42400
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #41976
- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.
Signed-off-by: chyezh <chyezh@outlook.com>
This PR fixes a minor typo in a log message in the `PreExecute` method
of `internal/datanode/index/task_index.go`.
Corrected "dimesion" to "dimension".
Signed-off-by: hckex <33862757+hckex@users.noreply.github.com>
This PR fixes a minor typo in the README file of the datanode module.
Corrected "imformation" to "information".
Signed-off-by: hckex <33862757+hckex@users.noreply.github.com>
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.
issue: https://github.com/milvus-io/milvus/issues/41123
Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
Related to #39173
This PR
- Use updated path with bucketName for packedReader
- Update milvus-storage commit to report reader/writer initialization
failure, see also milvus-io/milvus-storage#192
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Close the chunk manager's reader after the import completes to prevent
goroutine leaks.
issues: https://github.com/milvus-io/milvus/issues/41868
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39173
This PR:
- Upgrade milvus-storage commit to fix filesystem finalized issue
- Add bucket-name as prefix for all fs style access io
- Initial arrow fs on querynodes startup
- Fix timestamp access when loading sealed segment
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When autoID is enabled, the preimport task estimates row distribution by
evenly dividing the total row count (numRows) across all vchannels:
`estimatedCount = numRows / vchannelNum`.
However, the actual import task hashes real auto-generated IDs to
determine
the target vchannel. This mismatch can lead to inaccurate row
distribution estimation
in such corner cases:
- Importing 1 row into 2 vchannels:
• Preimport: 1 / 2 = 0 → both v0 and v1 are estimated to have 0 rows
• Import: real autoID (e.g., 457975852966809057) hashes to v1
→ actual result: v0 = 0, v1 = 1
To resolve such corner case, we now allocate at least one segment for
each vchannel
when autoID is enabled, ensuring all vchannels are prepared to receive
data even
if no rows are estimated for them.
issue: https://github.com/milvus-io/milvus/issues/41759
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #41544
- simplify the proto message for flush and create segment.
- simplify the msg handler for flowgraph.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Make DataNode use compaction parameters from request instead of
configuration.
issue: https://github.com/milvus-io/milvus/issues/41123
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.
compact will be supported in the next pr.
#39718
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.
issue: https://github.com/milvus-io/milvus/issues/40207
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
- Read v1 segments and compact with v2 format.
- Read both v1 and v2 segments and compact with v2 format.
- Read v2 segments and compact with v2 format.
- Compact with duplicate primary key test.
- Compact with bm25 segments.
- Compact with merge sort segments.
- Compact with no expiration segments.
- Compact with lack binlog segments.
- Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
- Read v1 segments and compact with v2 format.
- Read both v1 and v2 segments and compact with v2 format.
- Read v2 segments and compact with v2 format.
- Compact bm25 segments with v2 format.
- Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>