- Introduce dynamic buffer sizing to avoid generating small binlogs
during import
- Refactor import slot calculation based on CPU and memory constraints
- Implement dynamic pool sizing for sync manager and import tasks
according to CPU core count
issue: https://github.com/milvus-io/milvus/issues/43131
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Ref #42053
This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.
How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```
min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
In corner cases where there are many fields but only a small number of
rows to import, the default preallocated IDs may be insufficient. To
address this, consider the number of fields when preallocating IDs.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
This PR moves the deltalog file count check inside hasTooManyDeletions
check. Unifies the logic on checking if a segment has too many deletions
including: delta log count, deleted rows ratio and deltalog size.
This change removes several uncessary traverse through segment's binlogs
and deltalogs. And add more clear trigger logs
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs
issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Defer clone and decompress operations until just before removing from
meta, instead of eagerly applying them to all segments in advance.
issue: https://github.com/milvus-io/milvus/issues/42592
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #42498
- fix: sealed segment cannot be flushed after upgrading
- fix: get mvcc panic when upgrading
- ignore the L0 segment when graceful stop of querynode.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Remove the unlimited logID mechanism and switch to redundantly
allocating a large number of IDs.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #42457
Replace unsafe ExpectedCalls modification with atomic.Int32 state
tracking to avoid race conditions in concurrent test execution. Changes
include:
- Use atomic counters instead of direct mock ExpectedCalls manipulation
- Add RunAndReturn with atomic state transitions for thread safety
- Remove github.com/samber/lo dependency
This prevents data race when mock framework and test goroutines access
ExpectedCalls concurrently.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #41976
- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #42162
- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #42147
- error of sync task should be returned if error is returned to avoid
checkpoint is push forward.
- fix up node id checker of UpdateChannelCheckpoint in streaming.
Signed-off-by: chyezh <chyezh@outlook.com>
Return `false` in the `Process()` function for `executing` or
`pipelining` state `l0Compaction`. This prevents the `l0Compaction` task
from being removed from the `CompactionInspector`'s executing queue,
thereby avoiding concurrent execution of `l0Compaction` and `Stats`.
issue: https://github.com/milvus-io/milvus/issues/42008
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Rename `imeta` to `importMeta` to improve readability, and enhance
import related context usage.
issue: https://github.com/milvus-io/milvus/issues/41123
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.
issue: https://github.com/milvus-io/milvus/issues/41123
Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>