issue: #43072, #43289
- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Previous code uses diskSegmentMaxSize if and only if all of the
collection's vector fields are indexed with DiskANN index.
When introducing sparse vectors, since sparse vector cannot be indexed
with DiskANN index, collections with both dense and sparse vectors will
use maxSize instead.
This PR changes the requirments of using diskSegmentMaxSize to all dense
vectors are indexed with DiskANN indexs, ignoring sparse vector fields.
See also: #43193
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
- Introduce dynamic buffer sizing to avoid generating small binlogs
during import
- Refactor import slot calculation based on CPU and memory constraints
- Implement dynamic pool sizing for sync manager and import tasks
according to CPU core count
issue: https://github.com/milvus-io/milvus/issues/43131
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Ref #42053
This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.
How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```
min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
In corner cases where there are many fields but only a small number of
rows to import, the default preallocated IDs may be insufficient. To
address this, consider the number of fields when preallocating IDs.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
This PR moves the deltalog file count check inside hasTooManyDeletions
check. Unifies the logic on checking if a segment has too many deletions
including: delta log count, deleted rows ratio and deltalog size.
This change removes several uncessary traverse through segment's binlogs
and deltalogs. And add more clear trigger logs
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs
issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Defer clone and decompress operations until just before removing from
meta, instead of eagerly applying them to all segments in advance.
issue: https://github.com/milvus-io/milvus/issues/42592
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #42498
- fix: sealed segment cannot be flushed after upgrading
- fix: get mvcc panic when upgrading
- ignore the L0 segment when graceful stop of querynode.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Remove the unlimited logID mechanism and switch to redundantly
allocating a large number of IDs.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #42457
Replace unsafe ExpectedCalls modification with atomic.Int32 state
tracking to avoid race conditions in concurrent test execution. Changes
include:
- Use atomic counters instead of direct mock ExpectedCalls manipulation
- Add RunAndReturn with atomic state transitions for thread safety
- Remove github.com/samber/lo dependency
This prevents data race when mock framework and test goroutines access
ExpectedCalls concurrently.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #41976
- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #42162
- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #42147
- error of sync task should be returned if error is returned to avoid
checkpoint is push forward.
- fix up node id checker of UpdateChannelCheckpoint in streaming.
Signed-off-by: chyezh <chyezh@outlook.com>
Return `false` in the `Process()` function for `executing` or
`pipelining` state `l0Compaction`. This prevents the `l0Compaction` task
from being removed from the `CompactionInspector`'s executing queue,
thereby avoiding concurrent execution of `l0Compaction` and `Stats`.
issue: https://github.com/milvus-io/milvus/issues/42008
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>