issue: https://github.com/milvus-io/milvus/issues/41435
turns out we have per file binlog size in golang code, by passing it
into segcore we can support eviction in storage v1
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #43117, #42966, #43373
- also fix channel balance may not work at 2.6.
- fix error lost at delete path
- add mvcc into s/q log
- change the log level for TestCoordDownSearch
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43040
This patch introduces a disk file writer that supports Direct IO.
Currently, it is exclusively utilized during the QueryNode load process.
Below is its parameters:
1. `common.diskWriteMode`
This parameter controls the write mode of the local disk, which is used
to write temporary data downloaded from remote storage.
Currently, only QueryNode uses 'common.diskWrite*' parameters. Support
for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.
2. `common.diskWriteBufferSizeKb`
Disk write buffer size in KB, only used when disk write mode is
'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.
3. `common.diskWriteNumThreads`
This parameter controls the number of writer threads used for disk write
operations. The valid range is [0, hardware_concurrency].
It is designed to limit the maximum concurrency of disk write operations
to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool.
In this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.
Both parameters can be updated during runtime.
---------
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Ref #42053
This is the first PR for optimizing `LIKE` with ngram inverted index.
Now, only VARCHAR data type is supported and only InnerMatch LIKE
(%xxx%) query is supported.
How to use it:
```
milvus_client = MilvusClient("http://localhost:19530")
schema = milvus_client.create_schema()
...
schema.add_field("content_ngram", DataType.VARCHAR, max_length=10000)
...
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="content_ngram", index_type="NGRAM", index_name="ngram_index", min_gram=2, max_gram=3)
milvus_client.create_collection(COLLECTION_NAME, ...)
```
min_gram and max_gram controls how we tokenize the documents. For
example, for min_gram=2 and max_gram=4, we will tokenize each document
with 2-gram, 3-gram and 4-gram.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
issue: #41609
- add env `MILVUS_NODE_ID_FOR_TESTING` to set up a node id for milvus
process.
- add env `MILVUS_CONFIG_REFRESH_INTERVAL` to set up the refresh
interval of paramtable.
- Init paramtable when calling `paramtable.Get()`.
- add new multi process framework for integration test.
- change all integration test into multi process.
- merge some test case into one suite to speed up it.
- modify some test, which need to wait for issue #42966, #42685.
- remove the waittssync for delete collection to fix issue: #42989
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #42028
- limit the concurrency of zstd compression.
- zstd.go modified from
`github.com/apache/arrow/go/v17/parquet/compress/ztsd.go`
- may be related to #42129
Signed-off-by: chyezh <chyezh@outlook.com>
the manager's logging lambda should not capture the pipeline object
this creates a circular reference between the manager and the pipeline
object, making it impossible for both to be GC-ed.
issue: https://github.com/milvus-io/milvus/issues/42581
Signed-off-by: Buqian Zheng <buqianzheng@Buqians-MacBook-Air.local>
Co-authored-by: Buqian Zheng <buqianzheng@Buqians-MacBook-Air.local>
Related to #42489
See also #41435
This PR's main target is to make partial load field list work as caching
layer warmup policy hint. If user specify load field list, the fields
not included in the list shall use `disabled` warmup policy and be able
to lazily loaded if any read op uses them.
The major changes are listed here:
- Pass load list to segcore and creating collection&schema
- Add util functions to check field shall be proactively loaded
- Adapt storage v2 column group, which may lead to hint fail if columns
share same group
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #40532
- start timeticksync at rootcoord if the streaming service is not
available
- stop timeticksync if the streaming service is available
- open a read-only wal if some nodes in cluster is not upgrading to 2.6
- allow to open read-write wal after all nodes in cluster is upgrading
to 2.6
---------
Signed-off-by: chyezh <chyezh@outlook.com>
1. Add global scheduler for datacoord.
2. Define and implement new CreateTask, QueryTask, DropTask interfaces.
3. Refine Import, Compaction, Stats, Index task.
issue: https://github.com/milvus-io/milvus/issues/41123
Co-authored-by: Cai Zhang <cai.zhang@zilliz.com>
https://github.com/milvus-io/milvus/issues/35856
1. Optimizing decay function
2. Since the decay function is larger, the more similar it is, the
smaller the L2/JACCARD/HAMMING metrics scores the more similar they are.
For these metrics, the decay function regenerates new scores.
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
Close the chunk manager's reader after the import completes to prevent
goroutine leaks.
issues: https://github.com/milvus-io/milvus/issues/41868
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39173
This PR:
- Upgrade milvus-storage commit to fix filesystem finalized issue
- Add bucket-name as prefix for all fs style access io
- Initial arrow fs on querynodes startup
- Fix timestamp access when loading sealed segment
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #41544
- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: https://github.com/milvus-io/milvus/issues/41435
this PR is based on https://github.com/milvus-io/milvus/pull/41436.
Improvements include:
- Lazy Load support for Storage v1
- Use Low/High watermark to control eviction
- Caching Layer related config changes
- Removed ChunkCache related configs and code in golang
- Add `PinAllCells` helper method to CacheSlot class
- Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce
caching layer overhead
- Removed some unclear templated bulk_subscript methods
- CachedSearchIterator to store PinWrapper when searching on
ChunkedColumn, and removed unused contrustor.
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>