issue: #42688
- The channel cp is dropped by garbage collector
- The channel is dropped and the cp is marked as math.Uint64
- If we drop it here, the update channel checkpoints will write the
dirty cp back.
Signed-off-by: chyezh <chyezh@outlook.com>
Ref https://github.com/milvus-io/milvus/issues/42148https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
issue: #43261
`promise->setValue(folly::Unit());` may run callbacks inline and some of
them may attempt to grab `mtx_`. So we should not call
`promise->setValue(folly::Unit());` while holding the lock.
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.
See #43186
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Related to #43261
Read error with catched in `LoadWithStrategy`. Caller could not detect
read failure when some error occurred. This patch make
`LoadWithStrategy` throw ex instead of swallowing it.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
We’ve frequently observed data loss caused by broken mutual exclusion in
compaction tasks. This PR introduces a post-check: before modifying
metadata upon compaction task completion, it verifies the state of the
input segments. If any input segment has been dropped, the compaction
task will be marked as failed.
issue: https://github.com/milvus-io/milvus/issues/43513
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See: #43186
In this PR:
1. Flush renamed to FlushChunk, while a new Flush primitive is
introduced to serialize values to records.
2. Segment mapping in clustering compaction now process data by records
instead of values, it calls flush to all buffers after each record is
processed.
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
There are some unstable cases in go sdk e2e cases, which used default
bounded consistency level. This patch make these cases use strong level
to avoid unstable test results
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43088
issue: #43038
The current loading process:
* When loading an index, we first download the index files into a list
of buffers, say A
* then constructing(copying) them into a vector of FieldDatas(each file
is a FieldData), say B
* assembles them together as a huge BinarySet, say C
* lastly, copy into the actual index data structure, say D
The problem:
* We can see that, after each step, we don't need the data in previous
step.
* But currently, we release the memory of A, B, C only after we have
finished constructing D
* This leads to a up to 4x peak memory usage comparing with the raw
index size, during the loading process
* This PR allows timely releasing of B after we assembled C. So after
this PR, the peak memory usage during loading will be up to 3x of the
raw index size.
I will create another PR to release A after we created B, that seems
more complicated and need more work.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Realted #43520
Datanode may have memory leakage when reader returns error. In
previously mention issue, datanodes got OOM killed due to continueous
error in read path.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related: #42417
- update the isValidUsername function to accept dots and hyphens in
addition to letters, digits, and underscores
- this change improves compatibility with common username formats and
addresses feedback in issue #42417
Signed-off-by: Jean-Francois Weber-Marx <jfwm@hotmail.com>
Signed-off-by: Jean-Francois Weber-Marx <jf.webermarx@criteo.com>
issue: #42884
Fixes an issue where delete records for a segment are lost from the
delete buffer if `load segment` execution on the delegator is too slow,
causing `syncTargetVersion` or other cleanup operations to clear them
prematurely.
Changes include:
- Introduced `Pin` and `Unpin` methods in `DeleteBuffer` interface and
its implementations (`doubleCacheBuffer`, `listDeleteBuffer`).
- Added a `pinnedTimestamps` map to track timestamps protected from
cleanup by specific segments.
- Modified `LoadSegments` in `shardDelegator` to `Pin` relevant segment
delete records before loading and `Unpin` them afterwards.
- Added `isPinned` check in `UnRegister` and `TryDiscard` methods of
`listDeleteBuffer` to skip cleanup if corresponding timestamps are
pinned.
- Added comprehensive unit tests for `Pin`, `Unpin`, and `isPinned`
functionality, covering basic, multiple pins, concurrent, and edge
cases.
This ensures the integrity of delete records by preventing their
premature removal from the delete buffer during segment loading.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #43522
Currently, passing partial schema to storage v2 packed reader may
trigger SEGV during clustering compaction unit test.
This patch implement `NeededFields` differently in each `RecordReader`
imlementation. For now, v2 will implemented as no-op. This will be
supported after packed reader support this API.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
related issue #40698
1. add diskann and hnsw index test
2. update gen_row_data and gen_column_data functions
---------
Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
issue: #43072, #43289
- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.
issue: https://github.com/milvus-io/milvus/issues/43387
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
currently we multiplied the requesting size when adding to loading, but
did not do so when estimating projected usage.
issue: #43088
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Related to #43113
This PR:
- Change member of FieldIndex from `FieldMeta &` to needed `DataType`
and dim member resolving dangling reference after schema change
- Add double check after acquiring lock to reduce multiple assignment
- Change `auto schema` to `auto& schema` to reduce schema copy
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
The root cause of the issue lies in the fact that when a sealed segment
contains multiple row groups, the get_cells function may receive
unordered cids. This can result in row groups being written into
incorrect cells during data retrieval.
Previously, this issue was hard to reproduce because the old Storage V2
writer had a bug that caused it to write row groups larger than 1MB.
These large row groups could lead to uncontrolled memory usage and
eventually an OOM (Out of Memory) error. Additionally, compaction
typically produced a single large row group, which avoided the incorrect
cell-filling issue during query execution.
related: https://github.com/milvus-io/milvus/issues/43388,
https://github.com/milvus-io/milvus/issues/43372,
https://github.com/milvus-io/milvus/issues/43464, #43446, #43453
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
This patch makes `PackedReader` return EOF when try to calling
`ReadNext` after closing it.
This behavior make importv2.binlog reader could retry after EOF reached
and act normally.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR fill default value for `PackedBinlogRecordWriter` timestamp
range so target segment meta will contains correct timestamp range
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>