issue: #44212
Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
Related to #39173
According to the current design, datanode shall create fs from storage
config in request instead of using singleton fs. This PR upgrade
milvus-storage and make packed reader/writer compose new fs from storage
config.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/42148
Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray → Memory
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2
See also: #40321
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
#42032
Also, fix the cacheoptfield method to work in storagev2.
Also, change the sparse related interface for knowhere version bump
#43974 .
Also, includes https://github.com/milvus-io/milvus/pull/44046 for metric
lost.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148
This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.
The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```
Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Related to #43936
This PR:
- Use `folly::SharedMutex` instead of `std::shared_mutex` preventing
starvation
- Use `folly::SharedMutex::WriteHolder/ReadHolder` instead of
std::shared_lock and std::unique_lock to get better performance
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #43660
This patch reduces the unwanted offset&ts entries having same timestamp
of delete record. Under large amount of upsert, this false hit could
increase large amount of memory usage while applying delete.
The next step could be passing a callback to `search_pk_func_` to handle
hit entry streamingly.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #43592
When delete records are large, search pk one by one will result into
many `Pincells` call which creates lots of futures.
This patch make search pk execute in batch to reduce this cost.
Also add `GetAllChunks` API to utilize `PinAllCells` to reduce pins.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #43113
This PR:
- Change member of FieldIndex from `FieldMeta &` to needed `DataType`
and dim member resolving dangling reference after schema change
- Add double check after acquiring lock to reduce multiple assignment
- Change `auto schema` to `auto& schema` to reduce schema copy
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
The root cause of the issue lies in the fact that when a sealed segment
contains multiple row groups, the get_cells function may receive
unordered cids. This can result in row groups being written into
incorrect cells during data retrieval.
Previously, this issue was hard to reproduce because the old Storage V2
writer had a bug that caused it to write row groups larger than 1MB.
These large row groups could lead to uncontrolled memory usage and
eventually an OOM (Out of Memory) error. Additionally, compaction
typically produced a single large row group, which avoided the incorrect
cell-filling issue during query execution.
related: https://github.com/milvus-io/milvus/issues/43388,
https://github.com/milvus-io/milvus/issues/43372,
https://github.com/milvus-io/milvus/issues/43464, #43446, #43453
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue: #41435
this is to prevent AI from thinking of our exception throwing as a
dangerous PANIC operation that terminates the program.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Related to #43113
When schema change happens, insert shall not happen, otherwise:
- Data race may happen causing insertion failure
- Inconsistent data schema
This PR add shared_lock prevent this data race.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #39178
This PR add logs for segment schema change operations.
Also fixes the nit comments from PR #42490
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Fix issues in end-to-end tests:
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.
related: #39173fix: #42862
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
Related to #42856
Data under mmapped growing segment shall be treated respecting
growingMmap setting. Otherwise, varchar datatype could be treated with
logic error.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #42773
Growing segment fills all known meta into `InsertRecord` data, which
cause even the field is missing, the field data will still exists.
This PR update the logic while finish loading growing segment to check
field empty or not instead of existence.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #42640
The search/query plan holded a reference to schema, which could be
destructed after schema change. This PR make plan hold a shared ptr to
it fixing dangling reference problem under concurrent read & schema
change.
This PR also remove field binlog check for loading index for old segment
with old schema may have binlog lack.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148
This PR mainly enables segcore to support array of vector (read and
write, but not indexing). Now only float vector as the element type is
supported.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Related to #42489
See also #41435
This PR's main target is to make partial load field list work as caching
layer warmup policy hint. If user specify load field list, the fields
not included in the list shall use `disabled` warmup policy and be able
to lazily loaded if any read op uses them.
The major changes are listed here:
- Pass load list to segcore and creating collection&schema
- Add util functions to check field shall be proactively loaded
- Adapt storage v2 column group, which may lead to hint fail if columns
share same group
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/41435
this PR is based on https://github.com/milvus-io/milvus/pull/41436.
Improvements include:
- Lazy Load support for Storage v1
- Use Low/High watermark to control eviction
- Caching Layer related config changes
- Removed ChunkCache related configs and code in golang
- Add `PinAllCells` helper method to CacheSlot class
- Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce
caching layer overhead
- Removed some unclear templated bulk_subscript methods
- CachedSearchIterator to store PinWrapper when searching on
ChunkedColumn, and removed unused contrustor.
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
storage v2 chunked seal segment loading is based on caching layer. A
cell unit in storage v2 is a parquet row group in remote object storage,
containing all fields. Therefore, each field needs a proxy to do related
one field operations.
<img width="965" alt="Screenshot 2025-04-28 at 10 59 30"
src="https://github.com/user-attachments/assets/83e93a10-3b1d-4066-ac17-b996d5650416"
/>
related: #39173
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
Related to #39718Fixesmilvus-io/pymilvus#2771
This PR:
- Make AsyncRetrieve task triggers "schema check" logic as well
- Rename `AddField` related methods to align with code standard
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #39718
This PR:
- Add reopen logic for growing & sealed segments
- Lazy reopen when schema version increases
- Add FinishLoad api for loading progress
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.
compact will be supported in the next pr.
#39718
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>