after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.
compact will be supported in the next pr.
#39718
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
- Feat: Support Mix compaction. Covering tests include compatibility and
rollback ability.
- Read v1 segments and compact with v2 format.
- Read both v1 and v2 segments and compact with v2 format.
- Read v2 segments and compact with v2 format.
- Compact with duplicate primary key test.
- Compact with bm25 segments.
- Compact with merge sort segments.
- Compact with no expiration segments.
- Compact with lack binlog segments.
- Compact with nullable field segments.
- Feat: Support Clustering compaction. Covering tests include
compatibility and rollback ability.
- Read v1 segments and compact with v2 format.
- Read both v1 and v2 segments and compact with v2 format.
- Read v2 segments and compact with v2 format.
- Compact bm25 segments with v2 format.
- Compact with memory limit.
- Enhance: Use serdeMap serialize in BuildRecord function to support all
Milvus data types.
related: #39173
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
two point:
(1) reoder conjucts expr's subexpr, postpone heavy operations
sequence: int(column) -> index(column) -> string(column) -> light
conjuct
...... -> json(column) -> heavy conjuct -> two_column_compare
(2) support pre filter for expr execute, skip scan raw data that had
been skipped
because of preceding expr result.
#39869
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
issue: #40730
also see: https://github.com/milvus-io/cgosymbolizer/pull/2
After these PR, at linux:
- the milvus will always enable jemalloc by default.
- jemalloc will always compiled with --enable-prof options.
- all image will always enable the jemalloc prof by default.
- a pprof http service for jemalloc at `/debug/jemalloc/` will be
registered into restful.
- `jeprof` can remote profile the memory of milvus.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #40607
tantivy change: https://github.com/zilliztech/tantivy/pull/3
Benchmarks:
Test Envrioment: CPU 9900K
The data is insert by:
```
for i in 0..N {
for j in 0..UNIQUE {
let key = format!("hello{}", j);
index_writer.add_string(&key, i * UNIQUE + j).unwrap();
}
}
```
So the unique influences the locality of the matched docs.
The latency is the avg latency over 1000 repeate quries.
The result shows 22.5%-34.8% latency reduction.

---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #40308
This issue fixes these two concurrent issues:
1. element in null_offset is used to set bitset where the size of bitset
is initialized by tantivy document count. However, there may still be
some documents that are not committed in tantivy but are null in
null_offset. So array out of range occurs.
2. null_offset can be read and write concurrently but there's no
synchronization protection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #40198
Fix random sample does not consider empty input, that is no data is hit
by filter expression.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: https://github.com/milvus-io/milvus/issues/35528
If the query data type does not match the index type, fall back to a
brute-force search
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
* use the new packed reader and writer api to be compatible with current
etcd meta
* For the new packed writer API: column groups and paths are explicitly
defined by users and won't split column groups by memory in storage v2.
Packed writer follows the user-defined column groups to split arrow
record and write into the corresponding file path.
* For the new packed reader API: read paths are explicitly defined by
users.
related: #39173
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>