issue: #43427
pr: #37417
Support R-Tree index for geometry datatype.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
issue: #43427
pr: #37417
This pr's main goal is merge #37417 to milvus 2.5 without conflicts.
# Main Goals
1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display geospatial data
5. Support using GIS funtions like ST_EQUALS in query
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: cai.zhang <cai.zhang@zilliz.com>
issue: #43040
pr: #42665
This patch introduces a disk file writer that supports Direct IO.
Currently, it is exclusively utilized during the QueryNode load process.
Below is its parameters:
1. `common.diskWriteMode` This parameter controls the write mode of the
local disk, which is used to write temporary data downloaded from remote
storage. Currently, only QueryNode uses 'common.diskWrite*' parameters.
Support for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.
2. `common.diskWriteBufferSizeKb` Disk write buffer size in KB, only
used when disk write mode is 'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.
3. `common.diskWriteNumThreads` This parameter controls the number of
writer threads used for disk write operations. The valid range is [0,
hardware_concurrency]. It is designed to limit the maximum concurrency
of disk write operations to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool. In
this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.
Both parameters can be updated during runtime.
---------
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Previous code uses diskSegmentMaxSize if and only if all of the
collection's vector fields are indexed with DiskANN index.
When introducing sparse vectors, since sparse vector cannot be indexed
with DiskANN index, collections with both dense and sparse vectors will
use maxSize instead.
This PR changes the requirments of using diskSegmentMaxSize to all dense
vectors are indexed with DiskANN indexs, ignoring sparse vector fields.
See also: #43193
pr: #43194
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
related: #42803
1. add a new thread pools using folly::CPUThreadPoolExecutor, named by
FThreadPools
2. reading vectors from chunkcache will use the separated
CHUNKCACHE_POOL to avoid being influenced by load collection
3. Note. For safety on cloud side on 2.5.x, only read-chunk-cache
operations is using this newly created thread pools other caller points
for threadpool will be mutated in the near future
4. master-branch doesn't need this pr as caching layer unified the chunk
cache behaviour
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs
issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518
pr: https://github.com/milvus-io/milvus/pull/42612
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #41874
pr: #41875
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker
This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #42109
Related to #40756
Large nq will naturally increase query time, which causing lots of slow
log when user NQ numbers are very large.
This PR make slow search counts span per nq (using avg val) to decide
whether one request is slow or not.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Assign the import task to the worker with the most available slots, even
if availableSlots < requiredSlots. This ensures tasks won’t be blocked
indefinitely.
issue: https://github.com/milvus-io/milvus/issues/41981
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #41816
pr: #41817
pr #37983 introduced an issue, if doesn't specified
`defaultRootPassword` in milvus.yaml, then `"Milvus"` will be used as
default password for root user, instead of `Milvus`.
This PR fix the unexpected password for root, and add comment for case
which use large numeric password requires double quotes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
This PR fixes https://github.com/milvus-io/milvus/issues/41384 on 2.5.
Related to #41448.
When using milvus client and compile on windows, the compilation failed
with the undefined RSS error.
On windows, the way to get memory used is the same as on darwin.
Signed-off-by: Julien Salleyron <julien.salleyron@gmail.com>
Cherry-pick from master
pr: #41211
The restful API default timeout was hard-coded. This PR make this
timeout value configurable via paramtable.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #41089
The traceID is not initialized by client_request_id in context. If the
client sent valid traceID, milvus log will print two different traceID
which is wierd.
This PR add the logic to tray parsing incoming `client_request_id` into
traceID. If it works just use it the request traceID, otherwise set it
to a different field named `client_request_id`.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #37651
pr: #40297
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>