92 Commits

Author SHA1 Message Date
yihao.dai
f0b3942a08
enhance: Limit import job number (#36891)
issue: https://github.com/milvus-io/milvus/issues/36890

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-23 16:01:28 +08:00
yihao.dai
0fc2a4aa53
enhance: Optimize import scheduling and add time cost metric (#36601)
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.

issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-09 14:41:20 +08:00
Zhen Ye
a6545b2e29
fix: refactor milvus config and change default txn timeout (#36522)
issue: #36498

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-29 11:01:15 +08:00
SimFG
c50fe71163
fix: long buffering causes mq to be unable to receive messages. (#36420)
- issue: #36397

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-23 16:33:18 +08:00
wei liu
3b10085f61
enhance: Optimize workload based replica selection policy (#36181)
issue: #35859

This PR introduce two new param: toleranceFactor and checkRequestNum,
after every checkRequestNum request has been assigned, try to compute
querynode's workload score.

if the diff is less than the toleranceFactor, replica selection policy
will fallback to round_robin, which reduce the average cost to about
500ns.

if the diff is larger than the toleranceFactor, replica selection policy
will compute querynode's score to select the target node with smallest
score in every assigment.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-20 12:33:11 +08:00
yihao.dai
763fd0dfc5
enhance: Use a separate mmap config for chunk cache (#36276)
issue: https://github.com/milvus-io/milvus/issues/35273

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-15 16:23:09 +08:00
Ted Xu
d9a40784a2
fix: fallback params may be overridden (#35972)
See #35756

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-09-05 16:19:04 +08:00
wei liu
cf242f9e09
fix: fix dynamic update config doesn't works for some param (#35572)
issue: #35570
milvus support config cache to spped up config access, but only evict
param's cache when param has been updated. but milvus's param may rely
on other param's value, let's say ParamsA relys on paramsB, when paramsB
updated, it will evict paramB's cache, but the paramA's cache still keep
the old value.

This PR evict all config cache to solve the above issue, cause dynamic
update config won't be much frequetly.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-21 11:02:56 +08:00
wei liu
a570567644
enhance: Enable ReadOnly/ReadWrite/Admin Privilege Group to simplify RBAC grant progress (#35472)
issue: #35471

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-16 14:18:54 +08:00
wei liu
344dc6a9f8
enhance: enable to set load config in cluster level (#35169)
issue: #35170
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-07 12:38:21 +08:00
cai.zhang
6542c1ab0e
enhance: Add monitoring metrics for task execution time in datacoord (#35139)
issue: #35138

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-05 16:26:17 +08:00
jaime
fcec4c21b9
fix: check collection health(queryable) fail for releasing collection (#34947)
issue: #34946

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-08-02 17:20:15 +08:00
wei liu
3b735b4b02
enhance: Refine param init for MmapDirPath (#35181)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-02 11:12:14 +08:00
cai.zhang
196a7986b3
enhance: Change the fixed value to a ratio for clustering segment size (#35076)
issue: #34495

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-01 22:04:14 +08:00
wei liu
e9d61daa3f
enhance: Reduce delegator memory overloaded factor to 0.1 (#35092)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-01 10:21:50 +08:00
wayblink
ce3f836876
fix: compaction task not be cleaned correctly (#34765)
1.fix compaction task not be cleaned correctly
2.add a new parameter to control compaction gc loop interval
3.remove some useless configs of clustering compaction

bug: #34764

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-30 20:21:56 +08:00
cai.zhang
2372452fac
enhance: Optimized the GC logic to ensure that memory is released in time (#34949)
issue: #34703

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-07-28 23:53:47 +08:00
wei liu
166fc902b0
enhance: Limit collection's normal balance speed (#34810)
issue: #34798

after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-24 19:11:44 +08:00
yihao.dai
b22e549844
enhance: Rename config of sealing by growing segmetns size (#34787)
/kind improvement

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-07-19 20:27:41 +08:00
wayblink
c79d1af390
enhance: Add compaction task slot usage logic (#34581)
#34544

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-18 10:27:41 +08:00
yihao.dai
4939f82d4f
enhance: Seal by total growing segments size (#34692)
Seals the largest growing segment if the total size of growing segments
of each shard exceeds the size threshold(default 4GB). Introducing this
policy can help keep the size of growing segments within a suitable
level, alleviating the pressure on the delegator.

issue: https://github.com/milvus-io/milvus/issues/34554

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-07-17 21:45:41 +08:00
SimFG
203fb554a4
enhance: support to config root user's password (#34752)
- issue: #33058

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-07-17 20:19:42 +08:00
wei liu
acb33bba4d
enhance: Preserve fixed-size memory in delegator node for growing segment. (#34596)
issue: #34595
When consuming insert data on the delegator node, QueryCoord will move
out some sealed segments to manage its memory usage. After the growing
segment gets flushed, some sealed segments from other workers will be
moved back to the delegator node. To avoid the frequent movement of
segments, we estimate the maximum growing row count and preserve a
fixed-size memory in the delegator node.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-15 20:51:46 +08:00
congqixia
8b5754f7fe
enhance: Add segment seal proportion jitter (#34636)
See also #34574

Add jitter for segment seal proportion to avoid seal operation burst in
short period of time.

This PR also fix license header in paramtable pkg.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-15 14:47:39 +08:00
chyezh
1bc3c0b925
enhance: implement balancer at streaming coord (#34435)
issue: #33285

- add balancer implementation
- add channel count fair balance policy
- add channel assignment discover grpc service

Signed-off-by: chyezh <chyezh@outlook.com>
2024-07-11 09:58:48 +08:00
wayblink
f9a0f7bb25
Add an option to enable/disable vector field clustering key (#34097)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-06-25 18:52:04 +08:00
wei liu
4987067375
enhance: Execute bloom filter apply in parallel to speed up segment predict (#33792)
issue: #33610

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-14 11:37:56 +08:00
wei liu
ab93d9c23d
enhance: Use BatchPkExist to reduce bloom filter func call cost (#33611)
issue:#33610

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-13 17:57:56 +08:00
SimFG
ecee7d90d4
enhance: try to speed up the loading of small collections (#33570)
- issue: #33569

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-06-07 08:25:53 +08:00
cai.zhang
27cc9f2630
enhance: Support analyze data (#33651)
issue: #30633

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: chasingegg <chao.gao@zilliz.com>
2024-06-06 17:37:51 +08:00
wei liu
c6a1c49e02
enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405)
issue: #32995
To speed up the construction and querying of Bloom filters, we chose a
blocked Bloom filter instead of a basic Bloom filter implementation.

WARN: This PR is compatible with old version bf impl, but if fall back
to old milvus version, it may causes bloom filter deserialize failed.

In single Bloom filter test cases with a capacity of 1,000,000 and a
false positive rate (FPR) of 0.001, the blocked Bloom filter is 5 times
faster than the basic Bloom filter in both querying and construction, at
the cost of a 30% increase in memory usage.

- Block BF construct time	{"time": "54.128131ms"}
- Block BF size	                {"size": 3021578}
- Block BF Test cost	        {"time": "55.407352ms"}
- Basic BF construct time	{"time": "210.262183ms"}
- Basic BF size	                {"size": 2396308}
- Basic BF Test cost	        {"time": "192.596229ms"}

In multi Bloom filter test cases with a capacity of 100,000, an FPR of
0.001, and 100 Bloom filters, we reuse the primary key locations for all
Bloom filters to avoid repeated hash computations. As a result, the
blocked Bloom filter is also 5 times faster than the basic Bloom filter
in querying.

- Block BF TestLocation cost    {"time": "529.97183ms"}
- Basic BF TestLocation cost	{"time": "3.197430181s"}

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-31 17:49:45 +08:00
yihao.dai
bbb69980ac
enhance: Replace 'off' with 'disable' (#33433)
YAML will automatically parse "off" as a boolean variable. We should
avoid using "off" in the future.

issue: https://github.com/milvus-io/milvus/issues/32772

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-29 12:17:43 +08:00
yihao.dai
760223f80a
fix: use seperate warmup pool and disable warmup by default (#33348)
1. use a small warmup pool to reduce the impact of warmup
2. change the warmup pool to nonblocking mode
3. disable warmup by default
4. remove the maximum size limit of 16 for the load pool

issue: https://github.com/milvus-io/milvus/issues/32772

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-05-27 01:25:40 +08:00
yihao.dai
32560263fa
enhance: Query slot for compaction task (#32881)
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.

issue: https://github.com/milvus-io/milvus/issues/32809

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-17 18:19:38 +08:00
foxspy
f6777267e3
enhance: add score compute consistency config for knowhere (#32997)
issue: https://github.com/milvus-io/milvus/issues/32583
related: #32584

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-05-13 14:21:31 +08:00
Bingyi Sun
4724779b3b
enhance: remove fallback keys for config generator (#32946)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-05-13 13:33:31 +08:00
wei liu
e2332bdc17
enhance: Enable channel exclusive balance policy (#32911)
issue: #32910  
* split replica's node list to channels when create replicas
 * balance nodes among channels when node change happens
 * implement channel level balance, let balance happens in channel level

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-10 17:27:31 +08:00
chyezh
641f702f64
fix: add request resource timeout for lazy load, refactor context usage in cache (#32709)
issue: #32663

- Use new param to control request resource timeout for lazy load.

- Remove the timeout parameter of `Do`, remove `DoWait`. use `context`
to control the timeout.

- Use `VersionedNotifier` to avoid notify event lost and broadcast,
remove the redundant goroutine in cache.

related dev pr: #32684

Signed-off-by: chyezh <chyezh@outlook.com>
2024-05-07 16:33:30 +08:00
SimFG
09cd56d44f
enhance: add the skip auto id and partition key check config (#32592)
/kind improvement
issue: #32591

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-04-29 10:29:26 +08:00
SimFG
8594b55ad5
enhance: add max insert request size and must use partition key configs (#32433)
issue: https://github.com/milvus-io/milvus/issues/30577
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-04-19 10:31:20 +08:00
yihao.dai
49d109de18
enhance: Use an individual buffer size parameter for imports (#31833)
Use an individual buffer size parameter for imports and set buffer size
to 64MB.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-08 21:07:18 +08:00
chyezh
7b400252ff
fix: add configuration disk capacity config for lru and fix some bug (#31977)
issue: #30361

- Add configurable disk capacity limit

- fix bitset reset logic

- make insert record reinsert after clear

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-08 15:55:16 +08:00
yihao.dai
4e264003bf
enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629)
Feature Introduced:
1. Ensure ImportV2 waits for the index to be built

Enhancements Introduced:
1. Utilization of local time for timeout ts instead of allocating ts
from rootcoord.
3. Enhanced input file length check for binlog import.
4. Removal of duplicated manager in datanode.
5. Renaming of executor to scheduler in datanode.
6. Utilization of a thread pool in the scheduler in datanode.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-01 20:09:13 +08:00
yihao.dai
f65a796d18
enhance: Add max file num limit and max file size limit for import (#31497)
The max number of import files per request should not exceed 1024 by
default (configurable).
The import file size allowed for importing should not exceed 16GB by
default (configurable).

issue: https://github.com/milvus-io/milvus/issues/28521

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-22 18:13:06 +08:00
yihao.dai
0fe5e90e8b
enhance: Remove import v1 (#31403)
Remove all code and logic related to import v1.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-22 15:29:09 +08:00
wei liu
06b191b164
fix: Balance channel stuck forever due to logic dead lock (#31202)
issue: #30816

cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.

This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-13 15:05:04 +08:00
Chun Han
3298e64bd3
enhance: cache config values for saving cpu cycles to parse config item (#30947)
related: #30958

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-03-12 11:09:04 +08:00
yihao.dai
c411cb4a49
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941)
This PR includes the following adjustments:
1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2024-03-07 20:39:02 +08:00
yihao.dai
a434d33e75
feat: Add import scheduler and manager (#29367)
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-01 18:31:02 +08:00
chyezh
dd957cf9e3
enhance: add configurable memory index load predict memory usage factor (#30561)
related pr: https://github.com/milvus-io/milvus/pull/30475

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-01 15:23:00 +08:00