issue: #36686
master pr: #36799
The core of this change is to **ensure that the many-to-many lineage
derivation logic is correct, making sure that both the parent and child
cannot simultaneously exist in the target segment view.**
feature:
- Clustering compaction no longer marks the input segments as L2.
- Add a new field `is_invisible` to `segmentInfo`, and mark segments
that have completed clustering but have not yet built indexes as
`is_invisible` to prevent them from being loaded prematurely."
- Do not mark the input segment as `Dropped` before the clustering
compaction is completed.
- After compaction fails, only the result segment needs to be marked as
Dropped.
compatibility:
- If the upgraded task has not failed, there are no compatibility
issues.
- If the status after the upgrade is `MetaSaved`, then skip the stats
task based on whether TmpSegments is empty.
- If the failure occurs before `MetaSaved`:
- there are no ResultSegments, and InputSegments have not been marked as
dropped yet.
- the level of input segments need to revert to LastLevel
- If the failure occurs after `MetaSaved`:
- ResultSegments have already been generated, and InputSegments have
been marked as Dropped. At this point, simply make the ResultSegments
visible.
- the level of ResultSegments needs to be set to L1(in order to
participate in mixCompaction)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Cherry-pick from master
pr: #38157
`File.Write` and `File.WriteInt` use `write`, which may be just direct
syscall in some systems. When mappding field data and write line by
line, this could cost lost of CPU time when the row number is large.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. A collection should observe the channel only once.
2. A collection should check the CollectionLoadPercent for updates only
once.
3. Skip saving coll/partition meta if there are no changes, primarily to
accelerate collection observation after recovery.
issue: https://github.com/milvus-io/milvus/issues/37630
pr: https://github.com/milvus-io/milvus/pull/38028
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Cherry pick from master
pr: #36026
Relate: #36025
Fix datanode watch channel timeout when segment number is too large
Previous timeout apply for whole process for batching fetch segment
info, when segment number is large one rpc timeout does not work well
for multiple round rpc case
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
1. taskQueueCapacity 256 is too small for production when we want to
re-write the entire collection
2. tasks should be cleaned when unable to recover, or the meta will
remain in etcd forever later.
pr: #37896
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #38031
cause call `cli.SyncSegments` use ctx which already be override and
canceled, so SyncSegments rpc will always failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
When there're a lot of loaded collections, they would occupy the target
observer scheduler’s pool. This prevents loading collections from
updating the current target in time, slowing down the load process. This
PR adds a separate target dispatcher for loading collections.
issue: https://github.com/milvus-io/milvus/issues/37166
pr: https://github.com/milvus-io/milvus/pull/37454
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33550
pr: #37850
balance segment and balance segment execute at same time, which will
cause bounch of corner case.
This PR disable simultaneous balance of segments and channels
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #37906
Related to #37630
This PR add a new util coll2Replicas secondary index to reduce map
access & iteration while get replicas by collection
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #37115
pr: #37371#37646#37729
the old implementation update shard cache and shard client manager at
same time, which causes lots of conor case due to concurrent issue
without lock.
This PR decouple shard client manager from shard cache, so only shard
cache will be updated if delegator changes. and make sure shard client
manager will always return the right client, and create a new client if
not exist. in case of client leak, shard client manager will purge
client in async for every 10 minutes.
---------
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
issue: #37830
pr: #37862
casue dist handler doesn't set channel's version, so if channel checker
try to dedup channel, it may release the new delegator after balance
finished.
this PR fix the way to set proper version for channel.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37908
pr: #37909
cause paramtable is global single instance, which cause
paramtable.GetNodeID may return wrong server id in integration test.
This PR use node.GetNodeID to replace paramtable.GetNodeID
Signed-off-by: Wei Liu <wei.liu@zilliz.com>