milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2025-12-08 01:58:34 +08:00

Author	SHA1	Message	Date
wei liu	c6a1c49e02	enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405 ) issue: #32995 To speed up the construction and querying of Bloom filters, we chose a blocked Bloom filter instead of a basic Bloom filter implementation. WARN: This PR is compatible with old version bf impl, but if fall back to old milvus version, it may causes bloom filter deserialize failed. In single Bloom filter test cases with a capacity of 1,000,000 and a false positive rate (FPR) of 0.001, the blocked Bloom filter is 5 times faster than the basic Bloom filter in both querying and construction, at the cost of a 30% increase in memory usage. - Block BF construct time {"time": "54.128131ms"} - Block BF size {"size": 3021578} - Block BF Test cost {"time": "55.407352ms"} - Basic BF construct time {"time": "210.262183ms"} - Basic BF size {"size": 2396308} - Basic BF Test cost {"time": "192.596229ms"} In multi Bloom filter test cases with a capacity of 100,000, an FPR of 0.001, and 100 Bloom filters, we reuse the primary key locations for all Bloom filters to avoid repeated hash computations. As a result, the blocked Bloom filter is also 5 times faster than the basic Bloom filter in querying. - Block BF TestLocation cost {"time": "529.97183ms"} - Basic BF TestLocation cost {"time": "3.197430181s"} --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-31 17:49:45 +08:00
cai.zhang	77637180fa	enhance: Periodically synchronize segments to datanode watcher (#33420 ) issue: #32809 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-05-30 13:37:44 +08:00
congqixia	73c9b80a7d	enhance: Store locations for largest K in `LocationCache` (#33429 ) See also #32642 `LocationCache` used map to store different locations for different K which may cause lots of CPU time when get locations many times. This PR change the implementation of LocationCache to store only the location for the largest K used to totally remove the map access operation. See pprof from test of @XuanYang-cn ![image](https://github.com/milvus-io/milvus/assets/84113973/ad17cff8-62ad-4d78-9bb0-f6df0512f4ea) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-29 10:05:42 +08:00
congqixia	e71b7c7cc9	enhance: Reduce datanode metacache frequent scan range (#33400 ) See also #32165 There were some frequent scan in metacache: - List all segments whose start positions not synced - List compacted segments Those scan shall cause lots of CPU time when flushed segment number is large meanwhile `Flushed` segments can be skipped in those two scenarios This PR make: - Add segment state shortcut in metacache - List start positions state before `Flushed` - Make compacted segments state to be `Dropped` and use `Dropped` state while scanning them --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-28 14:19:42 +08:00
congqixia	0e5765b116	enhance: Utilize `TestLocations` ability to accelerate write & compaction (#32948 ) See also #32642 This PR reuses hash locations for bloom filter prediction utilizing `storage.Location`, like enhancement #32642. Also adds a utility struct in storage: `LocationCache` to storage locations for variable K (numbers of hash functions) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-13 10:15:32 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
congqixia	ff1e967e89	enhance: Add segment id short cut for WithSegmentID filter (#31144 ) See also #31143 This PR add short cut for datanoe metacache `WithSegmentIDs` filter, which could just fetch segment from map with provided segmentIDs. Also add benchmark for new implementation vs old one. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-11 10:55:02 +08:00
congqixia	b111f3b110	enhance: Use RWMutex and change WLock to RLock (#30557 ) Related to #27675 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-06 17:13:56 +08:00
yihao.dai	7ce876a072	fix: Decoupling importing segment from flush process (#30402 ) This pr decoups importing segment from flush process by: 1. Exclude the importing segment from the flush policy, this approch avoids notifying the datanode to flush the importing segment, which may not exist. 2. When RootCoord call Flush, DataCoord directly set the importing segment state to `Flushed`. issue: https://github.com/milvus-io/milvus/issues/30359 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-02-03 13:01:12 +08:00
congqixia	b5e078c4d3	enhance: Remove current stats after RollStats action (#30391 ) See also #27675 BloomFilterSet.current shall be reset after RollStats, otherwise it will keep tracking whole segment data causing the false positive ratio larger than expected. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-31 18:55:04 +08:00
XuanYang-cn	3d46096f86	fix: Set segment level for comapct to segment (#30129 ) See also: #29204 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-01-19 18:52:53 +08:00
congqixia	a040692129	enhance: Use estimated batch size to initalize BF (#29842 ) See also: #27675 The bloom filter set initialized new BF with fixed configured `n`. This value is always larger than the actual batch size and causes generated BF using more memory. This PR make write buffer to initialize BF with estimated batch size from schema & configuration value. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-10 20:36:50 +08:00
Buqian Zheng	d506d33a8d	fix: meta cache in datanode incorrectly tracking row nums (#29817 ) ... of compacted segments issue: #29816 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-01-10 13:22:48 +08:00
MrPresent-Han	ed644983e2	enhance: add param for bloomfilter(#29388 ) (#29490 ) related: #29388 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2023-12-28 18:10:46 +08:00
congqixia	277849a915	enhance: separate serializer logic from sync task (#29413 ) See also #27675 Since serialization segment buffer does not related to sync manager can shall be done before submit into sync manager. So that the pk statistic file could be more accurate and reduce complex logic inside sync manager. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-26 10:40:47 +08:00
congqixia	a937e4c232	fix: segment may never get flushed if sealed before watch (#29436 ) See also #29092 `FlushSegments` transfer only `Growing` segment to flushing, if the segment is in `Sealed` state before Datanode watch channel, the state will never got satisfied for a segment be selected to be flushed. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-23 21:32:43 +08:00
Bingyi Sun	ad866d2889	feat: integrate storagev2 into index build process (#28995 ) issue: https://github.com/milvus-io/milvus/issues/28994 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-12-13 17:24:38 +08:00
congqixia	eaabe0293b	fix: Update segment compactTo when compactTo segment is compacted (#28755 ) Related to #28736 #28748 See also #27675 Previous PR: #28646 This PR fixes `SegmentNotFound` issue when compaction happens multiple times and the buffer of first generation segment is sync due to stale policy Now the `CompactSegments` API of metacache shall update the compactTo field of segmentInfo if the compactTo segment is also compacted to keep the bloodline clean Also, add the `CompactedSegment` SyncPolicy to sync the compacted segment asap to keep metacache clean Now the `SyncPolicy` is an interface instead of a function type so that when it selects some segments to sync, we colud log the reason and target segment Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-27 19:48:26 +08:00
congqixia	39be35804c	enhance: Add back clean compacted segment info logic (#28646 ) See also #27675 Compacted segment info shall be removed after all buffer belongs to it is sync-ed. This PR add the cleanup function after triggerSyncTask logic: - The buffer is stable and protected by mutex - Cleanup fetches compacted & non-sync segment - Remove segment info only there is no buffered maintained in manager --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-24 15:38:25 +08:00
Bingyi Sun	4fedff6d47	feat: integrate storage v2 into the write path (#28440 ) #28378 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-11-23 17:26:24 +08:00
congqixia	18dc6b61ce	enhance: fix LevelZero segment sync logic (#28482 ) See also #27675 - Fix LevelZero segment cannot be flushed - Add level option for syncTask - Invoke `AddSegment` when new LevelZero segment is allocated Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-17 21:46:20 +08:00
congqixia	0b905078e7	Use writebuffer, sync manager refactory in datanode (#28320 ) See also #27675 This PR make previously merged refactory of datanode go online - Use write node to replace insert/delete node - Use write buffer manager to control all buffers - Use sync manager to control sync tasks instead of flush manager Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-15 15:24:18 +08:00
XuanYang-cn	8037f35de7	Change MetaCache interface (#28388 ) See also: #28320, #27675 Signed-off-by: yangxuan <xuan.yang@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-14 15:08:19 +08:00
congqixia	bf2f62c1e7	Add `WriteBuffer` to provide abstraction for delta policy (#27874 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-04 12:10:17 +08:00
congqixia	1e51255c15	Implement `Injection` for SyncManager with block and meta transition (#28093 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-03 04:48:15 +08:00
congqixia	233bf90c55	Add SyncManager to replace flush manager (#27873 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-31 02:30:16 +08:00
congqixia	98e2aad752	Refine datanode metacache and implement CoW (#27985 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-28 11:12:11 +08:00
aoiasd	9091a27832	Add meta cache to datanode for L0 Delta (#27768 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2023-10-23 19:42:10 +08:00

28 Commits