milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2026-01-05 18:31:59 +08:00

Author	SHA1	Message	Date
Spade A	3fc309bdfc	fix: add more logs related to tantivy upload/cache (#46019 ) issue: https://github.com/milvus-io/milvus/issues/45590 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-12-03 10:47:09 +08:00
Spade A	b8df1c0cc5	enhance: improve observability in trace for segcore scalar expression (#44260 ) Ref https://github.com/milvus-io/milvus/issues/44259 This PR connects the trace between go and segcore, and add full traces for scalar expression calling chain: <img width="2418" height="960" alt="image" src="https://github.com/user-attachments/assets/8cad69d7-bcb7-4002-a4e3-679a3641e229" /> <img width="2452" height="850" alt="image" src="https://github.com/user-attachments/assets/8b44aed0-0f03-48a7-baa0-b022fee994ce" /> <img width="2403" height="707" alt="image" src="https://github.com/user-attachments/assets/cd6f0601-0d5c-4087-8ed8-2385f1bc740b" /> --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-10-14 17:15:59 +08:00
Buqian Zheng	75557f3eb8	enhance: Use std::shared_lock and std::unique_lock for mutexes (#44459 ) issue: https://github.com/milvus-io/milvus/issues/44452 Signed-off-by: zhengbuqian <zhengbuqian@gmail.com> Co-authored-by: buqian.zheng <buqian.zheng@zilliz.com>	2025-09-22 18:02:09 +08:00
zhagnlu	8934c18792	enhance: support cache result cache for expr (#43923 ) issue: #43878 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-08-26 10:55:52 +08:00
sparknack	4fae074d56	enhance: add write rate limit for disk file writer (#43912 ) issue: #43040 --------- Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>	2025-08-25 10:27:47 +08:00
Bingyi Sun	742d72a6c2	fix: Fix wrong null offsets for json path index (#43390 ) issue: https://github.com/milvus-io/milvus/issues/43315 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-26 17:26:54 +08:00
Buqian Zheng	389104d200	enhance: rename PanicInfo to ThrowInfo (#43384 ) issue: #41435 this is to prevent AI from thinking of our exception throwing as a dangerous PANIC operation that terminates the program. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-07-19 20:22:52 +08:00
Spade A	8612a2c946	enhance: optimize in by batch-in (#43268 ) fix: https://github.com/milvus-io/milvus/issues/43267 --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-17 19:40:52 +08:00
Bingyi Sun	1b8c958cff	enhance: fix tantivy wrapper is freed after json flat executor is destructed (#43233 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-07-16 10:58:50 +08:00
Spade A	fce0bbe2ae	fix: remove redundant locks for null_offset (#43103 ) Ref: https://github.com/milvus-io/milvus/issues/40308 https://github.com/milvus-io/milvus/pull/40363 add lock for protecting concurrent read/write for null offset. But we don't need this for sealed segment. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-07-04 10:10:45 +08:00
Spade A	50f7579d8f	fix: fix some bugs discovered by chaos tests (#42906 ) fix: https://github.com/milvus-io/milvus/issues/42870 This PR fixes: 1. SetBitset fn shuold consider growing segments with concurrent write 2. avoid using from_raw_parts directly --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-24 16:32:42 +08:00
Spade A	e2c85eec81	fix: load stats index based on mmap config (#42788 ) ref https://github.com/milvus-io/milvus/issues/42626 This PR makes text match index and json key stats index be loaded based on mmap config. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-19 10:10:39 +08:00
Spade A	80f1d707f7	fix: tidy up path for scalar index (#42676 ) Ref #42626 This path tidy up path for scalar index including path for loading index from remote storage and temporary path for buliding index. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-06-18 00:42:38 +08:00
Chun Han	001619aef9	feat: supporing load priority for loading (#42413 ) related: #40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-17 15:22:38 +08:00
Bingyi Sun	fbf5cb4e62	feat: Add json flat index (#39917 ) issue: https://github.com/milvus-io/milvus/issues/35528 This PR introduces a JSON flat index that allows indexing JSON fields and dynamic fields in the same way as other field types. In a previous PR (#36750), we implemented a JSON index that requires specifying a JSON path and casting a type. The only distinction lies in the json_cast_type parameter. When json_cast_type is set to JSON type, Milvus automatically creates a JSON flat index. For details on how Tantivy interprets JSON data, refer to the [tantivy documentation](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#pitfalls-limitation-and-corner-cases). Limitations Array handling: Arrays do not function as nested objects. See the [limitations section](https://github.com/quickwit-oss/tantivy/blob/main/doc/src/json.md#arrays-do-not-work-like-nested-object) for more details. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-10 19:14:35 +08:00
congqixia	cc42d49769	fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186 ) Related to #39173 #39718 In storage v2, the `lack_bin_rows` cannot be used since field id is not column group id, which will not be matched forever. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-05-31 02:44:30 +08:00
zhagnlu	39e7ad33d7	enhance: add optimize for like expr (#41066 ) #41065 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-05-08 14:28:52 +08:00
Buqian Zheng	73bbf4c674	fix: error when lack_binlog_rows = 0 (#41644 ) issue: https://github.com/milvus-io/milvus/issues/41643 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-05-04 00:24:56 +08:00
sthuang	6c377b6e86	feat: Storage v2 index and stats raw data (#41534 ) related: #39173 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-04-30 08:48:54 +08:00
Spade A	5b1430f27e	enhance: tantivy collector set bitset directly (#39748 ) fix: #39755 The following shows a simple benchmark where insert 1M docs where all rows are "hello", the latency is segcore level, CPU is 9900K: master: 2.62ms this PR: 2.11ms bench mark code: ``` TEST(TextMatch, TestPerf) { auto schema = GenTestSchema({}, true); auto seg = CreateSealedSegment(schema, empty_index_meta); int64_t N = 1000000; uint64_t seed = 19190504; auto raw_data = DataGen(schema, N, seed); auto str_col = raw_data.raw_->mutable_fields_data() ->at(1) .mutable_scalars() ->mutable_string_data() ->mutable_data(); for (int64_t i = 0; i < N - 1; i++) { str_col->at(i) = "hello"; } SealedLoadFieldData(raw_data, *seg); seg->CreateTextIndex(FieldId(101)); auto now = std::chrono::high_resolution_clock::now(); auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch); auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - now); std::cout << "TextMatch query time: " << duration.count() << "ms" << std::endl; } ``` --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-20 23:02:41 +08:00
Chun Han	59b14d38f5	enhance: Optimize index format for improved load performance(#40838 ) (#40839 ) related: https://github.com/milvus-io/milvus/issues/40838 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-15 03:10:30 +08:00
Bingyi Sun	bf617115ca	enhance: Remove single chunk segment related codes (#39249 ) https://github.com/milvus-io/milvus/issues/39112 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 18:56:29 +08:00
Spade A	9ce3e3cb44	enhance: add documents in batch for json key stats (#41228 ) issue: https://github.com/milvus-io/milvus/issues/40897 After this, the document add operations scheduling duration is decreased roughly from 6s to 0.9s for the case in the issue. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-11 14:08:26 +08:00
Bingyi Sun	b9b8419cbf	fix: Use int32 when creating array index for element type int8/int16 (#41185 ) issue: #41172 Elements with type int8 or int16 in Array is encoded using int32, so we should parse it as int32 when creating index. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-04-11 13:18:25 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
zhagnlu	3ed23a5f48	fix: fix remove index type failed when remote storage is local mode (#41164 ) #41142 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-09 16:42:26 +08:00
Spade A	f552ec67dd	fix: support building tantivy index with low version(5) (#40822 ) fix: https://github.com/milvus-io/milvus/issues/40823 To solve the problem in the issue, we have to support building tantivy index with low version for those query nodes with low tantivy version. This PR does two things: 1. refactor codes for IndexWriterWrapper to make it concise 2. enable IndexWriterWrapper to build tantivy index by different tantivy crate --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-04-02 18:46:20 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
cai.zhang	e5f50076ec	enhance: Only check element type with not null array (#40446 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-03-11 14:58:07 +08:00
Bingyi Sun	0a7e692b6f	fix: Fix null offset loading in inverted index (#40523 ) issue: #40516 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-10 22:12:04 +08:00
smellthemoon	faae8ee518	fix: store wrong offset when build tantivy in nullable field (#40452 ) #40454 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-03-09 09:34:04 +08:00
Spade A	3db56560fb	fix: fix concurrent issues in null offset (#40363 ) issue: #40308 This issue fixes these two concurrent issues: 1. element in null_offset is used to set bitset where the size of bitset is initialized by tantivy document count. However, there may still be some documents that are not committed in tantivy but are null in null_offset. So array out of range occurs. 2. null_offset can be read and write concurrently but there's no synchronization protection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-03-05 17:48:00 +08:00
Spade A	d34d70582d	fix: fix misleading name _add_multi_ (#39997 ) fix: #39995 Signed-off-by: SpadeA <tangchenjie1210@gmail.com>	2025-02-21 16:45:55 +08:00
Bingyi Sun	b59555057d	feat: support json index (#36750 ) https://github.com/milvus-io/milvus/issues/35528 This PR adds json index support for json and dynamic fields. Now you can only do unary query like 'a["b"] > 1' using this index. We will support more filter type later. basic usage: ``` collection.create_index("json_field", {"index_type": "INVERTED", "params": {"json_cast_type": DataType.STRING, "json_path": 'json_field["a"]["b"]'}}) ``` There are some limits to use this index: 1. If a record does not have the json path you specify, it will be ignored and there will not be an error. 2. If a value of the json path fails to be cast to the type you specify, it will be ignored and there will not be an error. 3. A specific json path can have only one json index. 4. If you try to create more than one json indexes for one json field, sdk(pymilvus<=2.4.7) may return immediately because of internal implementation. This will be fixed in a later version. --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-02-15 14:06:15 +08:00
Spade A	8c4ba70a4c	fix: enable to build index with single segment (#39233 ) fix https://github.com/milvus-io/milvus/issues/39232 --------- Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>	2025-01-16 11:01:06 +08:00
Zhen Ye	3e788f0fbd	enhance: record memory size (uncompressed) item for index (#38770 ) issue: #38715 - Current milvus use a serialized index size(compressed) for estimate resource for loading. - Add a new field `MemSize` (before compressing) for index to estimate resource. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-14 10:33:06 +08:00
Bingyi Sun	f0cddfd160	fix: Fix panic caused by removing directory (#38622 ) https://github.com/milvus-io/milvus/issues/38604 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-01-06 10:54:54 +08:00
cai.zhang	ba3c2e6fb1	fix: Only generate the index_null_offset file when the field support null value (#38833 ) issue: #38832 Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-12-30 18:02:52 +08:00
zhagnlu	9afcc5bc5c	fix:fix incorrect dir operations when create or load inverted index (#38359 ) #37944 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-12-17 20:06:45 +08:00
smellthemoon	9b6dd23f8e	fix: wrong path spelling when use rootpath in segcore (#37453 ) #36532 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-11-07 11:22:25 +08:00
Bingyi Sun	23b95aeba3	fix: remove element type check (#35828 ) https://github.com/milvus-io/milvus/issues/36275 Array's element type is not same with schema's. It is INT32 for INT16 and INT8 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-09-18 11:37:10 +08:00
Jiquan Long	89bf226f0b	feat: support keyword text match (#35923 ) fix: #35922 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-09-10 15:11:08 +08:00
smellthemoon	b51b4a2838	fix: try get not exist file after upgrade (#35740 ) https://github.com/milvus-io/milvus/issues/35741 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-29 11:09:01 +08:00
Jiquan Long	a52ba3d09d	enhance: allow many segments for inverted index (#35616 ) fix: https://github.com/milvus-io/milvus/issues/35615 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-08-28 11:30:59 +08:00
zhagnlu	4d2f96c760	enhance: support bitmap mmap (#35399 ) #32900 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-08-27 16:34:59 +08:00
smellthemoon	80dbe87759	enhance: support null value in index (#35238 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-16 15:30:54 +08:00
zhagnlu	c19fe95154	fix: support string match for hybrid and bitmap index (#35294 ) #34841 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-08-07 09:54:22 +08:00
Jiquan Long	91df03afe8	feat: put inverted index into ram (#35222 ) fix: https://github.com/milvus-io/milvus/issues/35224 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-08-06 11:54:16 +08:00
zhenshan.cao	aa247f192d	enhance: remove unused code for StorageV2 (#35132 ) issue: https://github.com/milvus-io/milvus/issues/34168 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-08-01 12:08:13 +08:00
smellthemoon	5616b7e8d2	enhance: support null in c data_datacodec and load null value (#32183 ) 1. support read and write null in segcore will store valid_data(use uint8_t type to save memory) in fieldData. 2. support load null binlog reader read and write data into column(sealed segment), insertRecord(growing segment). In sealed segment, store valid_data directly. In growing segment, considering prior implementation and easy code reading, it covert uint8_t to fbvector<bool>, which may optimize in future. 3. retrieve valid_data. parse valid_data in search/query. #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-23 16:07:51 +08:00

1 2

62 Commits