276 Commits

Author SHA1 Message Date
congqixia
d3b15ac136
enhance: support pk isolation optional field data loading from manifest for index build (#46480)
### **User description**
Related to #44956

Add manifest-based data loading path for optional fields in
`cache_opt_field_memory_v2`. When a manifest file is provided in the
config, the function now retrieves field data directly from the manifest
using `GetFieldDatasFromManifest` instead of reading from segment insert
files. This enables storage v2 compatibility for building indexes with
optional fields.


___

### **PR Type**
Enhancement


___

### **Description**
- Add manifest-based data loading for optional fields in index building

- Support storage v2 compatibility via `GetFieldDatasFromManifest`
function

- Enable PK isolation optional field handling without segment insert
files


___

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-23 14:55:21 +08:00
Chun Han
f0265dde18
fix: catch exception from LoadWithStrategy(#46380) (#46381)
related: #46380

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-12-17 11:37:17 +08:00
zhagnlu
a86b8b7a12
enhance: move jsonshredding meta from parquet to meta.json (#46130)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-12-11 14:01:13 +08:00
cai.zhang
bb486c0db3
fix: Fix path concatenation error when rootPath = "." in minio (#46220)
issue: #46219

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-12-10 13:53:13 +08:00
congqixia
728cdc15b2
fix: fill partition_id in load index info and close RemoteOutputStream properly (#46203)
This PR fixes two issues related to segment loading and index
deserialization:

1. Fill partition_id in LoadIndexInfo when converting field index info,
which is required by cardinal (DiskANN) index deserialization.

2. Close RemoteOutputStream in destructor to ensure buffer flushed and
resources released properly.

issue: #46141

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-09 13:27:13 +08:00
congqixia
d4450b2f57
enhance: [StorageV2] Integrate CMEK support into Loon FFI interface (#46123)
This PR adds Customer Managed Encryption Keys (CMEK) support to the
StorageV2 FFI layer, enabling data encryption/decryption through the
cipher plugin system.

Changes:
- Add ffi_writer_c.cpp/h with GetEncParams() to retrieve encryption
parameters (key and metadata) from cipher plugin for data encryption
- Extend GetLoonReader() in ffi_reader_c.cpp to support CMEK decryption
by configuring KeyRetriever when plugin context is provided
- Add encryption property constants in ffi_common.go for writer config
- Integrate CMEK encryption in NewFFIPackedWriter() to pass encryption
parameters to the underlying storage writer

issue: #44956

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-05 17:59:12 +08:00
congqixia
3daff1ab2b
enhance: use specified manifest version in loon ffi reader (#46101)
Related to #44956

Use the exact manifest version from the path parameter instead of always
fetching the latest manifest. This ensures data consistency by reading
from the specific version that was requested.

Changes:
- Update GetColumnGroups to use transaction.begin(version) with the
specified version from the path JSON
- Replace get_latest_manifest() with get_current_manifest() after
beginning transaction at the target version
- Update Go FFI binding to call get_column_groups_by_version instead of
get_latest_column_groups
- Remove unused GetManifest function from util.cpp/util.h
- Bump milvus-storage version from 5fff4f5 to 33bf815

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-05 11:45:11 +08:00
Ted Xu
20ce9fdc23
feat: bump loon version (#46029)
See: #44956

This PR upgrades loon to the latest version and resolves building
conflicts.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
2025-12-04 10:57:12 +08:00
Spade A
3fc309bdfc
fix: add more logs related to tantivy upload/cache (#46019)
issue: https://github.com/milvus-io/milvus/issues/45590

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-12-03 10:47:09 +08:00
sparknack
8ef35de7ca
enhance: always use buffered io for high load priority (#45900)
issue: #43040

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-29 00:03:08 +08:00
congqixia
d2e4278b18
enhance: use milvus-storage internal C++ Reader API for Loon FFI (#45897)
This PR refactors the Loon FFI reader implementation to use
milvus-storage's internal C++ Reader API directly instead of the
external FFI interface.

Key changes:
- Replace external FFI calls (get_record_batch_reader, reader_destroy)
with direct C++ Reader API calls
- Add GetLoonReader() helper function to create Reader instances using
milvus-storage::api::Reader::create()
- Use MakeInternalPropertiesFromStorageConfig() instead of
MakePropertiesFromStorageConfig() to get internal properties
- Update NewPackedFFIReaderWithManifest() to deserialize column groups
from JSON manifest content directly
- Simplify GetFFIReaderStream() to use Reader::get_record_batch_reader()
and arrow::ExportRecordBatchReader() for Arrow stream export
- Change CFFIPackedReader typedef from ReaderHandle to void* for
flexibility
- Update milvus-storage dependency version to ba7df7b

This change improves code maintainability by using the native C++ API
directly and eliminates the overhead of going through the external FFI
layer.

issue: #44956

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-28 18:55:07 +08:00
zhagnlu
1b58844319
enhance: support mmap for jsonstats shared key index (#44914)
#42533

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-11-27 16:01:08 +08:00
Xiaofan
f455910bee
fix: support azure blob storage with federated token (#45632)
fix #44582 
related to #44583
Co-authored-by: DuMinhLe<https://github.com/ducminhle>

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-11-27 14:29:07 +08:00
congqixia
3f8c146831
enhance: support manifest-based index building with Loon FFI reader (#45726)
This PR adds support for reading data from StorageV2 using manifest
files and the Loon FFI interface during index building, providing an
alternative to the traditional segment insert files approach.

Key changes:

Core C++ changes:
- Add SEGMENT_MANIFEST_KEY and LOON_FFI_PROPERTIES_KEY constants for
manifest handling
- Extend FileManagerContext to carry loon_ffi_properties for FFI
operations
- Update index_c.cpp to pass manifest and loon properties to file
managers for all index types (vector, JSON key, text)
- Implement GetFieldDatasFromManifest() in Util.cpp using Arrow C Stream
interface:
  * Create Arrow schema from field metadata
  * Initialize FFI reader with manifest content and storage properties
  * Import record batches from C data interface
  * Convert to FieldData for index building
- Update DiskFileManagerImpl and MemFileManagerImpl to support
manifest-based data reading with fallback to traditional paths

Loon FFI utilities (internal/core/src/storage/loon_ffi/):
- Add ToCStorageConfig() to convert StorageConfig to C-compatible
structure
- Implement GetManifest() to parse manifest JSON and retrieve column
groups via FFI
- Enhance MakePropertiesFromStorageConfig() integration

Storage V2 integration:
- Update milvus-storage dependency from 0883026 to 302143c for latest
FFI support

Protobuf changes:
- Add manifest field to BuildIndexInfo for passing manifest path to C++
layer

Configuration:
- Add common.storageV2.useLoonFFI config option (default: false) for
feature toggle

This change is part of issue #44956 to integrate the StorageV2 FFI
interface as the unified storage layer. The implementation maintains
backward compatibility by checking for manifest presence and falling
back to existing segment insert files approach when manifest is not
provided.

Related issue: #44956

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-26 12:43:08 +08:00
congqixia
03f5d7c0a5
enhance: integrate StorageV2 FFI interface for manifest-based segment loading (#45798)
Related to #44956

**New Translator (C++)**
- Added `ManifestGroupTranslator`
(`internal/core/src/segcore/storagev2translator/`)
  - Translates manifest-based column groups to Milvus internal format
  - Implements `GroupCTMeta` interface for chunk-based column access
  - Supports both memory and mmap storage modes
  - Handles cache warmup policies for vector and scalar data

**ChunkedSegmentSealedImpl**
(`internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp:333`)
- Added `LoadColumnGroups(const std::string& manifest_path)`: Main entry
point for manifest-based loading
  - Creates milvus-storage Reader from manifest file
  - Parallelizes column group loading using thread pool
  - Aggregates loading exceptions and reports errors
- Added `LoadColumnGroup()`: Loads individual column group
  - Extracts field IDs from column group metadata
  - Creates ManifestGroupTranslator for each column group
  - Builds ProxyChunkColumn for field access
  - Special handling for timestamp field index construction

**SegmentGrowingImpl**
(`internal/core/src/segcore/SegmentGrowingImpl.cpp`)
- Added similar `LoadColumnGroups()` and `LoadColumnGroup()` methods for
growing segments
- Maintains consistency with sealed segment loading path

Storage FFI Utilities

**loon_ffi/util** (`internal/core/src/storage/loon_ffi/util.cpp`)
- Added `MakeInternalPropertiesFromStorageConfig()`: Converts C storage
config to internal Properties
  - Maps all storage configuration fields (S3, GCS, Azure, local)
  - Handles SSL, IAM, virtual host settings
  - Configures connection timeouts and max connections
- Added `MakeInternalLocalProperies()`: Creates local filesystem
properties
- Added `ToCStorageConfig()`: Converts Go StorageConfig to C
representation
- Added `GetColumnGroups()`: Extracts column groups from manifest file
using Transaction API

Protocol Buffer Changes

**segcore.proto** (`pkg/proto/segcore.proto:121`)
- Added `manifest_path` field to `SegmentLoadInfo` message
- Enables passing manifest file path from Go layer to C++ core

Go Integration

**segment.go** (`internal/util/segcore/segment.go:372`)
- Updated `ConvertToSegcoreSegmentLoadInfo()` to propagate
`ManifestPath` field
- Bridges QueryNode segment load info to Segcore format

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-25 17:27:07 +08:00
congqixia
c01fd94a6a
enhance: integrate Storage V2 FFI interface for unified storage access (#45723)
Related #44956
This commit integrates the Storage V2 FFI (Foreign Function Interface)
interface throughout the Milvus codebase, enabling unified storage
access through the Loon FFI layer. This is a significant step towards
standardizing storage operations across different storage versions.

1. Configuration Support
- **configs/milvus.yaml**: Added `useLoonFFI` configuration flag under
`common.storage.file.splitByAvgSize` section
- Allows runtime toggle between traditional binlog readers and new
FFI-based manifest readers
  - Default: `false` (maintains backward compatibility)

2. Core FFI Infrastructure

Enhanced Utilities (internal/core/src/storage/loon_ffi/util.cpp/h)
- **ToCStorageConfig()**: Converts Go's `StorageConfig` to C's
`CStorageConfig` struct for FFI calls
- **GetManifest()**: Parses manifest JSON and retrieves latest column
groups using FFI
  - Accepts manifest path with `base_path` and `ver` fields
  - Calls `get_latest_column_groups()` FFI function
  - Returns column group information as string
  - Comprehensive error handling for JSON parsing and FFI errors

3. Dependency Updates
- **internal/core/thirdparty/milvus-storage/CMakeLists.txt**:
  - Updated milvus-storage version from `0883026` to `302143c`
  - Ensures compatibility with latest FFI interfaces

4. Data Coordinator Changes

All compaction task builders now include manifest path in segment
binlogs:

- **compaction_task_clustering.go**: Added `Manifest:
segInfo.GetManifestPath()` to segment binlogs
- **compaction_task_l0.go**: Added manifest path to both L0 segment
selection and compaction plan building
- **compaction_task_mix.go**: Added manifest path to mixed compaction
segment binlogs
- **meta.go**: Updated metadata completion logic:
- `completeClusterCompactionMutation()`: Set `ManifestPath` in new
segment info
- `completeMixCompactionMutation()`: Preserve manifest path in compacted
segments
- `completeSortCompactionMutation()`: Include manifest path in sorted
segments

5. Data Node Compactor Enhancements

All compactors updated to support dual-mode reading (binlog vs
manifest):

6. Flush & Sync Manager Updates

Pack Writer V2 (pack_writer_v2.go)
- **BulkPackWriterV2.Write()**: Extended return signature to include
`manifest string`
- Implementation:
  - Generate manifest path: `path.Join(pack.segmentID, "manifest.json")`
  - Write packed data using FFI-based writer
  - Return manifest path along with binlogs, deltas, and stats

Task Handling (task.go)
- Updated all sync task result handling to accommodate new manifest
return value
- Ensured backward compatibility for callers not using manifest

7. Go Storage Layer Integration

New Interfaces and Implementations
- **record_reader.go**: Interface for unified record reading across
storage versions
- **record_writer.go**: Interface for unified record writing across
storage versions
- **binlog_record_writer.go**: Concrete implementation for traditional
binlog-based writing

Enhanced Schema Support (schema.go, schema_test.go)
- Schema conversion utilities to support FFI-based storage operations
- Ensures proper Arrow schema mapping for V2 storage

Serialization Updates
- **serde.go, serde_events.go, serde_events_v2.go**: Updated to work
with new reader/writer interfaces
- Test files updated to validate dual-mode serialization

8. Storage V2 Packed Format

FFI Common (storagev2/packed/ffi_common.go)
- Common FFI utilities and type conversions for packed storage format

Packed Writer FFI (storagev2/packed/packed_writer_ffi.go)
- FFI-based implementation of packed writer
- Integrates with Loon storage layer for efficient columnar writes

Packed Reader FFI (storagev2/packed/packed_reader_ffi.go)
- Already existed, now complemented by writer implementation

9. Protocol Buffer Updates

data_coord.proto & datapb/data_coord.pb.go
- Added `manifest` field to compaction segment messages
- Enables passing manifest metadata through compaction pipeline

worker.proto & workerpb/worker.pb.go
- Added compaction parameter for `useLoonFFI` flag
- Allows workers to receive FFI configuration from coordinator

10. Parameter Configuration

component_param.go
- Added `UseLoonFFI` parameter to compaction configuration
- Reads from `common.storage.file.useLoonFFI` config path
- Default: `false` for safe rollout

11. Test Updates
- **clustering_compactor_storage_v2_test.go**: Updated signatures to
handle manifest return value
- **mix_compactor_storage_v2_test.go**: Updated test helpers for
manifest support
- **namespace_compactor_test.go**: Adjusted writer calls to expect
manifest
- **pack_writer_v2_test.go**: Validated manifest generation in pack
writing

This integration follows a **dual-mode approach**:
1. **Legacy Path**: Traditional binlog-based reading/writing (when
`useLoonFFI=false` or no manifest)
2. **FFI Path**: Manifest-based reading/writing through Loon FFI (when
`useLoonFFI=true` and manifest exists)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-24 19:57:07 +08:00
congqixia
79926b412c
fix: protect tbb concurrent_map emplace to avoid race condition deadlock (#45681)
Related to #44974

The emplace() operation on tbb::concurrent_hash_map was not protected,
allowing other threads to erase entries between the emplace attempt and
the subsequent lookup.

Solution:
1. Add shared_lock protection around the emplace() operation to prevent
concurrent erasure during insertion
2. Instead of returning nullptr when the key is not found on retry,
recursively call Get(key) to retry the entire operation
3. Fix typo: "earsed" -> "erased"

This ensures that concurrent Get() operations are properly synchronized
and will eventually succeed even under high contention.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-20 11:57:06 +08:00
Gao
09a3195867
enhance: support max_connections config for remote storage (#45225)
related: https://github.com/milvus-io/milvus/issues/45344

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-11-13 15:37:38 +08:00
Chun Han
69f3aab229
feat: milvus support huawei cloud iam verification(#45298) (#45457)
related: #45298

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-11-11 14:41:41 +08:00
sparknack
9032bb7668
enhance: unify the aligned buffer for both buffered and direct I/O (#45323)
issue: #43040

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-06 10:53:33 +08:00
yihao.dai
121eb912ba
fix: Fix load segment failed due to get disk usage error (#45255)
When getting disk usage, files or directories may be removed
concurrently due to segment release. This PR ignores “file or directory
does not exist” errors in such cases.

issue: https://github.com/milvus-io/milvus/issues/45239

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-11-06 08:51:33 +08:00
congqixia
55bfd610b6
enhance: [StorageV2] Integrate FFI interface for packed reader (#45132)
Related to #44956

Integrate the StorageV2 FFI interface as the unified storage layer for
reading packed columnar data, replacing the custom iterative reader with
a manifest-based approach using the milvus-storage library.

Changes:
- Add C++ FFI reader implementation (ffi_reader_c.cpp/h) with Arrow C
Stream interface
- Implement utility functions to convert CStorageConfig to
milvus-storage Properties
- Create ManifestReader in Go that generates manifests from binlogs
- Add FFI packed reader CGO bindings (packed_reader_ffi.go)
- Refactor NewBinlogRecordReader to use ManifestReader for V2 storage
- Support both manifest file paths and direct manifest content
- Enable configurable buffer sizes and column projection

Technical improvements:
- Zero-copy data exchange using Arrow C Data Interface
- Optimized I/O operations through milvus-storage library
- Simplified code path with manifest-based reading
- Better performance with batched streaming reads

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-11-05 19:57:34 +08:00
sparknack
40b5e6b134
fix: avoid potential race conditions when updating the executor (#45230)
issue: #43040

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-11-04 14:25:33 +08:00
Spade A
cd0b36c39e
feat: impl StructArray -- support diskann index (#45223)
issue: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-11-04 11:57:33 +08:00
congqixia
199f6d936e
fix: Update milvus-storage to fix duplicate AWS SDK initialization (#45062)
Update milvus-storage version from aa189ad to e5f5b4c to include the fix
for duplicate AWS SDK initialization that was causing init conflicts.

This update removes the redundant arrow::fs::InitializeS3() call that
was resulting in duplicate Aws::InitAPI() initialization. The duplicate
initialization was causing AWS SDK to ignore custom configurations,
particularly affecting GCP Workload Identity authentication.

Changes in milvus-storage e5f5b4c:
- Remove redundant arrow::fs::InitializeS3() call
- Keep only the extended S3 initialization with custom AWS SDK options
- Ensure GCP IAM authentication via custom HTTP client factory works
correctly

Relates to #44745
Reference: milvus-io/milvus-storage#285

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-24 11:32:05 +08:00
cai.zhang
3d11ba06ef
fix: Double check to avoid iter has been earsed by other thread (#45013)
issue: #44974

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-21 23:36:04 +08:00
cai.zhang
a35a3b7c69
fix: Ensure fulfill promise when CreateArrowFileSystem throw an exception (#44975)
issue: #44974

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-10-20 23:32:03 +08:00
congqixia
27dbb8e75d
fix: support JSON default value in CreateArrowScalarFromDefaultValue (#44912)
Related to #44897

Add missing JSON data type handling in CreateArrowScalarFromDefaultValue
to fix query failures when dynamic fields are enabled. JSON default
values are now properly converted to arrow::BinaryScalar using
bytes_data().

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-17 18:22:00 +08:00
Spade A
c4f3f0ce4c
feat: impl StructArray -- support more types of vector in STRUCT (#44736)
ref: https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-10-15 10:25:59 +08:00
congqixia
5ece760d73
fix: Pass fs via FileManagerContext when loading index (#44733)
Related to #44615

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-11 09:55:57 +08:00
congqixia
8a443c699e
fix: Make aws credential provider singleton (#44687)
Related to #44647

This patch make milvus-storage using singleton credential provider in
case of data race when concurrent index build task recieved.

See also milvus-io/milvus-storage#44647

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 16:11:58 +08:00
congqixia
1d85b83215
enhance: [backlog] Fix unittest and remove fs fallback logic (#44615)
Related to #44535

This PR:
- Fix the unittest creating `DiskFileManagerImpl` without `filesystem`
- Add comments for methods need `fs_`
- Remove fallback creation and add assertion for nullptr fs

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-10-09 10:41:57 +08:00
cai.zhang
19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
foxspy
13c3b0b909
enhance: add autoindex configuration for the int8 vector type (#44554)
issue: #38666 

Add int8 support for autoindex to ensure it can be independently
configured. At the same time, remove the restriction on int8 type for
vectorDiskIndex (note that vectorDiskIndex only determines the building
and loading method of the index, not the index type).

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-09-24 17:48:04 +08:00
congqixia
ea307ea3c9
fix: [StorageV2] Make DiskFileManager use fs from context (#44535)
Related to #44534

Datanode shall not use singleton fs after 2.6+. This patch make disk
file manager use filesystem passed by fileManagerContext instead of
errorous singleton one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-24 10:12:03 +08:00
Tianx
2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
congqixia
7b83314bf3
enhance: [StorageV2] Make datanode use non-singleton fs (#44418)
Related to #39173

According to the current design, datanode shall create fs from storage
config in request instead of using singleton fs. This PR upgrade
milvus-storage and make packed reader/writer compose new fs from storage
config.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-18 20:06:00 +08:00
sthuang
2f70a73258
fix: turn on azure by default (#44377)
related: #44354, #44138, #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-17 10:12:01 +08:00
sthuang
b38013352d
enhance: [StorageV2] enable build with azure (#44177)
related: #43869

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-09-14 08:05:58 +08:00
congqixia
f5618d5153
enhance: [StorageV2] Utilized advance split policy and persist in meta (#44282)
Related to #44257

This PR:
- Utilize configurable split policy for storage v2, enabling system
field policy
- Store split result in field binlog struct
- Adapt legacy binlog without child fields

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-09-10 14:47:57 +08:00
sparknack
4a01c726f3
enhance: cachinglayer: some metric and params update (#44276)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-09-10 11:03:57 +08:00
Buqian Zheng
9bf2b5c10c
enhance: moved more segcore unit test files (#44210)
issue: https://github.com/milvus-io/milvus/issues/43931

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-08 10:21:55 +08:00
Buqian Zheng
b76bf13fc3
enhance: move c++ unit test file to aside of the production code (#43932)
issue: https://github.com/milvus-io/milvus/issues/43931

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-09-03 23:45:53 +08:00
Spade A
7cb15ef141
feat: impl StructArray -- optimize vector array serialization (#44035)
issue: https://github.com/milvus-io/milvus/issues/42148

Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray  → Memory

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-03 16:39:53 +08:00
zhagnlu
fc876639cf
enhance: support json stats with shredding design (#42534)
#42533

Co-authored-by: luzhang <luzhang@zilliz.com>
2025-09-01 10:49:52 +08:00
congqixia
e3b3502287
fix: Use correct regex for cppcheck (#44077)
Related to #44076

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-27 20:57:50 +08:00
marcelo-cjl
e13e19cd2c
enhance: add sparse_u32_f32 data type for sparse vertor (#43974)
issue: #43973

Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
2025-08-27 16:47:50 +08:00
Chun Han
da156981c6
feat: milvus support posix-compatible mode(milvus-io#43942) (#43944)
related: #43942

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-27 16:29:50 +08:00
XuanYang-cn
37a447d166
feat: Add CMEK cipher plugin (#43722)
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2

See also: #40321 
Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-08-27 11:15:52 +08:00
Tianx
c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00