issue: https://github.com/milvus-io/milvus/issues/27467
>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
>- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz
---
The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.
## M8 Default Timezone
We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.
For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.
## M5: Comparison Operators
We can now use the following expression format to filter on the
timestamptz field:
- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `
- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.
- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.
- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".
## M6: Extract
We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.
## M7: Indexing Support
Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.
---
After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.
> for more information, see https://en.wikipedia.org/wiki/ISO_8601
---------
Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
issue: #42032#44212
- special case for warmup param and cell storage size for tiered index
- add a config to enable/disable storage usage tracking
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
issue: #44123
- support replicate message in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44212
Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
relate: https://github.com/milvus-io/milvus/issues/41035
This PR adds support for a gRPC-based tokenizer.
- The protobuf definition was added in
[milvus-proto#445](https://github.com/milvus-io/milvus-proto/pull/445).
- Based on this, the corresponding Rust client code was generated and
added under `tantivi-binding`.
- The generated file is `milvus.proto.tokenizer.rs`.
I'm not very experienced with Rust, so there might be parts of the code
that could be improved.
I’d appreciate any suggestions or improvements.
---------
Signed-off-by: park.sanghee <park.sanghee@navercorp.com>
issue: #42942
This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>
issue: #43897
- The original broadcast ack operation need to recover message from
etcd, which can not support cdc.
- immutable message will set as the ack parameter to fix it.
Signed-off-by: chyezh <chyezh@outlook.com>
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2
See also: #40321
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #43785
- pulsar client will print log into milvus logger now.
- pulsar client open the metric by default.
- upgrade the pulsar client to v0.15.1, and use offical repo.
- the fixing of milvus-io/pulsar-client-go is already covered by
official v0.15.1.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: https://github.com/milvus-io/milvus/issues/43819
Before this fix: null elements are converted to zero or empty strings
After this fix: import job will return error "array element is not
allowed to be null value for field xxx"
Signed-off-by: yhmo <yihua.mo@zilliz.com>
- Add enable configuration for all model service providers
- Migrate environment variables from MILVUSAI_* to MILVUS_* prefix with
backward compatibility
- Unify model service enable/disable logic using configuration
- Add tests for environment variable parsing with fallback support
#35856
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148
This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.
The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```
Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Related to #43230
This PR
- Move segcore setup function to `initcore` package to remove cgo
dependency from pkg
- Register core callback only for components depends on segcore
- Rectify `UpdateLogLevel` implementation
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Each ColumnReader consumes ReaderProperties.BufferSize memory
independently. Therefore, the bufferSize should be divided by the number
of columns to ensure total memory usage stays within the intended limit.
issue: https://github.com/milvus-io/milvus/issues/43755
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #43745
Add timestamp filtering capability to L0Reader to match the
functionality available in the regular Reader. This enhancement allows
filtering delete records based on timestamp range during L0 import
operations.
Changes include:
- Add tsStart and tsEnd fields to l0Reader struct for timestamp
filtering
- Modify NewL0Reader function signature to accept tsStart and tsEnd
parameters
- Implement timestamp filtering logic in Read method to skip records
outside the specified range
- Update L0ImportTask and L0PreImportTask to parse timestamp parameters
from request options and pass them to NewL0Reader
- Add comprehensive test case TestL0Reader_ReadWithTsFilter to verify ts
filtering functionality using mockey framework
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Use blocking memory allocation to wait until memory becomes available
2. Perform memory allocation at the file level instead of per task
3. Limit Parquet file reader batch size to prevent excessive memory
consumption
4. Limit import buffer size from 20% to 10% of total memory
issue: https://github.com/milvus-io/milvus/issues/43387,
https://github.com/milvus-io/milvus/issues/43131
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>