related issue: #44425
1. split insert.py into a few files: upsert.py, insert.py,
partial_upsert.py ...
2. add test for allow insert auto id
---------
Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
Remove stopTask.
Replace multiple task tracking maps with single unified taskState map.
Fix slot tracking, improve state transitions, and add comprehensive test
See also: #44714
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
The collection not found err could contains db id in err message, which
is not meaningful to users.
This patch make error message wrapping dbname instead of db id.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43427
This pr's main goal is merge #37417 to milvus 2.5 without conflicts.
# Main Goals
1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
issue: https://github.com/milvus-io/milvus/issues/27467
>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
>- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz
---
The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.
## M8 Default Timezone
We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.
For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.
## M5: Comparison Operators
We can now use the following expression format to filter on the
timestamptz field:
- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `
- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.
- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.
- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".
## M6: Extract
We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.
## M7: Indexing Support
Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.
---
After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.
> for more information, see https://en.wikipedia.org/wiki/ISO_8601
---------
Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
issue: #44373
The current commit implements sparse filtering in query tasks using the
statistical information (Bloom filter/MinMax) of the Primary Key (PK).
The statistical information of the PK is bound to the segment during the
segment loading phase. A new filter has been added to the segment filter
to enable the sparse filtering functionality.
Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
issue: #44429
Fix the issue where dynamic modification of collection replica count
doesn't take effect due to incorrect parameter usage in load config
parsing.
Changes include:
- Replace DatabaseLevelReplicaNumber with CollectionLevelReplicaNumber
- Replace DatabaseLevelResourceGroups with CollectionLevelResourceGroups
- Update test cases to use correct collection-level constants
- Ensure dynamic replica changes are properly applied to collections
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #44212
Implement search/query storage usage statistics in go side(result
reduce), only record storage usage in vector search C++ path. Need to be
implemented in query c++ path in next prs.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
Signed-off-by: marcelo.chen <marcelo.chen@zilliz.com>
Co-authored-by: marcelo.chen <marcelo.chen@zilliz.com>
issue: #44156
Enhance FlushAll functionality to support targeting specific collections
within databases instead of only database-level flushing.
Changes include:
- Add FlushAllTarget message in data_coord.proto for granular targeting
- Support collection-specific flush operations within databases
- Maintain backward compatibility with deprecated db_name field
This enhancement allows users to flush specific collections without
affecting other collections in the same database, providing more precise
control over data persistence operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
<test>: <add test case for complex json expression
On branch feature/json-shredding
Changes to be committed:
modified: milvus_client/expressions/README.md
modified:
milvus_client/expressions/test_milvus_client_scalar_filtering.py
---------
Signed-off-by: Eric Hou <eric.hou@zilliz.com>
Co-authored-by: Eric Hou <eric.hou@zilliz.com>
issue: #43980
This commit optimizes the partial update merge logic by standardizing
nullable field representation before merge operations to avoid corner
cases during the merge process.
Key changes:
- Unify nullable field data format to FULL FORMAT before merge execution
- Add extensive unit tests for bounds checking and edge cases
The optimization ensures:
- Consistent nullable field representation across SDK and internal
- Proper handling of null values during merge operations
- Prevention of index out-of-bounds errors in vector field updates
- Better error handling and validation for partial update scenarios
This resolves issues where different nullable field formats could cause
merge failures or data corruption during partial update operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Bumps [deepdiff](https://github.com/seperman/deepdiff) from 6.7.1 to
8.6.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/seperman/deepdiff/releases">deepdiff's
releases</a>.</em></p>
<blockquote>
<h2>8.5.0</h2>
<ul>
<li>Updating deprecated pydantic calls</li>
<li>Switching to pyproject.toml</li>
<li>Fix for moving nested tables when using iterable_compare_func.
by</li>
<li>Fix recursion depth limit when hashing numpy.datetime64</li>
<li>Moving from legacy setuptools use to pyproject.toml</li>
</ul>
<h2>8.4.1</h2>
<ul>
<li>pytz is not required.</li>
</ul>
<h2>8.4.0</h2>
<ul>
<li>Adding BaseOperatorPlus base class for custom operators</li>
<li>default_timezone can be passed now to set your default timezone to
something other than UTC.</li>
<li>New summarization algorithm that produces valid json</li>
<li>Better type hint support</li>
</ul>
<h2>8.1.1</h2>
<p>Adding Python 3.13 to setup.py</p>
<h2>8.1.0</h2>
<ul>
<li>Removing deprecated lines from setup.py</li>
<li>Added <code>prefix</code> option to <code>pretty()</code></li>
<li>Fixes hashing of numpy boolean values.</li>
<li>Fixes <strong>slots</strong> comparison when the attribute doesn't
exist.</li>
<li>Relaxing orderly-set reqs</li>
<li>Added Python 3.13 support</li>
<li>Only lower if clean_key is instance of str <a
href="https://redirect.github.com/seperman/deepdiff/issues/504">#504</a></li>
<li>Fixes issue where the key deep_distance is not returned when both
compared items are equal <a
href="https://redirect.github.com/seperman/deepdiff/issues/510">#510</a></li>
<li>Fixes exclude_paths fails to work in certain cases</li>
<li>exclude_paths fails to work <a
href="https://redirect.github.com/seperman/deepdiff/issues/509">#509</a></li>
<li>Fixes to_json() method chokes on standard json.dumps() kwargs such
as sort_keys</li>
<li>to_dict() method chokes on standard json.dumps() kwargs <a
href="https://redirect.github.com/seperman/deepdiff/issues/490">#490</a></li>
<li>Fixes accessing the affected_root_keys property on the diff object
returned by DeepDiff fails when one of the dicts is empty</li>
<li>Fixes accessing the affected_root_keys property on the diff object
returned by DeepDiff fails when one of the dicts is empty <a
href="https://redirect.github.com/seperman/deepdiff/issues/508">#508</a></li>
</ul>
<p>8.0.1 - extra import of numpy is removed</p>
<h2>8.0.0</h2>
<p>With the introduction of <code>threshold_to_diff_deeper</code>, the
values returned are different than in previous versions of DeepDiff. You
can still get the older values by setting
<code>threshold_to_diff_deeper=0</code>. However to signify that enough
has changed in this release that the users need to update the parameters
passed to DeepDiff, we will be doing a major version update.</p>
<ul>
<li><input type="checkbox" checked="" disabled="" />
<code>use_enum_value=True</code> makes it so when diffing enum, we use
the enum's value. It makes it so comparing an enum to a string or any
other value is not reported as a type change.</li>
<li><input type="checkbox" checked="" disabled="" />
<code>threshold_to_diff_deeper=float</code> is a number between 0 and 1.
When comparing dictionaries that have a small intersection of keys, we
will report the dictionary as a <code>new_value</code> instead of
reporting individual keys changed. If you set it to zero, you get the
same results as DeepDiff 7.0.1 and earlier, which means this feature is
disabled. The new default is 0.33 which means if less that one third of
keys between dictionaries intersect, report it as a new object.</li>
<li><input type="checkbox" checked="" disabled="" /> Deprecated
<code>ordered-set</code> and switched to <code>orderly-set</code>. The
<code>ordered-set</code> package was not being maintained anymore and
starting Python 3.6, there were better options for sets that ordered. I
forked one of the new implementations, modified it, and published it as
<code>orderly-set</code>.</li>
<li><input type="checkbox" checked="" disabled="" /> Added
<code>use_log_scale:bool</code> and
<code>log_scale_similarity_threshold:float</code>. They can be used to
ignore small changes in numbers by comparing their differences in
logarithmic space. This is different than ignoring the difference based
on significant digits.</li>
<li><input type="checkbox" checked="" disabled="" /> json serialization
of reversed lists.</li>
<li><input type="checkbox" checked="" disabled="" /> Fix for iterable
moved items when <code>iterable_compare_func</code> is used.</li>
<li><input type="checkbox" checked="" disabled="" /> Pandas and Polars
support.</li>
</ul>
<h2>7.0.1</h2>
<ul>
<li><input type="checkbox" checked="" disabled="" /> When verbose=2,
return <code>new_path</code> when the <code>path</code> and
<code>new_path</code> are different (for example when ignore_order=True
and the index of items have changed).</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/seperman/deepdiff/commits">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/milvus-io/milvus/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
issue: https://github.com/milvus-io/milvus/issues/42148
Optimized from
Go VectorArray → VectorArray Proto → Binary → C++ VectorArray Proto →
C++ VectorArray local impl → Memory
to
Go VectorArray → Arrow ListArray → Memory
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
issue: #42942
This pr includes the following changes:
1. Added checks for index checker in querycoord to generate drop index
tasks
2. Added drop index interface to querynode
3. To avoid search failure after dropping the index, the querynode
allows the use of lazy mode (warmup=disable) to load raw data even when
indexes contain raw data.
4. In segcore, loading the index no longer deletes raw data; instead, it
evicts it.
5. In expr, the index is pinned to prevent concurrent errors.
---------
Signed-off-by: sunby <sunbingyi1992@gmail.com>