Skip to content

Commit 2c52f2d

Browse files
Merge pull request #70 from Build5Nines/dev
v2.1.0
2 parents 4177d5a + 2d03826 commit 2c52f2d

20 files changed

+839
-172
lines changed

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,23 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## v2.1.0
9+
10+
Add:
11+
12+
- Added `VectorTextResultItem.Id` property so it's easy to get the database ID for search results if necessary.
13+
- `IVectorDatabase` now inherits from `IEnumerable` so you can easily look through the texts documents that have been added to the database.
14+
15+
Fixed:
16+
17+
- Fixed text tokenization to correctly remove special characters
18+
- Update `BasicTextPreprocessor` to support Emoji characters too
19+
- Refactorings for more Clean Code
20+
21+
Breaking Changes:
22+
23+
- The `.Search` and `.SearchAsync` methods now return a `IVectorTextResultItem<TId, TDocument, TMetadata>` instead of `VectorTextResultItem<TDocument, TMetadata>`. If you're using things like the documentation shows, then you wont see any changes or have any issues with this update.
24+
825
## v2.0.4 (2025-04-16)
926

1027
Add:
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
title: Data Management
3+
4+
---
5+
# Data Management
6+
7+
Since `Build5Nines.SharpVector` is a database, it also has data management methods available. These methods enable you to add, remove, and update the text documents that are vectorized and indexed within the semantic database.
8+
9+
## Get Text Item ID
10+
11+
Every text item within a `Build5Nines.SharpVector` database is assigned a unique identifier (ID). There are a few ways to get access to the ID of the text items.
12+
13+
=== ".AddText()"
14+
15+
When adding an individual text item to the vector database, the ID value will be returned:
16+
17+
```csharp
18+
var id = vdb.AddText(txt, metadata);
19+
20+
var id = await vdb.AddTextAsync(txt, metadata);
21+
```
22+
23+
=== ".Search()"
24+
25+
When you perform a semantic search, the search results will contain the list of texts; each have an ID property.
26+
27+
```csharp
28+
var results = vdb.Search("query text");
29+
30+
foreach(var text in results.Texts) {
31+
var id = text.Id;
32+
var text = text.Text;
33+
var metadata = text.Metadata;
34+
// do something here
35+
}
36+
```
37+
38+
=== "Enumerator"
39+
40+
The `IVectorDatabase` classes implement `IEnumerable` so you can easily loop through all the text items that have been added to the database.
41+
42+
```csharp
43+
foreach(var item in vdb) {
44+
var id = item.Id;
45+
var text = item.Text;
46+
var metadata = item.Metadata;
47+
var vector = item.Vector;
48+
49+
// do something here
50+
}
51+
```
52+
53+
## Get
54+
55+
If you know the `id` of a Text item in the database, you can retrieve it directly.
56+
57+
### Get By Id
58+
59+
The `.GetText` method can be used to retrieve a text item from the vector database directly.
60+
61+
```csharp
62+
vdb.GetText(id);
63+
```
64+
65+
## Update
66+
67+
Once text items have been added to the database "Update" methods can be used to modify them.
68+
69+
### Update Text
70+
71+
The `.UpdateText` method can be used to update the `Text` value, and associated vectors will be updated.
72+
73+
```csharp
74+
vdb.UpdateText(id, newTxt);
75+
```
76+
77+
When the `Text` is updated, new vector embeddings are generated for the new text.
78+
79+
### Update Metadata
80+
81+
The `.UpdateTextMetadata` method can be used to update the `Metadata` for a given text item by `Id`.
82+
83+
```csharp
84+
vdb.UpdateTextMetadata(id, newTxt);
85+
```
86+
87+
When `Metadata` is updated, the vector embeddings are not updated.
88+
89+
### Update Text and Metadata
90+
91+
The `.UpdateTextAndMetadata` method can be used to update the `Text` and `Metadata` for a text item in the database for the given text item `Id`.
92+
93+
```csharp
94+
vdb.UpdateTextAndMetadata(id, newTxt, newMetadata);
95+
```
96+
97+
## Delete
98+
99+
The vector database supports the ability to delete text items.
100+
101+
### Delete Text
102+
103+
The `.DeleteText` method can be used to delete a text item form the database for the given `Id'.
104+
105+
```csharp
106+
vdb.DeleteText(id);
107+
```

docs/mkdocs.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,12 @@ nav:
137137
- Prerequisites: get-started/#prerequisites
138138
- Install Nuget Package: get-started/#install-nuget-package
139139
- Basic Example: get-started/#basic-example
140+
- Data Management:
141+
- get-started/data-management/index.md
142+
- Get Text Item Id: get-started/data-management/#get-text-item-id
143+
- Get Item By Id: get-started/data-management/#get
144+
- Update Item: get-started/data-management/#update
145+
- Delete Item: get-started/data-management/#delete
140146
- Concepts:
141147
- concepts/index.md
142148
- What is a Vector Database?: concepts/#what-is-a-vector-database

src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<PackageId>Build5Nines.SharpVector</PackageId>
1010
<PackageProjectUrl>https://sharpvector.build5nines.com</PackageProjectUrl>
1111
<RepositoryUrl>https://github.com/Build5Nines/SharpVector</RepositoryUrl>
12-
<Version>2.0.4</Version>
12+
<Version>2.1.0</Version>
1313
<Description>Lightweight In-memory Vector Database to embed in any .NET Applications</Description>
1414
<Copyright>Copyright (c) 2025 Build5Nines LLC</Copyright>
1515
<PackageReadmeFile>README.md</PackageReadmeFile>

src/Build5Nines.SharpVector/DatabaseFile.cs

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ namespace Build5Nines.SharpVector;
1111
/// </summary>
1212
public static class DatabaseFile
1313
{
14-
private const string databaseInfoFilename = "database.json";
15-
private const string vectorStoreFilename = "vectorstore.json";
16-
private const string vocabularyStoreFilename = "vocabularystore.json";
14+
internal const string databaseInfoFilename = "database.json";
15+
internal const string vectorStoreFilename = "vectorstore.json";
16+
internal const string vocabularyStoreFilename = "vocabularystore.json";
1717

1818
/// <summary>
1919
/// Load the vector database from a stream
@@ -210,6 +210,41 @@ public static async Task LoadVocabularyStoreAsync<TVocabularyKey, TVocabularyVal
210210
}
211211
}
212212

213+
public static async Task SaveDatabaseToZipArchiveAsync(
214+
Stream stream,
215+
DatabaseInfo databaseInfo,
216+
Func<ZipArchive, Task> saveVectorStore
217+
)
218+
{
219+
if (stream == null)
220+
{
221+
throw new ArgumentNullException(nameof(stream));
222+
}
223+
224+
using (var archive = new ZipArchive(stream, ZipArchiveMode.Create, true))
225+
{
226+
var entryDatabaseType = archive.CreateEntry(databaseInfoFilename);
227+
using(var entryStream = entryDatabaseType.Open())
228+
{
229+
// Save the database info
230+
var databaseInfoJson = JsonSerializer.Serialize(databaseInfo);
231+
if (databaseInfoJson != null)
232+
{
233+
var databaseTypeBytes = System.Text.Encoding.UTF8.GetBytes(databaseInfoJson);
234+
await entryStream.WriteAsync(databaseTypeBytes);
235+
await entryStream.FlushAsync();
236+
}
237+
else
238+
{
239+
throw new InvalidOperationException("Type name cannot be null.");
240+
}
241+
}
242+
243+
await saveVectorStore(archive);
244+
}
245+
}
246+
247+
213248
public static async Task<DatabaseInfo> LoadDatabaseFromZipArchiveAsync(
214249
Stream stream, string? dbClassType,
215250
Func<ZipArchive, Task> loadVectorStore

src/Build5Nines.SharpVector/IVectorDatabase.cs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ namespace Build5Nines.SharpVector;
99
/// <typeparam name="TMetadata"></typeparam>
1010
/// <typeparam name="TDocument"></typeparam>
1111
public interface IVectorDatabase<TId, TMetadata, TDocument>
12+
: IEnumerable<IVectorTextDatabaseItem<TId, TDocument, TMetadata>>
1213
where TId : notnull
1314
{
1415
/// <summary>
@@ -80,7 +81,7 @@ public interface IVectorDatabase<TId, TMetadata, TDocument>
8081
/// <param name="pageIndex">The page index of the search results. Default is 0.</param>
8182
/// <param name="pageCount">The number of search results per page. Default is Null and returns all results.</param>
8283
/// <returns></returns>
83-
IVectorTextResult<TDocument, TMetadata> Search(TDocument queryText, float? threshold = null, int pageIndex = 0, int? pageCount = null);
84+
IVectorTextResult<TId, TDocument, TMetadata> Search(TDocument queryText, float? threshold = null, int pageIndex = 0, int? pageCount = null);
8485

8586
/// <summary>
8687
/// Performs an asynchronous search vector search to find the top N most similar texts to the given text
@@ -90,7 +91,7 @@ public interface IVectorDatabase<TId, TMetadata, TDocument>
9091
/// <param name="pageIndex">The page index of the search results. Default is 0.</param>
9192
/// <param name="pageCount">The number of search results per page. Default is Null and returns all results.</param>
9293
/// <returns></returns>
93-
Task<IVectorTextResult<TDocument, TMetadata>> SearchAsync(TDocument queryText, float? threshold = null, int pageIndex = 0, int? pageCount = null);
94+
Task<IVectorTextResult<TId, TDocument, TMetadata>> SearchAsync(TDocument queryText, float? threshold = null, int pageIndex = 0, int? pageCount = null);
9495

9596

9697
[Obsolete("Use SerializeToBinaryStreamAsync Instead")]

0 commit comments

Comments
 (0)