Skip to main content

Vectors & Metadata

What is a Vector?

A vector is an array of floating-point numbers that represents the semantic meaning of a piece of content. Two pieces of content with similar meaning will have vectors that are close together in the vector space.

# The sentence "I love machine learning" might produce:
vector = [0.123, -0.456, 0.789, ..., 0.321] # 384 floats

You generate vectors using an embedding model. VectorDB stores and searches them — it does not generate embeddings.

Upserting Vectors

Every vector needs an external_id — a string that uniquely identifies it within the collection.

client.vectors.upsert(
collection="my-collection",
external_id="article-42",
vector=[0.1, 0.2, ..., 0.9],
metadata={"title": "How to train a model", "author": "Alice"},
)

If the external_id already exists, the vector and metadata are updated (upsert semantics). The response indicates "inserted" or "updated".

Bulk Upsert

For high-throughput ingestion, use bulk upsert. It batches database writes and index insertions.

items = [
{"external_id": "doc-1", "vector": [...], "metadata": {"tag": "ml"}},
{"external_id": "doc-2", "vector": [...], "metadata": {"tag": "nlp"}},
{"external_id": "doc-3", "vector": [...]},
]
result = client.vectors.bulk_upsert("my-collection", items)
print(f"{len(result.inserted)} inserted, {len(result.updated)} updated")
note

The default max batch size is 1000 items. Configure MAX_BATCH_SIZE to change it.

Metadata

Metadata is a JSON object attached to a vector. Use it to store any structured data alongside your embeddings.

metadata = {
"title": "Getting started with embeddings",
"author": "Bob",
"published_at": "2024-01-15",
"tags": ["tutorial", "embeddings"],
"view_count": 1523,
}

Metadata filtering in search lets you narrow results to vectors that match specific metadata values:

results = client.search.search(
"articles",
vector=query_vector,
k=10,
filters={"author": "Bob"},
)
note

Metadata is stored as JSON. Keys and values must be JSON-serializable. The default max metadata size is 10KB per vector.

Deleting Vectors

# Delete one
client.vectors.delete("my-collection", "article-42")

# Delete many
client.vectors.delete_batch("my-collection", ["doc-1", "doc-2", "doc-3"])

Generating Embeddings

VectorDB is embedding-model agnostic. Here are common ways to generate vectors:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2") # dim=384
vector = model.encode("Hello world").tolist()