Vectors & Metadata
What is a Vector?
A vector is an array of floating-point numbers that represents the semantic meaning of a piece of content. Two pieces of content with similar meaning will have vectors that are close together in the vector space.
# The sentence "I love machine learning" might produce:
vector = [0.123, -0.456, 0.789, ..., 0.321] # 384 floats
You generate vectors using an embedding model. VectorDB stores and searches them — it does not generate embeddings.
Upserting Vectors
Every vector needs an external_id — a string that uniquely identifies it within the collection.
client.vectors.upsert(
collection="my-collection",
external_id="article-42",
vector=[0.1, 0.2, ..., 0.9],
metadata={"title": "How to train a model", "author": "Alice"},
)
If the external_id already exists, the vector and metadata are updated (upsert semantics). The response indicates "inserted" or "updated".
Bulk Upsert
For high-throughput ingestion, use bulk upsert. It batches database writes and index insertions.
items = [
{"external_id": "doc-1", "vector": [...], "metadata": {"tag": "ml"}},
{"external_id": "doc-2", "vector": [...], "metadata": {"tag": "nlp"}},
{"external_id": "doc-3", "vector": [...]},
]
result = client.vectors.bulk_upsert("my-collection", items)
print(f"{len(result.inserted)} inserted, {len(result.updated)} updated")
The default max batch size is 1000 items. Configure MAX_BATCH_SIZE to change it.
Metadata
Metadata is a JSON object attached to a vector. Use it to store any structured data alongside your embeddings.
metadata = {
"title": "Getting started with embeddings",
"author": "Bob",
"published_at": "2024-01-15",
"tags": ["tutorial", "embeddings"],
"view_count": 1523,
}
Metadata filtering in search lets you narrow results to vectors that match specific metadata values:
results = client.search.search(
"articles",
vector=query_vector,
k=10,
filters={"author": "Bob"},
)
Metadata is stored as JSON. Keys and values must be JSON-serializable. The default max metadata size is 10KB per vector.
Deleting Vectors
# Delete one
client.vectors.delete("my-collection", "article-42")
# Delete many
client.vectors.delete_batch("my-collection", ["doc-1", "doc-2", "doc-3"])
Generating Embeddings
VectorDB is embedding-model agnostic. Here are common ways to generate vectors:
- sentence-transformers (local)
- OpenAI
- Ollama (local)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # dim=384
vector = model.encode("Hello world").tolist()
from openai import OpenAI
openai = OpenAI()
response = openai.embeddings.create(
model="text-embedding-3-small",
input="Hello world",
)
vector = response.data[0].embedding # dim=1536
import ollama
response = ollama.embeddings(model="nomic-embed-text", prompt="Hello world")
vector = response["embedding"] # dim=768