Vectors: Realtime Ad Personalization
Anton A. Nesterov | an (at) vski.sh |
Version | 1.0 |
This is a simple example demonstrating a common use case of vector databases.
In this example we'll use sqlite with sqlite-vec extension.
Our goal is to recommend the most relevant ad to a user based on their real-time context. We'll combine four key pieces of information to form a user's profile:
- User's Recent Search Term: A string representing their current interest.
- User's Location: A broad category like 'Urban', 'Suburban', or 'Rural'.
- Time of Day: 'Morning', 'Afternoon', or 'Evening'.
- User's Persona: A simple category like 'Budget_Shopper' or 'Luxury_Seeker'.
Setup a Database with Vector Extension
%pip install sqlite-vec
In reality, the model would be a bit more complex, but for this example it is enough to generate syntetic data:
import numpy as np
import pandas as pd
import struct
import sqlite3
import sqlite_vec
from sklearn.preprocessing import OneHotEncoder
def serialize_f32(vector):
"""serializes a list of floats into a compact "raw bytes" format"""
return struct.pack("%sf" % len(vector), *vector)
# --- 1. Create a synthetic ad dataset ---
ad_data = {
'ad_id': np.arange(100),
'title': [f'Ad for Product {i}' for i in range(100)],
'category': np.random.choice(['Electronics', 'Fashion', 'HomeGoods', 'Automotive', 'Travel'], 100),
'price_range': np.random.choice(['Budget', 'Mid-Range', 'Luxury'], 100),
'target_audience': np.random.choice(['Urban', 'Suburban', 'Rural'], 100),
'time_of_day': np.random.choice(['Morning', 'Afternoon', 'Evening'], 100)
}
ads_df = pd.DataFrame(ad_data)
# --- 2. Vectorize the ad features ---
features_to_vectorize = ['category', 'price_range', 'target_audience', 'time_of_day']
ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
ad_vectors = ohe.fit_transform(ads_df[features_to_vectorize])
# --- 3. Create and populate a SQLite database with the vector extension ---
conn = sqlite3.connect('vski_ads_vec.db')
conn.enable_load_extension(True)
sqlite_vec.load(conn) # Load the vector extension
conn.enable_load_extension(False)
cursor = conn.cursor()
sqlite_version, vec_version = conn.execute(
"select sqlite_version(), vec_version()"
).fetchone()
print(f"sqlite_version={sqlite_version}, vec_version={vec_version}")
# Create table for original ad data
cursor.execute('''
CREATE TABLE IF NOT EXISTS ads (
ad_id INTEGER PRIMARY KEY,
title TEXT,
category TEXT,
price_range TEXT,
target_audience TEXT,
time_of_day TEXT
)
''')
try:
# Populate ads table
ads_df.to_sql('ads', conn, if_exists='replace', index=False)
# Create a virtual table for vectors, specifying the dimension of our vectors
print("Dimmensions:", ad_vectors.shape)
cursor.execute(f"CREATE VIRTUAL TABLE ad_vectors USING vec0(vector float[{ad_vectors.shape[1]}])")
# Insert vectors into the virtual table
for i, vector in enumerate(ad_vectors):
cursor.execute("INSERT INTO ad_vectors(rowid, vector) VALUES (?, ?)", (i, serialize_f32(vector)))
conn.commit()
conn.close()
except Exception as e:
conn.rollback()
conn.close()
raise e
print("Database 'vski_ads_vec.db' created and populated with ad data and their vectorized representations.")
sqlite_version=3.37.2, vec_version=v0.1.6 Dimmensions: (100, 14) Database 'vski_ads_vec.db' created and populated with ad data and their vectorized representations.
Vectorize the User's Profile
For each ad request, we'll collect real-time user data and convert it into a vector using the same One-Hot Encoder we used for the ads. This ensures the user vector can be directly compared to the ad vectors.
# Function to get a user vector based on their real-time context
def get_user_vector(recent_search, location, time_of_day, persona, one_hot_encoder, feature_names):
# Combine user data into a DataFrame
user_data = pd.DataFrame({
'category': [recent_search],
'price_range': [persona],
'target_audience': [location],
'time_of_day': [time_of_day]
})
# Use the same encoder to transform user data
user_vector = one_hot_encoder.transform(user_data)
return user_vector
# Example user profile
user_profile = {
'recent_search': 'Electronics',
'location': 'Urban',
'time_of_day': 'Evening',
'persona': 'Mid-Range'
}
# Use the pre-trained encoder from Step 1
features_to_vectorize = ['category', 'price_range', 'target_audience', 'time_of_day']
ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
ohe.fit(ads_df[features_to_vectorize])
# Get the user's vector
user_vector = get_user_vector(
user_profile['recent_search'],
user_profile['location'],
user_profile['time_of_day'],
user_profile['persona'],
ohe,
features_to_vectorize
)
print(f"User's vector:\n{user_vector}")
User's vector: [[0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0.]]
Find the Most Relevant Ad
With the user's vector ready, we'll query our database for most similar ad vector, then match it with actual Ad:
# Connect to the database
conn = sqlite3.connect('vski_ads_vec.db')
conn.enable_load_extension(True)
sqlite_vec.load(conn)
cursor = conn.cursor()
# Get the recommended ad ID and score using vss_search
# The query finds the vector in ad_vectors most similar to the user_vector
query = user_vector[0]
cursor.execute(f"SELECT rowid, distance FROM ad_vectors WHERE vector MATCH ? and k=3 ORDER BY distance LIMIT 3", [serialize_f32(query)],)
result = cursor.fetchone()
recommended_ad_id = result[0]
most_similar_score = result[1]
# Retrieve the full details of the most similar ad
cursor.execute("SELECT * FROM ads WHERE ad_id = ?", [recommended_ad_id,])
recommended_ad = cursor.fetchone()
conn.close()
print(f"Recommended Ad ID: {recommended_ad[0]}")
print(f"Recommendation Score (Cosine Similarity): {1 - most_similar_score:.4f}") # vss_search returns cosine distance, so 1 - distance = similarity
print(f"Recommended Ad Details:")
print(f" Title: {recommended_ad[1]}")
print(f" Category: {recommended_ad[2]}")
print(f" Price Range: {recommended_ad[3]}")
print(f" Target Audience: {recommended_ad[4]}")
print(f" Time of Day: {recommended_ad[5]}")
Recommended Ad ID: 47 Recommendation Score (Cosine Similarity): 1.0000 Recommended Ad Details: Title: Ad for Product 47 Category: Electronics Price Range: Mid-Range Target Audience: Urban Time of Day: Evening
This example, yet oversimplified, demonstrates a complete, end-to-end personalization system. By converting both ads and user profiles into a common vectorized format, we can use a simple, powerful similarity metric to make highly relevant recommendations. This approach is highly scalable and forms the foundation for many modern recommendation and personalization engines.