Vectors: Realtime Ad Personalization


Anton A. Nesterov	an (at) vski.sh
Version	1.0

This is a simple example demonstrating a common use case of vector databases.

In this example we'll use sqlite with sqlite-vec extension.

Our goal is to recommend the most relevant ad to a user based on their real-time context. We'll combine four key pieces of information to form a user's profile:

User's Recent Search Term: A string representing their current interest.
User's Location: A broad category like 'Urban', 'Suburban', or 'Rural'.
Time of Day: 'Morning', 'Afternoon', or 'Evening'.
User's Persona: A simple category like 'Budget_Shopper' or 'Luxury_Seeker'.

Setup a Database with Vector Extension

%pip install sqlite-vec

In reality, the model would be a bit more complex, but for this example it is enough to generate syntetic data:

import numpy as np
import pandas as pd
 
import struct
import sqlite3
import sqlite_vec
 
from sklearn.preprocessing import OneHotEncoder
 
def serialize_f32(vector):
    """serializes a list of floats into a compact "raw bytes" format"""
    return struct.pack("%sf" % len(vector), *vector)
 
# --- 1. Create a synthetic ad dataset ---
ad_data = {
    'ad_id': np.arange(100),
    'title': [f'Ad for Product {i}' for i in range(100)],
    'category': np.random.choice(['Electronics', 'Fashion', 'HomeGoods', 'Automotive', 'Travel'], 100),
    'price_range': np.random.choice(['Budget', 'Mid-Range', 'Luxury'], 100),
    'target_audience': np.random.choice(['Urban', 'Suburban', 'Rural'], 100),
    'time_of_day': np.random.choice(['Morning', 'Afternoon', 'Evening'], 100)
}
ads_df = pd.DataFrame(ad_data)
 
# --- 2. Vectorize the ad features ---
features_to_vectorize = ['category', 'price_range', 'target_audience', 'time_of_day']
ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
ad_vectors = ohe.fit_transform(ads_df[features_to_vectorize])
 
# --- 3. Create and populate a SQLite database with the vector extension ---
conn = sqlite3.connect('vski_ads_vec.db')
conn.enable_load_extension(True)
sqlite_vec.load(conn) # Load the vector extension
conn.enable_load_extension(False)
 
 
cursor = conn.cursor()
 
sqlite_version, vec_version = conn.execute(
    "select sqlite_version(), vec_version()"
).fetchone()
print(f"sqlite_version={sqlite_version}, vec_version={vec_version}")
 
# Create table for original ad data
cursor.execute('''
    CREATE TABLE IF NOT EXISTS ads (
        ad_id INTEGER PRIMARY KEY,
        title TEXT,
        category TEXT,
        price_range TEXT,
        target_audience TEXT,
        time_of_day TEXT
    )
''')
 
try:
  # Populate ads table
  ads_df.to_sql('ads', conn, if_exists='replace', index=False)
 
  # Create a virtual table for vectors, specifying the dimension of our vectors
  print("Dimmensions:", ad_vectors.shape)
  cursor.execute(f"CREATE VIRTUAL TABLE ad_vectors USING vec0(vector float[{ad_vectors.shape[1]}])")
 
  # Insert vectors into the virtual table
  for i, vector in enumerate(ad_vectors):
      cursor.execute("INSERT INTO ad_vectors(rowid, vector) VALUES (?, ?)", (i, serialize_f32(vector)))
 
  conn.commit()
  conn.close()
except Exception as e:
   conn.rollback()
   conn.close()
   raise e
 
print("Database 'vski_ads_vec.db' created and populated with ad data and their vectorized representations.")

sqlite_version=3.37.2, vec_version=v0.1.6 Dimmensions: (100, 14) Database 'vski_ads_vec.db' created and populated with ad data and their vectorized representations.

Vectorize the User's Profile

For each ad request, we'll collect real-time user data and convert it into a vector using the same One-Hot Encoder we used for the ads. This ensures the user vector can be directly compared to the ad vectors.

# Function to get a user vector based on their real-time context
def get_user_vector(recent_search, location, time_of_day, persona, one_hot_encoder, feature_names):
    # Combine user data into a DataFrame
    user_data = pd.DataFrame({
        'category': [recent_search],
        'price_range': [persona],
        'target_audience': [location],
        'time_of_day': [time_of_day]
    })
    
    # Use the same encoder to transform user data
    user_vector = one_hot_encoder.transform(user_data)
    return user_vector
 
# Example user profile
user_profile = {
    'recent_search': 'Electronics',
    'location': 'Urban',
    'time_of_day': 'Evening',
    'persona': 'Mid-Range'
}
 
# Use the pre-trained encoder from Step 1
features_to_vectorize = ['category', 'price_range', 'target_audience', 'time_of_day']
ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
ohe.fit(ads_df[features_to_vectorize])
 
# Get the user's vector
user_vector = get_user_vector(
    user_profile['recent_search'],
    user_profile['location'],
    user_profile['time_of_day'],
    user_profile['persona'],
    ohe,
    features_to_vectorize
)
print(f"User's vector:\n{user_vector}")

User's vector: [[0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0.]]

Find the Most Relevant Ad

With the user's vector ready, we'll query our database for most similar ad vector, then match it with actual Ad:

 
# Connect to the database
conn = sqlite3.connect('vski_ads_vec.db')
conn.enable_load_extension(True)
sqlite_vec.load(conn)
 
cursor = conn.cursor()
 
# Get the recommended ad ID and score using vss_search
# The query finds the vector in ad_vectors most similar to the user_vector
query = user_vector[0]
cursor.execute(f"SELECT rowid, distance FROM ad_vectors WHERE vector MATCH ? and k=3 ORDER BY distance LIMIT 3", [serialize_f32(query)],)
result = cursor.fetchone()
 
recommended_ad_id = result[0]
most_similar_score = result[1]
 
# Retrieve the full details of the most similar ad
cursor.execute("SELECT * FROM ads WHERE ad_id = ?", [recommended_ad_id,])
recommended_ad = cursor.fetchone()
conn.close()
 
print(f"Recommended Ad ID: {recommended_ad[0]}")
print(f"Recommendation Score (Cosine Similarity): {1 - most_similar_score:.4f}") # vss_search returns cosine distance, so 1 - distance = similarity
print(f"Recommended Ad Details:")
print(f"  Title: {recommended_ad[1]}")
print(f"  Category: {recommended_ad[2]}")
print(f"  Price Range: {recommended_ad[3]}")
print(f"  Target Audience: {recommended_ad[4]}")
print(f"  Time of Day: {recommended_ad[5]}")

Recommended Ad ID: 47 Recommendation Score (Cosine Similarity): 1.0000 Recommended Ad Details: Title: Ad for Product 47 Category: Electronics Price Range: Mid-Range Target Audience: Urban Time of Day: Evening

This example, yet oversimplified, demonstrates a complete, end-to-end personalization system. By converting both ads and user profiles into a common vectorized format, we can use a simple, powerful similarity metric to make highly relevant recommendations. This approach is highly scalable and forms the foundation for many modern recommendation and personalization engines.