Phase 2 · 3

AI roadmap

A three-stage path from rules-based ranking to an embedding-based resonance engine — the 'Spotify Discover Weekly for energy-based travel.'

Stage 0 · MVP rules (week 14)

Hand-tuned weighted ranking for each of the 5 discovery rails.
Personalization limited to geo (nearby) and explicit category filters.
Goal: collect interaction data — likes, saves, visits, ratings — to seed ML.

Stage 1 · Collaborative filtering (month 5)

Model

Implicit-feedback matrix factorization (ALS) on the user × marker interaction matrix. Hosted as a daily batch job; results cached in Postgres.

Pipeline (nightly):
  1. Export interactions → S3 parquet
  2. Train ALS (implicit lib, k=64 factors) on Modal/Replicate
  3. Compute top-100 candidates per active user
  4. Write to user_recommendations table
  5. Serve at /functions/v1/ai-recommendations

Stage 2 · Embeddings & resonance vectors (month 9)

Marker embeddings

Text (title + description + comments) → sentence-transformers (all-MiniLM-L6-v2) → 768-dim vector. Stored in pgvector. Categorical fields concatenated as soft prompts.

User resonance vector

Weighted mean of embeddings for markers the user rated ≥ 8, with recency decay and category co-occurrence boost. Updated incrementally per interaction.

Retrieval

pgvector HNSW index, cosine similarity. Hybrid score = 0.6·semantic + 0.3·CF + 0.1·geo.

Explainability

Each recommendation carries 2–3 reason chips ("Similar to Sedona Vortex you loved", "High creativity score · matches your pattern"). Builds trust.

Stage 3 · LLM Energy Guide (year 2)

Architecture

User query  ──►  Intent classifier (sm. model)
                       │
              ┌────────┴─────────┐
              ▼                  ▼
   Conversational LLM    Tool calls:
   (GPT-4o or Claude)     - search_markers(category, radius, score)
              │           - get_user_journal_themes()
              │           - get_seasonal_recommendations(date)
              ▼
   Streaming response in chat + carousel of marker cards

RAG over user's own journal entries (with explicit consent).
Memory store of declared preferences — 'I love coastal places' carries across sessions.
Premium-only; rate-limited to 30 queries/day.
Safety: no medical / spiritual-prescriptive language; refusals on harmful requests.

Data & ethics

All embeddings trained on opt-in data only; clear toggle in Settings → Privacy.
Journal entries never used for model training without explicit per-entry consent.
Bias audits quarterly — surface diversity of recommendations across geography, creator demographics.
Open research API ships with strict differential privacy guarantees.

Cost envelope

Stage 1 CF: ~$40/mo (Modal batch). Stage 2 embeddings + pgvector: ~$120/mo at 100k markers. Stage 3 LLM: variable — capped at ~$0.04 per premium-user/day → ~$1.20 per premium-user/month, recoverable inside the $9.99 subscription.