Embedding Visualizer

Visualize semantic similarity between texts using dimensionality reduction

Frequently Asked Questions

What are embeddings?

Embeddings are numerical representations of text that capture semantic meaning. Similar texts produce similar embeddings (close in vector space).

What is UMAP?

UMAP (Uniform Manifold Approximation and Projection) reduces high-dimensional embeddings to 2D/3D for visualization while preserving meaningful relationships.

Which embedding models will be supported?

We plan to support text-embedding-3-small/large (OpenAI), embedding-001 (Google), and embed-english-v3.0 (Cohere).

Is this tool free?

The visualization engine runs in your browser (WebWorker). API-generated embeddings may require a free-tier API key from your preferred provider.

Can I paste pre-computed embeddings?

Yes, we'll support CSV/JSON import of pre-computed embedding vectors for visualization without requiring API calls.

What are common use cases?

Semantic clustering, document similarity analysis, RAG retrieval debugging, and understanding how AI models group related concepts.

Does embedding dimension matter?

Higher dimensions capture more nuance but cost more. text-embedding-3-small (1536D) is sufficient for most tasks. text-embedding-3-large (3072D) for precision work.

When will this launch?

The embedding visualizer is in development and will launch with WebWorker-powered UMAP computation for privacy-first visualization.

How many texts can I visualize?

UMAP in browser can handle up to ~1,000 points smoothly. For larger datasets, we recommend using Python tools like scikit-learn.

What about cosine similarity?

We'll display pairwise cosine similarity scores alongside the visual plot. Cosine similarity is the standard metric for comparing embeddings.