Introducing Aspects: A New Vector Search Paradigm

TL;DR

Vector search made it possible to retrieve documents by meaning rather than keywords, but most systems still embed only unstructured text and treat metadata as external filters. Historically, metadata was introduced to compensate for the limits of keyword search, yet it never truly became part of relevance itself. With vector search, the same happened again. Aspects search bridges this gap by transforming meaningful document properties into vectors and combining them with semantic embeddings into a single representation. The result is a unified, weighted, multi-dimensional search that runs in one query, expresses relevance more naturally, and scales efficiently.

The evolution of search

As digital information exploded in volume and diversity, search systems had to evolve quickly. Early information retrieval relied on lexical techniques: inverted indexes, keyword matching, and boolean logic. These approaches worked well for a long time, but struggled with ambiguity, synonymy, context, and human error. In an attempt to fix these problems, reliance on metadata started to grow.

With the adoption of content management systems, structured fields like content type, author, department, creation date, category, and sensitivity labels became common. This metadata allowed systems to constrain search space and restore precision where keywords failed, and the reliance on metadata for accurate search started to grow. For a long time, this worked well enough: keyword search for recall, metadata filters for control.

However, this approach still treats metadata as adjacent to search, not as a core part of it. As datasets expanded and started to include a wider variety of content types, such as documents, images, audio, and semi-structured content, limitations of this approach became clear. Metadata helped but required careful curation, strict schemas, and constant governance. It improved precision but struggled with properly encapsulating meaning. Then vector search arrived.

Meaning as geometry: The rise of vector databases

Vector search changed the game by allowing systems to retrieve documents based on semantic similarity rather than exact matches. Documents, queries, and other entities are encoded as numerical vectors in high-dimensional space, where proximity reflects meaning, rather than a high overlap in textual content.

With advances in large language models, embeddings became good enough to serve as a foundation for semantic search, recommendations, question answering, and retrieval-augmented generation (RAG). Suddenly, search systems could answer questions like “find documents similar to this idea” rather than “find documents containing these words.” This was a major leap forward, but it also quietly repeated an old pattern.

Despite their power, vector embeddings typically can only work with unstructured content, usually text. Structured metadata such as timestamps, file types, ownership, sensitivity labels, source systems are once again pushed aside, causing problems with ambiguity and context.

Fixing (vector) search with metadata (again)

A somewhat common but naive approach to incorporating metadata in vector search is appending it as plain text to the unstructured content before embedding. However, this method lacks control over the importance of each structured property and often results in “muddy vectors,” where the metadata is drowned out by the unstructured content itself.

A more refined approach that is often used, is applying metadata based filters on the index, either before (pre-filtering) or after (post-filtering) performing the actual vector search.

A comparison flowchart of Pre Filter and Post Filter strategies showing how documents are narrowed down by metadata first then text embedding, versus text embedding first then metadata.

With pre-filtering, structured constraints are applied to the index prior to running the vector search, excluding vectors that do not match the filters. While straightforward, this acts as a hard gate and can eliminate semantically relevant results that fall just outside the filter criteria. In addition, pre-filtering prevents the use of common vector search optimizations such as Hierarchical Navigable Small World (HNSW) graphs, which require access to the full index. As a result, searches often degrade to brute-force comparisons, which is only feasible when the filters are restrictive enough to keep the search space small. With broader or more general filters, large portions of the index must still be searched, causing performance to degrade to the point where real-time use becomes infeasible.

Post-filtering takes the opposite approach: the vector search is performed over the full index, and structured filters are applied only to the retrieved results. This ensures that all items are considered semantically and allows the use of optimizations like HNSW, since the search space remains fixed and optimizable. However, because filtering happens only after retrieval, the final result count becomes non-deterministic which means potentially significantly less documents are returned than initially requested. To compensate, systems typically over-fetch results during the vector search to ensure enough items remain after filtering. This overshooting is inefficient, can significantly impact performance, and still risks missing important results, especially when filters are highly restrictive or the queried topic is semantically dominant across the index.

With both the pre- and post-filtering techniques, metadata once again serves as a corrective mechanism layered on top of the search, rather than being part of it. This architectural separation introduces a set of familiar problems:

Metadata acts as a hard constraint, not a graded signal
Important context (time, type, classification) cannot smoothly influence relevance
Multiple query stages increase latency and operational complexity

We have effectively fallen into the same trap as before: metadata is essential, yet it remains external to the search itself. Instead of becoming part of the solution, metadata is still being treated as a fix. As a result, existing solutions are not always ideal, introducing inefficiencies that make it hard to scale and are not resilient to imperfect metadata or human error. To mitigate these problems, we have developed a new vector search paradigm called Aspects search.

Historical note: labeled search data, Kaggle, and the push to make search measurable
The idea of turning search relevance into a supervised learning problem (where query & result pairs are labelled, and models are trained to predict relevance) has a real history in industry. Several webshops and platforms put entire datasets of search queries and manually scored (query,document) pairs into public competitions to crowdsource better models. A prominent example is the Home Depot Product Search Relevance competition hosted on Kaggle, where participants were given thousands of real queries and product pairs labeled for relevance to build models that predict which product matches a user query best.
These competitions had astronomical prize-pools ($40,000 in Home Depot’s case), which indicates the seriousness (and complexity) of the problem.

Aspects: unifying meaning and structure

The core idea behind Aspects search is straightforward: rather than forcing a choice between semantic embeddings and structured metadata, all meaningful properties of a document are treated as first-class components of the search vector. This concept emerged from our work at Xillio, a data migration company with more than two decades of experience handling large volumes of both structured and unstructured data. As organizations increasingly began using their data for RAG systems, a consistent pattern emerged: Metadata was rarely missing or inadequate; the real issue was that it was treated as secondary to semantic content. As a result, their AI agents often received the wrong or outdated documents for context.

In Aspects search, these document properties are simply called aspects. They encompass everything that provides meaningful context about a document, such as its content type, when it was created or last modified, its sensitivity or classification level, where it is stored or which system it originated from, and even geographic information like coordinates or postal codes.

A flowchart showing a document being split into Text Content and Metadata, which are processed through LLM Embedding and Resolvers respectively to create a single Unified Embedding vector

A simplified overview of how a document can be prepared into a full unified embedding, in aspects search.

Each aspect is encoded into its own vector representation, designed to reflect how that property behaves in the real world. Categorical attributes are projected onto continuous ranges, temporal values are normalized along a linear scale, and semantic content is represented using high-dimensional embeddings generated by models such as LLMs. The goal is not to flatten all information into a single homogeneous signal, but to preserve the distinct characteristics of each property while making them comparable within a shared vector space.

A side-by-side comparison of "Vector Index" and "Aspects Index" structures, showing how standard metadata blocks can be transformed into specific, searchable "Aspect" dimensions like Date, Creation, and Sensitivity.

On the left: a traditional vector index, where metadata and the embedding are separate. On the right: an Aspects index in which vectors are stitched and stored as a single entry.

These individual aspect vectors are then combined into a single, unified full-aspect embedding that represents the document as a whole. Unlike traditional vector databases, where individual dimensions are typically opaque and lack human-interpretable meaning, the full-aspect embedding is structurally meaningful. Each contiguous segment corresponds to a specific aspect of the document, making the vector not just a numerical representation, but a coherent, multidimensional description of the document itself.

Metadata as signal, not constraint

Historically, metadata has been used to compensate for the limitations of search technologies rather than to enhance them directly. In keyword-based systems, where queries lacked semantic understanding, metadata was introduced to improve precision and narrow results. In vector search, the situation reversed: meaning was captured effectively, but structure and control were missing, so metadata reappeared as a set of external constraints layered on top of the search.

Aspects search represents a natural evolution beyond this trade-off. Instead of treating semantic meaning and metadata as fundamentally different mechanisms, both are modeled as signals of the same kind, therefore contributing equally when searching. Instead of asking “should this metadata filter apply?”, the system asks “how much should this aspect matter right now?”. This shift transforms metadata from a rigid gatekeeper into an active participant in ranking, allowing relevance to emerge smoothly from multiple dimensions rather than being enforced through hard constraints, while maintaining the efficiency of single-step optimized vector search.

Why this changes how search behaves

By embedding metadata into the vector space, rather than filtering around it, Aspects enables a more expressive and efficient retrieval model:

Single-query, multi-dimensional search
Semantic similarity and structural context are evaluated together.
Weighted relevance instead of binary filters
Metadata can influence scoring proportionally, not just include/exclude results.
Aspect masking
Queries can selectively ignore dimensions without biasing similarity math.
Cleaner system architecture
Fewer pipelines, fewer queries, less glue code.
Better performance at scale
One optimized ANN search instead of multiple sequential operations.

In practice, this means search results feel more intent-aware and less brittle, especially in enterprise and knowledge-heavy environments. The broader vector search ecosystem is already slowly moving in this direction, where concepts such as hybrid representations are implemented. Aspects search fits naturally into this evolution by unifying semantics and structure into a single representation. This way, vector search becomes not just a tool for semantic text retrieval, but a general-purpose foundation for intelligent information access.

Closing thoughts

Vector search taught us how to make meaning computable, and metadata taught us how to impose structure and intent. Aspects search is a natural extension of both, emerging when we stop treating them as competing approaches and instead allow them to work together.

For developers, this means fewer workarounds, clearer intent, and search systems that behave more like how humans actually think about documents. Documents are not just what they say, but what they are and why they matter.

If this way of thinking about search resonates with you, whether you’re building search systems, exploring new retrieval paradigms, or simply curious about where vector search is heading, you can learn more at aspected.com. We’re actively building, experimenting, and refining these ideas, and we’re always interested in talking to developers, researchers, potential collaborators, and those who want to be part of shaping what comes next as users, team members, or partners.

Thank you for your time!

- Team @ Aspected