Aspected Enrichment
Turn raw documents into AI-ready content
Most enterprise content is not ready for search or AI the moment it is uploaded.
PDFs, Word files, spreadsheets, presentations, scanned documents, and images all contain valuable information but that information is often locked inside mixed formats, inconsistent structures, or unreadable scans.
Aspected Enrichment prepares this content for discovery, analysis, and AI.
Documents go in.
Clean, safe, structured text comes out.
From Raw Files to Usable Knowledge.
Modern knowledge platforms depend on understanding what is inside every document.
Without enrichment:
-
Search results are incomplete
-
AI answers are less accurate
-
Scanned documents remain invisible
-
Sensitive information may be exposed
-
Valuable legacy content stays underused
Aspected Enrichment solves this by extracting, protecting, and structuring document content before it is used by search or AI services.
The result is a stronger foundation for discovery, migration, compliance, and AI-powered workflows.
Extract. Protect. Structure.
Aspected Enrichment processes documents in three core steps.
1. Text Extraction
The platform reads each document and extracts the available text.
For digital files such as Word documents, PowerPoint presentations, and spreadsheets, text can be extracted directly. For scanned PDFs or image-based documents, OCR can be used to read the page visually.
Customers can choose the right extraction level for each document set:
-
Text — the fastest option for files that already contain readable text
-
Fast — suitable for scanned documents when speed matters most
-
Medium / High — higher-quality OCR for difficult, sensitive, or business-critical documents
This gives organizations control over the balance between cost, speed, and accuracy.
2. Sensitive Data Protection
Before enriched content is stored, searched, or used by AI, Aspected Enrichment can detect sensitive information.
This includes personally identifiable information (PII) such as names, email addresses, phone numbers, bank details, and similar data.
Detected information can be flagged or redacted so it is not exposed in search results, downstream systems, or AI-generated responses.
Customers can combine different detection methods:
-
Pattern-based detection for structured data such as emails and phone numbers
-
AI-based detection for names, organizations, and sensitive terms in free text
-
Combined detection for stronger privacy coverage
This helps organizations use their content more safely while supporting compliance and data-protection expectations.
3. Chunking
Long documents are split into smaller, meaningful sections called chunks.
Instead of treating a 50-page report as one large block of text, Aspected Enrichment creates focused sections that can be searched, summarized, ranked, and used by AI more effectively.
Chunking can follow simple size limits or smarter boundaries such as:
-
Paragraphs
-
Headings
-
Sections
-
Document structure
This makes it easier to surface the exact paragraph, page, or section that answers a question.
Example
The Right Processing for Every Document
Not every document needs the same level of enrichment.
A collection of clean text exports can be processed quickly and cost-effectively. A set of scanned legal contracts may require high-quality OCR, stronger privacy controls, and more careful chunking.
Aspected Enrichment can be configured at multiple levels, including organization, dataset and ruleset.
This allows customers to decide which documents receive which type of processing.
They only pay for the depth they need — while still getting the quality required for search, AI, and compliance-critical use cases.
Where Enrichment Fits
Aspected Enrichment sits at the center of the document lifecycle.
First, documents are ingested into the content store. Then, enrichment prepares the text.
Finally, the enriched content is published to search and AI services, such as Aspects.
From the customer’s perspective, enrichment works quietly in the background. They configure it once, and the platform applies the right processing as documents move through the system.
Built for Real-World Content
Enterprise content is messy.
It lives in old archives, shared drives, collaboration platforms, scanned PDFs, exported reports, email attachments, and business applications. It comes in different formats, quality levels, and languages. Some of it is clean. Some of it is hard to read. Some of it contains sensitive information.
Aspected Enrichment is designed for that reality.
It helps organizations unlock the value of mixed-format content without forcing every document through the same expensive process.
Key Benefits
Unlock Document Value
Transform legacy and mixed-format content into searchable, AI-ready information.
Handle Real-World Messiness
Support common business formats, scanned files, and image-based documents.
Improve AI Accuracy
Give AI systems cleaner, more focused, and better-structured source content.
Protect Sensitive Data
Detect, flag, or redact personal and sensitive information before it reaches search or AI.
Control Cost and Quality
Choose the right enrichment depth for each dataset, use case, or document type.
Scale Automatically
Process large document collections consistently as part of the platform workflow.
Better Content In. Better Answers Out.
Search and AI are only as strong as the content they rely on.
Aspected Enrichment turns raw documents into clean, safe, and structured text that downstream systems can trust. It improves discovery, strengthens data protection, and prepares enterprise knowledge for the next generation of AI-powered workflows.