Komprise launches AI-focused ingest tool to clean up unstructured data

Data management business Komprise has launched a generally available Intelligent AI Ingest product as part of its Smart Data Workflow ingestion engine.

Komprise Intelligent Data Management delivers a single platform to easily analyze, migrate, transparently tier, and manage the lifecycle of petabytes of file and object data across hybrid environments. It uses file and object metadata to manage unstructured data estates and provide policy-driven workflows to manage placement and accessibility. Komprise says it automatically builds metadata and delivers a single view of all file data within the enterprise at scale and customers “can find precisely the right data for your AI use case with simple queries.” A recent Komprise AI Data and Enterprise Risk survey found that IT leaders cited getting the right unstructured data into AI systems and ensuring proper AI data governance as two major challenges. 

Kumar Goswami

CEO Kumar Goswami stated: “Our mission is to help organizations untangle the mess of unstructured data to gain the greatest competitive advantage with AI. Komprise Intelligent AI Ingest is the latest advancement in Smart Data Workflows to solve a critical customer pain point of efficiently finding and moving the right data to AI.”

The company says unstructured data is unorganized, containing large quantities of irrelevant, outdated, and duplicate files. This reduces precision, clutters context windows, and adds latency in AI pipelines. Studies show a 10 percent efficiency drop per 10,000 additional unstructured documents in typical retrieval-augmented generation (RAG), leading to reduced accuracy and poor outcomes. Irrelevant unstructured data wastes expensive AI processing resources, drives up costs, reduces accuracy, and ultimately erodes return on investment.

There is a risk of sensitive data leakage. Ingesting data in bulk can lead to inadvertent sensitive data exposure in AI tools, violating privacy, security, and compliance policies. Intelligent AI Ingest uses filters to eliminate low-quality and sensitive data flowing from data sources via connectors during ingest. Komprise claims it doubles ingest performance compared to the AWS DataSync data transfer tool in benchmark tests because it has a massively parallel architecture and minimizes file overhead. 

Intelligent AI Ingest has a sensitive data classification feature with built-in PII (Personally Identifiable Information) and sensitive data handling. It automatically maintains an audit trail of each ingestion workflow for data governance and auditing, documenting the who, what, and when, plus data lineage for compliance reporting. 

Komprise told us it can ingest the right data for AI model training or inferencing to Nvidia GPUDirect and NeMo DataStores and move this data out when the compute-intensive processing is complete. Essentially, Komprise provides a way to ingest in and lifecycle out data to AI-ready storage. Read a blog to find out more.