Solix Technologies To Deliver Data Governance, Privacy and Security with Third Generation Cloud Data Platform

Reading time: 7 mins

Meet the Authors

  • Mark Vigoroso

    CEO, ERP Today & Chief Content Officer, Wellesley Information Services

Key Takeaways

⇨ SOLIXCloud Enterprise Data Lake is a third-generation cloud data platform designed to manage, mine, and monetize enterprise data assets, featuring real-time data integration, governance, and support for both structured and unstructured data.

⇨ The new update includes advanced features such as Apache Hudi with ACID transactions, a unified data catalog, and fast data processing capabilities, which enhance data governance and streamline data workflows for increased efficiency and analytics power.

⇨ The platform supports multi-cloud environments and offers federated data governance, ensuring compliance and security while providing scalable solutions tailored to the evolving demands of enterprise data management and analytics.

Solix Technologies, Inc., SAP solution build partner and provider of enterprise data management solutions, recently unveiled a major update to SOLIXCloud Enterprise Data Lake.

Historically, companies have struggled to manage, understand and take advantage of the data assets spread across their enterprise. First generation data platforms were encumbered by limited real-time reporting, complex and expensive ETL loading, and lack of real time updating. Even second generation data platforms suffered from absence of key management and governance features, slow and cumbersome queries, and poor metadata and catalog management.

As a result, from operational data stores and data warehouses to previous generation data lakes, customers have invested time, money and talent in building data platforms that have fallen short in turning enterprise data assets into revenue.

Recent SAPinsider research points to a prominence of data lakes in leading companies’ enterprise data management strategies. The research report Data Management Strategies uncovered that 83% of SAP organizations currently use or are implementing data lakes, are evaluating data lakes, or plan to implement a data lake within 12 to 24 months. Data lakes are increasingly important for SAP customers as they provide a unified, scalable repository for storing large volumes of structured, semi-structured, and unstructured data.

Demand is growing not only for real-time enterprise data to power generative AI, machine learning, and enterprise intelligence application revenue streams, but also for a data governance framework that supports data engineering and data integration solutions throughout the data lifecycle.

SOLIXCloud Enterprise Data Lake, a third generation cloud data platform to help customers manage, mine and monetize their enterprise data assets, is highly scalable and now delivers end-to-end data integration and data engineering solutions for both new and existing customers, in addition to the best of breed cloud data management with secure governance SOLIXCloud customers rely upon.

The new data lake adds Apache Hudi with ACID transactions, and a unified data catalog to transform how businesses manage and analyze their growing volumes of structured and unstructured enterprise data to make better just-in-time decisions and future critical business predictions.

Along with robust governance for safety, security, compliance, and the lifecycle of data, the SOLIXCloud Enterprise Data Lake is a multi-cloud solution supporting AWS, Azure, IBM Cloud, Oracle Cloud, Google Cloud, and hybrid on-premise deployment.

“Data warehouses and first generation data lakes have not met the business requirement of storing, organizing, and analyzing enterprise data to maximize profitability,” said Dr. James Short, Lead Scientist at the San Diego Supercomputer Center. “Solix’s Enterprise Data Lake makes historical and real time data available, organized to meet the needs of data scientists, data engineers and business analysts, so they can discover new insights and build new profit-centric AI and machine learning solutions.”

Running on the cloud-native Solix Common Data Platform (CDP), SOLIXCloud Enterprise Data Lake is a third generation, transactional, streaming data lake that brings core data warehouse and database functionality directly to a data lake. Designed for high-performance, real-time cloud database workloads, the SOLIXCloud Enterprise Data Lake supports ingest of streaming data, data pipelining and delivers transactional guarantees to the data lake with consistent atomic writes and concurrency controls tailored for longer-running data lake transactions. To ensure data infrastructures are not tied to any one vendor, the SOLIXCloud Enterprise Data Lake supports Apache Hudi at customer early access, and Open Table Formats for Apache Iceberg and Delta are planned to follow.

SOLIXCloud Enterprise Data Lake is a response to the growing demand from customers for end-to-end data fabric solutions that support serverless, low-latency transactions for intelligent enterprise applications such as generative AI, streaming analytics and machine learning operations (MLOps). The SOLIXCloud Enterprise Data Lake collects any data, including metadata, from any source, and delivers real-time data pipeline solutions with federated data governance controls including data security, consumer data privacy, compliance and Information Lifecycle Management (ILM). SOLIXCloud Enterprise Data Lake may also be added to existing SOLIXCloud solution landscapes to quickly expand data platform capabilities.

“Cloud data platforms are a cornerstone to any digital transformation strategy,” said John Ottman, Executive Chairman of Solix Technologies, Inc. “SOLIXCloud Enterprise Data Lake delivers data streaming, data governance, data integration and data engineering capabilities so customers can fully capitalize and monetize their data assets.”

Key SOLIXCloud Enterprise Data Lake database features with Open Table Formats for Apache Hudi (Hadoop-Upserts-Deletes-Incrementals) include:

  • ACID Transactions – The Apache Hudi data lake framework provides real-time, ACID transactional guarantees to your data lake with consistent, Atomic Writes and Isolated Reads for Concurrency controls tailored for longer-running data lake transactions. These features include Tables, Transactions, Upserts/Deletes, Advanced Indexing methods to manage and query large datasets, Clustering/Compaction, Performance Optimizations to scale writes and reads independently and optimize infrastructure, Bulk Inserts and Transactional Writes, Snapshots so readers don’t block writers and writers don’t block readers, and Time Travel to enable querying past versions of the dataset useful for audit trails or rollbacks.
  • Fast Data Processing – Evolve from slow, batch data processing jobs to a new incremental approach of reading and writing data using Streaming Ingestion. Fast Data Processing runs alongside batch data processing and provides customers a way to re-think and re-engineer ETL processes for Hive and Spark jobs which are running too slow and taking up too many resources. Incremental data processing facilitates the processing of only new or updated data since the last batch, enhancing efficiency in data pipelines.
  • High-performance Loading – Even moderately big NoSQL database installations store billions of rows, making full bulk loads infeasible and a more efficient approach necessary to ingest such data volume. Replace costly and inefficient bulk loads with managed ingestion via Upserts and incremental streaming to keep your data up to date.

Additional SOLIXCloud Enterprise Data Lake features include:

  • Data Catalog – Data scientists and data professionals require a detailed inventory of all data assets to help quickly find the most appropriate data for any analytical or enterprise intelligence purpose. Features include role-based access control, business glossary, data classification, metadata repository and data lineage.
  • Low-code, Incremental Data Pipelines – To create real-time, incremental data pipelines from source to target that are fit-for-use by artificial intelligence (AI), machine learning (ML) and advanced analytics, data engineers require drag and drop tools to collect data from any source and apply data cleansing, data enrichment or any other preparation. By transforming files, removing erroneous records, masking sensitive data, tagging and labeling, or combining data objects into enterprise business records, Solix Data Pipeline improves data quality and the accuracy of data warehouse, machine learning and advanced analytics applications. A Continuous Data Delivery processing framework is ideal for low latency, minute-level analytics and change data capture workloads. Create declarative templates for incremental ingestion and transformation, and provision continuous data delivery pipelines for machine learning operations. Automate the operational burdens of scheduling, monitoring and moving enterprise data.
  • Change Data Capture (CDC) – Change data capture enables seamless, efficient database ingestion into your data lake. SOLIXCloud Enterprise Data Lake is designed to support fast Upserts and Deletes of data suitable for CDC and streaming use cases.
  • Apache Spark – Apache Spark’s parallel in-memory data processing is the world’s most widely used engine for scalable computations against structured and unstructured data. Thousands of companies, including 80% of the Fortune 500, use Apache Spark™ today.
  • Federated Data Governance – Federated Data Governance provides a centralized control framework for when several groups have authority over the data. Through delegated authorities, virtual policy enforcement and audit management, Federated Data Governance enables compliance control over remote tables and data, reducing risk, and improving security for decentralized, multi-cloud data operations.

What this means for SAPinsiders

Share your data management strategies. The focus on enterprise data management is intensifying with the proliferation of 5G, IoT, AI/ML and other transformative technologies. SAP customers are increasingly looking for new data management models for the storage, migration, integration, governance, protection, transformation, and processing of all kinds of data ranging from transactional to analytical. Balancing the risks, compliance needs, and costs of data management in SAP HANA on-premise and on the cloud while also providing reliable, secure data to the organization is increasingly important to the business We will be releasing the 2025 Data Management Strategies research report in February 2025. Contribute to the research by completing this survey: https://www.research.net/r/DataMgt25.

Account for diversity when organizing data. Third-generation data lakes should be capable of handling structured, semi-structured, and unstructured data. SAP customers need solutions that can seamlessly integrate SAP data (usually structured) with external data sources like IoT data, social media, and documents to support diverse analytics needs. Further, robust metadata management enables users to locate, understand, and utilize data more effectively. Look for a data lake with integrated data cataloging, which facilitates data discoverability and governance, improving usability across departments.

Look for lake performance and scalability. Data lakes should provide elasticity to scale storage and compute independently, which is cost-effective for SAP customers dealing with large data volumes that vary over time. This capability allows companies to manage fluctuating workloads without incurring high infrastructure costs​. For SAP environments where real-time insights drive business actions, third-generation data lakes should offer low-latency processing. Look for capabilities that support streaming data and real-time analytics to enable quick, data-driven decisions.

More Resources

See All Related Content