The Koverse Data Lake gives organizations ubiquitous access to their data across the enterprise in a pre-packaged, ready to deploy platform that runs on well known, and usually pre-existing IT infrastructure. Data scientists, analysts and business users can easily work together, quickly and iteratively to finally deliver to the organization the key data capabilities it needs.
One of the fundamental issues with big data technology deployments - those projects that require complex integration of multiple datasets to address a specific business case - is that they are too complex, take too long, and require risky technology integrations and when aggregated across the many solutions that an enterprise can require, are very expensive. The data lake is a powerful concept in enterprise IT infrastructure because it offers a solution to all of these issues. First, a data lake allows enterprises to store huge amounts of data in its raw form. Second, a data lake gives organizations the ability to quickly leverage that data against a given use case from within the system. The result is that solutions can be delivered in days or weeks that previously took months or years. Delivering a solution requires no additional infrastructure or architecture design because the entire use case can be hosted within the data lake. And finally, the reuse of infrastructure and data to support multiple use cases with a single data lake results in a huge reduction in overall cost, allowing organizations to shed their costly and ridged stove-piped solutions. Data Lakes allow this shift because they achieve economies of scale in the re-use of architecture, data, system resources, and analytics. The Koverse Data Lake in a Box gives organizations all the benefits of a data lake, without the technical risk or significant up front investment.
How Koverse Works¶
The biggest challenge of building a data lake is the huge integration associated with leveraging a range of complex infrastructure systems to generically address a range of use cases or leveraging these same technologies to hand craft specific solutions, making it difficult and expensive to operationalize multiple use cases. Koverse has solved this problem in a mature and mission proven platform, building on an open source core including Spark, Accumulo and Hadoop. Koverse integrates data storage, tranformation, interrogation and security features to create a complete solution that finally allows organizations to see big successes with big data.
Hold data at scale and in its raw form¶
The Koverse Data Lake is able to hold heterogeneous data in its raw form in a single system at any scale, enabling quick application to any use case.
- Koverse can simultaneously consume data in a variety of data formats from JSON, XML, CSV and can pull from a range of systems enabling a complete operational view.
- The Koverse internal data structure is schema free, allowing the system to tag, track, govern, and index the data while retaining the data’s native schema.
- The Koverse SDK allows custom data formats and systems to be quickly and reliably ingested.
- Koverse can ingest data in both streaming and bulk scenarios supporting both real time and bulk use cases.
- Koverse supports automatic data profiling and sampling enabling quick assessment of data quality and value.
Transforms the data at scale¶
The Koverse Data Lake can transform data in-situ using a variety of well known analytic frameworks and stores the output of these transforms in the exact same manner that it stores raw data, allowing analytic output to inherit all the functionality of the raw data.
- Koverse transforms data at scale and in-situ using well known MapReduce or Spark engines.
- Transformations are deployed via an SDK or UI and are executed by the Koverse server to ensure that data lineage, auditing and access controls are maintained.
- Transformations are semantically flexible and can be reused without code changes.
- Analytic output is re-usable such that it can be leveraged for multiple use cases.
Interrogation and Search¶
The Koverse Data Lake is able to serve thousands of users simultaneously with interactive responsiveness.
- Koverse provides an internet-scale query capability on all the data and content it holds by implementing an efficient and proven secure and scalable indexing technology.
- Queries are executed via API or REST calls using the well-known Lucene syntax.
- Different use cases can apply different semantic models at query time.
- Data can be searchable within seconds of ingest allowing for real time applications.
- API enables integration with a range of existing tools and applications.
Security and Governance¶
The Koverse Data Lake has best in breed multi-level security mechanisms allowing the use of sensitive data and multiple use cases of a single system, which enables the re-use of data, analytics and infrastructure
- Koverse enables individual access control on every record and every collection.
- Maintains audit information of all system interactions by all users.
- Tracks the analytic lineage of all data in the system.
- Integrates with existing access control systems via LDAP, Kerberos and AD.
With all of these features integrated, the Koverse Data Lake eliminates the complexity and dramatically reduces the time to put big data into production. Not only can you execute faster with your first use case, but the follow on use cases are even faster and more useful because analytics and data can be securely reused. Resulting in eventually, the incremental cost for additional data and applications become negligible.