Since the last decade with the advent of big data, Data Lakes have been included in conversation. However Chief Data Officers have been reluctant to undertake the initiative. There are a few others of large Enterprises who have committed investments but are struggling to justify the spend so far. Barring a few use cases implementation here and there, the list to a successful data lake strategy lies in gaining a understanding of what is included and what is the structure.
What is a Data Lake?
In the new world of big data (trillions and billions of records, A data lake is a centralized layer for all the data in the enterprise. A comparative definition of a data lake could be enterprise data bus for the enterprise. It acts in different purposes for various types of unique needs for enterprises.
1. Inward Data lake:
A data lake which is in a receiving mode of data objects , ERP feeds, transactional data , social media data and combines and aggregates into one. Such a data lake is the ideally place to run self service BI and analytics and helps in providing a democratic view of the data eco system .
2. Outward Data Lake:
An Outward Data lake is something like that of a emanating source of data lake which becomes a source of data feeds for many a type of external and internal systems. Good examples of Outward Data lake would be the Data lakes of connected organizations like that of Facebook , Google Mail , Bloomberg, Stock exchanges etc. Such data lakes are in the business of funnel through data monetisation.
Designing the Optimum and profitable Data Lake:
The optimum data lake design for a enterprise would be in having a transient contour. The design considerations for such a transient data lake are
- Combination of inward and outward data lake per degrees of data freedom
- Modular and extendible
- Searchable and indexed
- Object-Path record memory
- Governed by Information Catalogue mechanism
- Possible Indexed Architecture
The above transient data lake view helps us in achieving the following objectives:
- Making schema of Business data unit self-discoverable
- Allows Self Service BI through democratization
- Innovation layer as in and around the core data systems leaving space for innovation
- Space for leveraging business and process maps to deliver real business value
Conceptual Technology Design:
A data lake can be implementations using the one or combination of the below technology platform options:
- Hadoop data lake distributions
- Cloud based Data Blocks
- On Premise Big Data warehouse
Please do write to us at firstname.lastname@example.org a detailed discussion on journey design of your enterprise data lake.