Operational Data Lake

Tag:Favorite(0)Follow(0)

blob.png

Data Lake

"Data Lake" concept was first proposed in 2010, He likens the data lake to unprocessed and unpackaged original reservoirs, when different sources of water flow into the lake, bring a variety of analysis, exploration possibilities for enterprise.  The proposal of concept of data Lake, the data without processing integration, can be directly stacked on the big data platform, and the data will be processed by end users based on their own requirement. The traditional enterprise database focus on the integration, subject oriented, hierarchical thinking etc. It can be said that the idea of building data lake subvert the database construction methodology in essence.

Business Value of Data Lake

It is well known that there are many problems in the design of database based on hierarchical data structure, such as the complexity of data usage, the possible loss of data information, the difficulty of data structure for quick adjustment. The idea of data lake is to leave the problem to the end user and resolved by the end user based on their own requirements. To some extent, data lake is not a technical concept, but another way of data management, it can be used as a reference method for those users with strong IT skills, flexible data usage requirements, and the users who take the unusual way to study business issues. Data lake is actually methods and technology where it use a low-cost technology to capture, refine, store and explore large-scale long-term primary data.

Major features of the data lake include:

1)Different types of data are the main driving factor for the construction of data lake. Enterprise data Lake data is available to store a variety of structural business data:

· Abundant structured data

· Semi structured data (log, XML files, etc.)

· Unstructured data (files, images, audio, video, etc.)

2)Storing the total quantity historical data and its all features

data Lake need to store abundant business data.

· Real time service data persistence

· Online business system data storage

· Unload the historical data from the storage of enterprise database and data mart.

· Online historical data storage for offline tape library and CD Library storage .

3)Flexibility in data design patterns:

The traditional enterprise database usually adopt the way of  Schema On Write, that is , the data is written into a predefined E-R data table structure. The data lake will also use the way of Schema On Read mode, that is, when the data access, the format of data is analyzed and determined by data visitors, and the writer is not concerned with whether there is a consistent, uniform data format.

 There are following advantages by using this way:

· Reduce data storage cost, available to save without development.

· Reduce the latency between data generation and usage.

· Render the end user the maximum flexibility to process data.

· Allow users to store unstructured, semi-structured data.

· For data that does not need to be processed at the moment or unable to be processed, the original data is reserved for future use.

· Different users may have different interpretation regarding the same original data.

Schema On Read and On Write two ways have different advantages and disadvantages. Under two different data structure design strategies, what kind of data management mode should be used is subject to the requirements. In general, with respect to original data, Schema On Read management mode is used for data storage. For high stability, relatively fixed application, the way of Schema On Write is used for the storage of analytical data, the combination of the two methods is a feasible way.

4) Improve the usage and sharing of data:

Improve the use and sharing of data, providing data sources for multiple downstream systems: data lake provide a abundant and complete business data for enterprise database, data mart, online inquiry, mobile App applications and other downstream systems.

Data Lake Solution

Based on Sequoia DB distributed database, it is available to provide a distributed, support batch analysis , as well as online inquiry and transaction class data lake, meeting the growing demand and demand for big data platform. As shown on the chart above, data lakes are divided into online and analytical areas based on the functional area, corresponding respectively to the operative service with OLTP high concurrency and high real-time requirements, and the traditional batch processing business.

blob.png 

Big data analysis and operation domain

 

Sequoia DB distributed database and Share-Nothing distributed MPP structure, flexible data type definition and JSON storage and dual engine mechanism of block storage better meet the requirement of enterprises to construct data lake.

blob.png 

SequoiaDB Data Lake Structure

 

Based on the terms of flexibility, independence, agility and timeliness of enterprise data Lake of Sequoia DB distributed database, it is better to adapt to the rapid development of enterprises and rapid iteration of data analysis applications. Therefore, when the enterprise users in need of carrying out flexible and independent data、flat、 source based data structure is required,  enterprise data lake is constructed based on data lake model.

Construction Sequoia DB based enterprise data lake, using the design pattern of combining hierarchical and flat data structure enable enterprises in the era of big data, business personnel to use the data more quickly and conveniently, and solve the different analysis requirement of enterprises and bring higher business value, and achieve the best balance of input / output.

Please login to post comments
Latest Comment
About Us

SequoiaDB is a financial-level distributed database vendor and is the first Chinese database listed in Gartner’s Magic Quadrant OPDBMS report. SequoiaDB has recently released version 3.0.
SequoiaDB is now penetrating the vertical sector Financial Industry quickly and had more than 50 banking clients and hundreds of enterprise customers in industries including government, telecommunication, Internet and IoT.

Beijing:
Tower R, No.8 North Star East Road, Chaoyang District, Beijing,China
Guangzhou:
Tower A, No.22 Qinglan Street, Panyu District, Guangzhou,China
Shenzhen:
Tsing Hua Tech Park, Nanshan District, Shenzhen,China
Tel:400-8038-339
E-mail:contact@sequoiadb.com