Financial Industry History Data Platform



The new generation of big data architecture in the traditional industry IT integration and evolution has become the top priority. Due to the massive structured and unstructured historical data is the most important data assets now traditional enterprises have, how to wake up a large amount of historical data, the enterprise has to use a large amount of data to enhance customer experience, strengthen risk analysis, found that the operation law of history has become the focus of the construction of large data. A large amount of historical data usually includes historical transaction data, historical management (process) data, historical image data, historical customer interaction data and so on:

• Enhance the customer experience, mobile App, online direct marketing and electricity providers and other online customers' demand for the rapid increase in demand, especially for the vast number of historical data and the rapid increase in the demand for flexible query statistics.

• Customer labels and portraits, by analyzing the customer's historical data to analyze the relationship between the preference attributes and the basic attributes of each segment of the customer.

• Big data driven operations management, the need to preserve more long-term historical management data and image data, and faster detection of business problems.

• Analysis of large data aided risk, risk analysis of historical data requires a longer time support, especially the original data, such as audit or judicial departments to see many years ago a certain time (in days) as a table in a business system (called point snapshot).

The traditional ODS/DW system IT framework of enterprise is the key of data processing, storage and processing will usually 2-3 years of structured data in ODS/DW system, including data query, statistics and analysis, rather than the historical data in a structured, structured data and more than 3 years to go because the number is too large, can only be saved to the archive the system has compact discs and tapes and cannot make full use of. But the ODS/DW system is not suitable for dealing with massive historical data:

1. The general construction of traditional relational database technology based on ODS/DW 1 system, the data of more than a certain amount after a sharp decline in performance, need special machine to deal with the high cost, so the cost of processing massive data often limits the enterprise more desire for a large amount of data processing.

2. ODS/DW system based on relational database technology, unable to deal with a large number of unstructured data

3. Integration and cleaning 3 ODS/DW system to focus more on the data, the theme of data production data into the perspective of business management needs, but to change the customer query statistics business, audit and judicial investigation business, as well as a variety of needs fast and flexible change of data request, often overwhelmed.

4. ODS/DW system is based on the model, and the rapid development of Internet services, so that a lot of data storage and analysis cannot be defined in advance of the perfect model.

Driven by the business needs of the above, the construction of a special historical massive data management system has become inevitable; it will focus on solving the following technical problems:

Massive structured and unstructured archival data are put in use without the original tape disc, historical data platform must first solve the problem of massive data distributed storage of various types, the low cost distributed cluster provide highly efficient and stable platform for massive data storage.

• The existing ODS/DW system has accumulated many years of data, operating efficiency is getting lower and lower, and the historical data platform can move a large number of historical data out to provide relatively inexpensive data storage and computing pressure unloading. Such as the wholesale and business topic, random demand data query classification, analysis, multi table query relevance, interaction analysis to split the business to the history of data processing platform, so that the data warehouse and historical data platform to carry out their duties, complement each other.

• Because of the use incensement of mobile clients, mobile users from App to more historical data inquiry and statistics of the long-term demand is more and more intense, causing pressure suddenly increased query production system. Historical data platform is mature and stable in addition to unloading history data query and analysis, but also can deal with the high concurrent clients online query pressure, thus becoming a high pressure production system to read and write separation technology platform. The following example to a commercial bank's history of the number of platforms, to illustrate the way to read and write separation and various types of applications at the same time access to massive historical data.


A typical example of a large number of large and medium-sized enterprises with ten business systems, hundreds of GB per day to generate incremental data downstream, and the need to provide a full range of data query and Analysis on a regular basis. The historical data management platform to all systems and regular daily increment of total storage (all these incremental is the basis of full snapshot synthesis unit to the day), and planning for more than 10 years of data period, the structured data volume will reach nearly 10 billion, while the non node over the image data (such as images, documents, video etc.) can reach 300TB-500TB. Historical data platform to guarantee the storage cost of these data, can reduce the online access, can quickly query directly, but also can retain all traces of changes in data and metadata, interactive analysis to summary data, to achieve the so-called data lifecycle management.

This is the real needs of the new generation of enterprise data management, it is an important part of the new generation of IT construction project, is the key to the new generation of enterprise data architecture planning.

Technical challenges

Massive historical data archives is the nature of the data, these data files need to be able to self management of computing nodes from pure storage level organized and clear, the so-called "offline storage, online access, storage layers of loosely coupled computing. That is to say, we need to clearly know the exact time and system, forms of data storage, the history of enterprise data management, are essential for the long-term.

As mentioned above, the traditional relational ODS/DW system relies on the general and MPP database technology has been unable to adapt to the challenges of massive historical data platform of a new generation of Hadoop technology platform open source is not very suitable for historical data of traditional enterprises to format data based processing requirements. Hadoop with such a massive online computing as the main design and development goals, and in the storage management of its simple computational platform strategy, to complete the archives and long-term data storage management needs, is the "congenitally deficient platform selection". Hadoop in the storage management of the humble reflected in:

•   HDFS/HBase does not support multiple indexes and transactions; online interactive SQL support is not perfect.

•   HDFS is a file storage mode, is not suitable for online operation, for example, it does not support the direct modification of data, is generally removed and then increased

•   HDFS is not suitable for mass storage of small files, because the distribution of storage needs to read data from different nodes, but the efficiency is not centralized storage high;

•   HBase there is no foreign relations, but it can support millions of single data columns, so a HBase design table is all related information are stored in the same row:

•   For a business similar to historical data, it does not reflect the nature of the relationship between the data;

•   Too many columns defined as a nightmare for business people and data management.

•   The efficiency of scanning data in HBase is heavily dependent on the design of row primary keys

•   HBase construction of the two indexes has become the norm, it should be noted that the addition of the index is not a lightweight operation of the two, each time you write the data to the main table, it will update all of the two indexes; the cost of the index is very high.


Therefore, under the principle of the separation of computing layer and storage layer, Hadoop technology is a kind of high concurrent processing means, which is not only suitable for the main storage of historical data, but also as one of the most advanced data processing methods in the computing layer.

Unique advantages of SequoiaDB

For the massive historical data, the new generation of distributed database has a unique advantage in the flexible model and distributed storage of historical data. The SequoiaDB is the only independent research and development of NewSQL distributed database, to provide customers with flexible data modeling structure, a powerful distributed SQL and a wealth of enterprise database management functions:

Flexible JSON data structure has wider adaptation range

• Enable to describe a relational structure, to maximize the retention of existing SQL application assets; can also describe non relational structures, such as K/V and class wide table.

• Enable to store unstructured files with structured descriptors instead of index + file storage

• Properly reduce the dimension of the paradigm and reduce the complexity of JOIN operations

Powerful distributed SQL engine

• Improved SQL support, including high concurrency, low latency and batch computing SQL capabilities

• Efficient random read and write and Update

• ACID and transaction support

Richer enterprise database management capabilities

• Primary index support

• Strong data compression capability

• Support remote cluster data recovery

SequoiaDB based on the advantages of building a historical data platform, including:

• The same business application in different periods of the different versions of the data structure, the history of the data platform requires a business table stored in different periods of data Schema, and can be unified access through SQL. SequoiaDB as a storage layer is easy to do this.

• Keep the original data unit to the point in time snapshot day, also need to statistic data, the conversion process and the blood relationship also needs to be preserved and SequoiaDB through the data partition can be put in different stages to reduce paradigm of data are preserved.

• SequoiaDB unique LOB large object storage mechanism can be a good deal with a large amount of structured and unstructured data, especially a large number of small file processing capabilities.

•Provides convenient data entry and synchronization tools, including high-speed full volume and incremental import tools, as well as support the current common data synchronization tools, such as CDC and Goldengate.

• Support for the complete SQL feature set, in particular to provide high concurrency, low latency SQL query capabilities, as well as support for efficient multi table Association query processing.

• Solve the problem of distributed storage of massive structured data, to achieve the level of the expansion of hundreds of nodes, dynamic expansion.

• Enterprise data management capabilities, architecture and development is simple, stable and reliable operation, maintenance workload is small.

Please login to post comments
Latest Comment
About Us

SequoiaDB is a financial-level distributed database vendor and is the first Chinese database listed in Gartner’s Magic Quadrant OPDBMS report. SequoiaDB has recently released version 3.0.
SequoiaDB is now penetrating the vertical sector Financial Industry quickly and had more than 50 banking clients and hundreds of enterprise customers in industries including government, telecommunication, Internet and IoT.

Tower R, No.8 North Star East Road, Chaoyang District, Beijing,China
Tower A, No.22 Qinglan Street, Panyu District, Guangzhou,China
Tsing Hua Tech Park, Nanshan District, Shenzhen,China