History Data Management



In recent years, the concept of big data has long been popular in domestic banking industry, while more and more enterprises begin to try to promote internal and external innovation with the power of big data technology. However, due to the differences in concepts and technologies of the big data and traditional technology, many banks and enterprises have encountered a variety of problems  in the implementation of big data strategy. Therefore, how to use big data technology with the correct method and steps is the first problem that many financial service organizations meet when trying the big data technology.

The history data service platform is a type of platform application of the big data technology in the banking industry. Its core concept is to uniformly copy all the data from the offline and history system to the history platform based on big data technology, enabling the platform to save all the core data of the enterprise. The business value objectives of the history data service platform include "near-lining of offline data" and "downsizing of history data".

Main business needs of the history data platform

Near-lining of offline data: Offline data in banks typically include historical data for 2 years or more. When there is no need to access these data by an online or history system, they will be unloaded from the production database and stored in static media such as tape or optical disc juke-box. In general, when an industry insider needs to access the data, it takes a lot of effort and time to find and restore the data to the temporary accessing environment. Near-lining of offline data implements the near-lining of the data which can not be accessed directly originally at a relatively low cost with large storage space of big data and the ability to calculate the data, and provides query and search service of historical data for the industry insiders and outsiders.

Downsizing of history data: Many banks' history data is stored in ODS or data store. With the expansion of business scale, enterprises on the one hand need to keep archiving the history data, and need to continuously expand these systems on the other hand. However, the expansion cost of ODS or data store based on traditional-relational database is quite high, therefore, it can achieve downsizing the current ODS Or data store systems by taking the big data distributed computing storage as a platform and transferring the function of part of ODS or data store to the history data platform.

In addition to the two business goals of "near-lining of offline data" and "downsizing of history data", the history data service platform is  construction around the three goals of small initial investment, quick effect, and safety and reliability.

History data platform architecture

The overall architecture of the history data service platform includes four modules: "history data archive area", "fixed-mode access area", "free query data area" and "data processing and scheduling area".


Typical History Data Platform Architecture


History data archive area: the history data archive area acts as a copy of the external data in the history data platform, in addition to providing to the data processing and scheduling area for processing as the data source, it also carries the function of archiving key business data. Once the business data enters the history data archive area, it cannot be changed in any way. Therefore, this area can replace part of the functions of the traditional tape.

Data processing and scheduling area: data processing and scheduling area acts as the connecting layer of other three data storage areas, on the one hand, it is responsible for operations of the data in the history data archiving area such as processing, cleaning and denormalization to provide to the fixed-mode access area for customization query; on the other hand, it is responsible for copying the data that does not exist in the free query data area, or is deleted in visit dynamically in real time to the designated area.

Fixed-mode query area: Fixed-mode query area provides a fixed query for internal and external applications of banks. For example, for query business with relatively fixed mode such as ECIF and receipt, it can fully meet the banks’ demand on online search query of historical data through the data processing and scheduling area which denormalizes original archived data regularly. After denormalization, the wide-table data can be stored in the fixed-mode query area to meet the query function of high concurrent external business with independent hardware and network, to ensure that free query and offline analysis has no impact on the business in the area.

Free query data area: The free query data area is a subset of the history data archive area that contains the definitions of all tables in the history data archive area and all or part of the data of each table. The data of this area can be opened to the inside users for free query analysis, and it can dynamically identify the data range that needs the access table through the data processing and scheduling area, and dynamically copy the data that does not exist in the free query area data from the near line-data archive area. The purpose of isolating this area from the history data archive area is to ensure that any data access does not affect the data that has been archived. At the same time, when the data area occupies much space, it can clear the data in the table that is not often accessed through the script to release space.

Business Value of the History Data Platform

Through the archiving and free query area of history data service platform, enterprises can achieve near-lining and downsizing of the traditional off-line and history data. At the same time, the fixed-mode query area can even provide these data to banks for end-users’ application. For example, banks can implement application innovation based on history data platforms from these four aspects:

1) Near-lining of off-line data: business system data agrees to archiving, historical transactions online inquiry;

2) Free inquiry: self-service report system inside the industry, judicial inquiry system;

3) Downsizing of the production system: data store and ODS downsizing, T+0 user real-time asset view;

4) Distributed image platform: image certificate management, remote account opening video-recording and so on.

SequoiaDB combines with Spark big data technology, and can meet the end-to-end construction of history data service platform. The distributed architecture provided by SequoiaDB meets the characteristics of distribution, high availability, high performance and easy maintenance, and its features like multi-dimensional partition, flexible index, dual engine core, and standard SQL support lay the best foundation data storage and computing for enterprise-class history data service platforms.

Please login to post comments
Latest Comment
About Us

SequoiaDB is a financial-level distributed database vendor and is the first Chinese database listed in Gartner’s Magic Quadrant OPDBMS report. SequoiaDB has recently released version 3.0.
SequoiaDB is now penetrating the vertical sector Financial Industry quickly and had more than 50 banking clients and hundreds of enterprise customers in industries including government, telecommunication, Internet and IoT.

Tower R, No.8 North Star East Road, Chaoyang District, Beijing,China
Tower A, No.22 Qinglan Street, Panyu District, Guangzhou,China
Tsing Hua Tech Park, Nanshan District, Shenzhen,China