A real case, some real existent database selection misconceptions

 A bank plans to deploy a distributed database to replace a centralized database at the core of its business. A pilot project is planned for a core business and then, based on the pilot, a decision will be made whether to proceed with a large-scale implementation.




The pilot core business uses an "O" database, a 3-node RAC with 3 small machines, 2 for the business system and 1 in a co-located disaster recovery center for remote data backup. After the replacement, the database is a distributed database, using up to 600 X86 servers.




The pilot service has now been deployed, with a 50% increase in peak performance (TPS) and a 33.3% increase in average performance (TPS) over the pre-replacement period, and an unknown average transaction response time.




Ultimately, the bank decided not to proceed with the implementation, but to maintain the status quo.




By this point, I believe you should be able to see something.




Why do you need so many X86s for what 3-node RAC can do?




Yes, even though the core business complexity varies from bank to bank, and even though there is a 50% performance improvement after the replacement, but! But! But! The important thing to say 3 times, anyway, after the old fish asked a number of experts, we are in complete agreement that we do not think we need so many machines, it is burning money! Even if the money is not bad, we should consider the additional workload and complexity that will bring to the development and operation and maintenance teams in the future.




If more servers are needed because of network latency due to distributed transactions, it only shows that this distributed database architecture is poorly designed. The average transaction response time is unknown, is this because it is implemented in the application layer? Then the above network latency is debunked again.




The 5-year TCO of this project is not difficult to measure, and the investment is sure to break a billion. Hardware costs (servers, switches, etc.), software costs (operating systems, database licenses), including the fourth / five-year maintenance costs are very easy to measure.




But one cost may be easily overlooked, that is, the operation and maintenance costs, it may account for the overall purchase cost of 20-30%. Distributed database operation and maintenance is a pain point, and the workload of operation and development is bound to surge after distributed transformation.




From the perspective of operations and maintenance, because of the massive increase in hardware, assuming that the original 3 small machines only need 2 operations and maintenance, that now more than 600 X86 requires operations and maintenance will have to double, or even more. Assuming an average annual salary of 200,000 yuan, 5 years is 1 million, if the increase of 3 is 3 million. Second, the large number of additional machines will inevitably lead to higher costs for electricity, heat dissipation, and server room usage.




From the development point of view, the architecture changes, rarely without moving the application, and even completely reconfigure the application is possible.




Therefore, this case is not cost-effective, both in terms of performance and cost.




Although this case is a bit extreme, it reflects some real potential problems or misconceptions in the market that are worth exploring in depth.




In recent years, distributed database has become a trend, and with various haloes, it seems like a beacon in the dark, which is inevitably a bit over-mythical.




Not only various vendors or active and passive release their own distributed database products, the market products are swarming. Many traditional enterprises are also testing the distributed database. There is not on the distributed database will be eliminated by the times, not used the distributed database is particularly LOW the momentum.




In the test water of the enterprise, there are successes and failures. Since there is no unified industry standard for distributed database, so each has its own opinion.




The old fish believes that there is no universal database, only the most suitable database.




Although the distributed database is good, but it is not a universal "silver bullet", also divided into scenarios, but also has its own shortcomings. And to be clear about the best application of the current distributed database scenario, it is necessary to start from the historical background of the distributed database and the reasons for its popularity.




    polo ralph lauren outlet historical background 




Distributed database in the early history of the database, its research began in the 1970s, the world's first distributed database system SDD-1 was implemented by the Computer Corporation of America (CCA) in 1979 on the DEC computer.




However, distributed databases have not received much attention until recent years, and the reasons for this are manifold.




Before the birth of the Internet, the scale of enterprise data was not large, and the traditional database represented by "O" was sufficient to meet most of the data management needs, so there was no place for distributed databases, and there were two other reasons. Distributed database itself inevitably has some defects.




But after entering the Internet era, in the face of the massive data growth at all times and the huge number of users online at the same time, the traditional database is difficult to support business development. So, companies led by the Internet industry began to explore effective solutions.




First, NoSQL was developed, which sacrificed some limitations of relational databases, such as strong consistency and scalability for data processing. Then came the birth of NewSQL, which defined a new type of database with both scalability and traditional relational database features, most typically represented by the Google Spanner/F1-based papers.




In this process, the traditional database is also redeeming itself, the most common is the split database and split table approach, but this solution requires the application system to do a lot of transformation, the need to sense the location of data storage, while increasing the complexity of operations and maintenance.




Thus, there are two technical routes of distributed database: open source database + distributed middleware solution, such as MyCat, the advantage is that you can use the existing open source database mature and stable product features, the disadvantage is that middleware is only a roundabout way after all, will be limited by the functionality of a stand-alone database.




Another technical route is the native distributed database, which is born with distributed characteristics and designed for distributed architecture from the beginning of design.




At present, the general consensus is that the data volume is not on a certain scale, there is no ultra-high peak, there is no need to use a distributed database without a high concurrency scenario, because, most likely, not only can not get any obvious advantages, but also sacrifice the centralized single machine scalability and development and operation and maintenance of simple and convenient.




If only to achieve localization alternative, in fact, some domestic relational database can also meet the demand, and there is no need to directly on the distributed database.




In general, the advantage of distributed database is good scalability, data can be stored in multiple nodes, and can achieve horizontal expansion. And multiple nodes, which can be executed in parallel, improve the overall throughput.




The disadvantage is that the cost of distributed thing processing is high, and this cost mainly stems from the excessive message transmission caused by the two-phase commit; possible lock contention becomes large; replication of multiple copies and high availability. Secondly, the product maturity needs to be improved and the operation and maintenance is complicated.




    polo ralph lauren common misconceptions 




1, distributed transformation is only a technical problem




Transformation from traditional centralized architecture to distributed architecture is not a simple technical problem, but a technical ecological switching problem.




Distributed database, brings not only the database system reconfiguration, but also the application system reconfiguration. Distributed database generally does not support stored procedures, SQL execution efficiency is low, and these problems can usually only be solved at the application side.




Compared with the traditional database, the technical requirements for the development and operation and maintenance of distributed database will be raised a notch, and the technical person in charge of Minsheng Bank has said that it is very important to develop an intelligent operation and maintenance platform when the distributed transformation.




Therefore, before going on the distributed database, it is necessary to do the overall planning and be fully prepared in all aspects such as capital, environment transformation, personnel skills, management automation and technical reserve.




2、Distributed transformation will reduce TCO




Distributed database has two technical routes: open source database + distributed middleware, native distributed database. As the companies developing distributed database products are mostly Internet, startups, etc., they generally use MySQL as the main, therefore, many people think that the deployment of distributed database, software procurement costs will be reduced, X86 instead of RISC, the unit price of hardware will be significantly reduced, so the TCO will come down. However, this may not be the case in reality.




For example, the case at the beginning of this article is an example, of course, this case is a bit extreme, and then give an example.




A national bank's credit card system, the original database system for 4 small machines, after the distribution transformation, 120 database servers are needed, the hardware and software procurement costs are reduced by 50%, but the operation and maintenance staff grew by 66%, the developers grew by 5 times, calculating the overall cost of ownership for 5 years, more than 60% higher than the original. The procurement cost savings do not cover the increased O&M and development costs later.




From this case, the distributed transformation only reduces the first purchase cost TCA, and the overall cost of ownership TCO is not reduced.




3、Do not explore the potential of the existing system, blindly copy the Internet model




There is a saying that "technology follows business", and the evolution and upgrade of IT architecture needs to be synchronized with business transformation.




The business of traditional enterprises and Internet companies are fundamentally different. Internet companies have three distinctive features, namely massive users, high-frequency user access and transactions, and high-frequency business innovation, such as Jitterbug, Racer, and Today's Headlines, which can develop tens of millions of users in a year's time, and each user logs in multiple times a day. Therefore, the IT infrastructure must be the first pursuit of scalability and flexibility.




The core business of traditional enterprises is relatively stable, and the number of users is limited, the frequency of transactions is not high, there are few developers, less IT spending, and the business needs for IT are fundamentally different from those of Internet companies. It is difficult to bear the economic cost and technical risk brought by distributed transformation, and usually can only rely on the overall solutions and services provided by the third party, therefore, for such enterprises, the traditional centralized database is still the best choice.




For example, in the case at the beginning of this article, to improve the performance of the database system, you only need to upgrade the hardware platform from two-way or four-way servers to eight-way or sixteen-way and other large servers, which can cover the vast majority of business needs. The cost may not be higher than directly on the distributed, and may even be lower. At present, the domestic server, small machine category is complete, the price is also very transparent. If this is not enough, a RAC cluster, many domestic relational databases also have mostly RAC cluster expansion program.




     ralph lauren polo ralph lauren outlet 




Overall, what database to use depends entirely on the business needs.




What does the business use the database to do? Analytics or transactions? Or both? What kind of data does the business want to process? What are the database performance requirements?




If it is a traditional ERP, CRM, finance and other "money" related core business system, which needs transaction integrity and ACID transactions, then there is no doubt that the traditional centralized relational database is the best choice.




What kind of data does the business want to handle? Structured? Semi-structured? Unstructured data? Determine the data model that needs to be supported. In principle, "whatever data model, use whatever library."




If you are storing and processing unstructured data such as images, audio, video, etc., then a NoSQL database would be the best choice. Further, if the business wants to store information such as character information, experience props, friend rankings, etc. in a game scenario, which are generally linked to IDs (keys), then a key-value database is a good choice.




What is the size of data, concurrent throughput, and response time requirements for how large the business needs to handle? Determines the performance requirements for the database.




If the business is a spike, Chinese New Year train tickets, etc., with ultra-high peak and high concurrency, then a distributed database would be a good choice.




To sum up, although the discussion on architecture and database selection has always existed, but to its core, it must be clear: "business needs dominate technological innovation", rational analysis and treatment of architecture and distributed database selection, choose the most suitable architecture and database for business scenarios is the king.


Comments