In 2007 I worked for a company which provided data warehousing solutions. Our biggest client database was about 2-3 TB which was not so much even five years ago, but because of extremely complicated ETL and complex reports we had a lot of troubles.
We spent hundreds of hours monthly on code optimization, but every 30 minutes saved on ETL and processing were compensated with monthly data growth and we had to start over and over.
Finally it comes the time when we couldn’t improve anything else in programming code and company decided to drastically upgrade the hardware. But extremely expensive servers and disk storages gives us only several percent performance improvement. We thought about migration to another platform but preliminary estimations showed that every traditional RDBMS has the same bottleneck – disk storage. Maybe, if it was happening today, we probably would turn to fundamentally new technologies.
The whole software development industry is highly innovation-oriented, but RDMBSs always are perceived as something very conservative. I still have record about ANSI-92 SQL knowledge in my CV and almost all fundamental works from 70s are up to date, but in spite of that data management software industry is very fast developing and great new products comes to market each year.
Our blog already contains articles written by Dale Anderson about NoSQL and column-oriented databases, now it’s time to talk about in-memory databases.
IMDB – It’s Not About Movies
In-memory database (IMDB) is a database management system that stores data entirely in main memory. It is contrasted with database management systems which employ a disk storage mechanism.
The idea is not so fresh. Actually, the first IMDB was created in 1991 (to they are even older than ANSI-92 from my CV), but in twenty years something has changed.
Comparing with 1991 RAM become 10000 cheaper and way faster. On this evidence, in-memory databases systems (which for a long time were used primary in RTOSs and other specific embedded systems) now can serve as a database for e-commerce, social networking and of course business intelligence solutions. You can keep terabytes of data in main memory – it’s a low end of Big Data scale.
So, now IMDBs still don’t have a big market share, but last year’s industry’s major companies pay more attention to this technology. One of them, SAP AG, recently acquired Sybase, in 2010 released in-memory database called HANA, which is now in front and center of SAP’s agenda according to several industry experts. And it is the main subject of my post.
What Problems Does SAP HANA Solve?
What is SAP HANA, Anyway?
First of all, it is an in-memory database, with all corresponding benefits and disadvantages. As for benefits, obviously, working with all the data in memory is way faster than writing to and reading from file system (in certain circumstances – 100x or even 1000x times). Well, regular memory-mapped files are very fast too, but in-memory database is still a database. HANA is full ACID-compliant, supports multi-user access, high level data definition language, several programming interfaces, SQL and MDX, stored procedures and triggers, and more other traditional RDBMs tools and mechanisms.
The common problem with ACID for most IMDBs is “D”. Main memory is volatile storage and loses its content when it is out of power. To avoid this HANA has Persistence Level component. Each committed transaction generates a log entry that is written to non-volatile storage. HANA stores changed pages in save points, which are asynchronously written to persistent storage in regular intervals (by default every 5 minutes). Persistence level is also used for whole database backup and restore.
I am not going to describe here whole IMDBs architecture and common principles, so let’s focus on specific HANA features:
- HANA comes shipped as a pre-configured appliance from hardware vendor and the license is bought from SAP.
- HANA is positioned as single system for both OLTP and OLAP purpose and supports both columnar and row-based storage organization, so you can use benefits of each approach simultaneously. HANA even allows joining row-based tables with column-based tables. You can also alter an existing table from row-base to columnar and vice versa.
- HANA supports parallel execution.
- HANA is fully integrated with SAP business objects and other client tools like MS Excel, Dashboard Design Tool etc can also access HANA directly.
- HANA supports ODBC, JDBC and ODBO drivers.
- HANA has its own scripting language named SQLScript that is designed to enable optimizations and parallelization and is a collection of data, functional and procedural extensions.
I can continue the list, but maybe it would be better just to give you some useful links:
- http://help.sap.com/hana/hana_dev_en.pdf – SAP HANA Database Development Guide.
- HANA – Frequently Asked Questions
- http://www.saphana.com/docs/DOC-1647 – Performance Test Whitepaper.
- http://scn.sap.com/community/hana-in-memory/blog – The official blog community.
SAP provides 30 days trial access to a hosted HANA test and evaluation environment. Access is provided through remote desktops (by default you are offered to connect directly from web-browser but it’s much more convenient to use traditional way), so you don’t have to install anything on your desktop. You can also read a great blog about it before you start.
I am using this trial access for about a week. I’ve tried to reproduce some problems we have to solve with our client database from the beginning of my post and the results sometimes were very impressive. But of course it’s too early to make conclusions.