Implementing an SAP HANA system involves more than just selecting a standalone server. Organizations must also consider the different options available in terms of high availability (HA) and disaster recovery (DR). Depending on the organization’s service level requirements, multiple servers, storage devices, backup devices, and network devices might be required. This article gives an overview of the different ways organizations can achieve SAP HANA HA and DR. Equipped with this information, you can develop the right SAP HANA architecture and business continuity strategy more easily.
Key Concept
True SAP HANA high availability (HA) requires a scale-out configuration or system-to-system replication. System backups are critical to both the recovery and health of your SAP HANA system.
SAP HANA’s architecture has revolutionized how software, hardware, and in-memory storage can be combined to eliminate many of the data management issues that have historically plagued businesses and organizations. Introduced in 2010, it can serve as both an online transaction processing (OLTP) and online analytical processing (OLAP) engine in a single platform, which allows SAP HANA to operate an ERP system and simultaneously provide operational analytics to ERP users.
Because it holds data in-memory, data processing tasks are completed much faster when compared with relational databases that rely on disk drives for storage. SAP HANA can store data in a columnar orientation to bolster the speed of analytic queries and reduce the storage footprint of repeated column values. Because of these speed advantages, scalable real-time query access to data is a reality with SAP HANA. For many organizations, SAP HANA can meet data management needs by reducing data management complexity while simultaneously granting users actionable access to their data.
As organizations begin to adopt SAP HANA, it is important to take a moment to review its capabilities in the areas of high availability (HA) and disaster recovery (DR). If organizations fail to properly plan for SAP HANA business continuity, many of the excellent features of SAP HANA will be overshadowed if they cannot maintain its availability or provide recovery during a disaster. I provide you with an overview of the different HA and DR options that are available with SAP HANA as of Support Package stack 10 (SPS10). There are many options available. As I explain these options, my hope is that you discover a solution that accommodates the service level agreements (SLAs) that have been established by your organization.
HA Overview
The term HA in general describes a system’s ability to maintain operations and virtually eliminate downtime. There are different levels of HA that are typically defined by an organization’s established SLAs. For example, if the system is required to maintain two nines (99 percent) availability, that system can only accumulate approximately 3.65 days of downtime per calendar year. That equates to only 1.68 hours per week of downtime.
Downtime can be caused by both expected events and unexpected events. For example, you expect to restart the system or apply patches that require downtime. However, you also have to account for the unexpected. Unexpected events can include hardware failures, network outages, software bugs, poor maintenance practices, and many other types of events. Therefore, the overall goal of HA is to reduce both expected and unexpected outages by introducing layers of redundancy, developing processes to quickly identify failures, and developing methods to quickly resume the availability of SAP HANA.
To achieve HA, you need to examine many aspects of SAP HANA. First, you need to examine the hardware aspects. When you purchase certified SAP HANA hardware, you discover that SAP has very high standards. Hardware vendors are expected to certify their system designs based on these standards.
Most of these systems are equipped with redundant power supplies, disk arrays, cooling, and networking. While these items reduce common points of failure within a single-server chassis, other points of failure would require a secondary SAP HANA system. For example, if a CPU fails, the SAP HANA software needs to be configured in a cluster of servers and provide failover. Clustering is a technique that allows a multiple-server chassis to work together. SAP refers to SAP HANA clustering as scale-out. In doing so, the team of servers can provide HA and scalability.
Figure 1 shows how an active SAP HANA node and one standby HANA node can be clustered, assuming they share the same certified storage layer. Note that the shared storage arrays, such as storage area network (SAN) devices and network attached storage (NAS) devices, must be certified by SAP. If there is a major failure on node 1, node 2 could reload the data from a shared disk array and resume operations. This is a simple example of how complete hardware redundancy can be achieved.

Figure 1
A simple SAP HANA cluster
Understanding the SAP HANA storage layer is very important with regard to HA and DR. While SAP HANA is an in-memory database, it still needs to maintain regular save points to persistent disk drives or disk arrays. This is a requirement because server memory is volatile. If the server were to lose power, all the data stored in random access memory (RAM) would be discarded. SAP HANA overcomes this issue by storing information simultaneously to memory and to disk.
The disk layout is divided into two primary partitions. There is the data partition and the log partition. The data partition is used to house a copy of the data stored in-memory. SAP HANA incorporates regular asynchronous automatic save points to this partition. At start-up or during restart situations, SAP HANA loads this data back into memory. The logging partition is used to house the redo logs. These logs can be used in conjunction with the data save points to rebuild the in-memory storage of data for all committed transactions.
SAP HANA has atomicity, consistency, isolation, and durability, otherwise known as ACID compliance. This means that all committed transactions are guaranteed to be reliable and recoverable. With this in mind, you can fully expect that all committed transactions are stored in-memory while simultaneously stored to the disk partition and logging partition. Figure 2 shows how data in-memory is also stored to disk drives.

Figure 2
How SAP HANA saves data to disk within its data and log partition
Every five minutes, data that is stored in-memory undergoes a process that makes sure it is also saved to disk. The save point interval is configurable and can be reduced or increased as needed. Every committed transaction is instantly saved to disk within the logging partition. If the server were to lose power unexpectedly, a committed transaction can be recreated from the redo logs. Without these persistent disk processes, SAP HANA would not be able to provide reliable HA.
To overcome software crashes and to help bolster better HA, SAP HANA incorporates a watchdog service that helps to restart any failed SAP HANA process. For example, if the SAP HANA index service were to crash, the watchdog service would attempt to restart it. In a clustered SAP HANA environment, automatic failover services are used to migrate operations from a failed server node to a standby server node. Because the storage is shared, failover nodes can reload data from the shared data and logging partitions. Neither of these operations is instantaneous or seamless, meaning that a brief outage will occur as the services or servers are recovered. However, because of its automated nature, these options help to greatly reduce downtime and comply with an established SLA.
SAP HANA also incorporates system replication that can be used to provide both HA and DR. As shown in Figure 3, system replication allows two independent SAP HANA systems to be configured in a master and slave configuration.

Figure 3
SAP HANA system replication
Because the two systems are independent, shared storage is not required to achieve failover. Data is replicated from the master system to the slave system in real time using a technique similar to log shipping. The data is replicated to the slave system’s memory and disk layers. SAP HANA SPS10 supports four primary methods for replication: synchronous, synchronous full sync, synchronous in-memory, and asynchronous.
Synchronous replication ensures that data has been replicated to both memory and disk on the slave system. It is an acknowledgment that the data was received but not necessarily committed. In addition, there is no guarantee that all records committed in the master system are also committed to the slave system. Synchronous, with the full sync option enabled, guarantees that data has been replicated. With this option the master system does not consider a transaction committed until it has been acknowledged and committed on the slave system.
While this seems like the most ideal option, it can lead to master system data-loading performance issues if the slave system is hosted at a remote data center with limited wide area network bandwidth or frequent outages. Synchronous in-memory replication ensures that data has been replicated, but only to the memory of the secondary system. This replication option might be helpful if the slave system is experiencing disk performance issues that could lead to slow acknowledgments and replication latency.
Finally, asynchronous replication is likely the fastest option, but there are no guarantees the transactions were received or committed in the target system. It is an option that might only be wise to use if there is a secondary restore method also available. Like most SAP HANA HA options, failover is not instantaneous. In order for the slave host to act as the master host, a series of commands has to be executed to invoke takeover. However, because the replication was performed to both memory and disk, the takeover time is very minimal. As a point of reference, when failover occurs in a clustered SAP HANA environment, data has to be reloaded from disk to memory. Depending on the volume of data, this cluster node reload process can take several minutes.
Because SAP HANA can also be virtualized, it is important to examine the HA options available from VMware. VMware is one of the leading virtualization vendors supported by SAP HANA. With virtualization, the SAP HANA software and its operating system can be deployed on a virtual server host or hypervisor. These hypervisors are generally called VMware ESXi hosts. Virtualization offers numerous benefits, but in terms of HA, it allows an SAP HANA instance to be more portable. In short, an organization can migrate SAP HANA’s software, data, and operating system to another physical server. VMware offers two features that can be helpful in providing HA.
VMware vSphere vMotion allows an administrator to migrate an SAP HANA virtual server from one physical VMware host to another VMware physical host. This migration can be performed without disruption even during normal operating hours. This is an exceptional feature when stringent SLAs are required. It allows SAP HANA hardware maintenance to occur without any disruption.
VMware vSphere High Availability is another feature that can be used to provide HA. Two physical VMware ESXi 5.5 hosts can be clustered together to manage one SAP HANA virtual instance. The virtual SAP HANA instance can physically be hosted on only one of the servers at any given time. However, in the event of a hardware failure of the primary ESXi host, VMware automatically starts the virtual SAP HANA instance on the secondary ESXi host.
To support both VMware vSphere vMotion and VMware vSphere High Availability, the ESXi hosts require SAP-certified shared storage. When a VMware HA failover occurs, there will be a brief outage as the virtualized SAP HANA instance is recovered. However, because VMware is recovering the exact same SAP HANA image, the host name, IP address, and MAC address of the system will be identical.
In contrast, native SAP HANA clustering and SAP HANA system replication rely on separate physical hosts. When either of these technologies fails over, the network information for the SAP HANA instance changes. In the event of a failover, extra time may be required to reconfigure the network and Domain Name System (DNS) records and for dependent systems to recognize the change. The SAP HANA client has an option in which a secondary fail-over host can be configured. This option can reduce downtime when configured correctly. The options and setup all depend on what you’re doing. See two generic examples in this administrator guide: https://help.sap.com/saphelp_hanaplatform/helpdata/en/27/eddf5616ae449a8db32653a98f24e4/content.htm?frameset=/en/27/eddf5616ae449a8db32653a98f24e4/frameset.htm¤t_toc=/en/00/0ca1e3486640ef8b884cdf1a050fbb/plain.htm&node_id=466].
DR Overview
DR typically refers to a plan of action in the event of a complete loss of services. The goal of the plan is to restore vital systems when a disaster occurs. HA and DR are both components of a proper business continuity plan. In many cases the HA options you incorporate can also aid in recovering a system during a disaster. For example, SAP HANA system replication can provide both HA and DR in a single solution assuming that the master and slave systems are each hosted in different data centers.
Most disasters are caused by natural or manmade events. In either case, a proper SAP HANA DR plan should be considered. There are a few practical events that can occur that lead to the need for SAP HANA DR. For example, a flood can lead to the complete loss of a data center. Users deleting data can lead to a complete loss of vital data. Disk array failures can also lead to the loss of data. With these events in mind, there are three main strategies to discuss: SAP HANA backups, off-site system replication, and off-site data replication.
By way of my own experience, SAP HANA backups are one of the least discussed and planned events by companies when implementing SAP HANA. However, SAP HANA backups are critical to both DR and the proper health of the system.
SAP HANA has two major types of backups: data backups and log segment backups. When a complete backup of the database is performed, it is typically referred to as a data backup. As of SAP HANA SPS10, there are three major types of data backups that can be performed: full backups, delta incremental backups, and delta differential backups.
A full system backup captures the complete state of all data in the SAP HANA system at a given point in time (Figure 4). When you recover from a particular full backup, the SAP HANA system contains only the data from that point in time.

Figure 4
A full data backup and restore
Depending on the volume of data stored in SAP HANA, the traditional full backup can take several minutes to complete. The size of the full backup is typically proportionate to the size of the SAP HANA data files. Ideally, the backup files are stored external to the SAP HANA system.
For example, you can mount a Network File System (NFS) share on the SAP HANA operating system. You can then create scripts to perform a full system backup and store it on this share. SAP HANA also supports full backups using third-party tools. SAP HANA contains an application program interface (API) called BACKINT that third-party backup agents can leverage.
SAP HANA also supports storage snapshots. This is a process in which an incremental version of the SAP HANA data partition is created and stored in the SAP HANA data partition. At this time, there are no options to store the snapshots to an independent partition. SAP HANA can be restored to any storage snapshot and then the logs can be replayed to a set point in time.
An incremental system backup can be implemented to support smaller and more frequent backups of the system. Each incremental backup contains the data changes from only the preceding incremental backup. Ideally, you would perform a daily full backup and then perform multiple incremental system backups throughout the day (Figure 5). When restoring incremental backups, you need the full backup and each associated incremental backup. The restore process uses the full data backup and each incremental backup to restore the database to a point in time.

Figure 5
The incremental delta backup and restore process
The recovery point in time depends on which incremental backup files are used during the restore. Because each incremental file is assumed to be small, this is an excellent option to choose when backups need to be replicated off-site or over a slower network. However, the restore process can take time as it works through each incremental file set.
Differential backups are similar to incremental backups. However, each differential backup contains all the changes from the last full backup. Figure 6 shows how each differential backup contains the same data as the prior differential backup and any new data changes after the last differential backup. If you are performing a daily full backup and then multiple differential backups, only a single differential backup file and the full backup can be used to restore the system to a particular point in time.

Figure 6
The differential delta backup and restore process
Regardless of the type of data backup that is performed, it is important that a daily or a series of intraday backups be performed. The backup files would ideally be stored on a file system that is different from the persistent data partition used by SAP HANA. As a secondary precaution, these files should also be stored off-site or at a secondary data center. Because SAP HANA offers both full and delta backup options in SAP HANA SPS10, a variety of off-site and on-site backup options can be devised.
Log segment backups are the second type of backup that SAP HANA performs. By default, log segment backups are completed every 15 minutes. They are stored on the local disks in the $(DIR_INSTANCE)/backup/log mount by default. Log segment backups continue to fill that small partition until they are pruned from the backup catalog and file system.
Because the standard $(DIR_INSTANCE) partition is small, it is important to configure SAP HANA to use a larger mount point to store the log segment backups. Again, it is important to determine a more suitable storage location and retention policy for the SAP HANA log segment backups. Ideally log segment backups would be stored on a network file system and only retained for a few days or weeks. They should also be pruned from the SAP HANA file systems and backup catalog when they are no longer required. This catalog is used to keep track of each SAP HANA data and log segment backup.
I have outlined this process in more detail on a blog posting (https://bobj.sapbiblog.com/2013/03/20/sap-hana-backup-notes). The backup catalog is used during the restore process to locate and manage backup files. It is also backed up with the log segments. Therefore, keeping it pruned and small in size is important.
Log segment backups are important to the normal redo logging mechanism on SAP HANA. The SAP HANA redo log files cannot be reused under normal logging conditions unless all the data in the log file has been backed up to a log segment backup file. For example, if you were to disable the log segment backup mechanism on SAP HANA, eventually the redo log files would consume all the storage on the log partition. Because the log partition is directly proportionate to the size of the RAM on the system, it does not take long for the logs to outgrow that partition. Once the log partition is full, the SAP HANA server crashes and does not restart until the log partition is expanded or the files are relocated to a larger disk. Therefore, it is extremely important that you devise a backup plan that accommodates for proper log segment backups.
Log segment backups also have specific throughput requirements. SAP HANA needs to be able to write these backup files to a location that can sustain about 200-300 GB per hour write speed. This means that you may not be able to write them to tape or to a disk location located over a high-latency network connection. Based on observation, failure to comply with this can actually result in delta merge engine performance problems. In turn, this can affect data load and data query performance. Log segment backups can also be managed with BACKINT or third-party backup tools. However, the same throughput requirement applies.
When restoring SAP HANA from backup, the system can use the log segment backups in conjunction with the data backups to restore the system to a specific point in time. Therefore, the log segment backups should be retained between full data backups to provide point-in-time recovery. For example, you could schedule a daily data backup at 1 a.m. and then configure SAP HANA to complete log segment backups every 15 minutes. In the event of a complete loss of the SAP HANA system, you can use the full backup and completed log segment backups to rebuild the data to a specific point in time.
Both the data backup and log segment backups are essential to devising a proper SAP HANA DR strategy. A proper backup plan should define a backup tape rotation policy, backup retention policy, an off-site location policy, a plan to test the recovery, and a plan to facilitate the requirements unique to SAP HANA. It is important to remember that you need to protect SAP HANA data from disasters that affect the data center and those disasters that affect the data. Without a proper SAP HANA backup in place, no DR plan can be complete.
As discussed earlier, SAP HANA system replication can also be used to facilitate DR planning. However, it should only be used for DR purposes in conjunction with a proper SAP HANA database backup plan. As shown in Figure 7, for replication to meet the requirements of DR, the slave SAP system should be hosted in an off-site location.

Figure 7
SAP HANA system replication to a secondary data center
As with most HA and DR plans, you need to have a secondary SAP HANA system to quickly recover your system. However, most DR plans host the secondary system in a secondary data center. If the system were hosted in the same data center, your plan would fail to account for a complete loss of a data center.
Many hardware vendors also offer a different type of replication. The solutions vary by vendor. Therefore, you should ask your SAP HANA hardware vendor for details on its specific solution. However, this type of hardware replication typically occurs at the network storage device layers. SAP HANA does not directly manage this type of replication, but it is a supported solution for SAP HANA DR. It can be implemented for a scale-out system or single-node SAP HANA system using supported network storage devices. Figure 8 is a generic example of how storage replication can be used to move SAP HANA data to an off-site location. As each data bit is committed to the local network storage device, it is asynchronously written to the off-site network storage device.

Figure 8
Network file system replication
In the event of a disaster, the secondary site systems can be started from the replicated file systems. In an ideal situation, the data backups and log segment backups would also be replicated. This would provide an extra layer of recovery protection in the event that the replicated data was corrupted. VMware-based solutions can also leverage a similar technique to provide seamless site recovery in the event of disaster.
With both SAP HANA system replication and network storage replication, a network failover plan should also be included. The remote systems might have different host names, IP addresses, MAC addresses, and network routes. This might require that DNS records be changed, network routing tables be updated, or that clients use different connection information to access SAP HANA. You should also consider the ramifications to the systems that rely on SAP HANA. For example, if you are running SAP Business Suite on SAP HANA, you need to make sure that the application servers are also replicated and that they know how to find the secondary SAP HANA system on the network.
SAP HANA HA and SAP HANA DR are two separate and sometimes overlapping subcomponents of a proper business continuity plan. One of the most important components of an HA plan is to make sure that you have the proper level of redundancy to achieve the expected SLA. In terms of DR, it is essential that you perform SAP HANA backups and that you store them off-site.
Jonathan Haun
Jonathan Haun, director of data and analytics at Protiviti, has more than 15 years of information technology experience and has served as a manager, developer, and administrator covering a diverse set of technologies. Over the past 10 years, he has served as a full-time SAP BusinessObjects, SAP HANA, and SAP Data Services consulting manager for Decision First Technologies. He has gained valuable experience with SAP HANA based on numerous projects and his management of the Decision First Technologies SAP HANA lab in Atlanta, GA. He holds multiple SAP HANA certifications and is a lead contributing author to the book Implementing SAP HANA. He also writes the All Things BOBJ BI blog at https://bobj.sapbiblog.com. You can follow Jonathan on Twitter @jdh2n.
You may contact the author at jonathan.haun@protiviti.com.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.