Creating a SQL Server Disaster Recovery Plan – Part 1

There are a few steps in creating a disaster recovery plan. First, we need to decide how often we are going to back up our database. Some points to consider are: how much time do we have to make backups, how much (database size) do we need to back up, how long would we like it to take to restore a database, how often data is changed, and are we willing to lose any work for an improvement in speed? Lets look at a scenario.

In this scenario the database we are creating a back up plan for is use by our company’s sales department. The database contains customer order information. Any loss of this information would be devastating to the company. The database must give maximum performance during most hours of use. When looking at the performance logs, we can see that the server utilization is lower from 2-3 AM on Monday – Saturday and most of Sunday. When testing the time needed to make backups, you note a full backup takes 4 hours and a differential backup takes up to 30 minutes.

Before even looking at what options we have, notice that I list performance logs and the time needed to make the two types of backups. This is important information you will need to collect or estimate.

Because the data in this database is so important to the company, we need to back it up frequently and use the Full recovery model. A full backup takes 4 hours, so the only day we can do this (without impacting the system to much) is on Sunday. Now you don’t have to use differential backups, but in this scenario they fit in well. We can make a differential backup Monday – Saturday mornings, reducing the time needed to recover from a failure.

The last type of backup we have to consider are the log backups. Again, because the data in this database is so important to the company, we should perform log backups frequently throughout the day. Log backups normally don’t take a whole lot of resources to make, so we can perform them without impacting the system to much. In this scenario we will back up the transaction log every 30 minutes.

Here is what the backup plan would look like without the log backups:

SUN at 12:00 AM FULL BACKUP

MON THROUGH SAT 2:00 AM DIFFERENTIAL BACKUP

This plan is not perfect — it has it’s benefits and drawbacks. On the plus side, this strategy only makes full/differential backups when the server is at its lowest utilization. On the down side, if the database fails late in the day, we would have to restore a bunch of transaction log backups. Also, you may fine that a loss of 30 minutes worth of data (the time between transaction log backups) is to much — it all depends on what your willing to risk.

Another thing you must keep in mind is that backup times will vary. Using our scenario as an example, because a differential backup will backup the data that has changed from the last full backup, Friday’s differential backup probably has more changed data than Monday’s differential backup. Why? If you change the same exact data over and over again, a differential backup should stay about the same size no matter if you take the backup today or two weeks from now. On the other hand, if you modify/add different data, the size of the differential backup will continue to grow until you take a full backup.

So, could we improve our backup strategy? With a full backup taking 4 hours and a lot of server resources…not a whole lot. The next article in this series outlines some steps to improve the plan.

Introduction to SQL Server Disaster Recovery

Something you will start to notice as we get further into SQL Server is that most settings and options are more of a balancing act than they are write or wrong. In other words, you will need to weigh the benefits and draw backs of using one option over another. A good example of this would be the recovery models that we looked at in the last article. Some models allow for fast backup and use less disk space, but they don’t provide the recovery options of other recovery models. Because this is an important part of your backup strategy, I would like to do a quick review of the three recovery models available with SQL Server 2000.

Simple:

  • Uses full and differential backups
  • Recommended for development only, not for production databases
  • Bulk copy operations are fast and do not require large amounts of log space
  • Once log entries are no longer needed for recovery (after a checkpoint), log file space can be reused to keep log files small
  • If a failure occurs you can recover up to the point of the last full or differential backup, all data after the last full or differential backup will have to be redone

Full:

  • Uses full, differential, and log backups
  • Recommended for production databases
  • Bulk copy operations must be logged row by row resulting in slower bulk operations and the requirement of more log file space
  • Log file space can’t be reclaimed until the log file is backed up
  • Transaction log backups
  • If a failure occurs you can recover to any point in time
  • If a data file is lost or damaged, no work is lost

Bulk_Logged:

  • Uses full, differential, and log backups
  • Recommended for production databases when you need to perform many bulk operations
  • Bulk copy operations are not logged row by row resulting in faster bulk operations and lower log file space requirements
  • Log file space can’t be reclaimed until the log file is backed up
  • If a failure occurs and no bulk operations have occurred since the last full/differential backup, you can recover to any point in time of a log backup. If bulk operations have occurred, you can recover to the end of any backup.

SQL Server Recovery Models

Before we talk about the recovery models lets jump back to our discussion of transaction logs from last week. Remember that all modifications made to the database are recorded in the transaction log. In the event of a failure (such as a power outage or blue screen) the transaction log can be used to reapply the changes to the database. Additionally, checkpoints are used to write all pages in memory back to the hard disk, lowering the time needed to recover the database. So once a checkpoint occurs and all the data pages are written to disk why do we need to store information about transactions? This is where the recovery model plays a roll.

Each database running on a SQL Server can have one of three recovery models: Full, Bulk_Logged, and Simple.

In the full recovery model, every change made to the database is logged. All UPDATE, DELETE, and INSERT statements are all logged. Additionally, certain bulk operations, such as BULK INSERT, that are used to make many modifications quickly are also logged in their entirety (i.e. each individual row added by the BULK INSERT command would be logged).

The full recovery model provides the most options in the event a data file is corrupted. When a transaction is logged and the database is in full recovery mode, the transaction stays in the log until it is backed up. Once the database is backed up, the space from the old transactions are freed and can then be used to log new transactions. Because all transactions are backed up, Full database backups make it possible to restore a backup to a “point in time” by applying only the transactions up to that point. For example, we could restore a full backup and then restore all our log backups up to a certain point before data was deleted.

If full recovery model tracks all changes made to the database and allows us to restore transactions to a point in time, why not just use it all the time? Because all operations are logged in their entirety, you could end up with some big log files. Also commands like BULK INSERT will be slowed down because every modification must be logged.

The bulk_logged recovery model is a lot like the full recovery model with a few benefits and tradeoffs. Like the full recovery model, the bulk_logged recovery model also logs all UPDATE, DELETE, and INSERT statements. However, bulk_logged only records that the operation took place for certain commands. These commands include BULK INSERT, bcp, CREATE INDEX, SELECT INTO, WRITETEXT, and UPDATETEXT. The bulk_logged recovery model is also like the full recovery model in that they do not reuse (or overwrite) log space until the transactions have been backed up.

Unlike the full recovery model, if a transaction log includes bulk operations you can not restore that log to a point in time, you must restore to the end of the log. Also, a log backup of the database may be much larger because in the bulk_logged recovery model log backups must copy all the extents that have changed.

The benefit of the bulk_logged model is that the log file(s) for the database can be smaller if you use many bulk operations. Also, bulk operations are much faster because only the fact that the operation occurred needs to be recorded, not every modification to the database.

The last type of recovery model is the simple recovery model. Unlike the full and bulk_logged recovery models, simple recovery does not backup transaction logs. In this mode transaction logs are frequently truncated ( truncation is the process of removing old transactions from the log) automatically. The simple recovery model can use full and differential backups.

SQL Server Database Backups

One of the most important tasks you will ever perform as a DBA is performing backups. Although backups are certainly not the most interesting part of the job, they are probably the most important. If something goes wrong it’s the DBA’s job to get the server back up and running as quickly as possible. Loss of productivity, or even worse loss of data, can be very expensive for a company. Let’s look at the most common question I get when talking about backups.

Why not just use a RAID configuration that has mirroring to provide protection? RAID is certainly the first line of prevention for data loss. Depending on the RAID configuration you use, one or even several hard drives can fail before the data is lost. Additionally, the use of hot swappable and hot standby drives can be used to allow the server to continue without ever having to be taken offline in the event a hard drive fails. The key thing to notice here is that RAID can protect you if a hard drive fails…but what happens if a fire or natural disaster occurs? What do you do if your database files become corrupted due to hardware or software errors? Or what happens if a user deletes data that they later need? A RAID configuration will not help you if any of these events occurs.

Remember…You can always replace the server but the data on that server is extremely difficult, if not impossible, to recover.

Database Backup Types

SQL Server has three main backup types: Full, Differential, and Log. In addition to the main three that backup the whole database, their are also a few additional types of backups that can be used to backup a single file or a filegroup.

Full Backups – a full backup copies all the extents from the database. If you need to restore your database and are using full backups, you only need to apply the last full backup. However, full backups are the slowest of all the backup types to make.

Differential Backups – a differential backup only backs up the extents that have changed from the last full backup. If you need to restore your database and are using differential backups, you will need to apply the last full backup and then the last differential backup you have made. Differential backups are faster to create, but they take longer to restore than a full backup because you have to apply the full backup and then the differential backup.

Log Backups – a log backup is used to backup the transaction log from the last full or transaction log backup. You may or may not be able to make log backups, it depends on the recovery model you are using. If you need to restore your database and you are using full and log backups, you would need to restore the last full backup and all (in order) the transaction log backups.

One thing to note about backups is that they are done with the database online. This is called a “fuzzy backup” because the backup is not from a single point in time. The backup copies extents from the database and if any modifications are made, the backup just continues copying. To keep consistency, full and differential backups capture the portion of the log file from the point the backup started to the end of the backup.

SQL Server can backup to a file on your hard drive, a file on a network drive, a tape drive, or a named pipe.

Introduction to SQL Server Transaction Logs

In my last article I gave a brief description of what the transaction log is. However, we did not go over what a transaction is. The simplest definition of a transaction is a “logical unit of work.” In other words, it is a grouping of SQL statements that perform one logical task. The classic example that is used most of the time to describe transactions is that of a bank transfer. Say you have to bank accounts (Account A and Account B) and you wanted to transfer funds from one to the other. If you execute the two SQL statements individually (one to remove the funds from Account A and one to add funds to Account B), it is possible to remove funds from Account A successfully but have the second statement fail and not add the funds to Account B. In the real world this would make for some very unhappy customers. To overcome this problem we can group statements into transactions. When statements are grouped into transactions, either they all complete successfully and are “committed,” or if any one of the statements fails they are all “rolled back” and the data is left in the condition it was in before the transaction began. Until we get into data integrity, this definition will do.

The transaction log stores all the changes as they are made to the database. Once all the statements in the transaction have made their modifications to the database, the transaction is said to be committed once the commit is recorded in the transaction log. Once the transaction is committed, why does it need to stay in the log? To understand, you need to know what is going on under the hood of SQL Server.

As data is requested from a database, the information is stored in memory. Because accessing memory is so much faster than accessing the disk subsystem, SQL Server can improve performance dramatically.

First request for data from a database (request is made, pages are loaded into memory, data is sent to the client).

Request for data that is already in memory (pages are cached in memory, so they are used).

When you make a modification to the data in a database, SQL Server makes the changes to the data stored in memory (not on the disk) and an entry is made in the transaction log. Pages in memory that contain data that has been changed are called “dirty pages.”

In order to have pages written back to disk and release memory, SQL Server uses a thread called the lazywriter. When the lazywriter writes pages to disk, it uses the least recently used (LRU) page list to determine what pages should be written to disk. Pages that have not been used recently are at the beginning of the LRU list, however recently modified pages are at the end of the LRU list. Because of this, very heavily accessed pages may not be written to disk for a large amount of time (or never, in theory).

Because pages that have been recently changed are only stored in memory, they are susceptible to system crashes and power outages. To overcome this problem, SQL Server can use the log information to restore the modifications to the database if one of these events occurs. For example, if the power when out (and our UPS failed!) we would have lost updated pages stored in memory. In order to recover these pages, SQL Server starts in recovery mode. By using the transaction log, SQL Server can reapply the changes to the copy of the database stored on the hard drive.

SQL and T-SQL

Structured Query Language, also known as SQL, is a query and programming language. It can be used for accessing, updating, deleting, and adding data in a database. SQL can also be used for managing the RDBMS (Relational Database Management System) itself. Different databases may use versions of SQL that very slightly, but most comply with the standard ANSI SQL-92 implementation of SQL, commonly call ANSI SQL. You can group SQL statements into two main categories: Data Definition Language (or DDL) and Data Manipulation Language (or DML).

DDL statements, as the name suggests, allow you to define the database structure. Most DDL statements begin with CREATE, ALTER, or DROP. The two DDL statements we are going to cover today are CREATE DATABASE (used for creating new databases) and ALTER DATABASE (used for altering existing databases). We will look at the exact syntax of these two statements later on in this article.

DML statements, on the other hand, are used for manipulating the data inside database objects. For example, the SELECT statement allows you to query the data inside a database, the INSERT statement allows for the addition of new data, the UPDATE statement updates selected data, and the DELETE statement allows you to remove data. As this series progresses we will cover these statements as well as many more DDL and DML statements in greater detail.

We now know what SQL is, but what is T-SQL? Simply, T-SQL is SQL Server’s enhanced version of the standard SQL programming language. T-SQL in SQL Server 2000 allows for such things as stored procedures, IF and WHILE statements, and additional functions/data types (we will cover data types when we start creating tables) that are not available in standard SQL.

Creating a SQL Server Database

Ok, enough with that stuff…let’s create our first SQL Server 2000 database! Start by opening Enterprise manager and expand the tree view until you see the “Databases” folder of your server.

Right Click the Databases folder and select “New Database…”

In the Name textbox enter “FirstDB”

One thing to note on this screen is the ability to set the collation for this database other than the server default. If you don’t remember what the collation is, go back and look over my “Basic Installations” article. For this example we will leave it as the server default.

Click the “Data Files” tab.

The top portion of the Data Files tab shows you all the data files that makeup this database. As you can see Enterprise Manager has already added the Primary Data file for us. The logical file name is FirstDB_Data and the physical file is named FirstDB_Data.MDF located in the Data folder that was specified when we installed SQL Server. The logical file name is the name that SQL Server uses internally to reference the file, whereas the physical file name is the name of the operating system file.

In addition to the logical file name and physical name/location, we also can set the Initial size for each file. If you know you will be loading data into your database and have a general idea of its size you can set this value to avoid automatic file growth (more on this in a second). The last column allows the selection and creation of file groups.

The lower half of the Data Files tab allows you to set the Automatic file growth for each file that makes up the database. When a database has used all available space in a data file, you can have SQL Server automatically expand the file as needed. The File Growth option allows you to set how much SQL Server will expand the data file. You can select a fixed amount in megabytes or enter a percentage of the current file size to grow the file by. You can also set a maximum file size for the data file. Once a file reaches its maximum size, no more automatic file growth will take place.

As a general rule, it is best to have files expand as few times as possible because expanding files causes a performance hit to SQL Server.

Click the “Transaction Log” tab.

The options on the Transaction Log tab are vary similar to the Data Files tab with the only exception being the lacking of a File Group option. Filegroups are only used for data files, not log files. All other options on the Transaction Log tab are the same as the Data Files tab.

Click OK to except the defaults.

Our new database now appears in the Databases folder. If you go exploring into the database we have just created you will notice it looks a lot like the model database. This is because all new databases we create are a copy of the model database. All the system objects and other objects we create in the model database are added automatically to all the new databases we create.

Another way to create a database is by using the Create Database Wizard. To start the Create Database Wizard, in Enterprise Manager Select “Wizards…” from the “Tools” menu.

Then expand “Database”, select “Create Database Wizard”, and click OK.

The wizard then walks you through the steps to create a database.

SQL Server Database Filegroups

In order to manage data files they need to be placed into “filegroups.” By placing files into filegroups, we can control how data is spread over multiple files and we can also direct specific tables to specific files or sets of files.

When SQL Server allocates extents to tables, it does so by the proportional amount of the free space in all files in a filegroup. For example, let’s say we had a filegroup that was made up of one file that had 50 MB free and another file that had 100 MB free. For every extent that was allocated from the first file, two extents would be allocated from the second file. Therefore the two data files would fill up at approximately the same time.

Filegroups also allow control over what files a table will be stored in. For example, say we had 4 hard disks with 1 secondary data file on each disk. In our database let’s say we have two large tables that we need to perform frequent join operations on. In order to get the best performance we can setup two file groups each containing two of the four data files. Next we can place each table in one of the filegroups to give us maximum performance. Note that we will see how to place tables and indexes in specific filegroups when we start creating tables in an upcoming article.

There are a few specific filegroups we need to know about before we start creating our database. First, the Primary data file is always created as part of the “Primary filegroup” and it can’t be removed from this group. The Primary filegroup contains all the system objects (system objects are objects that store information about the database) in the database. The system objects stored in the Primary filegroup can’t be removed from this group.

When you create a new object it is automatically placed into the “Default” filegroup unless you specify otherwise. The Default filegroup can be set to any group you wish, but by default, the Default filegroup is set to the Primary filegroup (got that?). For example, if you created a new group called “MainStore” and set it as the Default filegroup, any new tables you create would automatically be placed into the “MainStore” filegroup.

We will come back to the concept of filegroups when we start creating objects. Also, I will show you how to perform task such as moving objects between file groups then. So if you don’t have a total grasp on how filegroups fit in…don’t worry. In a later article we will also take a more in-depth look at using files and filegroups to optimize database performance.

SQL Server Database Files

Databases in SQL Server 2000 have an underlying data storage structure made up of two or more system files. Each database must have at least one data file, used for storing database objects (tables, views, stored procedures, etc), and one log file, used for storing transaction information. In addition, each database must have its own files, you can’t share files between databases.

Data files store objects and the data inside objects. The first data file you create is known as the “Primary data file.” If you need to create additional data files for your database, they must be created as “Secondary data files.”

Log files, on the other hand, are used to track changes in the database and have only one file type. In the event of a system failure, the log files can be used to repair the database. We will look at how SQL Server tracks database modifications when we start talking about transactions.

The following list summarizes the three file types.

  • Primary data file: Each database must have one, and only one, Primary data file. This file type stores the system tables, data, and also keeps track of the other files that makeup the database. Primary data files use the extension .MDF
  • Secondary data files: A database can have one or more Secondary data files, but none are required. This file type stores data and uses the extension .NDF
  • Transaction log files: Each database must have one or more Transaction log files. This file type is used to store Transaction information and can be used to recover the database. Transaction log files use the extension .LDF

A simple database may have all its data stored in the Primary data file and may only have one log file. On the other hand, a more complex database may have the Primary data file, three Secondary data files, and two log files. By using multiple files a more complex database can spread its load across as many physical disks as possible.

Now that we know about the files that makeup our database, let’s look at how our data is being stored inside the files.

When data is stored it is placed in an 8 kilobyte contiguous block of disk space known as a page. Once you remove the overhead needed to manage the page, 8060 bytes are left for storing data. This brings up an important note and something to consider, rows can’t span multiple pages. Therefore the maximum size a row can be is 8060 bytes.

To manage the pages in a data file, pages are grouped into “extents.” Extents are made up of a grouping of 8 contiguous data pages (64 kilobytes). Extents can then be broken into two categories: mixed extents, which contain data from up to eight objects in the database, or uniform extents, which contain data from only one object in the database. When you create a new object it is allocated space from a mixed extent. Once a table has grown larger than eight data pages it is from then on allocated uniform extents. By using mixed and uniform extents SQL Server can save disk space when an object is very small.

SQL Server System Databases

When you install SQL Server several databases are created automatically. Two of the databases (Northwind and pubs) are sample user databases and can be used as examples or you can delete them safely (but they don’t take up a lot of hard disk space). The other four special system databases (master, model, msdb, and tempdb) are used by the system and should not be removed. To view the databases that are on your SQL Server, in Enterprise Manager expand the group your server is located in, then expand the server, and last expand the Databases folder. Remember, if you have the “Show system databases and system objects” checkbox unchecked in the SQL Server’s registration properties, you will be unable to see the system databases in Enterprise Manager.

master Database – Contains information about the server overall and all databases created. The information in this database includes: file allocations, disk space usage, system-wide settings, user accounts, and the existence of other databases and other SQL Servers.

model Database – This database provides a template for all newly created databases. If you ever need to add a table or stored procedures to all the new databases you create, simply modify the model database.

msdb Database – Used by the SQL Server Agent service.

tempdb Database – Is a workspace used for holding temporary information.