Exchange Message Archive Management

You remember Steve, right? That’s Steve from Sales, the guy who absolutely needed access to an email that he deleted over a year ago. You probably clearly remember the hassles of having to first find and then restore his mailbox. Perhaps you even recall sifting through his messages, trying to find that single, solitary email, and dedicating a whole afternoon to the process. Not exactly a fun experience, regardless of which way you cut it.

Thankfully, things don’t have to be this way. While there’s no question that the need to store and retrieve old email messages is necessary, there are much simpler ways to go about the process if your organization is running Exchange Server 2000 or 2003. I’m talking about a product called GFI MailArchiver, an message archiving solution that makes searching through internal and external email as simple as firing up your web browser and searching for the message in question.

How it works

Without a solution like GFI MailArchiver, digging up old email messages is usually a long and arduous process that involves restoring old mailboxes and searching through messages manually. While there’s no question that different techniques exist to speed up the process, most are involved and cumbersome to say the least.

SQL Server Tables Check Constraints

So just how do you add a check constraint to a table or column? As with just about everything in SQL Server you can use SQL statements or Enterprise Manager to make the modifications. I’m going to show you how to add a constraint using Enterprise Manager, but feel free to look at the CREATE TABLE and ALTER TABLE syntax in Books Online for more information.

What we are going to do is add a check constraint similar to the country/state one above to the Customers table of the Northwind database. Start out by opening Enterprise Manager, select your server, expand the Databases folder, expand the Northwind database, and select tables. Next right click the Customers table and select Design Table. The Design Table window should appear.

Next click the Manage Constraints icon on the toolbar.

Once the Properties window appears click the New button and enter the following statement for the Constraint expression (again, this is just a shortened list):

Country <> ‘USA’ OR Region IN (‘AL’, ‘AK’, ‘CA’, ‘FL’, ‘GA’, ‘ID’, ‘MT’, ‘NM’, ‘OR’, ‘WA’, ‘WY’)

Last, for Constraint name, enter CK_Customers_USA_Region. Your screen should look like this.
Before clicking close lets look at what the three options at the bottom do:

  • Check existing data on creation will check all existing data in the table to see if any rows violate the check constraint. Note that the check is not performed until the table is saved and if there are any invalid rows an error is shown and the table modifications are not saved. You can leave this unchecked to only apply this constraint to new/updated rows.
  • Enforce constraint for replication will enforce this constraint if the table is replicated to another database. We will be getting into replication later in the series.
  • Enforce constraint for INSERT and UPDATE will enforce the constraint for any INSERT and UPDATE operations to the table.

Close the Properties window and save the table. If you receive the error Unable to add constraint ‘CK_Customers_USA_Region’, you probably have rows that were not part of the original Northwind database that have the Country USA and a Region abbreviation that is not part of the check constraint we created.

One last thing to note about creating check constraints in Enterprise Manager is that the only place you add constraints is on the Check Constraints tab of the tables properties window (see above). In other words there is no way to specify that a check constraint should be tied to a column in Enterprise Manager whereas with the CREATE TABLE statement you can add a constraint to a specific column. However, the good news is that Enterprise Manager will automatically decide if it should add a column or table level check constraint based on the constraint you enter.

SQL Server Table Logical Expressions

To put it simply, a check constraint is a Boolean expression that evaluates to TRUE or FALSE (a note about NULL in a second). When a row is inserted or updated each check constraint is evaluated. If any of the check constraints evaluate FALSE the transaction is rolled back. If all of the check constraints evaluate to any value other than FALSE the transaction continues. I say any other value because NULL introduces three-value logic into the mix. A NULL or unknown value will not cause the transaction to fail – only a check constraint that evaluates to FALSE will.

For example, say we had a table that was used by employees to track mileage to and from job sites. We can use the following check constraint to limit the maximum number of miles to 100:

miles <= 100 This statement will return true if miles (the column name) is less than or equal to 100. The <= part of the above check constraint is called a comparison operator and should be familiar to anyone who has done basic programming. SQL Server supports the following comparison operators and can be used on all data types except text, ntext, and image. Operators and their meanings: = Equal to > Greater than

< Less than >= Greater than or equal to

<= Less than or equal to <> Not equal to

!= Not equal to*

!< Not less than* !> Not greater than*

* Are not standard SQL comparison operators, but are implemented by SQL Server.

Another type of operator you can use are logical operators such as AND and OR. For example, we can add more constraints on the number of miles by using the AND operator. For example, lets add the restriction that each trip must be more than 0 miles:

miles > 0 AND miles <= 100 Our new check constraint now only allows miles to be in the range 1 to 100. So far we have looked at column level check constraints, that is, constraints that only involve one column. In addition to that kind of constraint, we can also write constraints that involve multiple columns from the *same* table and the *same* row - known as table level check constraints. The word same in the last sentence is very important, because check constraints cant reference columns from other tables, only the one they are created in. Additionally, a check constraint can only reference the current row in the table. For example, say we have a table that tracks customer address information with the columns country and region and we want to add the constraint that if that country is USA, region must contain a two letter state abbreviation. For this we can use the IN operator instead of having to type OR statements for each state abbreviation. (state abbreviations list shortened for readability) country <> ‘USA’ OR region IN (‘AL’,’FL’, ‘GA’)

There are at least a half dozen other ways you can write the above constraint that will accomplish the same thing.

Although understanding the basics of logical expressions are important, writing them is really more of a database programming issue rather than administration. There are many more operators you can use in your check constraints and I recommend checking out the topic operator precedence in Books Online if you are not used to writing logical expressions in SQL Server and want to learn more. Order of precedence and NULLS can cause unexpected results unless you have a good amount of experience.

SQL Server Tables and Referential Integrity

So all the tables have been added to the database, most if not all with logical relationships to one another EX: Each Order has many Order Detail, Each Salesperson has many Order, etc (if your not following I would suggest you stop here and read up on Logical Database Design)…what more do we need to do? Validate Validate Validate! Here is an example:

Say I have two tables called Order and OrderDetail. Each Order can have many Order Detail. Each OrderDetail describes a single line item sold for a given Order. Now what happens if a program adds several OrderDetail rows and then the application freezes before adding the corresponding row to the Order table? Or what happens if an application deletes a row from Order without deleting all the rows from OrderDetail? In short we end up with orphaned child rows (foreign key table) that have no corresponding parent (primary key table).

In a perfect world all applications would modify tables in the proper order. But this is not a perfect world and there are cases where situations like the above can (and will!) occur. For a small project, where the programmer is also the DBA, it is possible to diminish the odds of violating any relationships with careful application design…but what happens if a database is accessed by several dozen applications all written by multiple programmers? Or what if someone accesses the tables directly (gasp!) and makes some changes? A simple mistake could end up violating the integrity of the database with no one ever knowing.

Repeat after me: “It is the job of the RDBMS, not the end application, to ensure referential integrity.”

SQL Server Tables and Transactions

Back in More Database Creation Topics I gave an example of a transaction that moved money from one bank account to another. As the example illustrated, a transaction can prevent modification A from being made to the database unless modification B is also made. In other words, in a transaction, either all the modifications are made to the database (commit) or none of the modifications are made (roll back). Let’s take our definition of a transaction and expand on it.

The one line definition of a transaction is “a sequence of operations performed as a single logical unit of work.” To be considered a “transaction,” a single logical unit of work must exhibit the ACID (Atomicity, Consistency, Isolation, and Durability) properties:

Atomicity – A transaction must be an atomic unit of work where either all of its modifications are made, or none of them are made.

Consistency – When finished a transaction must leave all data in a consistent state. In this case “consistent state” means that all rules, or constraints, are applied to a transaction’s modifications.

Isolation – Changes made by simultaneous transactions must be isolated from the changes made by any and all other simultaneous transactions. In other words a transaction will either “see” the data in the state it was in before another simultaneous transaction modified it, or it will see the data after the second transaction has completed, but it will not see an intermediate state. This is known as serializability because it is possible to load the data back to a starting point, “replay” the transactions, and end up with data in the same state as it was after the original transactions were made.

Durability – Once a transaction is committed, the data modifications are permanent and can withstand system failure.

The good news is that from a physical/technical standpoint SQL Server does all of the work of implementing the ACID properties. However, the logic behind what is considered a transaction and if a modification is or is not valid is the job of the database administrator and/or programmer.

Because choosing when a transaction begins and ends is more of a SQL Server programming issue rather than an administration issue, it will not be covered. Instead, the rest of this article is going to focus on the latter, how to implement data validation.

There are many types of constraints that we can use to control the validation of data, including some that we have already used such as Allow Nulls and Data Types. There are however many more that we will look at over the next few articles. The rest of this article will focus on setting up foreign key constraints with check constraints following in my next article.

Creating SQL Server Tables

Once you have a logical design to work from, the process of creating your tables is quite simple. Start by opening Enterprise Manager, expand the server and database you would like to add a table too, then right click “Tables” and select “New Table…” from the menu.

The New Table window appears.

Each row represents an individual column of the table being created/edited.

“Column Name” is used to enter the name of the column

“Data Type” sets the type of data the column will contain

“Length” specifies the length ( or ‘n’ ) of the column. For some data types, such as text and ntext, this value can be changed and specifies the maximum number of characters the column will store. For other data types, such as int and decimal, that don’t have a length, “Length” shows the size (in bytes) the column will need.

“Allow Nulls” determines whether a particular column can be left blank. A check indicates nulls are allowed in that column. That is, if a row is added/edited, the row is not required to have a value in that field. If there is no checkmark in Allow Nulls, any rows that are added or edited are required to have a value entered for that field. — Remember: 1, 0, and (or ‘ABC’, ‘   ‘, ) are three different things. For example, while 0 and may seem similar, 0 typically indicates “False” and indicates “No Value.” It is important to keep this concept in mind, it will have implications for us further down the road.

A table that holds employee data may look something like this.

There are few more things to note here.

First, if you have ever used Access before, you should recognize the key icon on the left of the EmployeeID row. This indicates what row(s) of the table make up the primary key (i.e. the column(s) that are used to uniquely identify every row in a table). To set the primary key, select the appropriate row (you can hold down the Ctrl key to select more than one row at a time) and then click the “Set primary key” icon in the toolbar.

Next, as you move from row to row, you will notice that the available options on the “Columns” tab change. Lets take a look at these options individually:

Description – is a space for you to enter any comments about what the column is used for. Anything entered in this box will have no effect on SQL Server or the table.

Default Value – is the value that will be entered if no other value is specified when a row is added to the table. For example, I could set the default value of the Title column to “staff.” Each time an employee gets added and no title is entered, “staff” will automatically be substituted in place of . In addition, you are not limited to just text for the default value, you can also use functions. Some more common ones include GETDATE() that returns the current system date/time and NEWID() that returns a new globally unique identifier. Note that if you set the column’s “Identity” property to yes, the default value is unavailable.

Precision/Scale – used to set the precision and scale values of the decimal and numeric data types. These options are not available for other data types. For more information on precision and scale see the decimal data type in the data types table at the beginning of this article.

Identity/Seed/Increment – Similar to the AutoNumber option in Access – if a column’s Identity option is set to true, SQL Server will automatically generate a new number for each row added to the table in this column. A table can have only one column defined with the Identity property set to yes and that column must use the decimal, int, numeric, smallint, bigint, or tinyint data type. The Seed value specifies what number SQL Server should start at. The Increment value specifies the number that should be added to the Seed value to determine successive identity numbers. Note that Identity columns are only guaranteed to generate unique numbers within one table – two tables both with an Identity column may (or may not) generate the same numbers. Another thing to keep in mind is that if the table experiences many deletions, large numbering gaps may occur because deleted identity values are not reused.

Is RowGuid – Specifies that this column contains the GUID for the rows in this table. Only one table can have its Is RowGuid value set to yes and the data type for this column must be set to uniqueidentifier. In addition, you must also set the default value for this column to use the NEWID() function. Unlike an Identity value, GUID values are guaranteed to (aka “should”) be unique for every row, in every table, in every database, on every computer in the world.

Formula – used to enter a formula to make this a computed column. Instead of just storing data, a computed column takes a formula (like LastName & ‘, ‘ & FirstName) and generates a value depending on the formula you entered. In addition, computed columns are generated on the fly, so updating one column that the formula references will update the computed column automatically.

Collation – gives the ability to set the collation for each individual column of a table. will set the column to use the same collation setting as the database the table is located in.

Once you have added all the columns you can click the save icon (left most, looks like a disk) on the toolbar and you will be prompted to enter a name for the table.

Once you enter a name and click OK, you can close the New Table screen by clicking the lower, inner, X at the upper right. You can also use the “Window” menu to switch between windows or arrange them.

After refreshing the Tables list (F5) the table we just created appears in SQL Server Enterprise Manager.

You can always go back and edit a table by right clicking it in Enterprise Manager and selecting “Design Table”.

One last thing, using enterprise manager is not the only way to create a table – you can also use the CREATE TABLE statement. For more information see CREATE TABLE and ALTER TABLE in the SQL Server books online.

SQL Server Data Types

Every column in your table must have a “data type,” which is simply a property that defines what type of data is stored in that column. In addition a data type will reject data that is not of the correct type (i.e. attempting to store a letter in a data type designed for numbers). SQL Server has over 25 different data types – some that have more options than others. Let’s look at the different data types and the options for each (a lot of this is copy and pasted from the SQL Server Books Online):

Data Type (Size) Description

Integers:

bigint (8 bytes) Holds integer (whole number) data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807).
int (4 bytes) Holds integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 – 1 (2,147,483,647).

smallint (2 bytes) Holds integer data from 2^15 (-32,768) through 2^15 – 1 (32,767).

tinyint (1 byte) Holds integer data from 0 through 255.

bit (1 byte for up to 8 bit columns) Holds integer data with either a 1 or 0 value. Each set of up to 8 bit columns requires 1 byte. So if there are anywhere from 1 to 8 bit columns in a table, the storage space you will need is 1 byte. If there are anywhere from 9 to 16 bit columns in a table, the storage space you will need is 2 bytes. And so on…

Decimal:

decimal (Anywhere from 5 to 17 bytes depending on the precision) Holds fixed precision and scale numbers. When maximum precision is used, valid values are from – 10^38 +1 through 10^38 – 1. The Precision specifies the maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point. The precision must be a value from 1 through the maximum precision. The maximum precision is 38. The Scale specifies the maximum number of decimal digits that can be stored to the right of the decimal point. Scale must be a value from 0 through Precision. Examples:
if precision is set to 10 and scale is set to 3 the smallest (other than 0)/ largest number we could store would be 0.001 / 9999999.999 if precision is set to 8 and scale is set to 6 the smallest (other than 0)/ largest number we could store would be 0.000001 / 99.999999

numeric (Same as decimal data type)

Money:

money (8 bytes) Holds monetary data values from -2^63 (-922,337,203,685,477.5808) through 2^63 – 1 (922,337,203,685,477.5807), with accuracy to one ten-thousandth of a monetary unit

smallmoney (4 bytes) Holds monetary data values from -214,748.3648 through +214,748.3647, with accuracy to a ten-thousandth of a monetary unit.

Approximate:

float(n) (Anywhere from 4 to 8 bytes depending on the precision) Holds floating precision number data from -1.79E + 308 through 1.79E + 308. The value n is the number of bits used to store the mantissa of the float number and can range from 1 to 53

real (4 bytes) Holds floating precision number data from -3.40E + 38 through 3.40E + 38. Real is the same as float(24).

Date and Time:

datetime (8 bytes) Holds date and time data from January 1, 1753, through December 31, 9999, with an accuracy of three-hundredths of a second, or 3.33 milliseconds.
smalldatetime (4 bytes) Date and time data from January 1, 1900 through June 6, 2079, with accuracy to the minute.

Strings (non-Unicode):

char(n) (n bytes) Holds fixed-length non-Unicode character data with length of n characters, where n is a value from 1 through 8000. If less than n number of characters are entered n bytes are still required because unused character spaces get padded to the end to make them n bytes long.

varchar(n) (Depends on the length of data entered, 1 byte per character) Holds variable-length non-Unicode character data with a length of n characters, where n is a value from 1 through 8000. The storage size is the actual length in bytes of the data entered, not n bytes.

text (16 bytes for the pointer) Variable-length non-Unicode data with a maximum length of 2^31 – 1 (2,147,483,647) characters. A text column entry can hold up to 2^31 – 1 characters. It is a pointer to the location of the data value, the data is stored separately from the table data.

Strings (Unicode):

nchar(n) (2 bytes * n) Holds fixed-length Unicode character data of n characters, where n is a value from 1 through 4000. Unicode characters use 2 bytes per character and can support all international characters. If less than n number of characters are entered n bytes are still required because unused character spaces get padded to the end to make them n bytes long.

nvarcher(n) (Depends on the length of data entered, 2 byte per character) Holds variable-length Unicode data of n characters, where n is a value from 1 through 4000. Unicode characters use 2 bytes per character and can support all international characters. The storage size is the actual length in bytes * 2 of the data entered, not n bytes.
ntext (16 bytes for the pointer) Holds variable-length Unicode data with a maximum length of 2^30 – 1 (1,073,741,823) characters. The column entry for ntext is a pointer to the location of the data. The data is stored separately from the table data

Binary:

binary(n) (n + 4 bytes) Holds fixed-length binary data of n bytes, where n is a value from 1 through 8000. Use binary when column data entries are consistent in size.

varbinary(n) (Depends on the length of data entered + 4 bytes) Holds variable-length binary data of n bytes, where n is a value from 1 through 8000. Use varbinary when column data entries are inconsistent in size.

image 16 bytes for the pointer Used for variable-length binary data longer than 8000 bytes, with a maximum of 2^31 – 1 bytes. An image column entry is a pointer to the location of the image data value. The data is stored separately from the table data

Other:

sql_variant (size varies) A column of type sql_variant may contain rows of different data types. For example, a column defined as sql_variant can store int, binary, and char values. The only types of values that cannot be stored using sql_variant are text, ntext, image, timestamp, and sql_variant.

timestamp (8 bytes) Timestamp is a data type that exposes automatically generated binary numbers, which are guaranteed to be unique within a database. Timestamp is used typically as a mechanism for version-stamping table rows. Each table can have only one timestamp column.

uniqueidentifier (16 bytes) Stores a 16-byte binary value that is a globally unique identifier (GUID).

That about covers it for the data types. There are two data types I left out (cursor and table), but they are really not applicable here because you can’t have a column that is made up of those data types.

Relational Database Design Concepts

Up to this point we have focused on the physical design of a database which includes things like files and filegroups. The physical design is focused on how data is stored and what is the most efficient way to access data. Now we are going to start to look at implementing the logical design of a database which includes objects like tables and the relationships between them. The logical design is only concerned with modeling a real-world data scenario – it is unaware and is not dependent on one particular Database Management System. For Example, I could use the same logical design to create a database in both SQL Server and Access and I would come up with the exact same database as far as the logical design is concerned. However, the physical design for the two databases could be very different – the Access database would be made up of only one file whereas the SQL Server database could be made up of many different files spread across several hard drives. Another example is a database on SQL Server that gets another data file added to it. While the physical design has change by adding another file, the logical design is still the same. The important point to get here is that the physical implementation of a database and the logical implementation of a database are two distinct concepts that are, for the most part, independent of one another.

I think I’ve said this already, but if not, SQL Server is a Relational Database Management System (RDBMS). So it makes sense that the databases that we are going to be working with are relational databases.
If this is your first time working with a relational database of any kind or you never got a good grasp on the design concepts, I encourage you to stop here and get a book on designing relational databases. Check out the following links to get started learning about logical database design:

Support WebCast: Database Normalization Basics

Understanding Relational Database Design
ACC2000: Database Normalization Basics (Q209534)

Recovering and Restoring a SQL Server Database

Restoring a database from a file or tape backup is quite simple, but there are a few things you must think about when using the Full and Bulk_Logged recovery models. Note that we are not going to cover restoring a database that uses the simple recovery model. The first thing to understand is that restoring a database and recovering a database are two different operations entirely. Restoring is simply the process of copying data from backups into the database. Recovering, on the other hand, is the process of using the transaction log to bring the data in the database to a consistent state. Rather than just give you a one sentence description, let’s take a closer look at exactly what the recovery process is.

If you remember back a few articles (Understanding Transaction Logs) I talked about how the recovery process was used to recover modifications if SQL Server was improperly shut down, known as restart recovery. Well a similar process to the one SQL Server uses if it’s improperly shut down is also used when we restore a database – known as restore recovery. If a SQL Server is improperly shut down we end up with a chunk of data that is in an inconsistent state, that is, we don’t know what modifications have been saved or what modifications were not saved before the unexpected shutdown. In addition to a chunk of inconsistent data, we also have a log file that contains a list of all the changes that were made to the data – this is where the recovery process comes in.

In order to bring the data into a consistent state all transactions in the log that are marked as having been completed are re-applied to the data, known as “rolling forward,” whereas all transactions in the log that had not been completed at the time of the unexpected shutdown are undone to the data, known as “rolling back.” By rolling forward completed transactions and rolling back uncompleted transactions, we are left with data that is in a “consistent state”…meaning we do not end up with half a transaction completed which could result in, for example, funds being deducted from account A and not being added to account B (i.e. a big mess).

Creating a SQL Server Disaster Recovery Plan – Part 2

Last week we decided on how often to backup our database, now we must decide where to backup our database. The first option available is to backup directly to tape from SQL Server. On the plus side a backup to tape allows for off-site storage of backups. On the down side tape backups are slow and therefore can impact the server for a longer amount of time than abackup to a file. The other common place to backup to is a file. File backups are much faster than a tape backup, for both backup and restore operations, however backups to files don’t allow for quick off-site storage (unless you happen to have a high speed remote link).

A third option is to use a combination of making backups to a file and then using another backup utility, such as NT Backup, to copy the file backups to tape. By making backups to files on another server nearby and then copying the files to tape, you can minimize the time a backup will impact your SQL Server while still allowing for off-site storage of tapes. Also, if you need to make multiple copies of tape backups, using another computer for copying backups to multiple tapes can help even more.

Another thing to think about when you are choosing where to make a backup is the time it takes to restore a backup. For example, a backup that is stored on another computer could be restored much faster over a high speed network than it could be from a tape drive. To take advantage of this faster recovery you may consider saving file backups for the week on another computer (in addition to your tape backups). In the event your SQL Server crashes you have a current backup on hand and available at a faster speed than tape.

File and tape backups do provide for lots of flexibility in designing your disaster recovery plan, but their are still many options available from third party venders. For example, you can find utilities that make the process of making backups of multiple SQL Servers very simple. You may also consider a Storage Area Network for large mission-critical systems. Although I would love to cover every option available, other backup utilities and hardware options are outside the scope of this series.