In my last article I gave a brief description of what the transaction log is. However, we did not go over what a transaction is. The simplest definition of a transaction is a “logical unit of work.” In other words, it is a grouping of SQL statements that perform one logical task. The classic example that is used most of the time to describe transactions is that of a bank transfer. Say you have to bank accounts (Account A and Account B) and you wanted to transfer funds from one to the other. If you execute the two SQL statements individually (one to remove the funds from Account A and one to add funds to Account B), it is possible to remove funds from Account A successfully but have the second statement fail and not add the funds to Account B. In the real world this would make for some very unhappy customers. To overcome this problem we can group statements into transactions. When statements are grouped into transactions, either they all complete successfully and are “committed,” or if any one of the statements fails they are all “rolled back” and the data is left in the condition it was in before the transaction began. Until we get into data integrity, this definition will do.
The transaction log stores all the changes as they are made to the database. Once all the statements in the transaction have made their modifications to the database, the transaction is said to be committed once the commit is recorded in the transaction log. Once the transaction is committed, why does it need to stay in the log? To understand, you need to know what is going on under the hood of SQL Server.
As data is requested from a database, the information is stored in memory. Because accessing memory is so much faster than accessing the disk subsystem, SQL Server can improve performance dramatically.
Request for data that is already in memory (pages are cached in memory, so they are used).
When you make a modification to the data in a database, SQL Server makes the changes to the data stored in memory (not on the disk) and an entry is made in the transaction log. Pages in memory that contain data that has been changed are called “dirty pages.”
In order to have pages written back to disk and release memory, SQL Server uses a thread called the lazywriter. When the lazywriter writes pages to disk, it uses the least recently used (LRU) page list to determine what pages should be written to disk. Pages that have not been used recently are at the beginning of the LRU list, however recently modified pages are at the end of the LRU list. Because of this, very heavily accessed pages may not be written to disk for a large amount of time (or never, in theory).
Because pages that have been recently changed are only stored in memory, they are susceptible to system crashes and power outages. To overcome this problem, SQL Server can use the log information to restore the modifications to the database if one of these events occurs. For example, if the power when out (and our UPS failed!) we would have lost updated pages stored in memory. In order to recover these pages, SQL Server starts in recovery mode. By using the transaction log, SQL Server can reapply the changes to the copy of the database stored on the hard drive.