After three and a half weeks with DB Best, things are starting to settle in. After attending the SQL Server 2016 Workshop in Bellevue, and delivering one section on In-Memory OLTP and columnstore indexes, I flew to Houston and delivered the entire Workshop on Friday, May 12. I did have local support from Peter DeBetta, also from DB Best, and John Cook, from Microsoft.
The Houston Workshop reached capacity by the Monday before the event, but with John Cook’s help, along with his colleague Sean Kelley, we found a larger room and ended up with 19 attendees from 15 Houston area companies in attendance.
We will be scheduling more of these Hands-On Lab workshops in the future, and I’ll be announcing them here and on Twitter. My next public event will be SQL Saturday Los Angeles on June 10th. I hope to see many of you there! http://www.sqlsaturday.com/640/eventhome.aspx
There were lots of great questions from the folks in attendance, mostly centered around In-Memory OLTP, one of my favorite topics. Maybe I should do a whole blog series about this topic, like the awesome Niko Neugebauer has done for columnstore indexes? http://www.nikoport.com/columnstore/
In this post, I’m going to address one of the questions that came up in the Houston workshop. However, before I can answer the question, a bit of background information is needed.
During the workshop, I discussed the fact that with In-Memory OLTP, logging is optimized in several ways, and one of the optimizations is that SQL Server will not log any changes to rows until the change is committed. So, there is nothing that needs to be rolled back during database recovery, because only committed changes are ever persisted. But, if a transaction makes multiple changes, those changes are made to rows in memory. The question that came up is: How does SQL Server know that a change made to a row in memory is committed or not? What happens to the rows if the change is rolled back instead of committed?
To answer this, let’s see what a row in memory looks like for a memory-optimized table. Every row has a header followed by the payload (the data that the row contains). In this post, we’ll just be talking about the header. The first two values in the header are timestamps and these are the most important pieces of information to answer the question at hand. (The other values in the header will not be discussed in the post, but I’ll use this diagram anyway and future posts can describe these other values.)
The Begin Ts (or “Begin Timestamp”) is a timestamp that indicates the time a row was inserted. The End Ts (or “End Timestamp) is a timestamp that indicates when a row was deleted. For rows that haven’t been deleted, there is a special value ∞ (infinity) that indicates the row is still active. The End Ts is needed because rows are not actually removed as soon as they are deleted, partly for performance reasons, and partly to support snapshot isolation. A running transaction, for its entire lifetime, continues to see the rows that were valid at the time the transaction started. Even if a row is deleted, transactions that started before the DELETE will continue to be able to read and return the row. The two timestamps define a row’s period of validity. If a row was inserted at timestamp 100 and deleted at timestamp 200, any other transaction that started between time 100 and 200 will be able to read that row.
To get back to the original question of what happens for a transaction that is rolled back, we need to know that every database that supports memory-optimized tables manages two internal counters:
- The Transaction-ID counter – this identifies each active transaction. It is a global, unique transaction id value (xtp_transaction_id) for the in-memory OLTP portion of a transaction. It is incremented every time a new transaction starts, and is reset whenever the SQL Server instance restarts.
- The Global Transaction Timestamp – this is the “commit” timestamp. It is also a global and unique value, but not reset on a restart. SQL Server increments the value monotonically each time a transaction issues a COMMIT. I’ll use the simple term timestamp, instead of Global Transaction Timestamp, to refer to this value. (Note that although this value is called a timestamp, it contains no information reflecting an actual time or date. It is an internal counter value which has no inherent meaning except in relation to other timestamp values.)
A simple way to differential these two values is that the Transaction-ID counter is used while a transaction is in progress to keep track of rows that it has changed, and the Global Transaction Timestamp is used for transactions that have been committed.
SQL Server keeps track of each active transaction in an internal transaction table. When a transaction starts, SQL Server increments the Transaction-ID counter, and assigns a unique transaction ID to the transaction. When the transaction issues a COMMIT, SQL Server generates a commit timestamp, which it stores in the internal table initially, and then writes the value to the header of affected rows, once it validates the transaction and hardens the changes to the log.
So, let’s see what this looks like.
Assume we have a row for a customer named Jane Wiggins in Kirkland, that was inserted at timestamp 100. (This means the transaction that inserted the row received a timestamp value of 100 when it committed.) The row might look something like this:
Assume Jane has now moved to Redmond, so we need to update the row. All updates on memory-optimized tables are performed as a two-step process: delete the original row and insert a new one. When this update transaction starts, it receives a Transaction-ID counter, let’s say TX1. So during the transaction, before it’s committed, the End Ts of the original row, and the Begin Ts of the new row, both contain this internal value TX1. There are now two rows in memory, which look like the following:
Any other transactions that run while this TX1 transaction is in progress will read the original version of the row. The internal transaction table will indicate that TX1 has not committed yet, so it is known that no changes have actually happened to this row yet.
If this transaction was aborted, SQL Server sets the Begin Ts value of its new versions to infinity, thereby making them invisible to all transactions, and attempts to reset the End Ts fields of its old versions to infinity. The rows would then look like the following. You can see the original version now looks like it did before the update, and the new (aborted) version would not be readable by any other query.
In case another transaction has already detected the abort, it might have created another new version and reset the End field of the old version. If so, the aborted transaction leaves the End field value unchanged. But in any case, the new version is not available to anyone after a rollback or abort of any kind.
So if you’ve read and understood this, you might just have even more questions now about In-Memory OLTP! Hopefully, I’ll answer more of them in upcoming posts. Feel free to ask your follow-up questions as a comment. I’d love to write about what you want to hear about.