How to Perform SQL Server Row-by-Row Operations Without Cursors
SQL cursors have been a curse to database programming for many years because of their poor performance. On the other hand, they are extremely useful because of their flexibility in allowing very detailed data manipulations at the row level. Using cursors against SQL Server tables can often be avoided by employing other methods, such as using derived tables, set-based queries, and temp tables. A discussion of all these methods is beyond the scope of this article, and there are already many well-written articles discussing these techniques.The focus of this article is directed at using non-cursor-based techniques for situations in which row-by-row operations are the only, or the best method available, to solve a problem. Here, I will demonstrate a few programming methods that provide a majority of the cursor’s flexibility, but without the dramatic performance hit.
Let’s begin by reviewing a simple cursor procedure that loops through a table. Then we’ll examine a non-cursor procedure that performs the same task.
if exists (select * from sysobjects where name = N’prcCursorExample’)
drop procedure prcCursorExample
go
CREATE PROCEDURE prcCursorExample
AS
/*
** Cursor method to cycle through the Customer table and get Customer Info for each iRowId.
**
** Revision History:
** —————————————————
** Date Name Description Project
** —————————————————
** 08/12/03 DVDS Create —-
**
*/
SET NOCOUNT ON
– declare all variables!
DECLARE @iRowId int,
@vchCustomerName nvarchar(255),
@vchCustomerNmbr nvarchar(10)
– declare the cursor
DECLARE Customer CURSOR FOR
SELECT iRowId,
vchCustomerNmbr,
vchCustomerName
FROM CustomerTable
OPEN Customer
FETCH Customer INTO @iRowId,
@vchCustomerNmbr,
@vchCustomerName
– start the main processing loop.
WHILE @@Fetch_Status = 0
BEGIN
– This is where you perform your detailed row-by-row processing.
– Get the next row.
FETCH Customer INTO @iRowId,
@vchCustomerNmbr,
@vchCustomerName
END
CLOSE Customer
DEALLOCATE Customer
RETURN
As you can see, this is a very straight-forward cursor procedure that loops through a table called CustomerTable and retrieves iRowId, vchCustomerNmbr and vchCustomerName for every row. Now we will examine a non-cursor version that does the exact same thing:
if exists (select * from sysobjects where name = N’prcLoopExample’)
drop procedure prcLoopExample
go
CREATE PROCEDURE prcLoopExample
AS
/*
** Non-cursor method to cycle through the Customer table and get Customer Info for each iRowId.
**
** Revision History:
** ——————————————————
** Date Name Description Project
** ——————————————————
** 08/12/03 DVDS Create —–
**
*/
SET NOCOUNT ON
– declare all variables!
DECLARE @iReturnCode int,
@iNextRowId int,
@iCurrentRowId int,
@iLoopControl int,
@vchCustomerName nvarchar(255),
@vchCustomerNmbr nvarchar(10)
@chProductNumber nchar(30)
– Initialize variables!
SELECT @iLoopControl = 1
SELECT @iNextRowId = MIN(iRowId)
FROM CustomerTable
– Make sure the table has data.
IF ISNULL(@iNextRowId,0) = 0
BEGIN
SELECT ‘No data in found in table!’
RETURN
END
– Retrieve the first row
SELECT @iCurrentRowId = iRowId,
@vchCustomerNmbr = vchCustomerNmbr,
@vchCustomerName = vchCustomerName
FROM CustomerTable
WHERE iRowId = @iNextRowId
– start the main processing loop.
WHILE @iLoopControl = 1
BEGIN
– This is where you perform your detailed row-by-row processing.
– Reset looping variables.
SELECT @iNextRowId = NULL
– get the next iRowId
SELECT @iNextRowId = MIN(iRowId)
FROM CustomerTable
WHERE iRowId > @iCurrentRowId
– did we get a valid next row id?
IF ISNULL(@iNextRowId,0) = 0
BEGIN
BREAK
END
– get the next row.
SELECT @iCurrentRowId = iRowId,
@vchCustomerNmbr = vchCustomerNmbr,
@vchCustomerName = vchCustomerName
FROM CustomerTable
WHERE iRowId = @iNextRowId
END
RETURN
There are several things to note about the above procedure.
For performance reasons, you will generally want to use a column like “iRowId” as your basis for looping and row retrieval. It should be an auto-incrementing integer data type, along with being the primary key column with a clustered index.
There may be times in which the column containing the primary key and/or clustered index is not the appropriate choice for looping and row retrieval. For example, the primary key and/or clustered index may have already been built on a column using uniqueindentifier as the data type. In such a case, you can usually add an auto-increment integer data column to the table and build a unique index or constraint on it.
The MIN function is used in conjunction with greater than “>” to retrieve the next available iRowId. You could also use the MAX function in conjunction with less than “<” to achieve the same result:
SELECT @iNextRowId = MAX(iRowId)
FROM CustomerTable
WHERE iRowId < @iCurrentRowId
Be sure to reset your looping variable(s) to NULL before retrieving the next @iNextRowId value. This is critical because the SELECT statement used to get the next iRowId will not set the @iNextRowId variable to NULL when it reaches the end of the table. Instead, it will fail to return any new values and @iNextRowId will keep the last valid, non-NULL, value it received, throwing your procedure into an endless loop. This brings us to the next point, exiting the loop.
When @iNextRowId is NULL, meaning the loop has reached the end of the table, you can use the BREAK command to exit the WHILE loop. There are other ways of exiting from a WHILE loop, but the BREAK command is sufficient for this example.
You will notice that in both procedures I have included the comments listed below in order to illustrate the area in which you would perform your detailed, row-level processing.
– This is where you perform your detailed row-by-row
– processing.
Quite obviously, your row level processing will vary greatly, depending upon what you need to accomplish. This variance will have the most profound impact on performance.
For example, suppose you have a more complex task which requires a nested loop. This is equivalent to using nested cursors; the inner cursor, being dependent upon values retrieved from the outer one, is declared, opened, closed and deallocated for every row in the outer cursor. (Please reference the DECLARE CURSOR section in SQL Server Books Online for an example of this.) In such a case, you will achieve much better performance by using the non-cursor looping method because SQL is not burdened by the cursor activity.
Here is an example procedure with a nested loop and no cursors:
if exists (select * from sysobjects where name = N’prcNestedLoopExample’)
drop procedure prcNestedLoopExample
go
CREATE PROCEDURE prcNestedLoopExample
AS
/*
** Non-cursor method to cycle through the Customer table ** and get Customer Name for each iCustId. Get all
** products for each iCustid.
**
** Revision History:
** —————————————————–
** Date Name Description Project
** —————————————————–
** 08/12/03 DVDS Create —–
**
*/
SET NOCOUNT ON
– declare all variables!
DECLARE @iReturnCode int,
@iNextCustRowId int,
@iCurrentCustRowId int,@iCustLoopControl int,
@iNextProdRowId int,
@iCurrentProdRowId int,
@vchCustomerName nvarchar(255),
@chProductNumber nchar(30),
@vchProductName nvarchar(255)– Initialize variables!
SELECT @iCustLoopControl = 1
SELECT @iNextCustRowId = MIN(iCustId)
FROM Customer– Make sure the table has data.
IF ISNULL(@iNextCustRowId,0) = 0
BEGIN
SELECT ‘No data in found in table!’
RETURN
END
– Retrieve the first row
SELECT @iCurrentCustRowId = iCustId,
@vchCustomerName = vchCustomerName
FROM Customer
WHERE iCustId = @iNextCustRowId
– Start the main processing loop.
WHILE @iCustLoopControl = 1
BEGIN
– Begin the nested(inner) loop.
– Get the first product id for current customer.
SELECT @iNextProdRowId = MIN(iProductId)
FROM CustomerProduct
WHERE iCustId = @iCurrentCustRowId
– Make sure the product table has data for current customer.
IF ISNULL(@iNextProdRowId,0) = 0
BEGIN
SELECT ‘No products found for this customer.’
END
ELSE
BEGIN
– retrieve the first full product row for current customer.
SELECT @iCurrentProdRowId = iProductId,
@chProductNumber = chProductNumber,
@vchProductName = vchProductName
FROM CustomerProduct
WHERE iProductId = @iNextProdRowId
ENDWHILE ISNULL(@iNextProdRowId,0) <> 0
BEGIN
– Do the inner loop row-level processing here.– Reset the product next row id.
SELECT @iNextProdRowId = NULL– Get the next Product id for the current customer
SELECT @iNextProdRowId = MIN(iProductId)
FROM CustomerProduct
WHERE iCustId = @iCurrentCustRowId
AND iProductId > @iCurrentProdRowId– Get the next full product row for current customer.
SELECT @iCurrentProdRowId = iProductId,
@chProductNumber = chProductNumber,
@vchProductName = vchProductName
FROM CustomerProduct
WHERE iProductId = @iNextProdRowId
END
– Reset inner loop variables.
SELECT @chProductNumber = NULL
SELECT @vchProductName = NULL
SELECT @iCurrentProdRowId = NULL
– Reset outer looping variables.
SELECT @iNextCustRowId = NULL
– Get the next iRowId.
SELECT @iNextCustRowId = MIN(iCustId)
FROM Customer
WHERE iCustId > @iCurrentCustRowId
– Did we get a valid next row id?
IF ISNULL(@iNextCustRowId,0) = 0
BEGIN
BREAK
END
– Get the next row.
SELECT @iCurrentCustRowId = iCustId,
@vchCustomerName = vchCustomerName
FROM Customer
WHERE iCustId = @iNextCustRowId
END
RETURN
In the above example we are looping through a customer table and, for each customer id, we are then looping through a customer product table, retrieving all existing product records for that customer. Notice that a different technique is used to exit from the inner loop. Instead of using a BREAK statement, the WHILE loop depends directly on the value of @iNextProdRowId. When it becomes NULL, having no value, the WHILE loop ends.
Conclusion
SQL Cursors are very useful and powerful because they offer a high degree of row-level data manipulation, but this power comes at a price: negative performance. In this article I have demonstrated an alternative that offers much of the cursor’s flexibility, but without the negative impact to performance. I have used this alternative looping method several times in my professional career to the benefit of cutting many hours of processing time on production SQL Servers.
Source: David VanDeSompele
About the Author
David VanDeSompele has worked directly with database systems for 11 years. He is currently the senior DBA for a Seattle-based company that provides firewall and VPN solutions.
Published with the express permission of the author. Copyright 2003.
Performance Tuning SQL Server Cursors
If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications. If you need to perform row-by-row operations, try to find another method to perform the task.
Here are some alternatives to using a cursor:
Use WHILE LOOPS
Use temp tables
Use derived tables
Use correlated sub-queries
Use the CASE statement
Perform multiple queries
More often than not, there are non-cursor techniques that can be used to perform the same tasks as a SQL Server cursor.
If you do find you must use a cursor, try to reduce the number of records to process.
One way to do this is to move the records that need to be processed into a temp table first, then create the cursor to use the records in the temp table, not from the original table. This of course assumes that the subset of records to be inserted into the temp table are substantially less than those in the original table.
The lower the number of records to process, the faster the cursor will finish.
If the number of rows you need to return from a query is small, and you need to perform row-by-row operations on them, don’t use a server-side cursor. Instead, consider returning the entire rowset to the client and have the client perform the necessary action on each row, then return any updated rows to the server.
If you have no choice but to use a server-side cursor in your application, try to use a FORWARD-ONLY or FAST-FORWARD, READ-ONLY cursor. When working with unidirectional, read-only data, use the FAST_FORWARD option instead of the FORWARD_ONLY option, as it has some internal performance optimizations to speed performance. This type of cursor produces the least amount of overhead on SQL Server.
If you are unable to use a fast-forward cursor, then try the following cursors, in this order, until you find one that meets your needs. They are listed in the order of their performance characteristics, from fastest to slowest: dynamic, static, and keyset.
Avoid using static/insensitive and keyset cursors, unless you have no other choice. This is because they cause a temporary table to be created in TEMPDB, which increases overhead and can cause resource contention issues.
If you have no choice but to use cursors in your application, try to locate the SQL Server tempdb database on its own physical device for best performance. This is because cursors use the tempdb for temporary storage of cursor data. The faster your disk array, the faster your cursor will be.
Using cursors can reduce concurrency and lead to unnecessary locking and blocking. To help avoid this, use the READ_ONLY cursor option if applicable, or if you need to perform updates, try to use the OPTIMISTIC cursor option to reduce locking. Try to avoid the SCROLL_LOCKS cursor option, which reduces concurrency.
When you are done using a cursor, don’t just CLOSE it, you must also DEALLOCATE it. Deallocation is required to free up the SQL Server resources used by the cursor. If you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you don’t DEALLOCATE your cursors, the resources used by the cursor will stay allocated, degrading the performance of your server until they are released.
If it is appropriate for your application, try to load the cursor as soon as possible by moving to the last row of the result set. This releases the share locks created when the cursor was built, freeing up SQL Server resources.
If you have to use a cursor because your application needs to manually scroll through records and update them, try to avoid client-side cursors, unless the number of rows is small or the data is static. If the number of rows is large, or the data is not static, consider using a server-side keyset cursor instead of a client-side cursor. Performance is usually boosted because of a reduction in network traffic between the client and the server. For optimum performance, you may have to try both types of cursors under realistic loads to determine which is best for your particular environment.
When using a server-side cursor, always try to fetch as small a result set as possible. This includes fetching only those rows and columns the client needs immediately. The smaller the cursor, no matter what type of server-side cursor it is, the fewer resources it will use, and performance will benefit.
If you need to perform a JOIN as part of your cursor, keyset and static cursors are generally faster than dynamic cursors, and should be used when possible.
If a transaction you have created contains a cursor (try to avoid this if at all possible), ensure that the number of rows being modified by the cursor is small. This is because the modified rows may be locked until the transaction completes or aborts. The greater the number of rows being modified, the greater the locks, and the higher the likelihood of lock contention on the server, hurting performance.
In earlier versions of SQL Server, Transact-SQL cursors were only global to the connection. But in SQL Server 7 and 2000, there are two options to define the scope of a cursor. LOCAL and GLOBAL keywords in the DECLARE CURSOR statement are used to specify the scope of a cursor. A GLOBAL cursor can be referenced in any stored procedure or batch executed by a connection. LOCAL cursors are more secure as they cannot be referenced outside the procedure or trigger unless they are passed back to the calling procedure or trigger, or by using an output parameter. GLOBAL cursors must be explicitly deallocated or they will be available until the connection is closed. For optimum performance, you should always explicitly deallocate a cursor when you are done using it. LOCAL cursors are implicitly deallocated when the stored procedure, the trigger, or the batch in which they were created terminates. We can use LOCAL cursors for more security and better scope of the cursor in our application, which also helps to reduce resources on the server, and boosting performance.
Consider using asynchronous cursors if you expect your result set to be very large. This allows you to continue processing while the cursor is still being populated. While it may not actually speed up your application, it should give the appearance to your end users that something is happening sooner that if they have to wait until the entire cursor is populated.
If you have to use a cursor, break out of the cursor loop as soon as you can. If you find that a problem has occurred, or processing has ended before the full cursor has been processed, then exit immediately.
If you are using the same cursor more than once in a batch of work, (or within more than one stored procedure), then define the cursor as a global cursor by using the GLOBAL keyword. By not closing or deallocating the cursor until the whole process is finished, a fair amount of time will be saved, as the cursor and the data contained will already be defined, ready for you to use.
Sometimes, it is handy to be able to perform some calculation on one or more columns of a record, and then take the result of that calculation and then add it to similar calculations performed on other related records to find a grand total.
For example, let’s say you want to find the total dollar cost of an invoice. An invoice will generally involve a header record and one or more detail records. Each detail record will represent a line item on the invoice. In order to calculate the total dollar cost of an invoice, based on two or more line items, you would need to multiply the quantity of each item sold times the price of each item. Then, you would need to add the total price of each line item together in order to get the total dollar cost of the entire invoice. To keep this example simple, let’s ignore things like discounts, taxes, shipping, etc.
One way to accomplish this task would be to use a cursor like the one we see below (we are using the Northwind database for this example code):
DECLARE @LineTotal money –Declare variables
DECLARE @InvoiceTotal money
SET @LineTotal = 0 –Set variables to 0
SET @InvoiceTotal = 0
DECLARE Line_Item_Cursor CURSOR FOR –Declare the cursor
SELECT UnitPrice*Quantity –Multiply unit price times quantity ordered
FROM [order details]
WHERE orderid = 10248 –We are only concerned with invoice 10248
OPEN Line_Item_Cursor –Open the cursor
FETCH NEXT FROM Line_Item_Cursor INTO @LineTotal –Fetch next record
WHILE @@FETCH_STATUS = 0
BEGIN
SET @InvoiceTotal = @InvoiceTotal + @LineTotal –Summarize line items
FETCH NEXT FROM Line_Item_Cursor INTO @LineTotal
END
CLOSE Line_Item_Cursor –Close cursor
DEALLOCATE Line_Item_Cursor –Deallocate cursor
SELECT @InvoiceTotal InvoiceTotal –Display total value of invoice
The result for invoice number 10248 is $440.00.
What the cursor does is to select all of the line items for invoice number 10248, then multiply the quantity ordered times the price to get a line item total, and then it takes each of the line item totals for each record and then adds them all up in order to calculate the total dollar amount for the invoice.
This all works well, but the code is long and hard to read, and performance is not great because a cursor is used. Ideally, for best performance, we need to find another way to accomplish the same goal as above, but without using a cursor.
Instead of using a cursor, let’s rewrite the above code using set-based Transact-SQL instead of a cursor. Here’s what the code looks like:
DECLARE @InvoiceTotal money
SELECT @InvoiceTotal = sum(UnitPrice*Quantity)
FROM [order details]
WHERE orderid = 10248
SELECT @InvoiceTotal InvoiceTotal
The result for invoice number 10248 is $440.00.
Right away, it is obvious that this is a lot less code and that is it more readable. What may not be obvious is that it uses less server resources and performs faster. In our example — with few rows — the time difference is very small, but if many rows are involved, the time difference between the techniques can be substantial.
The secret here is to use the Transact-SQL “sum” function to summarize the line item totals for you, instead of relying on a cursor. You can use this same technique to help reduce your dependency on using resource-hogging cursors in much of your Transact-SQL code.
-
Archives
- July 2008 (1)
- June 2008 (2)
- May 2008 (4)
- April 2008 (3)
- January 2008 (29)
-
Categories
-
RSS
Entries RSS
Comments RSS