Things in the “software development” world have changed a fair bit over the last 10 years. In the past, developers used to have to think carefully about the SQL they wrote, making sure that they used well optimized stored procedures on well designed databases with the right indexes on the right columns because let’s face it, the database was often a bottleneck.
Nowadays, with tools like ORMs being used more and more liberally, I get the feeling that some of us are getting a bit lazy and forgetting that the database is still a potential bottleneck. I bet that a lot of developers reading this won’t have had to hand-roll any SQL for a while and instead rely on what the ORM of choice generates for them. Mature frameworks like NHibernate and Entity Framework do a great job at providing us with CRUD statements but if we have a complex domain, a lot of rows, a heavy load or unique requirements then perhaps they aren’t good enough.
Although I’m by no means a DBA or SQL / SQL Server expert, I think it’s important for developers to take some responsibility over data retrieval when writing applications. To get the most out of your application and hardware, you should be aware of which queries are run most often and which are taking the longest to complete. In a perfect world, your DBAs could be providing you with this information on a weekly basis but there are some tools that you as a developer can use to preempt performance problems as well. In this post I’m going to list out some of the tools that I use to help me performance tune and monitor SQL Server, the DBMS that I am most familiar with.
Use stored procedures over executing SQL statements
In previous versions of SQL Server (version 6.5 and earlier), one of the main advantages of using stored procedures over executing SQL statements was that SQL Server would partially precompile a single execution plan for stored procedures upon their creation and cache it for reuse. However, the last couple of versions of SQL Server no longer precompile execution plans for stored procedures upon creation but instead upon their first execution, like is done for any other SQL statement. Also, execution plans for all T-SQL statements (stored procedures and SQL statements) are now stored in the procedure cache, not just stored procedure execution plans. These changes reduce the overall performance benefit that used to be gained by using stored procedures.
There are still some other performance gains to be made by using stored procedures though. For example, when you call a stored procedure from code, you only have to send the EXECUTE stored_procedure_name arguments down the wire instead of the whole statement. You can also perform a lot more logic within one stored procedure thereby saving you many round trips between the client and the server.
There are also other non-performance-related benefits to using stored procedures over SQL statements such as maintainability, abstraction, security.
Before releasing a new or updated stored procedure, it’s a good idea to have a look at the planned and actual execution plans for it. If you’re making a change, you should look at the execution plan before and after your change to make sure you haven’t made anything worse! The main things to look for here are:
- Table or clustered index scans. If the table over which the scan is being performed is small, this isn’t generally a problem. If it is a large table then it may mean that your table is missing indexes, or you need additional indexes, or that the optimizer has ignored your index for some reason. If you really need to (and you know what you’re doing!), you could use query hints to force SQL Server to use a particular index.
- Repetitive scans. Try to avoid any repetitive table or index scans by rewriting your queries. There is usually more than one way of writing any batch of SQL statements.
- Highest cost queries. Concentrate on optimizing the queries with the highest relative cost.
Traces – SQL Profiler vs Server Side
Traces record what’s going on in your database and are invaluable when it comes to analyzing what’s going on under the covers, especially when there is a performance issue.
SQL Profiler is a great GUI for running traces and analyzing how your code is actually communicating with your database(s). Even if the queries themselves are well optimized and running smoothly, if you’re repeatedly calling them hundreds of times in a row then you may encounter some performance issues. However, as a developer, you should only ever SQL Profiler to run traces on development and test databases. Running a trace on a production database from the SQL Profiler GUI can seriously overload it. You should also make sure you’re capturing the information you need and not too much more otherwise it becomes impossible to find what you’re looking for in all the trace output. I generally set mine up like this:
According to a very talented SQL Server DBA ex-colleague and friend of mine, @trademe_dave you can also run traces on the server side which apparently have much less impact than running them through the SQL Profiler GUI, but it really all depends on how many events you trace (3 or 4 should do the trick) and how long you leave it running for. Safest option is to leave this sort of thing up to the DBAs :)
Dynamic Management Views
If you can’t get your hands on a trace, you can try using Dynamic Management Views to show you some execution statistics (including CPU time, physical and logical reads, etc) on all queries. From what I understand, it isn’t perfect – for example, it only shows you aggregated data on queries that have actually finished executing, but that’s still much better than nothing! Here is a great article on how to use sys.dm_exec_query_stats to show you the top (most often run, longest execution time) queries.
Analyze Statistical Information
If you execute the statement SET STATISTICS IO ON before running a query or executing a stored proc (in SQL Server Management Studio), SQL Server will output some information regarding the amount of disk activity generated by the SQL statements being executed. To turn it off again, you simply execute SET STATISTICS IO OFF. These statements affect the existing connection so if you don’t execute a SET STATISTICS IO OFF command, any other queries you run using the same connection will also produce this performance tuning data.
The syntax looks a little like this:
1 2 3 4 5
SET STATISTICS IO ON SELECT * FROM dbo.Articles SET STATISTICS IO OFF
The information it outputs regarding the disk activity can be found in the ‘messages’ table, right next to the ‘results’ tab in mgmt studio:
What you’re looking for in particular are the number of scan counts and logical reads. For a simple query or stored proc that selects data out of only one table, you’ll generally find a scan count of 1. However, if you’re joining tables then you’ll see that some of the tables involved might have much larger scan counts. I don’t know off the top of my head what a good value for ‘scan counts’ is but your DBA should be able to help with that. More importantly, while you fine-tune your query / stored proc you should keep an eye out on this value to see if it goes up or down with the changes you make – the higher the number, the worse the performance.
With respect to the logical reads, they are a measure of the number of pages that SQL Server had to read the data cache from in order to produce the results specified by your query. Again, a higher number is worse than a lower number so while you are tuning your query / stored proc, you should make sure that you don’t end up with a higher number of logical reads than you initially had!
I’ve also read that it’s a good idea to issue the following two commands before executing your query / stored proc with SET STATISTICS IO ON / OFF around it so that you clear out SQL Server’s data and procedure caches:
DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE
This article has a lot more detail on SET STATISTICS IO ON / OFF (and SET STATISTICS TIME ON / OFF which I personally haven’t used). And here is an MSDN article on SET STATISTICS IO, just for good measure.
That’s all for now folks!
I’m sure there are many other tools that developers can and frequently do use to performance tune their SQL queries and stored procedures. Those listed above are just the ones I use often – if you know of any others please feel free to leave a comment below!
But please, developers, don’t forget that under all your awesome code there is still a database struggling to keep up with the load you’re throwing at it. Be smart about how you access your data and don’t stop analyzing, benchmarking and optimizing your SQL statements.