Monday, November 15, 2010

SQL Query optimization - Part-4 (Myths)

  1. If I have an index on every column of my table, all my queries will run fast. This is the equivalent of saying that adding all the good ingredients in a bowl and mixing away to glory will result in the world's best dish. Doesn't happen. The sooner you snap out of it, the better it is for you.
  2. Joins are expensive. Everything should be completely denormalized. This isn't an SQL optimization myth, just a SQL myth in general. Loosely speaking though, if your joins are all natural joins and the join columns are mostly unique and all your filter conditions are connected with a conjunction (AND clause) and you don't go crazy with GROUP BY, ORDER BY and HAVING, you will have made yourself a pretty fast query. The key to remember is to try and (anyhow) maximize the ratio of Rows returned to Rows visited by the DB engine to produce the result set.
  3. If I have an index on a column that I want to ORDER BY then that index will be used and the DB won't use a temporary table to sort the result set. Not really. If the JOIN is not on the first column that you want to ORDER BY, then the DB will most likely use a temporary table and sort. Having said that, even if you join on a column that you will later ORDER BY, the execution planner might decide to not order intermediate results by that column because it thinks that that would be the most efficient query execution plan. This can happen when exactly one table in the JOIN doesn't have an index on the join column and that will be the column that the DB will decide to iterate over.
  4. Streaming result sets. I really don't know how many DBs actually do this, but it is a killer feature to have. Of course, you need to craft your queries very smartly to take advantage of this.

I you've gotten this far and don't feel lost, you are into this shit big time, and it's time to read some stuff by the big boys (here and here).

No comments: