posted by Roland Bouman
on
Fri 10 Jul 2009 19:59 UTC
Tags:
There is a popular myth about the SQL GROUP BY clause. The myth holds that 'standard SQL' requires columns referenced in the SELECT list of a query to also appear in the GROUP BY clause, unless these columns appear exclusively in an aggregated expression. MySQL is often accused of violating this standard.
In this article I will attempt to debunk this myth, and to provide a more balanced view regarding MySQL's treatment of GROUP BY at the same time.
To do that, I will first demonstrate that MySQL can be instructed to only accept GROUP BY clauses that include all non-aggregated expressions referred to in the SELECT list, thus making MySQL's behaviour conform more to that of other well-known rdbms-products.
Second, I will show that it is very important to clearly define which version of the SQL-standard is being referred to. The two most recent versions use a rather sophisticated way of defining the required relationships between expressions appearing in the GROUP BY clause and the SELECT list. Contrary to a popular belief, these standards do not literally require that all non-aggregated SELECT list columns appear in the GROUP BY clause.
Third, I will use a simple yet realistic example to illustrate in an informal manner what I believe is the intent expressed in the more recent versions of the SQL standard. Hopefully I will be able to convince you why it may even be better to not blindly include all non-aggregated columns from the SELECT list in the GROUP BY clause.