Blag o' dkam

p0wning tubez

Archive for the ‘Development’ Category

MySQL’s handling of GROUP BY

without comments

In the year 2000, I was working on migrating a web application from MySQL to Oracle. One of the issues we had was a number of queries which took advantage of slackness in MySQLs handling of the GROUP BY clause. Oracle took exception to our poor SQL and forced the developers to rewrite queries to be a higher standard. To illustrate the issue, have a look at the following example.

mysql> create table `test1` (`a` int(11), `b` int(11) );
Query OK, 0 rows affected (0.14 sec)

mysql> insert into test1 values (1,2), (1,3);
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test1;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
|    1 |    3 |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from test1 group by a;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
+------+------+
1 row in set (0.00 sec)

The issue at hand, is the value for the b column. You’ll note that we’ve grouped by a, so we correctly have a single row result. If you’re not sure what the issue is, let’s repopulate the table in a different order and retry.

mysql> truncate test1;
Query OK, 0 rows affected (0.10 sec)

mysql> insert into test1 values (1,3), (1,2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test1;
+------+------+
| a    | b    |
+------+------+
|    1 |    3 |
|    1 |    2 |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from test1 group by a;
+------+------+
| a    | b    |
+------+------+
|    1 |    3 |
+------+------+
1 row in set (0.00 sec)

So, the value for b was 2 in the first example, and is now 3, with the same data in the table and the same SQL. The problem is that there is no aggregation or GROUP BY clause for b. There’s only one possible value for a, but b is not aggregated – there are two possible values: 2 or 3. MySQL just picks one of the two possible values. Let’s see what PostgreSQL does.

database=# create temporary table test1 (a integer, b integer);
CREATE TABLE
database=# insert into test1 values (1,2), (1,3);
INSERT 0 2
database=# select * from test1;
 a | b 
---+---
 1 | 2
 1 | 3
(2 rows)

database=# select * from test1 group by a;
ERROR:  column "test1.b" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select * from test1 group by a;

PostgreSQL decides you don’t want undefined / random data in your result set and lets you know. If you’re using GROUP BY queries in MySQL, and rely on the result being, you know, correct, it may be time for a quick audit of your SQL.

A quick search of Stack Overflow questions shows many solutions which include this flaw. The solutions work fine on MySQL, but fail on PostgreSQL.

I’m surprised that 11 years after first learning of MySQL’s handling of the GROUP BY clause, it has the same behaviour enabled by default. MySQL’s behaviour can be modified to behave in a stricter manner. Worth doing if you want to ensure you receive results without including randomly selected data.

Written by dkam

September 28th, 2011 at 11:25 pm

Posted in Development,Geeky,Ruby

On Daemons

without comments

New post on Daemons over on the Booko blog.

Written by dkam

August 7th, 2010 at 10:59 pm

On Users and Passwords

without comments

Just posted on the Booko Blogo an article about implementing Users and Passwords.

Written by dkam

September 23rd, 2009 at 11:24 pm

Making Booko work better with Google

without comments

Over on the Booko Blog.

Written by dkam

August 24th, 2009 at 11:57 am

No more onload=”javascript…” for Booko

with one comment

I’ve taken Phil’s advice and removed the Javascript from the body’s onload event handler, replacing it with Prototype’s event handler.

    <script type="text/javascript">
    Event.observe(
             window,
             "load",
         <%=  remote_function :url => { :action => "get_prices", :isbn13 => @book.isbn13  },:method => :get %>
        );
    </script>

Written by dkam

July 2nd, 2008 at 11:15 pm

Frustrating

without comments

I spent several hours yesterday fighting with RubyGems — I’d even written a vitriolic post about it — but I … did something … and bam, like that it was gone. RubyGem is Ruby’s version of Perl’s CPAN. It’s got a very annoying trait — its prodigious use of memory. For each gem ( a gem is a Ruby module — like rails or hpricot for example ) RubyGem would load the spec into memory in order ( I’m guessing after reading a bunch of forum posts ) to build a dependency tree. On a 256MB slice host, this pushes you into swap hell. On a 512MB host it would use up to 68% of memory.

So what can you do but rent a bigger slicehost? Moe Sizlack said it best: “I’m choking on my own rage over here!”

Naturally, it’s been fixed. Today.

If only I’d done something more constructive yesterday. Like played COD.

Written by dkam

June 22nd, 2008 at 1:55 pm

XMPP

without comments

This Simple XMPP client for Ruby looks like a great way to get some IM loving into an application. I’d love a monitoring system which used IM to alert me to failures. Since IM is not exactly a perfect medium, sending a copy of the message via Email for critical alerts is probably a good idea too.

Could be fun to ask your monitoring IM buddy “Status?” and get a status report. X Emails delivered, Y Emails rejected, Z books looked up on Booko.

Fun. :-)

Written by dkam

June 11th, 2008 at 10:57 pm

Posted in Booko,Development,Geeky

Consistency

without comments

Well, apparently I’ve not been consistent enough — an important quality when one bags out others for being inconsistent. So — I’ve updated all references to “Bookie” to the now correct “Booko”.

While I was at it I updated the scraping code for Fishpond who have again updated their site. At least they’ve updated it for the better. Compare and contrast the old and new Hpricot XPath code for grabbing the book title and author.

The Old:

book.title     = (doc/"table/tr/td/div/h1").inner_html
book.author    = (doc/"table/tr/td/p[2]/a/font/u").inner_html

The New:

book.title     = (doc/"h1#product_title").first.inner_html
book.author    = (doc/"p#product_author/a").first.inner_html

Much nicer!

Written by dkam

May 28th, 2008 at 11:20 pm

Fragment Caching is go

without comments

I added fragment caching to Booko the other day. I’ve added caching to four sections, the Recent Searchs, Most Popular, Shopping Cart and the section which displays the prices of a book.

Now, instead of calculating the most popular books every time someone views a page, we look to see if that part of the site has been created recently — if it has, use it again. The Shopping cart is another good example — it only changes when you add or remove a book from it. Now Booko avoids having to calculate the cost of your shopping trolley at all shops every time you view a page — it regenerates that part of the page only when you add or remove a book.

Kinda wishing I’d bench marked it before and after. I think it’s faster now — but that’s because I’m looking for it. It feels snappier. Anyway — let me know what you think, faster or slower?

Written by dkam

April 25th, 2008 at 12:07 am

Booko updates

with 2 comments

Well, I’ve added another feature. You can now add books to a cart, which will calculate the price to buy all books in your cart at the various online book stores, including calculating shipping.

It might be slightly problematic for collecting referrer fees as people will probably not be clicking on each of the book links. I’ll have to have a think on that.

Next job is to make it look better.

Written by dkam

March 13th, 2008 at 10:32 pm

Posted in Booko,Development,Ruby