Archive for the ‘Development’ Category
In the year 2000, I was working on migrating a web application from MySQL to Oracle. One of the issues we had was a number of queries which took advantage of slackness in MySQLs handling of the GROUP BY clause. Oracle took exception to our poor SQL and forced the developers to rewrite queries to be a higher standard. To illustrate the issue, have a look at the following example.
mysql> create table `test1` (`a` int(11), `b` int(11) ); Query OK, 0 rows affected (0.14 sec) mysql> insert into test1 values (1,2), (1,3); Query OK, 2 rows affected (0.09 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> select * from test1; +------+------+ | a | b | +------+------+ | 1 | 2 | | 1 | 3 | +------+------+ 2 rows in set (0.00 sec) mysql> select * from test1 group by a; +------+------+ | a | b | +------+------+ | 1 | 2 | +------+------+ 1 row in set (0.00 sec)
The issue at hand, is the value for the b column. You’ll note that we’ve grouped by a, so we correctly have a single row result. If you’re not sure what the issue is, let’s repopulate the table in a different order and retry.
mysql> truncate test1; Query OK, 0 rows affected (0.10 sec) mysql> insert into test1 values (1,3), (1,2); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> select * from test1; +------+------+ | a | b | +------+------+ | 1 | 3 | | 1 | 2 | +------+------+ 2 rows in set (0.00 sec) mysql> select * from test1 group by a; +------+------+ | a | b | +------+------+ | 1 | 3 | +------+------+ 1 row in set (0.00 sec)
So, the value for b was 2 in the first example, and is now 3, with the same data in the table and the same SQL. The problem is that there is no aggregation or GROUP BY clause for b. There’s only one possible value for a, but b is not aggregated – there are two possible values: 2 or 3. MySQL just picks one of the two possible values. Let’s see what PostgreSQL does.
database=# create temporary table test1 (a integer, b integer); CREATE TABLE database=# insert into test1 values (1,2), (1,3); INSERT 0 2 database=# select * from test1; a | b ---+--- 1 | 2 1 | 3 (2 rows) database=# select * from test1 group by a; ERROR: column "test1.b" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: select * from test1 group by a;
PostgreSQL decides you don’t want undefined / random data in your result set and lets you know. If you’re using GROUP BY queries in MySQL, and rely on the result being, you know, correct, it may be time for a quick audit of your SQL.
A quick search of Stack Overflow questions shows many solutions which include this flaw. The solutions work fine on MySQL, but fail on PostgreSQL.
I’m surprised that 11 years after first learning of MySQL’s handling of the GROUP BY clause, it has the same behaviour enabled by default. MySQL’s behaviour can be modified to behave in a stricter manner. Worth doing if you want to ensure you receive results without including randomly selected data.
New post on Daemons over on the Booko blog.
Just posted on the Booko Blogo an article about implementing Users and Passwords.
Over on the Booko Blog.
I spent several hours yesterday fighting with RubyGems — I’d even written a vitriolic post about it — but I … did something … and bam, like that it was gone. RubyGem is Ruby’s version of Perl’s CPAN. It’s got a very annoying trait — its prodigious use of memory. For each gem ( a gem is a Ruby module — like rails or hpricot for example ) RubyGem would load the spec into memory in order ( I’m guessing after reading a bunch of forum posts ) to build a dependency tree. On a 256MB slice host, this pushes you into swap hell. On a 512MB host it would use up to 68% of memory.
So what can you do but rent a bigger slicehost? Moe Sizlack said it best: “I’m choking on my own rage over here!”
Naturally, it’s been fixed. Today.
If only I’d done something more constructive yesterday. Like played COD.
This Simple XMPP client for Ruby looks like a great way to get some IM loving into an application. I’d love a monitoring system which used IM to alert me to failures. Since IM is not exactly a perfect medium, sending a copy of the message via Email for critical alerts is probably a good idea too.
Could be fun to ask your monitoring IM buddy “Status?” and get a status report. X Emails delivered, Y Emails rejected, Z books looked up on Booko.
Well, apparently I’ve not been consistent enough — an important quality when one bags out others for being inconsistent. So — I’ve updated all references to “Bookie” to the now correct “Booko”.
While I was at it I updated the scraping code for Fishpond who have again updated their site. At least they’ve updated it for the better. Compare and contrast the old and new Hpricot XPath code for grabbing the book title and author.
book.title = (doc/"table/tr/td/div/h1").inner_html book.author = (doc/"table/tr/td/p/a/font/u").inner_html
book.title = (doc/"h1#product_title").first.inner_html book.author = (doc/"p#product_author/a").first.inner_html
Now, instead of calculating the most popular books every time someone views a page, we look to see if that part of the site has been created recently — if it has, use it again. The Shopping cart is another good example — it only changes when you add or remove a book from it. Now Booko avoids having to calculate the cost of your shopping trolley at all shops every time you view a page — it regenerates that part of the page only when you add or remove a book.
Kinda wishing I’d bench marked it before and after. I think it’s faster now — but that’s because I’m looking for it. It feels snappier. Anyway — let me know what you think, faster or slower?
Well, I’ve added another feature. You can now add books to a cart, which will calculate the price to buy all books in your cart at the various online book stores, including calculating shipping.
It might be slightly problematic for collecting referrer fees as people will probably not be clicking on each of the book links. I’ll have to have a think on that.
Next job is to make it look better.