Blag o' dkam

p0wning tubez

Archive for the ‘Ruby’ Category

MySQL’s handling of GROUP BY

without comments

In the year 2000, I was working on migrating a web application from MySQL to Oracle. One of the issues we had was a number of queries which took advantage of slackness in MySQLs handling of the GROUP BY clause. Oracle took exception to our poor SQL and forced the developers to rewrite queries to be a higher standard. To illustrate the issue, have a look at the following example.

mysql> create table `test1` (`a` int(11), `b` int(11) );
Query OK, 0 rows affected (0.14 sec)

mysql> insert into test1 values (1,2), (1,3);
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test1;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
|    1 |    3 |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from test1 group by a;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
+------+------+
1 row in set (0.00 sec)

The issue at hand, is the value for the b column. You’ll note that we’ve grouped by a, so we correctly have a single row result. If you’re not sure what the issue is, let’s repopulate the table in a different order and retry.

mysql> truncate test1;
Query OK, 0 rows affected (0.10 sec)

mysql> insert into test1 values (1,3), (1,2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test1;
+------+------+
| a    | b    |
+------+------+
|    1 |    3 |
|    1 |    2 |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from test1 group by a;
+------+------+
| a    | b    |
+------+------+
|    1 |    3 |
+------+------+
1 row in set (0.00 sec)

So, the value for b was 2 in the first example, and is now 3, with the same data in the table and the same SQL. The problem is that there is no aggregation or GROUP BY clause for b. There’s only one possible value for a, but b is not aggregated – there are two possible values: 2 or 3. MySQL just picks one of the two possible values. Let’s see what PostgreSQL does.

database=# create temporary table test1 (a integer, b integer);
CREATE TABLE
database=# insert into test1 values (1,2), (1,3);
INSERT 0 2
database=# select * from test1;
 a | b 
---+---
 1 | 2
 1 | 3
(2 rows)

database=# select * from test1 group by a;
ERROR:  column "test1.b" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select * from test1 group by a;

PostgreSQL decides you don’t want undefined / random data in your result set and lets you know. If you’re using GROUP BY queries in MySQL, and rely on the result being, you know, correct, it may be time for a quick audit of your SQL.

A quick search of Stack Overflow questions shows many solutions which include this flaw. The solutions work fine on MySQL, but fail on PostgreSQL.

I’m surprised that 11 years after first learning of MySQL’s handling of the GROUP BY clause, it has the same behaviour enabled by default. MySQL’s behaviour can be modified to behave in a stricter manner. Worth doing if you want to ensure you receive results without including randomly selected data.

Written by dkam

September 28th, 2011 at 11:25 pm

Posted in Development,Geeky,Ruby

On Daemons

without comments

New post on Daemons over on the Booko blog.

Written by dkam

August 7th, 2010 at 10:59 pm

On Users and Passwords

without comments

Just posted on the Booko Blogo an article about implementing Users and Passwords.

Written by dkam

September 23rd, 2009 at 11:24 pm

Automating host provisioning

without comments

When testing new stuff for Booko, I sometimes create a new slicehost and build myself a test box. In the past I’ve done this manually, which really isn’t very difficult, but it turns out that Slicehost have an API to allow you to automate this stuff.  With the API you can create, destroy, rebuild and reboot your VPS which is pretty cool.  It also lets you manipulate all your DNS settings. Check out the API for complete details. 

Here’s a script I put together to build a new dev host for me and set it up so it’s ready to use.  It performs the following steps:

  1. Check the Domain name I’ve selected for the host is managed by Slicehost.
  2. Create a new 256MB VPS with Ubuntu 8.04 installed.
  3. Create the domain names for the host (including on the internal interface if required)
  4. Wait for host to build and for Networking and SSH to startup.
  5. Add my SSH key to the root user’s account for passwordless login.
  6. Update and upgrade the host.
  7. Install Puppet  

I plan on having a puppet server setup soon which will take over the rest of the setup, so I’ve installed that. I’ll probably use the Net::SSH stuff to get the host added to puppet (signing certificates and such.)

#!/usr/bin/env ruby

require 'rubygems'
require 'activeresource'
require 'net/ssh'
require 'net/scp'

API_KEY="your_key_goes_here"
SITE="https://#{API_KEY}@api.slicehost.com/"
DEFAULT_TTL = 300

host_name="hostname"
int_host_name="hostname-int"
domain_name="mydomain.com.au"

fqdn = host_name + "." + domain_name
origin = domain_name + "."

##
# Required definitions to access the Slicehost stuff
##
class Slice < ActiveResource::Base
  self.site = SITE
end 

# Address class is required for Slice class
class Address < String; end 

class Zone < ActiveResource::Base
  self.site = SITE
end 

class Record < ActiveResource::Base
  self.site = SITE
end 

##
# Create or update a DNS record
##
def create_host_record(zone_id, host_name, ip_address, ttl = DEFAULT_TTL)
	host_record = Record.find(:first, :params => { :name => host_name, :zone_id => zone_id  } )
	unless host_record.nil?
		host_record.data = ip_address
		host_record.record_type = "A"
	else
		host_record = Record.new(:ttl => ttl, :record_type => 'A', :zone_id => zone_id, :name => host_name, :data => ip_address)
	end
	host_record.save
end

puts "Getting Zone data for \"#{domain_name}\"."
dom = Zone.find(:first, :params => { :o rigin => origin } )

raise "Domain \"#{domain_name}\" not found. Won't be able to create host record." if dom.nil?

puts "Domain exists. Creating slice."

slice = Slice.new(:image_id => 10, :flavor_id => 1, :name => host_name)
slice.save

puts "Slice created. Creating DNS records while it builds."

create_host_record(dom.id, host_name, slice.addresses[0])
create_host_record(dom.id, int_host_name, slice.addresses[1]) unless int_host_name.nil?

puts "DNS created. Waiting for host to build and become active."

while slice.progress != 100 && slice.status != "active"
	puts "Host is #{slice.progress}% complete - host status: #{slice.status}"
	sleep 10
	slice.reload
end

puts "Host built. Waiting for host to startup."

begin
	Net::SSH.start(slice.ip_address, 'root', {:auth_methods => ["password"], :password => slice.root_password}) do |ssh|
		puts "Connected to new host. Bootstrapping."

		puts "Creating .ssh directory and uploading public key."
		ssh.exec "/bin/mkdir -p /root/.ssh/"
		ssh.scp.upload!("/Users/dkam/.ssh/id_dsa_omena.pub", "/root/.ssh/authorized_keys")
		puts "Done!"

		puts "Updating apt and upgrading system."
		ssh.exec!("/usr/bin/aptitude update")
		ssh.exec!("/usr/bin/aptitude dist-upgrade -y")
		puts "Host has been upgraded and updated."

    		puts "Installing puppet."
		ssh.exec!("/usr/bin/aptitude install puppet -y")
		puts "Puppet Installed."
	end
rescue Errno::ENETUNREACH
	puts "Host network not up. Waiting 10 seconds, then retrying"
	sleep 10
	retry
rescue Errno::ECONNREFUSED
	puts "SSH not ready. Waiting 10 seconds, then retrying"
	sleep 10
	retry
end

puts "Done! Enjoy your new host #{fqdn}. You can now ssh root@#{slice.ip_address}"</pre>

Written by dkam

January 26th, 2009 at 5:48 pm

Posted in Booko,Ruby

Tagged with

No more onload=”javascript…” for Booko

with one comment

I’ve taken Phil’s advice and removed the Javascript from the body’s onload event handler, replacing it with Prototype’s event handler.

    <script type="text/javascript">
    Event.observe(
             window,
             "load",
         <%=  remote_function :url => { :action => "get_prices", :isbn13 => @book.isbn13  },:method => :get %>
        );
    </script>

Written by dkam

July 2nd, 2008 at 11:15 pm

Migration to Passenger ( mod_rails )

without comments

Ruby On Rails apps are finally easy to install — mod_rails is here. Just installed it for the Blag ( Typo 5.0.1 on Gentoo Unstable ) and it looks to be working quite nicely. Only roadbump I encountered was problems getting static content to be served. Kept getting:

[error] [client 124.168.83.214] client denied by server configuration: /var/www/da.nmilne.com/htdocs/public/javascripts/typo.js

The problem was pretty straight forward — now that mongrel isn’t serving up the static content, I had to make sure that apache was configured to allow access. Added a this stanza:

  <Directory "/var/www/da.nmilne.com/htdocs/public">
        Options FollowSymLinks
        AllowOverride None
        Order allow,deny
        Allow from all
  </Directory>

Naturally it feels snappier. Nice.

Written by dkam

June 22nd, 2008 at 3:49 pm

Posted in Ruby,SysAdmin

Frustrating

without comments

I spent several hours yesterday fighting with RubyGems — I’d even written a vitriolic post about it — but I … did something … and bam, like that it was gone. RubyGem is Ruby’s version of Perl’s CPAN. It’s got a very annoying trait — its prodigious use of memory. For each gem ( a gem is a Ruby module — like rails or hpricot for example ) RubyGem would load the spec into memory in order ( I’m guessing after reading a bunch of forum posts ) to build a dependency tree. On a 256MB slice host, this pushes you into swap hell. On a 512MB host it would use up to 68% of memory.

So what can you do but rent a bigger slicehost? Moe Sizlack said it best: “I’m choking on my own rage over here!”

Naturally, it’s been fixed. Today.

If only I’d done something more constructive yesterday. Like played COD.

Written by dkam

June 22nd, 2008 at 1:55 pm

Consistency

without comments

Well, apparently I’ve not been consistent enough — an important quality when one bags out others for being inconsistent. So — I’ve updated all references to “Bookie” to the now correct “Booko”.

While I was at it I updated the scraping code for Fishpond who have again updated their site. At least they’ve updated it for the better. Compare and contrast the old and new Hpricot XPath code for grabbing the book title and author.

The Old:

book.title     = (doc/"table/tr/td/div/h1").inner_html
book.author    = (doc/"table/tr/td/p[2]/a/font/u").inner_html

The New:

book.title     = (doc/"h1#product_title").first.inner_html
book.author    = (doc/"p#product_author/a").first.inner_html

Much nicer!

Written by dkam

May 28th, 2008 at 11:20 pm

Booko now with better sessions!

without comments

  • Bug fixed which stopped Booko remembering which Book searching place you last used. ( Bug reported by teh timo )
  • Re-factored the functions which go and get the prices from the stores.
  • Added an attribute to “Book” such that it now remembers when its prices were last looked up. Accidentally introduced a bug where any new book initially had no idea when it was last looked up and crashed the “Recently Viewed” method.
  • Todo — better testing :-(

Written by dkam

May 19th, 2008 at 10:57 pm

Posted in Booko,Geeky,Ruby

Booko improvements

without comments

I was contacted by a chap from The Nile to let me know that they not only have an Affiliate program I could join, but they have an API to access their site! I’ve spent the last couple of days adding the affiliate links. You’ll see that several stores now appear to link to clixGalore — but fear not, the links will still get you to the page you’re after.

You’ll also notice there’s now a new store to search — The Nile. It’s the fastest of the search engines by a considerable margin. ( Which isn’t really fair — at least one of the sites which is searched is just a fancy screen scraper.)

Additionally, I’ve cleaned up the CSS and layout some. Hopefully you won’t have the horizontal scroll bar on your browser, regardless of how wide you open the window.

You’ll also notice that there’s a context sensitive help message in the top right-hand corner. This should make it clear what you can do at which ever page you happen to be on.

Try out the updated Booko and let me know what you think.

EDIT: Added a tool tip style mouse over for the Most Popular/Recent list so you can see the full title. Thanks to Niall & Riina for their help & ideas.

Written by dkam

May 3rd, 2008 at 3:56 pm

Posted in Booko,Geeky,Ruby