Anatomy of a Failure: CFA Website

January 6th, 2013

 

As a result of disasterous fires in our state in 2009 known as Black Saturday there was a Royal Commission into the fires, the management of them and the deaths that occurred as a result.  Coming out of this was a set of recommendations and changes were instigated by a number of parties.  One of the main parties of interest in the handling of fire services in the state is theCFA (Country Fire Authority).

The CFA has to be congratulated for at least the intent of some of their initiatives post Black Saturday.  One of these is their Fire Ready app for mobile phones, and another was a revamp of their website.  A lot of time, and presumably public funds, went into the developments that resulted in a catastrophic failure of the website and app. So what went wrong?

Initial reports suggest that capacity planning was at issue, with the statement made that they had planned for up to 350 hits per second, but the site received 700 hits per second. Much of the blame was placed on the fact that the app was connecting to the same server as the main site, and that part of the solution was that they put the app on its own server. Note the singular 'server' there.  I'm not sure if this was just a reporting problem, trying to make it easier for journalists to understand, or we are really talking about an under-resourced system that is supposedly meant to assist people in the times of greatest need and stress.  So, aside from the fact that both were on the same infrastructure (let us assume there is actually more than one server there), what can we determine about the possible problems that these 700 hits per second were causing.

Firstly, looking at the site details, the app now uses osom.cfa.vic.gov.au, which appears to be fronted by a Squid proxy cache, often used to act as a web site accelerator, so there could be more than one server behind that.  The main website, www.cfa.vic.gov.au is behind a F5 BIG IP appliance, which in its base form is a load balancer, meaning that unless they are spending money for the sake of it, there is at least more than one server providing the service.  The BIG IP appliance is quite a useful beasty, and can provide web site acceleration, using compression and other techniques to make life easier.  So why was there a problem?  And apologies for me starting to get a bit technical from here on in.

Let's first define what a "hit" might be.  When you load a web page, like www.vic.gov.au, you might think that by the time it finishes loading that is one "hit".  And you'd be wrong.  A web page is made up of a number of files. Each image on the site is likely to be a file, as is each stylesheet and javascript file. Each of these files is requested separately and each of these requests results in a "hit".  Taking a look at the structure of the CFA site shows that there are a number of unnecessary files on there.  For instance there are 6 javascript files that could be combined into one.  There are images called from CSS that could be combined into "sprites".  Just performing that alone would change the number of hits the site was getting to load a single page, and in doing so increase its capacity - without resorting to extra hardware or tweaking networking stacks.  Indeed it looks like you could increase the number of page views supported by a factor of at least three, simply by resolving these extraneous hits.

There is more intersting stuff under the covers.  There is no compression used on any of the assets (page data, images, scripts, stylesheets) so that the amout of data having to be sent down the wire is far greater than it needs to be.  A conservative estimate suggests that the site could handle twice the current load simply by turning on compression.  This is usually just a configuration change in the base web server software.  All modern browsers support compression, and it is a rookie mistake not to be using it.

On the same front, most browsers also try and help by actively caching items that don't change often, meaning the next time you go to the site it is far quicker to load as the browser already knows about it.  For this to work you need to make sure your web server is set up to help out.  The CFA site is not.  There are two factors here, ETags and Expire headers.  ETags are supposedly unique ids for assets that allow the browser to check that it has a copy of this and therefore doesn't need to re-download (providing it hasn't expired).  The trouble with these is that if you have more than one server supplying the same data - as you would behind say a BIG IP appliance,  these ETags are likely to be different for the same resource, meaning that the browser thinks it needs to download the item again as it has changed.  The CFA site has ETags turned on.

Expiry headers tell the browser how long it should hold an item in its cache before checking for a new version.  For items that don't change much (like images and stylesheets and scripts) it is common to set a "far future" expiry header, say a year or even a month into the future.  This means the browser doesn't need to worry about these items after the first load.  The CFA site doesn't use Expiry headers.  Fixing these two problems could see at least a 50% if not doubling of the capacity of the servers.

Now, without looking at hardware, operating system or web server software, it looks like the site could have handled between 9 and 12 times the traffic it received on Friday.  All it would have taken was someone with experience in developing high capacity websites, or even a sysadmin with capacity planning skills to have foreseen this and averted what was an obviously avoidable calamity.  What a pity.  I hope someone in CFA is reading this, as despite the assurances made about things being done - I don't see any evidence of even the basics being covered.

Update 2013-01-15:

The minister has announced today that the site, and the app, have been improved such that there will be no repeat of the problems previously seen.  The details of what they've changed were unclear, however a quick check shows that the only thing addressed on the site itself was the compression.  As mentioned above, this was an easy fix and should have been applied before the site went live.  However, the other points were not addressed.

The wording of the minister's announcement suggests that more hardware has been added.  Perhaps the compression module on the Big IP box?   I haven't run any tests to identify if there have been extra servers added, but I suspect that has happened as well.  Yet compression and the other suggestions I've made would take a few minutes of a competent sysadmin's time, so why is it that it took more than a week for compression to be turned on?  Why aren't the other problems being addressed?  Why are we getting vague assurances that the problems are resolved with absolutely no detail as to what was addressed?  My suspicion is that there is a salesperson somewhere driving around in a very expensive car funded by the commission they have received on upselling the CFA.

License confusion and the stripping of rights.

January 4th, 2013

I really thought that the time of license confusion was well past, and we had all settled down and understood the basis, and ramifications of each of the open source licenses.  It seems I was wrong.

Having a look at a piece of code recently I noticed some very familiar code - it was my code, originally released under GPL with the copyright notice clearly stating that it was "part of the collected works of Adam Donnison".  I had done this deliberately and done so with a number of pieces of code that I had built over the years, knowing that they were useful and could be used in other projects.  Imagine my surprise to see that same code, only slightly modified, in another project with the copyright notice removed and the explanation:

 * Note: Previously, this class was mis-licensed as GPL in an otherwise BSD
 *   application. The GPL attempt was in 2003 while the project itself was not
 *   relicensed from BSD to GPL in 2005. In 2007, all further development was
 *   done under the Clear BSD license and all GPL modifications were removed.

Really?  Since when did the BSD license become viral?  Wasn't that the entire reason people complained about GPL and wanted to move to BSD?  There is nothing in the BSD license, or the Clear BSD license that demands that all code in a project be covered by the same license.  Indeed even prior to 2001 dotProject had code that was under the Voxel Public License (ticketsmith) which was more restrictive than BSD, so there had been precedents for differently licensed parts of the code.  Indeed none of the BSD licenses even has the concept of a "project" or "greater work". Dropping the copyright is also a violation of the BSD license, as it is of the GPL, so no matter how you cut the dice, this action was against both the spirit and the letter of the licenses it supposes to uphold.

As to "all GPL modifications were removed", an interesting and, on the face of it, erroneous statement.

I believe I am within my rights to demand that the copyright notice be reinstated, or the code removed.  Now I don't want to get heavy with anyone, but these licenses only work based on strong copyright protection.  My copyright has been violated, and I am now considering what action to take.

This is not the first run-in I've had with this project's developers on their cavalier attitude to copyright notices, but this is by far the most egregious.  I believe their actions were to allow them to make money on the project - in which case I also believe that damages could be sought.  I have no problem with them making money - only not by stripping me of my rights.

Update: I've since spoken with the project lead on the project in question and we've come to an understanding on the issue.

2013 - Promise and Challenges

January 2nd, 2013

Well it's over 4 years now since we went "bush" and whilst we've achieved a hell of a lot on farm in that time, the idea of keeping an ongoing record of what's happening has been less successful. News Years Resolutions are next to useless for us as  there's something in the joint intelligence around here that says as soon as it becomes some sort of mandated activity it doesn't happen. It would be nice to be able to use this blog as we'd wanted - to look back at what's changed... but there's a few photos tucked around on various hard drives around the place and we'd better stick with that for the moment, with the vague idea that we might actually blog a bit more frequently on what's happening in these parts.

But a bit of a retrospective think about things, and where we are at at the moment, and it's all pretty good. Weather aside as this bloody drought is a nightmare of increasingly frightening proportions, there has been very little regret about the move.

Sure the climate up here is harsh, and difficult and very very challenging, and absolutely this idea that the drought has ended is irritating, insulting and frustrating as hell, as we've had little / no rain now all winter and spring, and are now dealing with near enough to empty dams, dry as tinder grasslands and a serious management problem with livestock water, but there are plenty of things on the upside.

The air is clean and clear and wonderful - there's no persistence of smoke lingering over us every day. We come and go, hang washing, sit around outside, open and close windows and generally live free of that constant cloud of toxic fumes that ultimately drove us from the Dandenong Ranges. Obviously people up here do fire preparations, and are more fire aware than we've come across in years, but they do that within reason, and with consideration to everyone around them. There is burning off, but there's a general consideration of wind direction, and impact. Even the DSE, when burning off the bush reserves, do it quickly, efficiently and with some thought to where the smoke is going to go and for how long. Although the Longest Lunch and a burnoff became a close run thing a year or so ago, common sense and co-operation prevailed in the end.

Obviously it's helped by the size of the properties, but mostly, it's an issue of care and consideration and awareness of neighbours. There's also a very nice little warning that comes with your welcome pack from the local council - no point in moving into an agricultural area and then whinging about agricultural activities (occasional headers in paddocks of a night / gas guns firing in paddocks / roosters crowing!).

And that's probably the thing that makes all the hard environmental stuff easier to take. The community up here is fantastic. The combination of artists, farmers, small-holders and broad acre growers, graziers, croppers, grape growers, wineries, workers, labourers, professionals, retired, young, old, just kind of works. This is a community in which, in the main, everyone is accepted for who they are / what they do / and their foibles and individual quirks might be commented on, but mostly with affection and acceptance.

So after 4 years, and in the face of a desperate weather situation, no regrets. A few desires - but pretty simple ones really. Wish it would BLOODY RAIN. A few weeks worth would be good. But not on the Australia Day long weekend - there's a party in town and everyone will want to show up.

She's Behind You! Pantomime comes to Moonambel

November 23rd, 2012

"Once upon a time..."  So began the pantomime "Sleeping Beauty" at the Moonambel hall.

 

 

The words spoken by yours truly in an Irish accent while dressed as a frog.  This was my fifth appearance in a Steve Lane/Gwynnyth van den Bergen production with the ensemble cast that is starting to get quite a reputation around the district.  And for good reason.  The team all works incredibly hard and there is a wealth of talent that Steve and Gwyn manage to winkle out and polish up.

 

 

With evil witches (Salli Argall) and the three fairies, Fairy Floss (Jan Curtis), Fairy Nuff (Tanya Miles) and  Fairy Vazion (Ruth Searle) there was great banter and plenty of boos, hissess and "She's behind you" throughout the night.

 

 

And just to ensure a bit of gender balance in the magical stakes, the wizard (Graeme Akers) makes an appearance with the clownish "Boots" (Bernard Abadie) and myself in a musical interlude.

 

 

Of course you can't have Sleeping Beauty without, well, Sleeping Beauty.  The princess (Claire Farrell) in repose waiting the 100 years for her prince charming (Sanne Malkaer) made quite a sight.

 

 

And in true pantomime tradition we had our own Grand Dame, the queen Popsi Wopsi (Michael Matthews), whose comic monologue got the crowd going and fully immersed in the proceedings.

 

 

"On the day of the christening, the court were all busy..."   And you can't have a court without a king (Adrian van den Bergen) - centre stage.

In the wings we had Steve and Gwyn and Claire's brother Ben did a great job of promt/stage hand.

The entire proceedings were very well received and we even got a write up in a few local papers, along with Facebook and Twitter appearances.

Predictions of a Long Hot Summer

November 23rd, 2012

Needless to say, after the desperately dry conditions we've had all winter and spring, predictions of a long hot Victorian summer have done nothing to increase comfort zones in these parts.

Fire preparations are lurching into shape, and this year we'll be facing the possibility of a really bad grass fire year with reduced dam levels, and a lot more livestock. So that's not going to be any fun at all. Add to that the sorts of weather predictions that are being chucked around, and I doubt there'll be any relaxing hereabouts until sometime in April next year.

Mind you, if one more person tells me the drought's obviously over then I've got a plan for a spot of stress relief. (Where did I put that shovel, must get out and do a bit of swinging practice....)

We're currently considering where we'll be moving the pigs to in the event of the dangerous weeks to come - it will have to be somewhere where we can quickly protect them if necessary and/or allow them to run if things get really bad. Which is an interesting challenge given they are absolute buggers to move when they don't want to move! Then there's the Alpacas, who you can move, as long as you don't mind a bit of a sprint around the paddock at the time. Not my favourite activity at the best of times, but when it's 40 degrees....

Given all the talk of "good seasons" and too much rain in the Southern parts of Victoria, I'm assuming that people don't get out much. There is a distinct difference north and south of the Divide - we see it all the time when we haul up and down the Highway.

But out here it's as dry as a chip - and has been all winter. We've had a long, freezing cold winter with very little rain and a long, hot dry Spring already - with temperatures soaring again on Sunday. Neither of our dams are close to full - one is down to around half capacity.

They are talking about the vague possibility of a sprinkling of rain on Monday, so let's hope like hell. We're just this side of mildly desperate for some.