I finished reading The Curious Incident of the Dog in the Night-time less than twenty-four hours ago, so it was a pleasant surprise when, this morning, I was presented with my very own puzzle-mystery to solve. Better still for one who has just read “The Curious Incident”, the players in this puzzle-mystery are not people, but computer programs!
We have a production process, logsrv, that takes log records from IBM MQ, interprets each one and then inserts it into a database table. A second process, publisher, provides a SOAP web service that reads the database when requested by clients.
+------+ +--------+ +-----------+ +-----------+
| MQ | --> | logsrv | --> | SQLServer | --> | publisher |
+------+ +--------+ +-----------+ +-----------+
|
v
+--------+
| client |
+--------+
This is all written in Java and deployed to Windows. logsrv and publisher run as Windows services (using JavaService), and the database is Microsoft SQLServer. While it is a little complex – and there are good reasons for that, trust me – this system is rock-solid stable. The publisher process, for instance, has a record up-time of 89 days, with no sign of memory leaks or performance. (I think it could run for a year or longer, but unfortunately had to take it down recently, to put in a one-line code change. Oh well.)
All was going fine, until this morning, when I got The Email That Changed The Course Of My Day. Log messages shown at the client were timestamped for ten hours later than they should have been.
If you’re a Java developer, and you live on the east coast of Australia, then that ten hours ought to make you suspicious. Ten hours is the time difference between Australian Eastern Standard Time and UTC.
So the Simon The Sys Admin and I had a look in the database and found A Clue. The database records both WHEN_OCCURRED and WHEN_RECORDED. WHEN_OCCURRED is the timestamp from the record that taken from MQ, while WHEN_RECORDED is the time that the record was received at the logsrv process. According to the database, the logs were being RECORDED ten hours before they OCCURRED.
We checked the Windows system time and timezone, and found they were set correctly. A quick check of the log4j debug output confirmed that the logsrv process was ten hours behind normal time.
So now we had two problems. First, logsrv had the wrong timezone, and second, the web service was publishing the WHEN_RECORDED timestamp instead of the WHEN_OCCURRED timestamp.1
In accordance with the ancient and mystic precepts of Issue Resolution, I asked the Simon The Sys Admin to stop and restart logsrv. This did not correct the issue. The clients still showed the wrong timestamp. Again in accordance with the precepts, the sysadmin elected to Reboot The Server. This time, when the server came up, the clients began to show correct timestamps once more.
Feeling a little better about it all, I wrote an email to the IT manager, reporting that all was now well, and went hunting through the publisher code for this showing-the-wrong-timestamp bug.
Funny. I couldn’t find a bug in the publisher code. The publisher code correctly ignored the WHEN_RECORDED value, only using the WHEN_OCCURRED value.
There was only place where the two values could have been confused: in Hibernate. Now, this wasn’t quite as silly as it sounds. The timestamps had been stored using a Hibernate UserType, something which is not as common as straight data types. We were also using an older version (2.1.6), and hadn’t upgraded. I downloaded 2.1.8 and read through the release notes, but there was nothing appropriate there.
I decided it was time for a cuppa. While waiting for the jug to boil, I realised that I was grasping at straws, and decided to put this particular theory away and see if I could think of something a little more likely.
Just then, a good thing happenned. Simon The Sys Admin emailed me the log4j debug output for the whole day so far, plus some of yesterday. Looking at the logsrv output, I could see that, after restarting the service, it had the wrong idea of system time. However, after the reboot, logsrv suddenly had the right time. Curious. How the service was started should have no effect on the timezone.
I happenned to have a copy of the SQLServer database on hand – one that I’d made only yesterday. A quick couple of queries showed that logsrv only started to put the wrong WHEN_RECORDED timestamp starting yesterday at – doh! – approximately the time I’d manually restarted the logsrv service, breaking the 89 day uptime record.
Still, the logsrv was working fine, apart from putting in the wrong WHEN_RECORDED date.
Then it occurred to me that the publisher likely had the same problem. Sure enough, I had had to restart it yesterday too, and after the restart it had the wrong timezone.
Fortunately, these applications dump the contents of System.properties() to log4j on startup. I could see right there on every start up the value of the user.timezone property. It was set to GMT whenever the service was restarted manually, and Australia/Sydney after the reboot. GMT is what Java defaults to when it can’t determine the timezone, that explained the ten hour difference.
The simple fix is to configure the logsrv and publisher services to pass -Duser.timezone=Australia/Sydney to java.exe on startup. Easy. I can do that next week, since I’ll be deploying a new release then anyway.
Production issue resolved, I went back to my development task.
However, something about this just didn’t make sense. Sure, it works fine when all the processes have the correct timezone, but still I wondered what exactly was causing the client to show the wrong times when the timezone is wrong. logsrv was inserting the correct timestamp into the WHEN_OCCURRED column of the database table. publisher was not supposed to be modifying any dates or times at all.
I nutted it out while I was at the shops. This is what happens when the timezone in the logsrv and publisher is wrongly set to GMT:
logsrv needs to know the timezone that applies to any given point in time. Since there is no explicit timezone, it interprets the “occurred time” as a GMT time.logsrv then writes the occurred time to the SQLServer database.logsrv verbatim, so WHEN_OCCURRED ends up with exactly the same time as was in the MQ record. So far so good.publisher then reads the data from the WHEN_OCCURRED column, assuming it to be a GMT time.publisher then constructs a SOAP message to send to the client. It marshals the “occurred time” as an XMLSchema dateTime. XMLSchema dateTimes include the timezone.Now I’m much happier, because I know that if I have the correct timezone on logsrv, publisher and client, this problem is derinitely solved. I also have a better understanding of how this part of the application will behave if the timezones ever get of sync in future.
——
1 Observant readers will note that I was reasoning backwards. WHEN_RECORDED was certainly off by ten hours, but it was ten hours early, whereas the reported symptom was that times where ten hours late. I’m kind of glad I didn’t notice this, though, as then I would have been really stumped, and may not have persevered past rebooting the PC.
I keep seeing Americans write, “I could care less” in reference to things they don’t care about. This annoys me. I was going to write a rambling article about it, but Paul Brians already wrote a pithy one, Michael Quinion wrote a learned one, and Kevin MacLeod drew a chart. Nice to know I’m not on my own.
I just nuked the last NT4.0 PC in my life, replacing it with shiny new Windows XP. Am I the last person in the world to do this? Who else is still using NT4.0?
Robert X. Cringley doesn’t have a high opinion of IBM. Last week, he wrote, ”...what is IBM? IBM is a disaster-in-the-making”. This week he has another crack at diagnosing the 120 billion dollar company.
If he’s after a consulting job, he’s going about it all wrong.
An interesting read, though, given the wide applicability of his criticisms, I wonder why he’s singling out IBM.
If you can read this, you’re seeing my new web host.
I’ve just moved over to a VPS, hosted at RapidVPS, because I wanted the flexibility to run multiple server processes, and to be able to reconfigure the environment – the firewall, Apache and so forth – at will.
Many thanks to my previous hosting company, python-hosting.com for many years of great service.
By the way, if you’d like to move to open an account at RapidVPS, and you feel like being nice to me, let me know, and I’ll send you a referral email.
public List getImageItems() {
List result = new ArrayList();
for (Iterator iter = items.iterator(); iter.hasNext();) {
ReportItem item = (ReportItem) iter.next();
if (item instanceof ReportImageItem) {
result.add(item);
}
}
return result;
}
@property
def imageItems(self):
return [i for i in self.items if isinstance(i, ReportImageItem)]
I’m just saying. And, no, I’m not putting this on Javablogs.
Someone is emailing me J.R.R. Tolkien’s The Hobbit. This last week, thousands of advertisements for prescription drugs have arrived in my inbox, and, at the bottom of each, is a little bit of the The Hobbit. It makes me smile in a most un-spamly manner. Here’s a sample quotation:
fire, till he talks! I wont have it, said William. I caught him anyway. Youre a fat fool, William, said Bert, as Ive said afore this evening. And youre a lout! And I wont take that from you. Bill Huggins, says Bert, and puts his fist in Williams eye. Then there was a gorgeous row. Bilbo had just enough wits left, when Bert dropped him on the ground, to scramble out of the way of their feet, before they were fighting like dogs, and calling one another all sorts of perfectly true and applicable names in very loud voices. Soon they were locked in one anothers arms, and rolling nearly into the fire kicking and thumping, while Tom whacked at then both with a branch to bring them to their senses-and that of course only made them madder than
I hope they do Dr Suess next. With pictures.
Jeff Croft reckons that Django, a Python web application framework, has a niche with web designers who aren’t nescessarily programmers.
(via GvR.)
For marketing purposes, Australian fast-food outlets propagate the fiction that kebabs are an exotic dish, imported from the north-west Mediterranean. There is however, some dispute as to the kebab’s precise nationality.
Here in my home town of Sydney, kebabs are Turkish. Eight hundred kilometres away in Melbourne, they call “kebabs”, “souvlaki” and think they are Greek. Somewhere between here and there, kebabs change provenance. A colleague from Albury-Wodonga confirmed that kebabs are still Greek on the New South Welsh/Victorian border, so where is the changeover point? And what about north and west of Sydney?
Update: I forgot to say. This post is Andy’s fault.
MAKEMSI makes Windows .msi files. While MAKEMSI does everything it says on the tin, and seems stable and usable, it makes my brain turn inside out.
Last time I came across it, I spent quite a few hours trying to figure out how it works. Let’s see if I remember this correctly. You write a script with a text editor, then right click and tell MAKEMSI to do its magic:
When the VBScript has finished poking and prodding the .msi file, your installer is done. Clever. Very, very clever.