Thanks to all who left an answer for Mitchell’s survey. It turns out this was not so much a maths assignment, as one for a subject called “Human Society and its Environment.”
We had a good discussion about the answers and the write-in comments as we tallied the results and drew the graph. Amongst other things, we talked about why so many people in this sample might be answering ‘E’ and ‘A’, rather than any of the other options (thanks, Laurent), why self-reporting and self-selecting samples are biased (thanks, Charles) why somebody might use the word ‘hate’ in their answer (thanks, “no two”) and why Windows might be considered a religion (thanks, Mickey).
Unfortunately, “leanings toward” Buddism went straight over Mitchell’s head—he’s maybe a year or two away from that level of abstract thought—but I find it interesting, so thankyou for sharing.
Could you please take two minutes to help my son, Mitchell with his homework?
Update: The poll is now closed. Thanks to all!
Mitchell has an elementary statistics project: poll a number of people on a set question and write up the results. Some children would just grab paper and pencil and start running around the neighbourhood. But, being the introvert he is,1 Mitchell finds that kind of thing hard. Another solution might be to ask people at Church on Sunday, but that Just Won't Work for Reasons That Shall Become Clear.
However, at 9 years old, Mitchell is completely in touch with his inner-geek and hatched a cunning plan: get Dad to put the poll on his website. He cleared it with the teacher, and then asked me very, very nicely. So here it is:
Please answer in a comment - single letter and short write-in responses are fine. The fact that you reply is more important than the accuracy of your answer :).
Mitchell sends many thanks in advance.
1Wonder where he got that trait?
“Revenge First. World Peace Second.” So reads the only clever graffitti I see on my commute.
There are some parts of the world—like Palestine, Ireland, and half of Africa—where revenge is a way of life. A perceived initial wrong is revenged with violence. The violence of the revenge is revenged with violence, and the cycle begins. Everybody involved has had wrong done to them, and so everyone involved feels justified having their revenge.
Assuming we all participate, the results of revenge would definitely be world peace—the kind of peace that is only interrupted by the scuffle of cockroach feet.
Getting even is instinctive for the majority of humanity. Why is that some nations and races can restrain themselves, even in the face of being deeply wronged? I’m thinking here of Australian aborigines, South Africa after Apartheid and how post-Holocaust Jews didn’t arbitrarily round up German citizens and execute them.
I think the difference is character: it requires more character to forgive than it takes to stand up for your rights.
Today I’m looking at retro-web. The Symantec brand content filter has decided that stylesheets are potentially evil, and won’t allow them in. If I type in the URL of a stylesheet document, Symantec gives me this message—
Error: Internal proxy error: Unable to scan file due to disk full or other disk device error
I suppose blocking CSS in these circumstances makes sense—IE allows Javascript to be embedded in stylesheets—but I wonder why the proxy is still serving web pages.
Surprisingly, the web is usable without stylesheets. Ugly, but usable. It could be a lot worse.
I’ve settled on using SQLite as the database for my blog/mini-CMS. The blog is going to be running complex queries over all the data on a regular basis, and SQLite seems up to the task.
Other bonuses for SQLite include:
Since it only impacts the code at a few points, going with SQLite is a safe decision. If it turns out that SQLite isn’t up to the task it will be simple to replace it with another RDBMS—or even something else.
This morning I was writing a script that operates on every html file on my hard disk. I was just going to have my script read the output of ”find . -name ’*.html’”, but then I remembered Python 2.3’s os.walk() function. Here’s the code to find my html files:
import osdef htmlFiles():
for (path, dames, fnames) in os.walk(“C:\\”):
for fn in fnames:
if fn.endswith(”.html”):
yield os.path.join(path, fn)
Very neat.
Update: Added the import and the os.path.join().
Charles’s recent post on pain caused by bugs and feature creep put me in mind of a conversation I once had with an architect. [For all us IT grunts: note that this was a real, designs-buildings architect, not a software architect.] This architect found that each line he drew on a blank page meant discarding thousands of possibilities. The start of the creative process filled him with a sense of loss.
Writing software is somewhat similar. When you sit down to a blank screen, your new program can do anything. Then, as soon as you type the first line of code, there are thousands of things your program can’t do. And there isn’t time to write each program that you could.
Why bother with all the pain? So that when the last line is drawn—or typed—you have a real thing, not just a bunch of ideas.
Casey asked me why I want to rewrite my blogging software, and why I don’t plan to use a CMS. Good question. The answer is that, in the face of current circumstances, it seems the “right” thing to do. Here are my current circumstances:
For the past few months though, I’ve had a hankering to create content that isn’t suited to a blog format—content such as thousand word essays, searchable notes on movies I’ve watched and a personal information manager.1
A CMS is probably exactly the right thing to solve this problem, but the size of CMS software frightens me. Zope looks about the same complexity as WebSphere Application Server—meaning about 40 hours of reading manuals, followed by six months hands-on before I knew all the nooks and crannies.
1 The first thing I’m going to do is get blogging working, of course. The second will be to get the Cardboard Checklists working again, and the third will be to write some documentation on the software, starting with collating all these “Blog Replacement” blog entries.
David says that he is, “A Planter of Ideas.”
To which I say, “Sew What?”
I’ve developed a theory of commercial software development, that has made me more accepting of imperfection, less-stressed and happier with my job. I say:
If you try to do your software development exactly right, the project will fail. If you don’t try to do anything right, the project will fail. If you pick just few areas and do them well, the project has a chance.
Trying to do everything exactly right will eventually fail because doing everything exactly right —total planning, precise requirements, detailed designs, thorough reviews, three types of testing—is a huge effort. For all but the most complex of projects, a lot of that effort is wasted. Wasted effort means whoever is paying for the software is paying too much money for what they get. <yoda-voice>Wasted-money leads to tension. Tension leads to disrespect. Disrespect leads to distrust. Distrust leads to failure. Failure leads to suffering.</yoda-voice>1
At the other extreme, not doing anything right is a shortcut to failure. People don’t study software engineering for years and years because it is simple. If a team goes at a software project without planning, requirements, design or testing, they will probably produce a lot of code, but not code that does what the person who is paying for the software wants it to do.
The successful projects I have worked on have taken a middle road. They take limited resources and a non-trivial problem and work out how best to apply one to the other. My theory is that, if you do just a few things really well, it alleviates the pressure in a whole range of areas.
Here are some practices I’ve used, and their effect on the project.2
Each practice affects many areas of the project; picking a good range of mostly non-overlapping practices markedly improves a project’s chance of success.
While there is no magic recipe, exactly which practices the team chooses to do doesn’t matter as much as that they do choose some and then do them.
I don’t stress when the project isn’t doing everything right, or even when it isn’t doing all the things that I think it should be doing, so long as some things are being done right and nobody is trying to do everything right. I have a much better tolerance for software development process “imperfections” than I did a few years ago. In fact, I always welcome a few.
1It’s rare for a team of more than two people to agree on what ‘exactly right’ means anyway. Disagreement leads to tension. Tension leads to…
2I’m not recommending any or all of these for your project. You decide what’s right for your project, and why.
I’ve been toying with an object model for my blog replacement. Here it is, complete with compulsory Poseidon logo background:

The model centres around ‘Article’. Articles are content provided by a user. An Article can be a blog entry, a comment, a longer essay or a note on a movie that I’ve seen.
Articles have a type, which is defined by their ArticleTemplate. ArticleTemplates are arranged in a hierarchy.
The attributes of an article are held as key-value pairs. The AttributeDefinitions of the article’s template define what these attributes are. AttributeDefinition objects can be used to validate individual attributes on an Article.
With this mechanism, the attributes of a blog entry can differ from a longer article, although they will share some common attributes – an author name, a title, body text.
Finally, articles can belong to multiple categories. This is to help classify blog entries and essays, although it may be less useful to other types of article.
Looking back, I seem to have created a generic object model, somewhat similar in purpose to Roundup’s, though much simpler. I have a nagging feeling that getting to terms with one or more real content management systems would be helpful right now, if I had the time.
David Pinn on positive feedback and user interface design:
You like the computer to go “clunk” when you choose something.
Nicely put.
The other day I wrote that I planned to use Twisted to build my own blogging software.
This means using Python. There were a few different reasons for choosing Python:
Having decided on Python, Ian Bicking’s Web Framework Shootout was my next stop. Because I’d like to use Python commercially, I was looking for a framework that a) is already in wide use, b) is well documented, c) can be used to build functionality stunningly quickly (at least, compared to J2EE) and d) would be acceptable to the clients and other developers of the company I work for.
The two frameworks that stand out for me were Twisted and Webware. I suspect that I will find these two frameworks to be approximately equivalent in terms of utility, with Webware more palatable to our developers due to their Servlet/JSP/EJB background. However, I didn’t want to pass up Twisted without having tried it at least once, so Twisted got the guernsey for my first big project, and I’ll use Webware for the next one.
The Nokia 6600 is on sale in Australia. Apparently it will soon run Python, but I can’t find it on Nokia’s forums, yet.
I suppose this gives me time to convince my wife that I need a new phone, and for the phone’s price to drop from 999AUD.
When I read a blog entry written by someone I know from “real life”, I perceive it as being read in their voice, complete with their inflections, phrasing, facial expressions and hand gestures.1 If you’ve met me, you might “hear” cardboard.nu in the same way.
But what readers that don’t know me? What do they hear when they read my blog?
My sister Kate recently had a snippet from her blog voiced by none other than Diver Dan. She accomplished this simply by placing faux-XML ‘ddv’ (meaning Diver Dan Voice) tags around the text. Worked for me. Pretty clever, really.
Well, if Kate can have a celebrity voice, then so can I. But who? It has to be somebody who (a) has a solid, pleasant voice, (b) is well known to the typical cardboard.nu reader, (c) has a reputation worthy of the weighty topics discussed here and (d) won’t charge royalties.
Any suggestions?
1 Charles Miller once did an audio blog entry. Casey’s blog cracks me up, because I hear it in his Pirate Song voice.
We don’t need no steenkin’ Refactorin’ Browsa. We got perl -i -p -e.
Be afraid. Be very afraid.
Tedious already quoted this bit from the Feb 11 Whitehouse Press Briefing, but it bears repeating.
This is a work of art. I stand in awe, pondering how many reporters will be ‘on-message’, and how many of their readers and viewers will believe it.
Q Coming back to John’s question real briefly. One of the questions that remain after the release of the documents yesterday involves the President’s physical in 1972. Are you guys talking about what happened there and why he didn’t take—
MR. McCLELLAN: I think this was all addressed previously. I think that, again, this goes to show that some are not interested in the facts of whether or not he served; they’re interested in trolling for trash and using this issue for partisan political gain.
Q What was the answer previous to this?
MR. McCLELLAN: What’s the question?
Q On the question of—
MR. McCLELLAN: See, I mean, there are some that want us to engage in gutter politics. I’m not going to engage in gutter politics. I’m going to focus on what we’re doing—
Q But you were suggesting you’d answered the question previously.
MR. McCLELLAN:—to address the priorities for the American people. We went through this in 1994, I believe again in ‘98, 2000. Now some are trying to bring it up again in 2004.
Q Scott, can I ask, in 2004, just again, why did the President miss his physical?
MR. McCLELLAN: I’m sorry?
Q Why did the President miss his physical?
MR. McCLELLAN: Are you talking about when he—whether or not he—I put out a response to that question yesterday, about whether or not he was rated by his commanders as a pilot.
Q Can I just ask you today, in 2004—
MR. McCLELLAN: No.
Q—why he missed his physical?
MR. McCLELLAN: Elisabeth, there are some that—again, this is a question of whether or not he served. That question has been answered through the documents that were released yesterday, and released previously.
Q I just want to hear from the White House Press Secretary—
MR. McCLELLAN: I’m not—no, there are some—Elisabeth, we’ve already addressed this issue. I’m not going to engage in gutter politics. I’m going to focus on what we’re doing to make the world safer, to make the world a better place, and to make America more prosperous. If others want to engage in gutter politics, that’s their choice. But I think that—
Q How is asking that question engaging in gutter politics?
MR. McCLELLAN: But I think the American people—I think the American people deserve better.
Q Scott, how does that engage in gutter politics if I ask that question?
MR. McCLELLAN: Well, we’ve been through these issues. I wasn’t accusing you. I’m accusing some—(Laughter.) But, you see, we went through—
Q—the answer to that question today?
MR. McCLELLAN: No, we went through these—no, we went—we’ve already addressed this issue. We went through it previously. We went through it four years ago, for sure.
I tried to list all the ways in which Scott McLellan’s spin offended my sensibilities, but eventually gave up and came to the conclusion that he was reading the script for a Seinfeld episode that was filmed but cancelled in post-production, on account of being “too bizarre.”
I’m going to build a web application—a replacement for my Movable Type blog. I have a rough set of requirements and some vague plans for using Twisted. The next step is to choose an architecture.
The reliability and availability requirements for my blog are very light. There is no need for fail-over or even warm-backup. Likewise, a single machine ought to be able to handle the load from one obscure blog, so a rack of load-balanced quad-SPARCs is not required.
The software side is more interesting. Three basic approaches present themselves:
For sites with highly dynamic content, every page served must be dynamically generated, each time it is to be served. Most Servlet/JSP/J2EE applications work in this manner, as do Wikis.
While this approach offers the ultimate in flexibility and ensures that each page is served from the very latest data, it is costly in terms of computing resources. To serve a high-volume application using this approach requires distributing the application across multiple machines.
A common variation on this architecture is to place the application server ‘behind’ a web server. In this configuration, the web server handles requests for static content such as images and stylesheets, leaving the application server to serve HTML.
Another approach is have an essentially static site, served by a web server, with pages regenerated as required. Movable Type is a good example of this approach.
The web server can handle the the majority of requests directly from the filesystem (big red arrow). When the web server receives request to update content (smaller, purple arrow), it invokes the application code. The application code updates its private data store, then regenerates the pages that have been modified.
This approach generally results in lighter use of server resources than the 100% dynamic approach, though the page regeneration process can be costly.
To make the page regeneration process as efficient as possible, the application regenerates only those pages affected by a change. If the application cannot determine precisely which pages are affected, it must update every page. The mechanism to determine exactly which pages are affected by a change is potentially complex to implement.
A third approach is to serve the application dynamically, but cache the most frequently requested content. The cache can be invalidated either by the application – when the application determines that the content for that page has changed – or on a timer.
The aim is that most requests (big red arrow) can be served from cache. Requests for resources that are not, or cannot be cached are passed through to the application server.
Compared with a 100% dynamic configuration, this approach is more complicated, but uses less resource, meaning that more useres can be supprorted on a single machine. Compared with Regenerated Static pages, caching is more flexible in terms of the kinds of application it can support.
For my blog replacement, I decided to use the 100% dynamic approach, because it gives me the most flexibility, and load is not a big issue.
Further reading: Roy Fielding’s Architectural Styles and the Design of Network-based Software Architectures.
The little badge sensor on the security door near my desk is broken. It goes:
Beep… Beep… Beep… * Pause *
Beep… Beep… Beep… * Pause *
Beep… Beep… Beep… * Pause *
And so on.
It was annoyingly loud until a group from thoughtful security dudes came and put six cardboard boxes and some bubble wrap up against the door frame. Now it is still annoyingly loud, but in a muffled sort of way.
Beep… Beep… Beep… * Pause *
Apparently, we can’t take the sensor apart because it is under warranty, but a technician will be out tomorrow morning.
Beep… Beep… Beep… * Pause *
Beep… Beep… Beep… * Pause *
Update: Woohoo! One of the web guys went and shook the doorframe violently, and the beeping stopped. We shall tiptoe around the door.
On our project, we have two kinds of classes, those that are fully documented, and those that are not.
For some reason, it really ticks me off when somebody adds a method to one of the documented classes and doesn’t document it. Sometimes I add the documentation myself, other times I issue unsubtle reminders.
I know why I am so fascist about documentation: a key objective of this project is that the code be maintainable by novice C++ programmers. Never-the-less, I find the role uncomfortable.
Yesterday, while discussing the usefulness of Purify with Peter, he spake a Fundamental Truth of C++:
C++ has no imperfections. It’s C++ programmers that have the problem.
I’ve been programming by the cut-and-paste, search-and-replace method for the last week. It’s boring, but I’m not sure it’s a bad thing.
We’ve been going for two months now, building the infrastructure as we’ve been building the application. With just one month left to go, we still have more than half the screens1 outstanding, but there are no longer any big surprises. With the pattern set by the infrastructure, each screen follows a simple, three-class recipe; the easiest way to make a new screen is to cut-and-paste an existing screen, search and replace the type names, fiddle with the contents of three or four key methods, and recompile.
We could have spent more time extracting commonality from the code, but chose not to. If we put our minds to it, we could have cut the per-screen line count by half, at the cost of making the infrastructure components more complex. However, one of the project goals is that the code be maintainable by programmers with minimal C++ experience, and we have had to strike a balance between absence of boiler-plate and understandability.
Even with all this boiler-plate, the code is still quite maleable. Adding a single extra field to an existing screen takes maybe an hour, including testing. Adding four fields only takes an hour and a half. This compares favourably with some big, Java based web-apps I have worked on.
So, despite misgivings and the boredom, it looks like I’ll be cutting and pasting for another few weeks. Sigh.
1 Screens, as in 80×25 and green.
There’s a neat article about Twisted over at OnLamp.com. It gives the following snippet to make and run a Twisted web server:
$ mktap web --path=/var/www $ twistd -f web.tap
So I tried it out on Windows, and, amazingly, it Just Works. Definitely simpler than setting up IIS or Apache, once you find out what to do. Impressive.
I have a small server application in mind, and I’ll definitely be giving Twisted a try.
Over on Slashdot, someone wrote that they had $7000 dollars, wanted to spend it providing Internet access to his neighbours and asked for advice on how to do it.
This discussion quickly turned to ethics. Some hold that giving away this amount of money is a waste – that it is somehow stupid to be so generous. Others thought that, if this guy was going to be generous like this, then there are more worthy causes than Internet access.
I wonder why this topic provokes such strong reactions.
Picked up my new laptop today. It is busted. The most obvious symptom is that the touchpad doesn’t work, and it reboots at odd moments, too.
I suspect a memory problem, but I’m not going to put too much effort into tracking it down, because H-P have committed to replacing the entire machine. The dealer kindly loaned me a USB mouse until the new machine comes, which works around the touchpad problem.
The upside of all this is that I now have a machine that I can freely trash over the course of the next week. I intend to do this by trying any and every software package I can lay my greedy little hands on.
The maximum speed that I could possibly download data from the USA via FTP or HTTP is a little over 200kbytes/second.
Both HTTP and FTP are based on the TCP/IP protocol, which provides a reliable stream of data between two points. One of the reliablity mechanisms built into TCP is that a receiver will tell the sender when it has received a chunk of data. If the sender doesn’t get this acknowledgement, it resends the data.
Because of the way that TCP packets are structured, one computer can send up to 64k bytes of data to another without acknowledgement. When the sender has received an acknowledgement for some or all of that 64k, it can send some more data, but not before.
From here in Sydney, the round-trip time to servers in the USA, as reported by ping, is 300 milliseconds, give or take. Some of this – 60 or 70 milliseconds – is due to the speed of light. The rest of it is due to the the routers and switches between here and there.
Therefore, a TCP connection between my server and one of those in the USA can transmit at most 64kbytes per 300 milliseconds, or 213 kbytes per second. If I wanted to receive data any faster than this I would need to look at multiple TCP connections, or a different protocol altogether.
In reality, the fastest connection I have ever had to the USA was about 150kbytes/second. In contrast, I’ve been able to download at 500 kbytes/second from local servers.
After months of choosing, I finally settled on the HP nx7010. Ordered it today with a 1.6GHz Centrino, 512Mb, 60Gb and a WSXGA(1680×1050) screen. Hope to pick it up tomorrow or Thursday.
Other notebooks I considered:
In the week from last Sunday morning to yesterday evening, in no particular order, I:
Oh, and I almost forgot: