The last project I worked on got a lot of cache penetratraion on the first online day. All this happened again, though I was not in first line of development this time. Well, we just shouldn’t expect a perfect opening without any downtime, not if we have not spent enough resource to ensure the system quality, from the availability, performance and functionality view.
It’s highly temptating to be slack on test and fulfill more functionality. We try only one or two path, making some optimistic assumtions. If everything is OK, we assume the test is finished. And we rarely do rehearsal on disaster recovery. When disaster does happen, the measure we applied before would just don’t work.
And there is the simplicity problem. We are so easy to get ourself to a complicated architecture, only because we think it is totally under our control at the time, while it’s not after the complixity builds up and some emergency happens.