r3 - 11 Jan 2007 - 20:35:46 - ThomasLimoncelliYou are here: TWiki >  TPOSANA2 Web  > ServerUpgradesChapter
NO MORE UPDATES TO THIS PAGE PLEASE. SUBMIT ALL FUTURE COMMENTS TO tposana-bugs_at_wingfoot_org.gif

11.1 pg 214

Another question to ask before upgrading; can the service be run from a different server - eg a contingency server or whether it can be "swung" over to a temporary server ("swing server") during the upgrade, so service is still available even if the machine is not.

11.1.1 Step 1 pg 216

Note that sometimes the "service" is invisible. For example a network health check test script might simply "ping" the server and if there is no response or if the response takes too long then raise an alert

11.1.1 Step 3 pg 217

As much as it's against automation rules, sometimes the human eyeball is the best fuzzy difference detector. You could spend hours determining mangling up an old report so it looks like the new one, or spend 3 minutes checking line by line by eye.

11.1.1 Step 3 pg 218

This section can also apply for testing after external changes. For example, Y2K tests.

Anecdote: A large business area had 400 Unix servers that they wanted to test just after midnight Y2K to ensure that core functionality of the operating system and associated infrastructure was working correctly (application correctness was for the application support teams to deal with). A series of tests non-invasive tests were created, each with a PASS/FAIL response (eg is the box up, can we login, can it see the YP servers, is the time correct, can it resolve DNS, can it mount from the NFS servers and read a file, is the automounter working and so on). Using a central administration point the tests could be fired off on multiple boxes at a time and the results pulled back. All 400 boxes were tested within 20 minutes and the team was able to report their PASS to the Y2K tracking management team well in advance of other, smaller, units. So popular did the tests become with the SA team that they became part of the daily monitoring of the environment. An extra benefit was found after a network outage. Solaris 2.5.1 and Solaris 2.6 automounters had a habit of freezing in obscure cases, and a network outage was likely to cause this problem on at least 1 of the servers. The health checks allowed the team to test their whole environment on demand and find servers that needed remedial action taken. (Why, yes, I am pretty proud of that code; it's still in use today!)

11.1.1 Step 5 pg 220

The "Scotty" quote. The "Relics" episode was 1992 (co-incidentally SpikeTV? showed it today at 5pm). However the joke goes back to "Star Trek III: The Search For Spock" in 1984.
Kirk: How long to re-fit?
Scotty: Eight weeks. But you don't have eight weeks, so I'll do it for you in two.
Kirk: Do you always multiply your repair estimates by a factor of four?
Scotty: How else to maintain my reputation as a miracle worker?
Kirk: Your reputation is safe with me.

(geek geek geek)

11.1.1 Step 11 pg 223

A good "change management" system will also have a "close" section where scheduled changes can be closed with predefined results such as "implemented according to plan" or "implemented but exceeded change window" or "partial implemention; more work to be done" or "failed; change backed out" or "failed; service unusable; end of world predicted". Well, Ok, not quite the last one smile The change management team can be tasked to review the changes that were scheduled and can follow up on abnormal results and verify the customer is at least informed and aware of the issues and that plans are in place to correct the problem.

11.2.1 pg 224

Note that some services may only be used once a year (eg financial year end reports). It is vitally important that decomissioning services is done via the change management process as part of the "cover your arse" protection for the SA.

Case Study pg 227

Just FYI in case you need to do this again, most OS's will send out a "gratuitous ARP" when the interface is configured, so a down/up of the interface would cause the packet to be sent. This will be picked up by machines in the broadcast domain and cause an instantaneous update of ARP caches. This is, basically, how IP address migration works in a cluster when a node fails.

-- StephenHarris - 16 Aug 2006

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
key Log In or Register
Log In or Register

Main Web Users Groups Index Search Changes Notifications Statistics Preferences


Webs AprilFoolsRFCs? EduResources? Main Sandbox TM2SA TPOSANA2 TWiki Log In or Register

Main Web Users Groups Index Search Changes Notifications Statistics Preferences


Webs AprilFoolsRFCs? EduResources? Main Sandbox TM2SA TPOSANA2 TWiki porn free porn


 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback