Data Center Maintenance
- Continual uptime requires constant maintenance of overall environment
- maintain individual components on scheduled basis and unscheduled
General Maintenance Concepts
- devices in data center are in either:
- production mode
- maintenance mode
- When device is in maintenance mode, operator does:
- Removes all operational instances from the system/device before entering maintenance mode
- must migrate all virtualized instances off the specific systems before beginning maintenance
- to avoid affecting customers
- Prevent all new logins
- to prevent customer usage
- Ensure logging is continued, and begin enhanced logging
- admin activities are more powerful, so requires more logging
- test device functionality before restoring to production
- ensure proper maintenance documentation
Updates
- comply with vendor specific guidance on updates
- demonstrates due diligence
- update process should be formalized in governance policies
- elements:
- Document how, when, and why the update was initiated
- if originated from vendor, include date, update code or number, explanation, and justification
- Move update through the change management process
- all modifications should be through CM and documented
- put into maintenance mode
- apply the updates
- verify update
- validate modifications
- return to normal operations
Upgrades
- updates are applied to existing systems and components
- upgrades are the replacement of older elements with new ones
- follows similar process for updates
- but need to document changes in asset inventory
- removal and addition of elements
- includes secure disposal if relevant
Patch Management
- patches are a variety of update most commonly associated with software
- distinguished by frequency
- vendors issue patches on a regular basis for:
- immediate response to a given need (vulns)
- routine purposes like bug fixes and added functionality
- patch management process follows similarly to updates and upgrades
- but has additional risks for cloud data center
Timing
- There is a risk when vendor issues a new patch:
- if CSP fails to apply patch
- then can be seen as failing to provide due care for customers
- if CSP applies in haste
- can adversely affect production environment (customer operations)
- cause other vulnerabilities, break systems, etc.
- can use scheduled patching in a change control process
Implementation: Automated or Manual
- Automated
- allows for much faster delivery to more targets
- tools can include useful functions
- reporting that identifies which targets received the patch
- cross-referenced on asset inventory
- alerting to identify missed targets
- risk of misapplication of patches without a human observer
- Manual
- trained personnel can be more trustworthy
- but repetitiveness and boredom of process can affect personnel
- much slower and may lose thoroughness over time
Dates
- if targets are configured with varying dates (time zones, etc.)
- then patches may be applied at different times
- increases the risk window of the unpatched system
- virtualization compounds this issue