Data Center Maintenance


  • Continual uptime requires constant maintenance of overall environment
    • maintain individual components on scheduled basis and unscheduled

General Maintenance Concepts

  • devices in data center are in either:
    • production mode
    • maintenance mode
  • When device is in maintenance mode, operator does:
    • Removes all operational instances from the system/device before entering maintenance mode
      • must migrate all virtualized instances off the specific systems before beginning maintenance
        • to avoid affecting customers
    • Prevent all new logins
      • to prevent customer usage
    • Ensure logging is continued, and begin enhanced logging
      • admin activities are more powerful, so requires more logging
        • and at greater detail
  • test device functionality before restoring to production
  • ensure proper maintenance documentation

Updates

  • comply with vendor specific guidance on updates
    • demonstrates due diligence
  • update process should be formalized in governance policies
    • elements:
      • Document how, when, and why the update was initiated
        • if originated from vendor, include date, update code or number, explanation, and justification
      • Move update through the change management process
        • all modifications should be through CM and documented
        1. put into maintenance mode
        2. apply the updates
        3. verify update
        4. validate modifications
        5. return to normal operations

Upgrades

  • updates are applied to existing systems and components
  • upgrades are the replacement of older elements with new ones
  • follows similar process for updates
    • but need to document changes in asset inventory
      • removal and addition of elements
    • includes secure disposal if relevant

Patch Management

  • patches are a variety of update most commonly associated with software
  • distinguished by frequency
    • vendors issue patches on a regular basis for:
      • immediate response to a given need (vulns)
      • routine purposes like bug fixes and added functionality
  • patch management process follows similarly to updates and upgrades
    • but has additional risks for cloud data center

Timing

  • There is a risk when vendor issues a new patch:
    • if CSP fails to apply patch
      • then can be seen as failing to provide due care for customers
    • if CSP applies in haste
      • can adversely affect production environment (customer operations)
        • cause other vulnerabilities, break systems, etc.
  • can use scheduled patching in a change control process

Implementation: Automated or Manual

  • Automated
    • allows for much faster delivery to more targets
    • tools can include useful functions
      • reporting that identifies which targets received the patch
        • cross-referenced on asset inventory
      • alerting to identify missed targets
    • risk of misapplication of patches without a human observer
  • Manual
    • trained personnel can be more trustworthy
    • but repetitiveness and boredom of process can affect personnel
    • much slower and may lose thoroughness over time

Dates

  • if targets are configured with varying dates (time zones, etc.)
    • then patches may be applied at different times
  • increases the risk window of the unpatched system
  • virtualization compounds this issue