DCS Daily - Compute
All the checks should be completed on the day !!
vSphere & UCS
This info is available under the "Clusters with HA/DRS Turned Off" Tab of the daily health report @ http://dcsaklmgmt1.dcs.local
AK Cisco DPAY 1, WLG Cisco DPAY 1 and HLZ Cisco BYOL 4 should be there as Partially Automated for DRS.
No other Clusters should be listed
This info is available under the "Zombie Files" Tab of the daily health report @ http://dcsaklmgmt1.dcs.local
Make sure the file / folder is safe to delete and Remove the files.
Check the recommendations and apply them as required:
1) Before applying balance datastore space usage alerts please check the SDRS cluster view, if all Datastores are 80% or less utilized then we can simply clear this alert.
2) Some (not all) affinity rules are purposely violated to avoid large disks bursting too hight within the same Datastore etc.
3) Before applying balance datastore I/O workload, please check that it's not suggesting to separate VMDK's that currently reside within the same datastore for burst performance. Other than this, all I/O workload balance rules should be applied (as long as they are not going to fill up the datastore over 80%).
- Please put the appropriate VM's into their related groups.
Talk to Elsa or Sam D initially to understand how this works
Please login to Spectrum(see keepass for details) and check the "Datacom Cloud Services - DCS All Devices" for any alarms.
Please action or log an incident in R12 and clear the alerts.
Note: Recommend using the java console (start console) rather than the web client.
Please also help to check for any brown(maintenance mode = yes) devices under each of the DC's. - you can take a device out of maintenance mode by clicking on the device then under the information tab, change "In Maintenance" to "No".
- DCSAKLZVM1- https://dcsaklzvm1.dcs.local:9669/
- DCSAKLZVM2 (VCD) - https://dcsaklzvm2.dcs.local:9669/
- DCSCHCZVM1 - https://dcschczvm1.dcs.local:9669/
- DCSHLZZVM1 (VCD) - https://dcshlzzvm1.dcs.local:9669/
- DCSHLZZVM2 - https://dcshlzzvm2.dcs.local:9669
- DCSWLGZVM1 - https://dcswlgzvm1.dcs.local:9669
Zerto Virtual Manager (access from Jump server)
From each URL below click Dashboard and check for any issues. If there are no issues then bottom left will report SITE IS OK)
For detail on any issues Click Monitoring Tab from each of those ZVMs.
For any issues with RAAS customers for their on premise use contacts from this spreadsheet for Customer Escalation (Scroll under Zerto columns): https://www.onedatacom.com/client/Datacom/DCS%20Internal/Customer%20Folders%20(DCS%20Sales)/_Customer%20List/Customer%20Listing%20Updated.xlsx
Note: All alerts can be checked via Zerto Cloud Manager as well DCSZCM1 - https://dcszcm1.dcs.local:9989 (See ALERTS from menu or top right if there's any alerts in red)
Daily Diff Check: Check for any Failed Backups - Watch out for Time Since Last Backup
This info is available under the "Daily Backup Report - VMs That Failed Last Backup" Tab of the daily health report @ http://dcsaklmgmt1.dcs.local - Inform the On- Call person about any re-runs needed.
Investigate any Backups that failed more than once or have a pattern
Check Tape Libraries and any Tape Requirements
Check Replication Status
This check needs to be performed from two locations:
1) https://DCSHLZDDMC1.dcs.local - check replication status
2) sign into a master/media server in each location to check for any failed replications in NBU.
Check DataDomain Capacity and Cleaning Schedule and investigate any Alerts
Log into https://DCSHLZDDMC1.dcs.local and check status of all the DDs:
WLG-EMCDD2500-1 (soon to be replaced by WLG-EMCDD3300-1)
CHC-EMCDD7200-1 - (pending decom)
AKL-EMCDD2500-2 (soon to be replaced by AKL-EMCDD6300-2)
Please notify your team leader if any DD is exceeding 60% capacity.
Check all Tape Libraries have Cleaning Tapes and 10+ cleans remaining (if not please order more)
Please make sure EVERY Tape Library has a cleaning tape and check it has more than 10x "Cleanings Remaining" in NBU.
If you find we need a new cleaning tape, please email the Datacenter Ops team in the first instance and Cc DCSEngineers. (they should have spares more often than not).
Weekly/Monthly Backups Check: check for any failed backups that have not been re-run successfully
Please sign into the master/media server in each location and check for any weekly/monthly full backups that have failed and have not been re-run successfully
Use this template in Manifestly
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click