Kaseya Community

Weekly Chkdsk of Servers & Workstations

  • Hi Guys,

    I am just after some opinion on running scheduled chkdsk’s every week on servers & workstations as part of a weekly maintenance cycle.

    Can you think of any reasons why this would be a good bad or indifferent?

    For most of our sites we have a nice window every weekend where we can conduct weekly maintenance without causing interruption to the client’s. Are there any risk’s / reasons anyone can think of for / against doing this?

    Cheers!

    chris021

    Legacy Forum Name: Weekly Chkdsk of Servers & Workstations,
    Legacy Posted By Username: chris021
  • If you have a long enough window to do it, then I don't see a good reason not to on the workstations. It's pretty safe to run, can improve performance, and reduce the risk of a system failing due to something silly like one or two bad sectors.

    However, chkdsk has certain risks associated with it as it pre-empts the OS. I have read of cases where a chkdsk fails, and the PC is then stuck in an endless loop. If chkdsk can never complete, the OS never loads, and nothing you can do remotely will fix that. While we have never experienced this ourselves, it's always a rule to never remotely chkdsk a server unless we really have no other choice. So for servers, I wouldn't say the risk is acceptable. But for workstations (which are quick to restore/repair in the event of a failure), go for it.

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: arobar
  • Thanks Alex,

    that was pretty much my thoughts. Would be interesting to see if anyone has any examples of what can go wrong with chkdsk'in servers :-) I belive a few guys on here do it monthly and 6 monthly as part of a shedule.

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: chris021
  • i have had one machine which i was cloneing to a larger drive, it refused to clone due to bad sectors, suggested a chkdsk.

    ran chkdsk, machine then would not boot, BSOD and ended up using ghost and fixing the mbr and something else.

    it was a crappy server but still, it can cause issue and i dont like working until the real early morning to sort out those sort of issues =\

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: philipj@itwest.biz
  • The key here is to run chkdsk (note, no /f) and get the results. Then act based on the results. I have a script that runs checkdisk on all workstations weekly, along with other maintenance items, and spits out a report. Kaseya reads the report, and generates a ticket and logs the data. A technician then manually handles the errors. I may in the future have kaseya auto-schedule the chkdsk /f and prompt the user to reboot and let run.

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: boostmr2
  • boostmr2
    The key here is to run chkdsk (note, no /f) and get the results. Then act based on the results. I have a script that runs checkdisk on all workstations weekly, along with other maintenance items, and spits out a report. Kaseya reads the report, and generates a ticket and logs the data. A technician then manually handles the errors. I may in the future have kaseya auto-schedule the chkdsk /f and prompt the user to reboot and let run.


    That seems a very roundabout way of doing it. Why don't you just create an event set to read the event log and let event log monitoring create tickets / run scripts appropriately?

    Andrew

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: andrew.doull@computer-care.com.au
  • That sees more roundabout than my solution. I have a single script that executes, and generates a ticket where an error occurs in the chkdsk results. You are saying, run a script, set up event log monitoring to read errors (I'm not even sure if chkdsk logs errors in the event log?) and then generate a ticket.

    Either way, my solution works ok. It's amazing how many machines have errors.

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: boostmr2
  • boostmr2
    It's amazing how many machines have errors.


    That's the scary thing. I'd say close to 90% of servers and workstations have NTFS errors that Chkdsk picks up. Given the level of NTFS corruption that I see, I have difficulty trusting Microsoft as an operating system vendor.

    Its a great way to pick up customers though. You can almost guarantee that there'll be problems here.

    Andrew

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: andrew.doull@computer-care.com.au
  • I've put together a decent process for this.

    Currently, I watch for System warnings 7,11, and 52. If these are generated, a script reacts that runs chkdsk in read-only mode. Occassionally, the chkdsk can't finish due to a process such as indexing, so if the chkdsk is incomplete I have the script re-schedule itself for an hour later and log an event error 999 in the system log. If this happens three times in a 7-day period (where the chkdsk doesn't finish completely,) then the event monitoring set generates an alarm and someone will manually look at it. If at any time in this process the chkdsk completes and returns that problems were found in the filesystem, a ticket is created requesting a technician to arrange for chkdsk /f to be performed to correct it.

    The Event Log monitoring set should be assigned to check for Event Source ChkDsk and ID 999 to be logged three times in 7 days. I have the counter setup to reset itself after another 3 days, but that value can be speculated and may be better set at 1. I'm also a bit neurotic and define my variables in every script instead of using the system variables, but you could easily replace the variable steps in the scripts and use the system variables in their place.

    Here are the scripts:

    Script Name: Chkdsk Step1
    Script Description: This script runs chkdsk in read-only mode and dumps a log file to the agent temp directory called chkdsklog.txt. A second script is called to read the results of this script.

    IF True
    THEN
    Get Variable
    Parameter 1 : 10
    Parameter 2 :
    Parameter 3 : AgentTemp
    OS Type : 0
    Execute Shell Command
    Parameter 1 : chkdsk c: >> #AgentTemp#\chkdsklog.txt
    Parameter 2 : 1
    OS Type : 0
    Get Variable
    Parameter 1 : 1
    Parameter 2 : #AgentTemp#\chkdsklog.txt
    Parameter 3 : chkdsklog
    OS Type : 0
    Execute Script
    Parameter 1 : Chkdsk Step2 (NOTE: Script reference is NOT imported. Correct manually in script editor.
    Parameter 2 :
    Parameter 3 : 0
    OS Type : 0
    ELSE



    Script Name: Chkdsk Step2
    Script Description: Part 2 of 2. This script parses the chkdsk log looking for the existence of 'total disk space.' and re-runs the script in an hour if it is not discovered. If the value is found, Chkdsk Step3 is executed to find mention that 'Windows found problems with the file system.' The chkdsklog.txt file is deleted after the script executes.

    IF Check Variable
    Parameter 1 : #chkdsklog#
    Not Contains :total disk space.
    THEN
    Get Variable
    Parameter 1 : 6
    Parameter 2 :
    Parameter 3 : MachineID
    OS Type : 0
    Write Script Log Entry
    Parameter 1 : Chkdsk has found issues with indexes and security descriptors and did not finish. Scheduling chkdsk to run again in read-only mode in 60 minutes.
    OS Type : 0
    Schedule Script
    Parameter 1 : 10380010
    Parameter 2 : 60
    Parameter 3 : #MachineID#
    OS Type : 0
    Execute Shell Command
    Parameter 1 : eventcreate /L System /T Error /SO ChkDsk /ID 999 /D "Chkdsk has found issues with indexes and security descriptors and did not finish. Scheduling chkdsk to run again in read-only mode in 60 minutes."
    Parameter 2 : 1
    OS Type : 0
    ELSE
    Execute Script
    Parameter 1 : Chkdsk Step3 (NOTE: Script reference is NOT imported. Correct manually in script editor.
    Parameter 2 :
    Parameter 3 : 0
    OS Type : 0



    Script Name: Chkdsk Step3
    Script Description: This script checks the log for indications that check disk should be run against the drive with the /f parameter.

    IF Check Variable
    Parameter 1 : #chkdsklog#
    Contains :Windows found problems with the file system.
    THEN
    Get Variable
    Parameter 1 : 10
    Parameter 2 :
    Parameter 3 : AgentTemp
    OS Type : 0
    Get Variable
    Parameter 1 : 6
    Parameter 2 :
    Parameter 3 : MachineID
    OS Type : 0
    Get Variable
    Parameter 1 : 1
    Parameter 2 : #AgentTemp#\chkdsklog.txt
    Parameter 3 : chkdsklog
    OS Type : 0
    Send Email
    Parameter 1 : (alert/monitoring email address)
    Parameter 2 : #MachineID# - Chkdsk found problems with the filesystem. Please arrange for 'chkdsk /f' to be run.
    Parameter 3 : #chkdsklog#
    OS Type : 0
    Delete File
    Parameter 1 : #AgentTemp#\chkdsklog.txt
    OS Type : 0
    ELSE
    Get Variable
    Parameter 1 : 10
    Parameter 2 :
    Parameter 3 : AgentTemp
    OS Type : 0
    Write Script Log Entry
    Parameter 1 : Chkdsk found no errors in the filesystem.
    OS Type : 0
    Delete File
    Parameter 1 : #AgentTemp#\chkdsklog.txt
    OS Type : 0



    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: drodden
  • Thanks for sharing your chkdsk script. I am trying it out and I have few questions:

    In step 2 script, do you reschedule script step2 or step 3 after 60 minutes?

    In Step 3, do it only read the log from step 2 and email you to schedule chkdsk /f manually?

    Thanks,
    MK

    Legacy Forum Name: IT Procedures,
    Legacy Posted By Username: Matthew@eSudo.com