{"id":928,"date":"2022-01-28T15:30:42","date_gmt":"2022-01-28T15:30:42","guid":{"rendered":"https:\/\/www.geekmungus.co.uk\/?p=928"},"modified":"2022-11-05T10:53:18","modified_gmt":"2022-11-05T10:53:18","slug":"dell-ecs-hardware-monitoring","status":"publish","type":"post","link":"https:\/\/geekmungus.co.uk\/?p=928","title":{"rendered":"Dell ECS Hardware Monitoring &#8211; BASH Script"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">IMPORTANT! The below scripts are not supported by Dell nor is there use on a Dell ECS node, you therefore utilise these steps at your own risk. We have these scripts running on Dell ECS nodes in production to collect hardware information and report it back periodically and no issues have been identified yet.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The Dell ECS platform at the time of writing, does not appear to have end user visible hardware fault information visible via the Dell ECS Web GUI, REST API or Email\/Syslog notifications, this therefore limits your ability to see certain types of problem. However these events\/issues are sent to Dell via ESRS therefore they should have visibility of them, even if you do not (directly).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The script(s) below should be placed on each node and set to run periodically, they check the hardware status, process the result and report it back to NagiosXI using a passive check with NRDP.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The script can be found here: <a href=\"https:\/\/github.com\/tristanhself\/general\/blob\/9ee1151e2617db036bbd5fcc4a26d401c3c92a1f\/check_racadm_hardware.sh\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/tristanhself\/general\/blob\/9ee1151e2617db036bbd5fcc4a26d401c3c92a1f\/check_racadm_hardware.sh<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>check_nrdp.sh<\/strong> script is a script available with NagiosXI due to this, I cannot distribute it here, but you can find this directly from NagiosXI.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The process requires you to deploy the script on each node, set the cron job, then create a &#8220;Passive Check&#8221; within your NagiosXI configuration that will update with the status posted from the node. This also involves the use of freshness checking on the NagiosXI configuration so if a node was to fail to report in within a particular time you are alerted and take action.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"DellECSMonitoringandSyslog(Splunk)Configuration-Step1-PuttheMonitoringScriptinPlace\">Step 1 &#8211; Put the Monitoring Script in Place<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Logon with SSH to the node, this needs to be performed on each node you wish to monitor.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Put the <strong>check_racadm_hardware.sh<\/strong> file and the <strong>send_nrdp.sh<\/strong> file into the <strong>\/tmp<\/strong> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ensure they are executable only by root with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo -i chmod +x \/tmp\/check_racadm_hardware.sh\nsudo -i chmod +x \/tmp\/send_nrdp.sh<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"DellECSMonitoringandSyslog(Splunk)Configuration-Step2-SettheCrontoRunAutomatically\">Step 2 &#8211; Set the Cron to Run Automatically<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You also need to perform this step on each node you wish to monitor.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Put the following file into the <strong>\/etc\/cron.d<\/strong> directory with the following contents, this will run the script every 12 hours of every day. Its recommended to stagger this by a few minutes across the nodes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Create and edit with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo -i vi \/etc\/cron.d\/hardware_racadm<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With the contents:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SHELL=\/bin\/bash\nPATH=\/sbin:\/usr\/sbin:\/bin:\/usr\/bin\n* *\/12 * * *    admin  \/tmp\/check_racadm_hardware.sh<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"DellECSMonitoringandSyslog(Splunk)Configuration-Step3-ConfigureNagiosXI\">Step 3 &#8211; Configure NagiosXI<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a passive check  ECS node host in NagiosXI configuration.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then wait for the script to report in periodically. We check  ideally staggered times across all the hosts and sites. It is also recommended to set the freshness check to at least twice the period of the check. I.e. if the check sends every 12 hours (43,200 seconds), you should set freshness check to 24 hours (86,400 seconds).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>IMPORTANT! The below scripts are not supported by Dell nor is there use on a Dell ECS node, you therefore utilise these steps at your own risk. We have these scripts running on Dell ECS nodes in production to collect hardware information and report it back periodically and no issues have been identified yet. The &#8230; <a title=\"Dell ECS Hardware Monitoring &#8211; BASH Script\" class=\"read-more\" href=\"https:\/\/geekmungus.co.uk\/?p=928\" aria-label=\"Read more about Dell ECS Hardware Monitoring &#8211; BASH Script\">Read more<\/a><\/p>\n","protected":false},"author":4,"featured_media":918,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,20],"tags":[],"class_list":["post-928","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-general","category-random"],"_links":{"self":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/928","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=928"}],"version-history":[{"count":1,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/928\/revisions"}],"predecessor-version":[{"id":1346,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/928\/revisions\/1346"}],"wp:attachment":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=928"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=928"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=928"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}