Nagiosによる企業向け監視システムの構築 第1回「監視サーバ構築」

nagiosを用いたサーバー監視システムの構築~監視サーバ構築編

はじめに

本ブログでは、Chefおよび、Vagrantを用いた仮想インフラの構築について取り上げてきました。今回は、構築した仮想インフラの障害監視を行う監視システムの構築方法を2回に分けて解説します。第1回は、サーバー監視ツールのNagiosのインストールから、監視対象サーバの設定方法を解説します。
なお、構築に必要なソフトウエアは、Chefを用いたLAMP開発環境の構築方法~仮想環境構築編を参考にして、インストールして下さい。また、全ての構築作業は、Chefを用いて行います。

監視サーバの構築

構築する監視サーバのベースとなる仮想マシンを作成し、HTTPサーバをインストールします。

  1. Boxの初期化

    ベースとなる仮想マシン(Box)の初期化を行います。

    $ mkdir -p ~/vagrant/nagios-server && cd ~/vagrant/nagios-server
    $ vagrant init nagios-server
    
    $ mkdir -p ~/vagrant/nagios-server && cd ~/vagrant/nagios-server
    
    $ vagrant init nagios-server
    A `Vagrantfile` has been placed in this directory. You are now
    ready to `vagrant up` your first virtual environment! Please read
    the comments in the Vagrantfile as well as documentation on
    `vagrantup.com` for more information on using Vagrant.
    
  2. ベースとなるCookbookのコピー

    仮想環境構築編で作成したCookbookの中から、base/repo/httpdの3つのCookbookをコピーします。

    $ mkdir -p site-cookbooks
    $ cp -r ../chef-lamp/site-cookbooks/{base,repo,httpd} site-cookbooks
    
    4000-02
  3. Vagrantfile設定

    コピーしたCookbookのレシピを指定し、レシピ内で参照する変数を設定します。

    $ vi Vagrantfile
    
    VAGRANTFILE_API_VERSION = "2"
    
    Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
      config.vm.box = "nagios-server"
      config.vm.box_url = "http://files.brianbirkinbine.com/vagrant-centos-65-i386-minimal.box
      config.vm.network :public_network, ip: "192.168.0.50"
      config.vm.synced_folder ".", "/vagrant"
      config.omnibus.chef_version = :latest
    
      config.vm.provision :chef_solo do |chef|
        chef.cookbooks_path = "./site-cookbooks"
        chef.add_recipe "base"
        chef.add_recipe "repo"
        chef.add_recipe "httpd"
    
        chef.json = {
          base: {
            fqdn: "nagios-server.vagrantup.com",
            name: "nagios-server"
          },
          httpd: {
            admin: "root@nagios-server.vagrantup.com",
            fqdn: "nagios-server.vagrantup.com",
            port: 80
          },
        }
      end
    end
    
    4000-03
  4. Boxの起動

    Boxを起動すると、指定したレシピが実行され、HTTPサーバが起動されます。

    $ vagrant up
    
    $ vagrant up
    Bringing machine 'default' up with 'virtualbox' provider...
    [default] Box 'nagios-server' was not found. Fetching box from specified URL for
    the provider 'virtualbox'. Note that if the URL does not have
    a box for this provider, you should interrupt Vagrant now and add
    the box yourself. Otherwise Vagrant will attempt to download the
    full box prior to discovering this error.
    Downloading box from URL: http://files.brianbirkinbine.com/vagrant-centos-65-i386-minimal.box
    Extracting box...ate: 2004k/s, Estimated time remaining: --:--:--)
    Successfully added box 'nagios-server' with provider 'virtualbox'!
    [default] Importing base box 'nagios-server'...
    [default] Matching MAC address for NAT networking...
    [default] Setting the name of the VM...
    [default] Clearing any previously set forwarded ports...
    [default] Clearing any previously set network interfaces...
    [default] Preparing network interfaces based on configuration...
    [default] Forwarding ports...
    [default] -- 22 => 2222 (adapter 1)
    [default] Booting VM...
    [default] Waiting for machine to boot. This may take a few minutes...
    DL is deprecated, please use Fiddle
    [default] Machine booted and ready!
    [default] Configuring and enabling network interfaces...
    [default] Mounting shared folders...
    [default] -- /vagrant
    [default] -- /tmp/vagrant-chef-1/chef-solo-1/cookbooks
    [default] Installing Chef 11.8.2 Omnibus package...
    [default] Downloading Chef 11.8.2 for el...
    
    [2014-01-27T17:16:53+09:00] INFO: remote_file[/tmp/rpmforge-release-0.5.3-1.el6.rf.i686.rpm] updated file contents /tmp/rpmforge-release-0.5.3-1.el6.rf.i686.rpm
    [2014-01-27T17:16:54+09:00] INFO: bash[repo] ran successfully
    [2014-01-27T17:17:11+09:00] INFO: package[httpd] installing httpd-2.2.15-29.el6.centos from base repository
    [2014-01-27T17:17:23+09:00] INFO: service[httpd] started
    [2014-01-27T17:17:23+09:00] INFO: service[httpd] enabled
    [2014-01-27T17:17:23+09:00] INFO: package[httpd] sending create action to template[httpd.conf] (delayed)
    [2014-01-27T17:17:23+09:00] INFO: template[httpd.conf] backed up to /var/chef/backup/etc/httpd/conf/httpd.conf.chef-20140127171723.460652
    [2014-01-27T17:17:23+09:00] INFO: template[httpd.conf] updated file contents /etc/httpd/conf/httpd.conf
    [2014-01-27T17:17:23+09:00] INFO: Chef Run complete in 73.969582632 seconds
    [2014-01-27T17:17:23+09:00] INFO: Running report handlers
    [2014-01-27T17:17:23+09:00] INFO: Report handlers complete
    [2014-01-27T03:16:08-05:00] INFO: Forking chef instance to converge...
    
    4000-06

Nagiosのインストール

Nagiosのインストールから、設定を行うCookbookを作成し、サーバーに適用します。

  1. Nagios Cookbookの作成
    $ knife cookbook create nagios -o site-cookbooks
    
    $ knife cookbook create nagios -o site-cookbooks
    ** Creating cookbook nagios
    ** Creating README for cookbook: nagios
    ** Creating CHANGELOG for cookbook: nagios
    ** Creating metadata for cookbook: nagios
    
  2. レシピの作成
    $ vi site-cookbooks/nagios/recipes/default.rb
    
    %w{
      nagios
      nagios-plugins-all
      perl-Params-Validate
      perl-Nagios-Plugin
      nagios-plugins-nrpe
      mutt
    }.each do |p|
      package p do
        action :install
        options "--enablerepo=epel"
      end
    end
    
    # 連絡先メールアドレスの変更
    script "contact" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    sed -i -e 's|nagios@localhost|#{node[:nagios][:contact]}|' /etc/nagios/objects/contacts.cfg
      EOC
      not_if "grep -q #{node[:nagios][:contact]}  /etc/nagios/objects/contacts.cfg"
    end
    
    # 日付表示方法の変更
    script "format" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    sed -i -e 's|^date_format.*$|date_format = #{node[:nagios][:format]}|' /etc/nagios/nagios.cfg
      EOC
      not_if "grep -q #{node[:nagios][:format]} /etc/nagios/nagios.cfg"
    end
    
    # 監視対象サーバ設定フォルダの有効化
    script "enable remote server config" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    sed -i -e 's|^#cfg_dir=/etc/nagios/servers|cfg_dir=/etc/nagios/servers|' /etc/nagios/nagios.cfg
    mkdir -p /etc/nagios/servers
      EOC
      only_if "grep -q ^'#cfg_dir=/etc/nagios/servers' /etc/nagios/nagios.cfg"
    end
    
    # 管理者パスワード設定
    file "/tmp/.nagios_passwd_done" do
      action :nothing
    end
    script "passwd" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    htpasswd -b /etc/nagios/passwd #{node[:nagios][:user]} #{node[:nagios][:passwd]}
      EOC
      not_if "test -f /tmp/.nagios_passwd_done"
      notifies :touch, "file[/tmp/.nagios_passwd_done]", :immediately
    end
    
    # SSH/HTTPの状態変化通知の有効化
    file "/tmp/.nofitications_enabled_done" do
      action :nothing
    end
    script "notifications" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    sed -i -e 's|notifications_enabled.*$|notifications_enabled 1|' /etc/nagios/objects/localhost.cfg
      EOC
      not_if "test -f /tmp/.nofitications_enabled_done"
      notifies :touch, "file[/tmp/.nofitications_enabled_done]", :immediately
    end
    
    # 死活監視用ダミーファイル作成
    template "index.html" do
      path "/var/www/html/index.html"
      source "index.html.erb"
      mode 0644
      action :create
      # /var/www/html/index.htmlが無かったら作成する
      not_if "test -f /var/www/html/index.html"
    end
    
    # メモリ使用量チェックプラグインのインストール
    remote_file "check_mem" do
      path "/usr/lib/nagios/plugins/check_mem"
      source "https://raw.github.com/jasonhancock/nagios-memory/master/plugins/check_mem"
      mode 0755
      action :create
      not_if "test -f /usr/lib/nagios/plugins/check_mem"
    end
    
    # 自ホスト用メモリ監視コマンドの設定
    script "set local mem usage" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    echo '' >> /etc/nagios/objects/commands.cfg
    echo 'define command {' >> /etc/nagios/objects/commands.cfg
    echo '  command_name  check_mem' >> /etc/nagios/objects/commands.cfg
    echo '  command_line  $USER1$/check_mem -w $ARG1$ -c $ARG2$' >> /etc/nagios/objects/commands.cfg
    echo '}' >> /etc/nagios/objects/commands.cfg
      EOC
      not_if "grep -q check_mem /etc/nagios/objects/commands.cfg"
    end
    
    # 自ホスト用メモリ監視サービスの追加
    script "check local mem service" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    cat<<EOF>>/etc/nagios/objects/localhost.cfg
    
    define service {
      use                 local-service
      host_name           localhost
      service_description Memory Usage
      check_command       check_mem!80!90
    }
    EOF
      EOC
      not_if "grep -q check_mem /etc/nagios/objects/localhost.cfg"
    end
    
    # 自ホスト用SWAP監視サービスの修正
    script "check local swap service" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    sed -i -e 's/check_local_swap\!20\!10/check_local_swap\!20\%\!10\%/' /etc/nagios/objects/localhost.cfg
      EOC
      not_if "grep -q 'check_local_swap\!20\%\!10\%' /etc/nagios/objects/localhost.cfg"
    end
    
    # 監視用NRPEコマンドの登録
    script "set nrpe command" do
      interpreter 'bash'
      user "root"
      code <<-EOC
    echo '' >> /etc/nagios/objects/commands.cfg
    echo 'define command {' >> /etc/nagios/objects/commands.cfg
    echo '  command_name  check_nrpe' >> /etc/nagios/objects/commands.cfg
    echo '  command_line  $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$' >> /etc/nagios/objects/commands.cfg
    echo '}' >> /etc/nagios/objects/commands.cfg
      EOC
      not_if "grep -q check_nrpe /etc/nagios/objects/commands.cfg"
    end
    
    # サービス起動と自動起動設定
    service "nagios" do
      action [:start, :enable]
    end
    
    4000-08
  3. HTTP死活監視用テンプレートの作成

    Nagiosプラグインのcheck_httpコマンドでは、HTTPサーバーのトップページの読み込みの可否によってシステム監視を行いますので、トップページのテンプレートを作成します。既にトップページが存在する場合、新たにトップページを上書きすることはありません。

    $ vi site-cookbooks/nagios/templates/default/index.html.erb
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    NAGIOS</title>
    </head>
    <body>
    </body>
    </html>
    
    4000-09
  4. レシピの追加

    作成したCookbookのレシピを指定し、レシピ内で参照する変数を設定します。

    $ vi Vagrantfile
    
        chef.add_recipe "nagios"
          nagios: {
            contact: "root@nagios-server.vagrantup.com",
            format: "iso8601",
            user: "nagiosadmin",
            passwd: "ro0tUser"
          },
    
    4000-10
  5. プロビジョニング

    作成したレシピをBoxに適用します。

    $ vagrant provision
    
    $ vagrant provision
    DL is deprecated, please use Fiddle
    [default] Chef 11.8.2 Omnibus package is already installed.
    [default] Running provisioner: chef_solo...
    Generating chef JSON and uploading...
    Running chef-solo...
    [2014-01-27T17:28:39+09:00] INFO: Forking chef instance to converge...
    [2014-01-27T17:28:39+09:00] INFO: *** Chef 11.8.2 ***
    [2014-01-27T17:28:39+09:00] INFO: Chef-client pid: 3088
    [2014-01-27T17:28:40+09:00] INFO: Setting the run_list to ["recipe[base]", "recipe[repo]", "recipe[httpd]", "recipe[nagios]"] from JSON
    [2014-01-27T17:28:40+09:00] INFO: Run List is 
    , recipe[repo], recipe[httpd], recipe[nagios]] [2014-01-27T17:28:40+09:00] INFO: Run List expands to [base, repo, httpd, nagios] [2014-01-27T17:28:40+09:00] INFO: Starting Chef Run for nagios-server.vagrantup.com [2014-01-27T17:28:40+09:00] INFO: Running start handlers [2014-01-27T17:28:40+09:00] INFO: Start handlers complete. [2014-01-27T17:29:11+09:00] INFO: package[nagios] installing nagios-3.5.1-1.el6 from epel repository [2014-01-27T17:29:39+09:00] INFO: package[nagios-plugins-all] installing nagios-plugins-all-1.4.16-10.el6 from epel repository [2014-01-27T17:30:09+09:00] INFO: package[perl-Params-Validate] installing perl-Params-Validate-0.92-3.el6 from base repository [2014-01-27T17:30:17+09:00] INFO: package[perl-Nagios-Plugin] installing perl-Nagios-Plugin-0.35-1.el6 from epel repository
    [/tmp/.nagios_passwd_done] (immediate)
    [2014-01-27T17:30:49+09:00] INFO: file[/tmp/.nagios_passwd_done] created file /tmp/.nagios_passwd_done
    [2014-01-27T17:30:50+09:00] INFO: file[/tmp/.nagios_passwd_done] updated atime and mtime to 2014-01-27 17:30:50 +0900
    [2014-01-27T17:30:50+09:00] INFO: script[notifications] ran successfully
    [2014-01-27T17:30:50+09:00] INFO: script[notifications] sending touch action to file[/tmp/.nofitications_enabled_done] (immediate)
    [2014-01-27T17:30:50+09:00] INFO: file[/tmp/.nofitications_enabled_done] created file /tmp/.nofitications_enabled_done
    [2014-01-27T17:30:50+09:00] INFO: file[/tmp/.nofitications_enabled_done] updated atime and mtime to 2014-01-27 17:30:50 +0900
    [2014-01-27T17:30:50+09:00] INFO: template[index.html] created file /var/www/html/index.html
    [2014-01-27T17:30:50+09:00] INFO: template[index.html] updated file contents /var/www/html/index.html
    [2014-01-27T17:30:50+09:00] INFO: template[index.html] mode changed to 644
    [2014-01-27T17:30:50+09:00] INFO: remote_file[check_mem] created file /usr/lib/nagios/plugins/check_mem
    [2014-01-27T17:30:56+09:00] INFO: remote_file[check_mem] updated file contents /usr/lib/nagios/plugins/check_mem
    [2014-01-27T17:30:56+09:00] INFO: remote_file[check_mem] mode changed to 755
    [2014-01-27T17:30:57+09:00] INFO: script[set local mem usage] ran successfully
    [2014-01-27T17:30:57+09:00] INFO: script[check local mem service] ran successfully
    [2014-01-27T17:30:57+09:00] INFO: script[check local swap service] ran successfully
    [2014-01-27T17:30:57+09:00] INFO: script[set nrpe command] ran successfully
    [2014-01-27T17:30:58+09:00] INFO: service[nagios] started
    [2014-01-27T17:30:58+09:00] INFO: service[nagios] enabled
    [2014-01-27T17:30:58+09:00] INFO: Chef Run complete in 138.358505198 seconds
    [2014-01-27T17:30:58+09:00] INFO: Running report handlers
    [2014-01-27T17:30:58+09:00] INFO: Report handlers complete
    [2014-01-27T17:28:39+09:00] INFO: Forking chef instance to converge...
    
  6. Nagios管理画面へのアクセス

    Nagios管理画面に初めてアクセスすると、ユーザー認証が行われますので、Vagrantfile内で指定したユーザーIDとパスワードを指定します。

    4000-13

    左に表示されたメニューから「Service」を選ぶと監視対象サーバの状態が確認できます。

    4000-14
    http://192.168.0.50/nagios/
    

以上で、監視サーバの構築は終了です。

監視対象サーバの設定

Nagiosのインストール直後の監視対象サーバは、自サーバ(localhost)のみとなっていますので、仮想環境構築編で作成したBoxを監視対象サーバとして追加します。
なお、リモートサーバの監視には、NRPE(Nagios Remote Plugin Executor)を使用します。
NRPEは、Linuxサーバの監視に特化した監視エージェントですので、Linux以外のプラットフォームには対応していませんので、注意して下さい。

  1. NRPE監視対象サーバ設定 Cookbookの作成
    $ knife cookbook create nrpe_config -o site-cookbooks
    
    $ knife cookbook create nrpe_config -o site-cookbooks
    ** Creating cookbook nrpe_config
    ** Creating README for cookbook: nrpe_config
    ** Creating CHANGELOG for cookbook: nrpe_config
    ** Creating metadata for cookbook: nrpe_config
    
  2. レシピの作成
    $ vi site-cookbooks/nrpe_config/recipes/default.rb
    
    # 監視対象サーバをNAGIOSに登録する
    template "nrpe config" do
      path "/etc/nagios/servers/#{node[:nrpe_config][:target_name]}.cfg"
      source "nrpe_config.erb"
      mode 0644
      action :create
      # 登録後、NAGOSサーバを再起動する
      notifies :restart, "service[nagios]", :immediately
    end
    
    4000-16
  3. NRPE監視対象サーバ設定テンプレートの作成

    監視対象サーバに対する監視項目および、監視用のコマンドをテンプレートとして作成します。
    テンプレート化により、さまざまLinuxサーバの監視を同一の監視内容で実施することができます。

    $ vi site-cookbooks/nrpe_config/templates/default/nrpe_config.erb
    
    define host {
      use       linux-server
      host_name <%= node[:nrpe_config][:target_fqdn] %>
      alias     <%= node[:nrpe_config][:target_name] %>
      address   <%= node[:nrpe_config][:target_addr] %>
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    Current Load
      check_command          check_nrpe!check_load
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    Current Users
      check_command          check_nrpe!check_users
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    HTTP
      check_command          check_nrpe!check_http
      notifications_enabled  1
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    MySQL
      check_command          check_nrpe!check_mysql
      notifications_enabled  1
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    Memory Usage
      check_command          check_nrpe!check_mem!20!10
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    PING
      check_command          check_ping!100.0,20%!500.0,60%
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    Root Partition
      check_command          check_nrpe!check_disk
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    SSH
      check_command          check_nrpe!check_ssh
      notifications_enabled  1
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    Swap Usage
      check_command          check_nrpe!check_swap
    }
    
    define service {
      use                    generic-service
      host_name              <%= node[:nrpe_config][:target_fqdn] %>
      service_description    Total Processes
      check_command          check_nrpe!check_total_procs
    }
    
    4000-17
  4. レシピの追加

    作成したCookbookのレシピを指定し、レシピ内で参照する変数を設定します。
    targetから始まる変数には、監視対象サーバのFQDN/エイリアス/IPアドレスを指定します。

    $ vi Vagrantfile
    
        chef.add_recipe "nrpe_config"
          nrpe_config: {
            target_fqdn: "chef-lamp.vagrantup.com",
            target_name: "chef-lamp",
            target_addr: "192.168.0.20"
          },
    
    4000-18
  5. プロビジョニング

    作成したレシピをBoxに適用します。

    $ vagrant provision
    
    $ vagrant provision
    DL is deprecated, please use Fiddle
    [default] Chef 11.8.2 Omnibus package is already installed.
    [default] Running provisioner: chef_solo...
    Generating chef JSON and uploading...
    Running chef-solo...
    [2014-01-27T17:48:42+09:00] INFO: Forking chef instance to converge...
    [2014-01-27T17:48:42+09:00] INFO: *** Chef 11.8.2 ***
    [2014-01-27T17:48:42+09:00] INFO: Chef-client pid: 4153
    [2014-01-27T17:48:43+09:00] INFO: Setting the run_list to ["recipe[base]", "recipe[repo]", "recipe[httpd]", "recipe[nagios]", "recipe[nrpe_config]"] from JSON
    [2014-01-27T17:48:43+09:00] INFO: Run List is 
    , recipe[repo], recipe[httpd], recipe[nagios], recipe[nrpe_config]] [2014-01-27T17:48:43+09:00] INFO: Run List expands to [base, repo, httpd, nagios, nrpe_config] [2014-01-27T17:48:43+09:00] INFO: Starting Chef Run for nagios-server.vagrantup.com [2014-01-27T17:48:43+09:00] INFO: Running start handlers [2014-01-27T17:48:43+09:00] INFO: Start handlers complete. [2014-01-27T17:49:00+09:00] INFO: template[nrpe config] created file /etc/nagios/servers/chef-lamp.cfg [2014-01-27T17:49:00+09:00] INFO: template[nrpe config] updated file contents /etc/nagios/servers/chef-lamp.cfg [2014-01-27T17:49:00+09:00] INFO: template[nrpe config] mode changed to 644 [2014-01-27T17:49:00+09:00] INFO: template[nrpe config] sending restart action to service[nagios] (immediate) [2014-01-27T17:49:03+09:00] INFO: service[nagios] restarted [2014-01-27T17:49:03+09:00] INFO: Chef Run complete in 19.795843861 seconds [2014-01-27T17:49:03+09:00] INFO: Running report handlers [2014-01-27T17:49:03+09:00] INFO: Report handlers complete [2014-01-27T17:48:42+09:00] INFO: Forking chef instance to converge...

以上で、監視対象サーバの設定は終了です。
この段階では、監視対象サーバが起動していませんので、Nagiosの管理画面では、追加したサーバのステータスがPENDING/CRITICALとなっていますが、監視対象サーバにNRPEエージェントをインストールすることで、監視を正常に行うことができます。

まとめ

ここまで、サーバー監視ツールのNagiosのインストールから、監視対象サーバの設定方法を解説しました。第2回は、監視対象サーバにNRPEエージェントをインストールします。

Comments are closed.