Switch
In SKOOR 6.2 the semantics of the switch
command have changed significantly. In previous versions, the switch
command completely reversed the roles of primary and standby. Since SKOOR 6.2 the switch command only performs a failover from primary to the standby. To get a working replication again, one has to perform a createslave
manually on the new primary. This covers the common use cases much better by providing much faster failovers.
Failover
This can be achieved by running the following command on the current standby or primary. If the primary is not accessible anymore, it must be run on the standby:
# /opt/eranger/bin/eranger-server-replication.pl switch 10.1.0.89 10.1.0.89 checking ssh for user reranger localhost (10.1.0.89) is slave, master 10.1.0.88 is up will convert localhost to master and 10.1.0.88 to slave press ENTER to continue, Ctrl-C to abort >
Press Enter to continue. The output should look like this:
10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 pre s2m 10.1.0.88 calling eranger-server-syncfs.sh 10.1.0.89 10.1.0.88 calling eranger-server-sync-collector-bin.pl 10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 pre m2s 10.1.0.88 OK> 10.1.0.90 removed from eth0 10.1.0.89 from slave to master 10.1.0.89 current master: 10.1.0.88 10.1.0.89 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 httpd already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-report (service eranger-report ).. 10.1.0.89 done 10.1.0.89 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-server (service eranger-server ).. 10.1.0.89 done 10.1.0.89 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-ethd (service eranger-ethd ).. 10.1.0.89 done 10.1.0.89 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-eth-alerter (service eranger-eth-alerter ).. 10.1.0.89 done 10.1.0.89 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-collector (service eranger-collector ).. 10.1.0.89 delete /opt/eranger/collector/ringbuffer1.bin 10.1.0.89 done 10.1.0.89 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 eranger-agent already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-ic-alerter (service eranger-ic-alerter ).. 10.1.0.89 done 10.1.0.89 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 httpd already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 eranger-report already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 eranger-server already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-ethd (service eranger-ethd ).. 10.1.0.89 done 10.1.0.89 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-eth-alerter (service eranger-eth-alerter ).. 10.1.0.89 done 10.1.0.89 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 eranger-collector already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 eranger-agent already running (not starting) 10.1.0.89 done 10.1.0.89 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.89 start eranger-ic-alerter (service eranger-ic-alerter ).. 10.1.0.89 done 10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 post s2m 10.1.0.89 OK> 10.1.0.90 added to eth0. 10.1.0.89 OK> 10.1.0.90 configured in /etc/sysconfig/network-scripts/ifcfg-eth0:0. 10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 post m2s 10.1.0.89 10.1.0.89 checking ssh for user reranger
The script will:
make the current standby the new primary
run any pre-scripts if defined in eranger-replication.cfg
deactivate filesystem synchronization (syncfs service) on the old primary
stop the eranger-server, eranger-collector and eranger-report services on the old primary
start the eranger-server, eranger-collector and eranger-report services on the new primary
start httpd and eranger-agent if not running already on primary and standby
login to all external collectors and change eranger-collector.cfg so that they will deliver their data to the new primary server (does not work with server<n>_address parameter set an http address)
run any post-scripts if defined in eranger-replication.cfg
Activate replication between new and old primary
This step should only be performed when the old primary is still available and it is planned to switch back to the old system again.
To get a running replication from the new primary to the new standby, one has to perfom a createstandby
on the new primary.
/opt/eranger/bin/eranger-server-replication.pl createslave
Virtual IP
The above example was carried out while the old primary was still running and available. After a successful switch end users will have to be notified to use the IP address or DNS name of the new primary instead.
Assigning a virtual IP and moving it along to the active primary with IP-aliasing is supported (see Setup virtual IP address using Pre- and Post scripts) and the corresponding pre- and post-script vip-switch.sh is used in the above example.
Switching back
Before switching back to the old primary, one hast to perform a createslave
on the new primary.
The current primary does not need to be running for the standby to be able to take over. The switch can also be issued on the standby when the current primary is broken (unavailable). Care must be taken, when bringing back online the old primary as it is still configured as primary. Stop the processes eranger-server, eranger-collector and eranger-scheduler on the old primary and issue the createslave command on the current active primary. To go back to the default roles issue again the switch command.
It is assumed that the switch is only temporary and that after bringing the primary back up again another switch back to the original primary is performed. Therefore the IP addresses of primary and standby in the eranger-replication.cfg are not switched when doing a switch. Switching back still works with this setup, only when doing a new createslave operation from the new primary must the IP addresses be switched in eranger-replication.cfg.
SKOOR Engine Status after switch
After the switch, the eRanger.sh status command shows the following output on the new primary:
# /opt/eranger/bin/eRanger.sh status Running /opt/eranger/bin/eRanger.sh with root privileges... eRanger Server installation... Current eRanger Status: Status postgresql: started Status postgresql replication: started Status postfix: started Status rsyslog: started Status snmptrapd: stopped Status http server: started Status eRanger Server: started Status eRanger Collector: started Status eRanger Report: started Status eRanger Agent: started Status eRanger Webservice: started
and the following output on the new standby:
# /opt/eranger/bin/eRanger.sh status Running /opt/eranger/bin/eRanger.sh with root privileges... eRanger Server installation... Current eRanger Status: Status postgresql: started Status postgresql replication: started Status postfix: started Status rsyslog: started Status snmptrapd: stopped Status http server: started Status smsd: stopped Status eRanger Server: stopped (postgresql slave) Status eRanger Collector: stopped (postgresql slave) Status eRanger Report: stopped (postgresql slave) Status eRanger Agent: stopped Status eRanger Webservice: started
Non-Interactive mode
The switch can be run in a non-interactive mode by adding the option -f. Use this if the script needs to run unattended and in case one or some of the collectors has a SKOOR version installed not identical to the one on the primary and standby. The following shows the output when running the switch using -f, on the current standby, i.e. the original primary. This second switch command restores the original roles of primary and standby.
# /opt/eranger/bin/eranger-server-replication.pl -f switch 10.1.0.88 10.1.0.88 checking ssh for user reranger localhost (10.1.0.88) is slave, master 10.1.0.89 is up will convert localhost to master and 10.1.0.89 to slave 10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 pre s2m 10.1.0.89 calling eranger-server-syncfs.sh 10.1.0.88 10.1.0.89 calling eranger-server-sync-collector-bin.pl 10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 pre m2s 10.1.0.89 OK> 10.1.0.90 removed from eth0 10.1.0.88 from slave to master 10.1.0.88 current master: 10.1.0.89 10.1.0.88 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 httpd already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-report (service eranger-report ).. 10.1.0.88 done 10.1.0.88 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-server (service eranger-server ).. 10.1.0.88 done 10.1.0.88 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-ethd (service eranger-ethd ).. 10.1.0.88 done 10.1.0.88 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-eth-alerter (service eranger-eth-alerter ).. 10.1.0.88 done 10.1.0.88 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-collector (service eranger-collector ).. 10.1.0.88 delete /opt/eranger/collector/ringbuffer1.bin 10.1.0.88 done 10.1.0.88 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 eranger-agent already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-ic-alerter (service eranger-ic-alerter ).. 10.1.0.88 done 10.1.0.88 copied file to /var/lib/pgsql/data/ NOTICE: pg_stop_backup complete, all required WAL segments have been archived 10.1.0.88 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 httpd already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 eranger-report already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 eranger-server already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-ethd (service eranger-ethd ).. 10.1.0.88 done 10.1.0.88 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-eth-alerter (service eranger-eth-alerter ).. 10.1.0.88 done 10.1.0.88 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 eranger-collector already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 eranger-agent already running (not starting) 10.1.0.88 done 10.1.0.88 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717. 10.1.0.88 start eranger-ic-alerter (service eranger-ic-alerter ).. 10.1.0.88 done 10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 post s2m 10.1.0.88 OK> 10.1.0.90 added to eth0. 10.1.0.88 OK> 10.1.0.90 configured in /etc/sysconfig/network-scripts/ifcfg-eth0:0. 10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 post m2s 10.1.0.88 10.1.0.88 checking ssh for user reranger