Switch

In SKOOR 6.2 the semantics of the switch command have changed significantly. In previous versions, the switch command completely reversed the roles of primary and standby. Since SKOOR 6.2 the switch command only performs a failover from primary to the standby. To get a working replication again, one has to perform a createslave manually on the new primary. This covers the common use cases much better by providing much faster failovers.

Failover

This can be achieved by running the following command on the current standby or primary. If the primary is not accessible anymore, it must be run on the standby:

# /opt/eranger/bin/eranger-server-replication.pl switch

10.1.0.89 
10.1.0.89 checking ssh for user reranger


localhost (10.1.0.89) is slave, master 10.1.0.88 is up

will convert localhost to master
and 10.1.0.88 to slave

press ENTER to continue, Ctrl-C to abort >

Press Enter to continue. The output should look like this:

10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 pre s2m
10.1.0.88 calling eranger-server-syncfs.sh 10.1.0.89
10.1.0.88 calling eranger-server-sync-collector-bin.pl
10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 pre m2s
10.1.0.88 OK> 10.1.0.90 removed from eth0
10.1.0.89 from slave to master
10.1.0.89 current master: 10.1.0.88
10.1.0.89 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 httpd already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-report (service eranger-report )..
10.1.0.89 done
10.1.0.89 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-server (service eranger-server )..
10.1.0.89 done
10.1.0.89 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-ethd (service eranger-ethd )..
10.1.0.89 done
10.1.0.89 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-eth-alerter (service eranger-eth-alerter )..
10.1.0.89 done
10.1.0.89 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-collector (service eranger-collector )..
10.1.0.89 delete /opt/eranger/collector/ringbuffer1.bin
10.1.0.89 done
10.1.0.89 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 eranger-agent already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-ic-alerter (service eranger-ic-alerter )..
10.1.0.89 done
10.1.0.89 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 httpd already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 eranger-report already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 eranger-server already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-ethd (service eranger-ethd )..
10.1.0.89 done
10.1.0.89 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-eth-alerter (service eranger-eth-alerter )..
10.1.0.89 done
10.1.0.89 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 eranger-collector already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 eranger-agent already running (not starting)
10.1.0.89 done
10.1.0.89 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.89 start eranger-ic-alerter (service eranger-ic-alerter )..
10.1.0.89 done
10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 post s2m
10.1.0.89 OK> 10.1.0.90 added to eth0.
10.1.0.89 OK> 10.1.0.90 configured in /etc/sysconfig/network-scripts/ifcfg-eth0:0.
10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.89 post m2s
10.1.0.89 
10.1.0.89 checking ssh for user reranger

The script will:

  • make the current standby the new primary

  • run any pre-scripts if defined in eranger-replication.cfg 

  • deactivate filesystem synchronization (syncfs service) on the old primary

  • stop the eranger-server, eranger-collector and eranger-report services on the old primary 

  • start the eranger-server, eranger-collector and eranger-report services on the new primary 

  • start httpd and eranger-agent if not running already on primary and standby

  • login to all external collectors and change eranger-collector.cfg so that they will deliver their data to the new primary server (does not work with server<n>_address parameter set an http address)

  • run any post-scripts if defined in eranger-replication.cfg 

Activate replication between new and old primary

This step should only be performed when the old primary is still available and it is planned to switch back to the old system again.

To get a running replication from the new primary to the new standby, one has to perfom a createstandby on the new primary.

/opt/eranger/bin/eranger-server-replication.pl createslave

Virtual IP

The above example was carried out while the old primary was still running and available. After a successful switch end users will have to be notified to use the IP address or DNS name of the new primary instead.

Assigning a virtual IP and moving it along to the active primary with IP-aliasing is supported (see Setup virtual IP address using Pre- and Post scripts) and the corresponding pre- and post-script vip-switch.sh is used in the above example.

Switching back

Before switching back to the old primary, one hast to perform a createslave on the new primary.

The current primary does not need to be running for the standby to be able to take over. The switch can also be issued on the standby when the current primary is broken (unavailable). Care must be taken, when bringing back online the old primary as it is still configured as primary. Stop the processes eranger-server, eranger-collector and eranger-scheduler on the old primary and issue the createslave command on the current active primary. To go back to the default roles issue again the switch command.

It is assumed that the switch is only temporary and that after bringing the primary back up again another switch back to the original primary is performed. Therefore the IP addresses of primary and standby in the eranger-replication.cfg are not switched when doing a switch. Switching back still works with this setup, only when doing a new createslave operation from the new primary must the IP addresses be switched in eranger-replication.cfg.

SKOOR Engine Status after switch

After the switch, the eRanger.sh status command shows the following output on the new primary:

# /opt/eranger/bin/eRanger.sh status
Running /opt/eranger/bin/eRanger.sh with root privileges...
eRanger Server installation...

Current eRanger Status:

Status postgresql:              started
Status postgresql replication:  started
Status postfix:                 started
Status rsyslog:                 started
Status snmptrapd:               stopped
Status http server:             started
Status eRanger Server:          started
Status eRanger Collector:       started
Status eRanger Report:          started
Status eRanger Agent:           started
Status eRanger Webservice:      started

and the following output on the new standby:

# /opt/eranger/bin/eRanger.sh status
Running /opt/eranger/bin/eRanger.sh with root privileges...
eRanger Server installation...

Current eRanger Status:

Status postgresql:              started
Status postgresql replication:  started
Status postfix:                 started
Status rsyslog:                 started
Status snmptrapd:               stopped
Status http server:             started
Status smsd:                    stopped
Status eRanger Server:          stopped (postgresql slave)
Status eRanger Collector:       stopped (postgresql slave)
Status eRanger Report:          stopped (postgresql slave)
Status eRanger Agent:           stopped
Status eRanger Webservice:      started

Non-Interactive mode

The switch can be run in a non-interactive mode by adding the option -f. Use this if the script needs to run unattended and in case one or some of the collectors has a SKOOR version installed not identical to the one on the primary and standby. The following shows the output when running the switch using -f, on the current standby, i.e. the original primary. This second switch command restores the original roles of primary and standby.

# /opt/eranger/bin/eranger-server-replication.pl -f switch
10.1.0.88 
10.1.0.88 checking ssh for user reranger


localhost (10.1.0.88) is slave, master 10.1.0.89 is up

will convert localhost to master
and 10.1.0.89 to slave
10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 pre s2m
10.1.0.89 calling eranger-server-syncfs.sh 10.1.0.88
10.1.0.89 calling eranger-server-sync-collector-bin.pl
10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 pre m2s
10.1.0.89 OK> 10.1.0.90 removed from eth0
10.1.0.88 from slave to master
10.1.0.88 current master: 10.1.0.89
10.1.0.88 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 httpd already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-report (service eranger-report )..
10.1.0.88 done
10.1.0.88 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-server (service eranger-server )..
10.1.0.88 done
10.1.0.88 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-ethd (service eranger-ethd )..
10.1.0.88 done
10.1.0.88 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-eth-alerter (service eranger-eth-alerter )..
10.1.0.88 done
10.1.0.88 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-collector (service eranger-collector )..
10.1.0.88 delete /opt/eranger/collector/ringbuffer1.bin
10.1.0.88 done
10.1.0.88 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 eranger-agent already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-ic-alerter (service eranger-ic-alerter )..
10.1.0.88 done
10.1.0.88 copied file to /var/lib/pgsql/data/
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
10.1.0.88 eranger start httpd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 httpd already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-report at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 eranger-report already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-server at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 eranger-server already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-ethd at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-ethd (service eranger-ethd )..
10.1.0.88 done
10.1.0.88 eranger start eranger-eth-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-eth-alerter (service eranger-eth-alerter )..
10.1.0.88 done
10.1.0.88 eranger start eranger-collector at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 eranger-collector already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-agent at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 eranger-agent already running (not starting)
10.1.0.88 done
10.1.0.88 eranger start eranger-ic-alerter at /opt/eranger/bin/eranger-server-replication.pl line 1717.
10.1.0.88 start eranger-ic-alerter (service eranger-ic-alerter )..
10.1.0.88 done
10.1.0.88 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 post s2m
10.1.0.88 OK> 10.1.0.90 added to eth0.
10.1.0.88 OK> 10.1.0.90 configured in /etc/sysconfig/network-scripts/ifcfg-eth0:0.
10.1.0.89 calling script /opt/eranger/sbin/vip-switch.sh 10.1.0.88 post m2s
10.1.0.88 
10.1.0.88 checking ssh for user reranger