I have setup WAN fail-over on NV3120. The primary is Comcast and backup WAN is FIOS.
The fail-over works and when Comcast goes down it fails to FIOS as expected. But the problem is when Comcast link comes back online, it does not fail back to Comcast and keeps running on FIOS.
The probe status keep showing as failed.
Here is the router config:
!
! ADTRAN, Inc. OS version R13.5.1
! Boot ROM version R11.5.0
! Platform: NetVanta 3140, part number
! Serial number
!
!
hostname
enable password
!
!
clock timezone -5-Eastern-Time
!
ip subnet-zero
ip classless
ip routing
ipv6 unicast-routing
!
!
name-server 4.2.2.2 192.168.1.2
!
!
no auto-config
!
no event-history
no logging forwarding
no logging email
!
!
banner motd '
'
!
!
ip firewall
ip firewall fast-nat-failover
no ip firewall alg msn
no ip firewall alg mszone
no ip firewall alg h323
!
!
!
!
no dot11ap access-point-control
!
!
!
probe Failover icmp-echo
destination 4.2.2.2
source-address 10.44.47.55
period 10
tolerance consecutive fail 7 pass 7
no shutdown
!
track Failover
test if probe Failover
no shutdown
!
!
!
!
interface gigabit-eth 0/1
ip address dhcp
ip access-policy PublicFIOS
no awcp
no shutdown
no lldp send-and-receive
!
!
interface gigabit-eth 0/2
ip address 192.168.1.1 255.255.255.0
ip access-policy Private
no awcp
no shutdown
!
!
interface gigabit-eth 0/3
ip address 10.44.47.55 255.255.255.0
ip access-policy PublicComcast
no awcp
no shutdown
no lldp send-and-receive
!
!
!
!
route-map FAILOVER permit 10
route-map Failover permit 1
description "Failover"
!
!
!
!
ip access-list standard MATCHALL
remark NAT list MATCHALL
!
!
ip access-list extended ADMIN
permit tcp any host 192.168.10.1 eq ssh
!
ip access-list extended REMOTE
permit tcp any host 192.168.10.1 eq 3390
!
ip access-list extended Failover
permit icmp any host 4.2.2.2
!
ip access-list extended SERVER
permit tcp any host 10.44.47.55 eq https
!
ip access-list extended web-acl-6
remark Many:1 FIOS
permit ip any any
!
!
!
!
ip policy-class Private
allow list MATCHALL self
nat source list MATCHALL interface gigabit-ethernet 0/3 overload policy PublicComcast
nat source list web-acl-6 interface gigabit-ethernet 0/1 overload
!
ip policy-class PublicFIOS
allow list ADMIN
nat destination list SERVER address 192.168.1.2
!
ip policy-class PublicComcast
allow list ADMIN
nat destination list SERVER address 192.168.1.2
!
!
!
ip route 0.0.0.0 0.0.0.0 192.168.10.158 5
ip route 0.0.0.0 0.0.0.0 10.44.47.11 track Failover
!
no tftp server
no tftp server overwrite
http server
http secure-server
no snmp agent
no ip ftp server
no ip scp server
no ip sntp server
!
!
!
!
!
The issue is that the default route changes to FIOS when you go into failover. Because you're sourcing from the Comcast interface IP and both of your WAN links are behind carrier NAT, the ping will fail once the default route is moved.
One solution is to create a static host route for the probe with a next hop of the Comcast interface. If you are using 4.2.2.2 (which I don't recommend, see below) you would do the following:
ip route 0.0.0.0 0.0.0.0 192.168.10.158 5
ip route 0.0.0.0 0.0.0.0 10.44.47.11 track Failover
ip route 4.2.2.2 255.255.255.255 10.44.47.11
Now your probe will always go out Comcast regardless of the state of the track.
However, if your client hosts are using 4.2.2.2 as their primary resolver then their DNS will fail during a Comcast outage. You could choose the IP of one of Comcast's DNS resolvers as your probe target, or some other IP host that reliably returns pings but isn't critical to your clients should you be in a failover state.
Also be advised that 4.2.2.2 is anycast and I've seen some cases where it doesn't reliably return pings.
You can also accomplish the same thing with a local route-map but the host route to the probe target is easier.
FYI, what I've typically done is to make the probe detect a failure reasonably quickly but detect recovery much more slowly. In an intermittent scenario where something is flapping or has substantial packet loss, this helps to keep down the churn. something like:
probe Failover icmp-echo
destination 4.2.2.2
source-address 10.44.47.55
period 3
tolerance consecutive fail 3 pass 20
no shutdown
Now, missing three pings in a row over a period of nine seconds triggers a failover, but it takes a full minute of solid uptime to cut back over. Tune to fit your scenario.
By the way, your MATCHALL access-list appears to be blank other than the remark. I assume that's a cut/paste error or a typo. Presumably it indeed is permit ip any any
The issue is that the default route changes to FIOS when you go into failover. Because you're sourcing from the Comcast interface IP and both of your WAN links are behind carrier NAT, the ping will fail once the default route is moved.
One solution is to create a static host route for the probe with a next hop of the Comcast interface. If you are using 4.2.2.2 (which I don't recommend, see below) you would do the following:
ip route 0.0.0.0 0.0.0.0 192.168.10.158 5
ip route 0.0.0.0 0.0.0.0 10.44.47.11 track Failover
ip route 4.2.2.2 255.255.255.255 10.44.47.11
Now your probe will always go out Comcast regardless of the state of the track.
However, if your client hosts are using 4.2.2.2 as their primary resolver then their DNS will fail during a Comcast outage. You could choose the IP of one of Comcast's DNS resolvers as your probe target, or some other IP host that reliably returns pings but isn't critical to your clients should you be in a failover state.
Also be advised that 4.2.2.2 is anycast and I've seen some cases where it doesn't reliably return pings.
You can also accomplish the same thing with a local route-map but the host route to the probe target is easier.
FYI, what I've typically done is to make the probe detect a failure reasonably quickly but detect recovery much more slowly. In an intermittent scenario where something is flapping or has substantial packet loss, this helps to keep down the churn. something like:
probe Failover icmp-echo
destination 4.2.2.2
source-address 10.44.47.55
period 3
tolerance consecutive fail 3 pass 20
no shutdown
Now, missing three pings in a row over a period of nine seconds triggers a failover, but it takes a full minute of solid uptime to cut back over. Tune to fit your scenario.
By the way, your MATCHALL access-list appears to be blank other than the remark. I assume that's a cut/paste error or a typo. Presumably it indeed is permit ip any any
Thanks for your input & suggestion. I will try it this week and get back to you.
I will also check MATCHALL ac and make correction.
Thanks again.