Wan fails over from primary connection to backup connection, but once the failover happens the probe stays in a constant failing state even when the primary connection is back online. Any help is appreciated.
Below is the configuration without our multiple VPNs:
!
!
! ADTRAN, Inc. OS version R12.3.1.E
! Boot ROM version R11.5.0
! Platform: NetVanta 3140, part number 4700341F2#25
! Serial number CFG1531961
!
!
hostname "NV3140"
enable password password
!
!
clock timezone -5-Eastern-Time
!
ip subnet-zero
ip classless
ip routing
ipv6 unicast-routing
!
!
name-server 8.8.8.8 75.75.75.75
!
ip local policy route-map FailOver
!
no auto-config
!
event-history on
no logging forwarding
no logging email
!
!
!
!
ip firewall
ip firewall fast-nat-failover
no ip firewall alg msn
no ip firewall alg mszone
no ip firewall alg h323
!
!
!
!
aaa on
ftp authentication LoginUseLocalUsers
!
!
aaa authentication login LoginUseRadius group radius
aaa authentication login LoginUseLocalUsers local
aaa authentication login LoginUseLinePass line
!
aaa authentication enable default enable
!
!
!
no dot11ap access-point-control
!
!
!
probe VPN-KeepAlive icmp-echo
destination 192.168.15.1
source-address 192.168.1.1
period 10
no shutdown
!
probe FailOver icmp-echo
destination google.com
source-address <Primary WAN>
tolerance consecutive fail 3 pass 3
no shutdown
!
track FailOver
test if probe FailOver
no shutdown
!
!
!
!
ip dhcp excluded-address 192.168.1.1 192.168.1.216
ip dhcp excluded-address 192.168.1.240 192.168.1.255
!
ip dhcp pool "Private"
network 192.168.1.0 255.255.255.0
dns-server 192.168.1.120 8.8.8.8
default-router 192.168.1.1
lease 0 1 0
option 176 ascii MCIPADD=192.168.1.242,MCPORT=1719,TFTPSRVR=192.168.1.242
option 242 ascii MCIPADD=192.168.1.242,MCPORT=1719,HTTPSRVR=192.168.1.242
!
!
!
!
!
!
!
!
!
ip flow top-talkers
match list self
!
interface gigabit-eth 0/1
description Local Network
ip address 192.168.1.1 255.255.255.0
ip access-policy Private
no awcp
no shutdown
media-gateway ip primary
!
!
interface gigabit-eth 0/2
ip address <Backup WAN> 255.255.255.252
ip mtu 1500
ip access-policy Public
ip crypto map VPN
no awcp
no shutdown
media-gateway ip primary
!
!
interface gigabit-eth 0/3
ip address <Primary WAN> 255.255.255.248
ip mtu 1500
ip access-policy "Public Fiber"
ip crypto map VPN
no awcp
no shutdown
media-gateway ip primary
!
!
!
!
route-map FailOver permit 1
description "FailOver"
match ip address FailOver
set ip next-hop <Primary WAN Gateway>
!
!
!
!
ip access-list standard wizard-ics
remark NAT list wizard-ics
permit any
!
!
ip access-list extended FailOver
permit icmp any hostname google.com echo-reply
!
!
ip access-list extended self
remark Traffic to NetVanta
permit ip any any log
!
!
!
ip policy-class Private
allow list self self
nat source list wizard-ics interface gigabit-ethernet 0/2 overload policy Public
nat source list web-acl-16 interface gigabit-ethernet 0/3 overload policy "Public Fiber"
!
ip policy-class Public
!
ip policy-class "Public Fiber"
!
!
!
ip route 0.0.0.0 0.0.0.0 <Primary WAN Gateway> track FailOver
ip route 0.0.0.0 0.0.0.0 <Backup WAN Gateway>
!
no tftp server
no tftp server overwrite
http authentication LoginUseLocalUsers
http server
http secure-server
no snmp agent
no ip ftp server
no ip scp server
no ip sntp server
!
!
!
!
!
!
!
!
sip udp 5060
sip tcp 5060
!
!
!
voice feature-mode network
voice forward-mode network
!
!
!
!
!
ip rtp quality-monitoring
ip rtp quality-monitoring udp
ip rtp quality-monitoring sip
!
line con 0
!
line telnet 0 4
login authentication LoginUseLinePass
password password
no shutdown
line ssh 0 4
no shutdown
!
sntp server time-a.nist.gov
!
!
!
!
end
The primary issue is that once the default route flips to the backup, attempting to ping Google via the primary will fail because the backup ISP will not route traffic sourced from the IP of your primary WAN link.
There are a couple of ways to fix this. One would be to probe reachability to the other side of the primary provider's WAN link. This is a directly-connected route so nothing special needs to be done to your routing. A second would be to probe an IP out on the Internet and build an untracked static route to that host via the primary WAN link. There are advantages and disadvantages to both.
Pinging the other side of the primary WAN link eliminates a failure with the foreign host that you're pinging from falsely causing a failover. On the other hand if the primary provider has a problem with something within their core but the link to you from their edge router is up, then this failure would not be detected.
Pinging a host out on the Internet (such as Google which you are doing) verifies that the primary link has connectivity outside of its own network, but will cause a false switchover should the foreign host go down. By the way, I would ping an IP and not a hostname for your probe, as a DNS failure can cause strange problems.
So step one would be to ensure that the probe can reach its target even after your primary default route is withdrawn due to the track going down.
There's another issue in that your backup default route isn't floating. You should have a distance metric after the gateway of some number greater than 1 so that this route only applies should the primary route fail. The way it is now they're equal cost so will load-balance.
I don't see a cost metric (like a 10) as a non-preferred route in your IP route for the failover, but also you would want to track the primary not the failover also. such as below :
ip route 0.0.0.0 0.0.0.0 <PE_SERIAL_IP> track PRIMARY-UP
ip route 0.0.0.0 0.0.0.0 <FAILOVER_IP_> 10
You need to show that the primary route is preferred and the failover route is not by using the cost metric and by tracking the primary, not the failover route.
Also, make sure that you're pinging the PE IP address, not the CPE ip address. or use google or some other IP that is available all the time. The PE IP address seems to be the best IP to use since its the next hop from your CPE device.
The primary issue is that once the default route flips to the backup, attempting to ping Google via the primary will fail because the backup ISP will not route traffic sourced from the IP of your primary WAN link.
There are a couple of ways to fix this. One would be to probe reachability to the other side of the primary provider's WAN link. This is a directly-connected route so nothing special needs to be done to your routing. A second would be to probe an IP out on the Internet and build an untracked static route to that host via the primary WAN link. There are advantages and disadvantages to both.
Pinging the other side of the primary WAN link eliminates a failure with the foreign host that you're pinging from falsely causing a failover. On the other hand if the primary provider has a problem with something within their core but the link to you from their edge router is up, then this failure would not be detected.
Pinging a host out on the Internet (such as Google which you are doing) verifies that the primary link has connectivity outside of its own network, but will cause a false switchover should the foreign host go down. By the way, I would ping an IP and not a hostname for your probe, as a DNS failure can cause strange problems.
So step one would be to ensure that the probe can reach its target even after your primary default route is withdrawn due to the track going down.
There's another issue in that your backup default route isn't floating. You should have a distance metric after the gateway of some number greater than 1 so that this route only applies should the primary route fail. The way it is now they're equal cost so will load-balance.
Fyi,
I was responding to VXCM’s posting. It wasn’t my WAN failover issue.
I deal with failover designs regularly with successful configurations.
Thank you,
WAN Failover works in conjunction with multiple ISPs to assure that you maintain Internet connectivity if a loss of connectivity occurs on one of your WAN connections. If one of your ISP links goes down, WAN Failover will automatically route all traffic over the other WAN(s) until service is restored.
You may also consider using WAN Balancer in your network as well - it allows you to maintain an automatic distribution of traffic over multiple WAN links rather than just failing over if one goes down.
Tests are configured for each WAN which are run continuously to determine the current status of each interface. If enough test fail on a given WAN to exceed the failure threshold then the WAN is considered down and internet-bound traffic will not go out that WAN. The lowest ID active WAN is used as the current default WAN interface for internet-bound traffic.
Visit : custom essay writing service