We have a customer who has two sites and they have a T1 joining the sites. The Voice subnets are advertised via OSPF across the T1 link; this enables users destined for voice subnets to utilize the T1 link while the data subnets by default go to the Internet and use a VPN to traverse to the other site.
Because it's an occassional need by the voip vendor's communicator software for Client PC's on the Data VLAN to talk to the Voice VLAN at the other site, we implemented PBR to allow the voice subnets to reply to data users utilizing the T1 link. This works fantastically; the PBR is processed and we enjoy the low latency of the T1 link and if the link fails and the next-hop is invalid, the traffic defaults to using the default route and traverses the VPN, providing a backup path.
Unfortunately, when the T1 recovers, the traffic continues to take the VPN path. The only way that I can enable PBR to process the packets again and traverse the T1 is to issue the 'clear ip ffe' command on both routers. To be fair, have not tried "waiting it out" to see if after 30 minutes the entry is refreshed and the PBR is path is enabled again, and to be fair I'm testing this with ICMP so the flow has not been idle for 15 seconds either.
If I issue 'no ip ffe' on the LAN facing interface where the PBR is performed, the recovery is immediate; this is to be expected as the Adtran is processing each packet and there is no cached path information.
Should I change timeout values on ffe globally or perhaps should I disable ffe on the LAN facing interfaces? The customer is routing only - no NAT, Firewall or VPN functions are utilized in this configuration. They are looking to push about 20Mb of data through the router. Tests show that without FFE enabled on the LAN interface, they are still reaching the subscribed rate and CPU usage is not high - but I do not want to impact performance.
Does anyone have experience with this problem and could offer insight?
There are a lot of options available in this case. But the simplest approach is often best. If I were in your position, I might consider just leaving FFE disabled. If the 3400 was handling the NAT firewall, VPN and so forth, I might feel differently.
That said, the 15 second idle time before an FFE session is taken down should cover most phone-type hosts. Most people make a call, talk for a while, then hang up. At least most phones will shuffle back over to the T1 fairly quickly. If you have a call that's already in progress when the T1 comes back up, or some application which constantly streams media, then the 30 minute lifetime will cover you. The ongoing session will expire and a new FFE session will be setup (and this should make the best current routing decision). That covers your pings too.
I'm sure you're already aware of the excellent application note , but I'll link it here for reference in case future readers encounter a similar situation. That article will show you how to modify the timeout and age values so that maybe you can control how often FFE sessions expire and new decisions are made.
RapidRoute is meant to avoid the AOS device needing to constantly make routing decisions, which is how routing performance is improved. Your scenario is one where the router needs to react quickly (make decisions) packet by packet. Disabling FFE should allow the unit to react quickly all the time. Keeping default FFE configuration will maximize performance. Changing timeout/age values may be a happy medium.
Let us know what you think and how it goes if you can perform some tests!
Thanks for your thoughts. I'm thinking that I will let this run for a while and see how it works in motion, with the plan to disable ffe at our disposal for the future.
Are you simply relying on the physical interface state of the T1 to inject and withdraw the route or are you probing and tracking the remote end?
There have been some issues with PBR and tracked routes on recent software releases, R10.9.x doesn't properly handle PBR at all with firewall enabled up until it was fixed in 10.9.4. And even this fix doesn't work with tracked routes. This is supposedly fixed in R11.3.0 which came out last week but we haven't yet tried it.
You might want to try different firmware before disabling FFE.
I went ahead and flagged "Assumed Answered" on this post to make it more visible and help other members of the community find solutions more easily. If you feel like there is a better answer, feel free to come back to this post and select it with the applicable buttons. If you have any additional information on this that others may benefit from, please come back to this post to provide an update. If you still need assistance, we would be more than happy to continue working with you on this - just let us know in a reply.