I have been having a very strange issue regarding a tunnel on a 1335, the tunnel randomly drops everyday. Sometimes its every hour, sometimes its every 2 hours, sometimes it's up for 5 hours before it drops. But when it does drop it comes back up within 2minutes every time. Sometimes it goes up/down so fast we dont even get snmp alerts.
The topology is 1335 -> port 0/23 -> DSL modem.
Port 0/23 is in the tunnel source vlan and the tunnel source vlan is apart of the same subnet of the modem. I can provide more configuration if required.
The Far End of the tunnel goes to a Cisco 3845 and we have 80 tunnels working just fine - only this particular tunnel.
On the 3845 i see the below log messages
May 7 16:51:20.708 EDT: %CRYPTO-4-RECVD_PKT_INV_SPI: decaps: rec'd IPSEC packet has invalid spi for destaddr=X.X.X.X, prot=50, spi=0x5B858FB8(1535479736), srcaddr=X.X.X.X, input interface=FastEthernet1/0
My understanding of this message just means that one of the peer's died, which it is dieing.
I've spoke to the carrier of the modem and they see no issues as the gateway of the modem is still pingable when the tunnel goes down, switchport 0/23 doesnt go down, nor has it had any interface errors.
I would like to run debug's on the 1335 to see if i can see anything. What would the cpu utilization be for a debug and could anyone recommend a debug to run? Or any other suggestions? I personally think its a carrier issue but just want to have my ducks in a row before i escalate more.
Thank you for asking this question in the support community. It may be helpful if you attach a copy of the running configuration of the ADTRAN unit (please make sure to remove any information that may be sensitive to the organization).
First, is the message you are receiving on the Cisco from the remote ADTRAN device's IP address? The "Invalid Security Parameter Index (SPI)" message is typically related to an IPSec VPN with IKE, and is not used in standard GRE tunnels. Is this a GRE/IPSec tunnel?
One thing I would suggest is to perform a debug on the GRE tunnel to monitor the GRE keepalive transmissions/reception. You can view this with the debug interface tunnel x command (where "x" represents the GRE tunnel interface number).
Please, reply with any additional information or questions. I will be happy to help in any way I can.
Hi Levi, thanks for responding. yes you are correct its GRE/IPSec tunnel.
The ip address in the SPI message has the source addr as the 1335 end and the dest addr as the cisco end.
I've attached the config of the 1335
While i was on the switch the tunnel happened to go down, i then turned on debugging and this is the output minus the public IP's
2012.05.08 10:13:48 INTERFACE_STATUS.tunnel 1 changed state to down
CHC-RDGOH-IDF-L3SWT-RIDGEWOOD-1#debug interface tunnel 1
2012.05.08 10:14:19 TUNNEL.1 Keepalive retries exceeded without Rx a keepalive.
2012.05.08 10:14:21 TUNNEL.1 GRE/IP encapsulated X.X.X.X->X.X.X.X (len=48).
2012.05.08 10:14:21 TUNNEL.1 Keepalive Tx.
2012.05.08 10:14:31 TUNNEL.1 GRE/IP encapsulated X.X.X.X->X.X.X.X (len=48).
2012.05.08 10:14:31 TUNNEL.1 Keepalive Tx.
2012.05.08 10:14:31 TUNNEL.1 GRE to decaps X.X.X.X->X.X.X.X (len=24 ttl=253).
2012.05.08 10:14:31 TUNNEL.1 Keepalive Rx.
2012.05.08 10:14:32 TUNNEL.1 GRE/IP to decaps X.X.X.X->X.X.X.X (len=48 ttl=254).
2012.05.08 10:14:32 TUNNEL.1 GRE decapsulated IP X.X.X.X->X.X.X.X (len=24 ttl=255).
2012.05.08 10:14:33 INTERFACE_STATUS.tunnel 1 changed state to up
2012.05.08 10:14:33 TUNNEL.1 GRE/LLDP encapsulated X.X.X.X->X.X.X.X (len=239).
2012.05.08 10:14:40 TUNNEL.1 GRE/IP encapsulatedX.X.X.X->X.X.X.X (len=48).
2012.05.08 10:14:40 TUNNEL.1 Keepalive Tx.
2012.05.08 10:14:40 TUNNEL.1 GRE to decaps X.X.X.X->X.X.X.X (len=24 ttl=253).
2012.05.08 10:14:40 TUNNEL.1 Keepalive Rx.
From the debug output it appears that the 1335 is transmitting keep alives but it didnt receive keepalives back within a certain period.
With this issue being intermittent, is it best practice to have this debug running for long periods of time? Its a very busy debug.
Thank you for replying with the configuration and the debug output. As you said above, the debug indicates that this unit is not receiving the keepalive messages, and thus tears the GRE tunnel down. By default the keepalives are sent every 10 seconds, and it takes three missed retries before it is declares the peer unreachable. For something as intermittent as this, you will most likely have to monitor the link long term. It would also be beneficial to verify if the other end of the tunnel was receiving the keepalives the ADTRAN is transmitting, when it is in the failed state.
Also, I would recommend disabling LLDP on the tunnel interface, because the Cisco does not have it enabled by default. The command is no lldp send-and-receive.
Levi, thanks for that resource regarding NQM. After more troubleshooting we found out that the tunnel is going down every hour but it comes back up quick enough not to send an alert. We noticed this based off our BGP notifications(hold time expired) from the 1335 side of the tunnel.
I've attached a debug output from debug crypto ike client and crypto ipsec, if you can assist in interpreting the debug
we think that the issue is when the keys timeout and try to re-negotiate the tunnel drops during the process.
The IPSec debug you attached is the output from a successful IPSec tunnel negotiation. Unfortunately, the portion where the tunnel went down was not captured. I would need to see the debug messages when the tunnel was terminated to facilitate you in diagnosing the problem. Also, with GRE/IPSec tunnels, the GRE is encapsulated within the IPSec tunnel, so if the IPSec goes down the GRE will also go down. Therefore, it is important to determine if IPSec or only GRE is dropping.
Do you also have the Cisco side of the config? I am trying to do something similar but am having troubles just getting the tunnel up.
No, I do not have an example configuration for GRE/IPSec tunnels on a Cisco, but here is a document on Configuring a GRE over IPSec Tunnel in AOS.
Also, here is another post on when to use GRE tunnels: https://supportforums.adtran.com/message/2133#2133
I went ahead and flagged this post as “Assumed Answered.” If any of the responses on this thread assisted you, please mark them as either Correct or Helpful answers with the applicable buttons. This will make them visible and help other members of the community find solutions more easily. If you still need assistance, I would be more than happy to continue working with you on this - just let me know in a reply.