Switchcard failure at FL-IX / IX is stable at this time

Greeting FL-IX Members, We have experienced what appears to be a switchcard failure on one of our edge switches. We’ve stabilized the switch and will be following up with Arista, our hardware vendor. The IX should be back to normal operations at this time. Once we have some more details, we will follow up with our members. Regards, Randy Epstein Executive Director Email: [email protected] Mobile: +1 561-756-4475 Office: +1 888-925-4678 Fax: +1 561-431-0437 P Please consider the environment before printing this e-mail

Thanks Randy for your mail
We Observed few BGP peering flapped but now its stable .
Please keep us posted for any further update
[email protected]> show log messages | match 206.41.108.76 | last 40
Jan 13 02:02:19 bbr02.tm01.mia01-re0 rpd[26930]: BGP_IO_ERROR_CLOSE_SESSION: BGP peer 206.41.108.76 (External AS 54113): Error event Operation timed out(60) for I/O session - closing it (instance master)
Jan 13 02:02:19 bbr02.tm01.mia01-re0 rpd[26930]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 206.41.108.76 (External AS 54113) changed state from Established to Idle (event HoldTime) (instance master)
Jan 13 02:02:19 bbr02.tm01.mia01-re0 rpd[26930]: bgp_io_mgmt_cb:2380: NOTIFICATION sent to 206.41.108.76 (External AS 54113): code 4 (Hold Timer Expired Error), Reason: holdtime expired for 206.41.108.76 (External AS 54113), socket buffer sndacc: 38 rcvacc: 0 , socket buffer sndccc: 38 rcvccc: 0 TCP state: 4, snd_una: 1694725490 snd_nxt: 1694725509 snd_wnd: 64128 rcv_nxt: 1331733848 rcv_adv: 1331750232, hold timer 60s, hold timer remain 0s, last sent 15s, TCP port (local 179, remote 55907), JSR handle (primary 3026418952311668737, secondary 11096869482294673409)
Jan 13 02:26:16 bbr02.tm01.mia01-re0 rpd[26930]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 206.41.108.76 (External AS 54113) changed state from EstabSync to Established (event RsyncAck) (instance master)
Jan 13 02:26:16 bbr02.tm01.mia01-re0 kernel: jsr_action_replicate: Pri handle 0x1001a4d0000001e Sec handle 0x10019be0000001c fd 237 for connection (206.41.108.15:51290-206.41.108.76:179)
Jan 13 02:40:53 bbr02.tm01.mia01-re0 mgd[51643]: UI_CMDLINE_READ_LINE: User 'sjani', command 'show log messages | match 206.41.108.76 | last 40 '
---
{master}
[email protected]> show log messages | match 2001:504:40:108::1:61 | last 40
Jan 13 02:03:05 bbr02.tm01.mia01-re0 rpd[26930]: BGP_IO_ERROR_CLOSE_SESSION: BGP peer 2001:504:40:108::1:61 (External AS 11096): Error event Operation timed out(60) for I/O session - closing it (instance master)
Jan 13 02:03:05 bbr02.tm01.mia01-re0 rpd[26930]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 2001:504:40:108::1:61 (External AS 11096) changed state from Established to Idle (event HoldTime) (instance master)
Jan 13 02:03:05 bbr02.tm01.mia01-re0 rpd[26930]: bgp_io_mgmt_cb:2380: NOTIFICATION sent to 2001:504:40:108::1:61 (External AS 11096): code 4 (Hold Timer Expired Error), Reason: holdtime expired for 2001:504:40:108::1:61 (External AS 11096), socket buffer sndacc: 57 rcvacc: 0 , socket buffer sndccc: 57 rcvccc: 0 TCP state: 4, snd_una: 3219721324 snd_nxt: 3219721381 snd_wnd: 31400 rcv_nxt: 238646128 rcv_adv: 238662512, hold timer 90s, hold timer remain 0s, last sent 26s, TCP port (local 55881, remote 179), JSR handle (primary 2161727823655206913, secondary 10232178353638211585)
Jan 13 02:27:13 bbr02.tm01.mia01-re0 rpd[26930]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 2001:504:40:108::1:61 (External AS 11096) changed state from EstabSync to Established (event RsyncAck) (instance master)
Jan 13 02:27:13 bbr02.tm01.mia01-re0 kernel: jsr_action_replicate: Pri handle 0x1001a5c00000081 Sec handle 0x10019cd0000002b fd 227 for connection (2001:504:40:108::1:15:51668-2001:504:40:108::1:61:179)
Jan 13 02:41:29 bbr02.tm01.mia01-re0 mgd[51643]: UI_CMDLINE_READ_LINE: User 'sjani', command 'show log messages | match 2001:504:40:108::1:61 | last 40 '
---
206.41.108.16 16509 965 140 0 1 10:09 Establ
206.41.108.17 20940 53 140 0 1 10:03 Establ
206.41.108.20 21928 14810 15363 0 15 5d 2:07:39 Establ
206.41.108.23 6939 18690 141 0 1 10:08 Establ
206.41.108.27 32787 1202 154 0 1 10:27 Establ
206.41.108.61 11096 98 142 0 1 10:27 Establ
206.41.108.64 45474 114 140 0 1 10:17 Establ
206.41.108.76 54113 39 154 0 1 10:27 Establ
206.41.108.110 52468 420 141 0 1 10:28 Establ
206.41.108.146 7195 4782 140 0 1 10:11 Establ
206.41.108.163 396998 1102 139 0 2 9:56 Establ
2001:504:40:108::1:16 16509 227 88 0 1 10:17 Establ
2001:504:40:108::1:17 20940 29 87 0 1 10:02 Establ
2001:504:40:108::1:23 6939 55058 53 0 1 10:16 Establ
2001:504:40:108::1:27 32787 84 100 0 1 10:17 Establ
2001:504:40:108::1:61 11096 27 85 0 1 9:31 Establ
2001:504:40:108::1:76 54113 40 100 0 1 10:25 Establ
2001:504:40:108::1:110 52468 234 88 0 1 10:27 Establ
2001:504:40:108::1:153 263237 54 88 0 4 10:26 Establ
2001:504:40:108::1:163 396998 54 87 0 1 10:03 Establ
Regards,
Sunny Jani
Network Reliability Engineering
IBM Cloud Infrastructure
IBM Cloud
[email protected]mailto:[email protected]
From: Randy Epstein

Greetings FL-IX Members,
I am following up on this event that occurred on January 12th on one of our edge switches at Equinix-MI1.
Arista has spent a great amount of time investigating the cause of this and have now provided a patch for us to install.
We’ve evaluated their analysis and the patch they’ve provided and believe that installing the patch would be the best way forward at this time. They also advised that the patch has been installed on another customer’s switches that experienced the same problem, and they have been running the patch since October without issue.
I will be sending a maintenance notice shortly with a date and time.
Thank you for your understanding and patience. If you have any questions, please feel free to reach out to me directly or [email protected].
Regards,
Randy Epstein
Executive Director
Email: [email protected]
Mobile: +1 561-756-4475
Office: +1 888-925-4678
Fax: +1 561-431-0437
P Please consider the environment before printing this e-mail
From: Randy Epstein
participants (2)
-
iaas-nre
-
Randy Epstein