Linux optical 10Gbe networking, how to diagnose performance problems?Why am I only achieving 2.5Gbps over a...
Why is c4 a better move in this position?
What is the purpose of easy combat scenarios that don't need resource expenditure?
If a druid in Wild Shape swallows a creature whole, then turns back to her normal form, what happens?
How to define a macro with multiple optional parameters?
It took me a lot of time to make this, pls like. (YouTube Comments #1)
Find the number of ways to express 1050 as sum of consecutive integers
Sometimes a banana is just a banana
What's the rationale behind the objections to these measures against human trafficking?
Which aircraft had such a luxurious-looking navigator's station?
Do authors have to be politically correct in article-writing?
Is the theory of the category of topological spaces computable?
Why is commutativity optional in multiplication for rings?
Metadata API deployments are failing in Spring '19
Am I a Rude Number?
What are these green text/line displays shown during the livestream of Crew Dragon's approach to dock with the ISS?
What is the wife of a henpecked husband called?
I am on the US no-fly list. What can I do in order to be allowed on flights which go through US airspace?
Where is this triangular-shaped space station from?
Crystal compensation for temp and voltage
How would an AI self awareness kill switch work?
Removing debris from PCB
Inventor that creates machine that grabs man from future
How Should I Define/Declare String Constants
Why do neural networks need so many training examples to perform?
Linux optical 10Gbe networking, how to diagnose performance problems?
Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?Linux bonded Interfaces hanging periodicallyIs is possible to put a tagged vlan on one physical interface on a Force10 S50N switch?Link aggregation (LACP/802.3ad) max throughputTeaming the 2 ports of a hp 361T pcie Dual port GB NicHow to replace an SC-Duplex-switch with an SFP switch?How does one diagnose Linux LACP issues at the kernel level?UDP traffic loss on all interfacesIntel Network Card on Linux server - limited performance despite bonding portsDo 10GbE SFP transceivers need to match?Dell X4012 copper options?
I have a small cluster consisting of 3 servers. Each has two 10Gbe SFP+ optical network cards. There are two separate 10Gbe switches. On all servers one NIC is connected to switch 1, second NIC is connected to switch 2 to provide fault tolerance.
Physical interfaces are bonded on server level using LACP.
All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)
When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.
Those two good servers can also download from problematic one also about 9.8 Gbit/s
Iperf3 show strange thing when run as client on problematic server. It starts with a few hundred megabit in first turn. Later speed drops to 0 bit/s (while still running ICMP ping with ~96% success rate). Only in one direction.
When other servers download from this, they get full speed.
It's all running on a same hardware even firmware version is the same (Dell R620 servers, Mellanox ConnextX-3-EN NIC's, Opton SPF+ modules, Mikrotik CRS309-1G-8S switches). Also OS is the same latest stable Debian with all updates and exact installed packages.
There is no firewall, all iptables rules are cleared on all servers
On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex
Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors
I checked/replaced SFP+ modules, used different fiber patch cords, tried different switch ports and nothing changes, still this one problematic server get poor download speed from others and small packet loss (over bonded interface!).
I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change
Any ideas how can I diagnose it better?
debian linux-networking bonding lacp sfp
New contributor
add a comment |
I have a small cluster consisting of 3 servers. Each has two 10Gbe SFP+ optical network cards. There are two separate 10Gbe switches. On all servers one NIC is connected to switch 1, second NIC is connected to switch 2 to provide fault tolerance.
Physical interfaces are bonded on server level using LACP.
All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)
When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.
Those two good servers can also download from problematic one also about 9.8 Gbit/s
Iperf3 show strange thing when run as client on problematic server. It starts with a few hundred megabit in first turn. Later speed drops to 0 bit/s (while still running ICMP ping with ~96% success rate). Only in one direction.
When other servers download from this, they get full speed.
It's all running on a same hardware even firmware version is the same (Dell R620 servers, Mellanox ConnextX-3-EN NIC's, Opton SPF+ modules, Mikrotik CRS309-1G-8S switches). Also OS is the same latest stable Debian with all updates and exact installed packages.
There is no firewall, all iptables rules are cleared on all servers
On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex
Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors
I checked/replaced SFP+ modules, used different fiber patch cords, tried different switch ports and nothing changes, still this one problematic server get poor download speed from others and small packet loss (over bonded interface!).
I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change
Any ideas how can I diagnose it better?
debian linux-networking bonding lacp sfp
New contributor
How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.
– Chopper3
Feb 28 at 12:43
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.
– Chopper3
Feb 28 at 12:49
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here
– Mateusz Bartczak
Feb 28 at 14:22
add a comment |
I have a small cluster consisting of 3 servers. Each has two 10Gbe SFP+ optical network cards. There are two separate 10Gbe switches. On all servers one NIC is connected to switch 1, second NIC is connected to switch 2 to provide fault tolerance.
Physical interfaces are bonded on server level using LACP.
All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)
When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.
Those two good servers can also download from problematic one also about 9.8 Gbit/s
Iperf3 show strange thing when run as client on problematic server. It starts with a few hundred megabit in first turn. Later speed drops to 0 bit/s (while still running ICMP ping with ~96% success rate). Only in one direction.
When other servers download from this, they get full speed.
It's all running on a same hardware even firmware version is the same (Dell R620 servers, Mellanox ConnextX-3-EN NIC's, Opton SPF+ modules, Mikrotik CRS309-1G-8S switches). Also OS is the same latest stable Debian with all updates and exact installed packages.
There is no firewall, all iptables rules are cleared on all servers
On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex
Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors
I checked/replaced SFP+ modules, used different fiber patch cords, tried different switch ports and nothing changes, still this one problematic server get poor download speed from others and small packet loss (over bonded interface!).
I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change
Any ideas how can I diagnose it better?
debian linux-networking bonding lacp sfp
New contributor
I have a small cluster consisting of 3 servers. Each has two 10Gbe SFP+ optical network cards. There are two separate 10Gbe switches. On all servers one NIC is connected to switch 1, second NIC is connected to switch 2 to provide fault tolerance.
Physical interfaces are bonded on server level using LACP.
All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)
When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.
Those two good servers can also download from problematic one also about 9.8 Gbit/s
Iperf3 show strange thing when run as client on problematic server. It starts with a few hundred megabit in first turn. Later speed drops to 0 bit/s (while still running ICMP ping with ~96% success rate). Only in one direction.
When other servers download from this, they get full speed.
It's all running on a same hardware even firmware version is the same (Dell R620 servers, Mellanox ConnextX-3-EN NIC's, Opton SPF+ modules, Mikrotik CRS309-1G-8S switches). Also OS is the same latest stable Debian with all updates and exact installed packages.
There is no firewall, all iptables rules are cleared on all servers
On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex
Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors
I checked/replaced SFP+ modules, used different fiber patch cords, tried different switch ports and nothing changes, still this one problematic server get poor download speed from others and small packet loss (over bonded interface!).
I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change
Any ideas how can I diagnose it better?
debian linux-networking bonding lacp sfp
debian linux-networking bonding lacp sfp
New contributor
New contributor
New contributor
asked Feb 28 at 12:08
Mateusz BartczakMateusz Bartczak
61
61
New contributor
New contributor
How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.
– Chopper3
Feb 28 at 12:43
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.
– Chopper3
Feb 28 at 12:49
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here
– Mateusz Bartczak
Feb 28 at 14:22
add a comment |
How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.
– Chopper3
Feb 28 at 12:43
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.
– Chopper3
Feb 28 at 12:49
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here
– Mateusz Bartczak
Feb 28 at 14:22
How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.
– Chopper3
Feb 28 at 12:43
How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.
– Chopper3
Feb 28 at 12:43
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.
– Chopper3
Feb 28 at 12:49
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.
– Chopper3
Feb 28 at 12:49
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here
– Mateusz Bartczak
Feb 28 at 14:22
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here
– Mateusz Bartczak
Feb 28 at 14:22
add a comment |
2 Answers
2
active
oldest
votes
Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.
Generally, link aggregation only works with a single opposite switch (or a stack acting like it).
With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.
add a comment |
Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):
Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?
It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/
Tuning 10G network interfaces is a huge topic.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f956148%2flinux-optical-10gbe-networking-how-to-diagnose-performance-problems%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.
Generally, link aggregation only works with a single opposite switch (or a stack acting like it).
With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.
add a comment |
Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.
Generally, link aggregation only works with a single opposite switch (or a stack acting like it).
With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.
add a comment |
Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.
Generally, link aggregation only works with a single opposite switch (or a stack acting like it).
With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.
Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.
Generally, link aggregation only works with a single opposite switch (or a stack acting like it).
With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.
answered 12 hours ago
Zac67Zac67
3,9732415
3,9732415
add a comment |
add a comment |
Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):
Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?
It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/
Tuning 10G network interfaces is a huge topic.
add a comment |
Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):
Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?
It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/
Tuning 10G network interfaces is a huge topic.
add a comment |
Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):
Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?
It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/
Tuning 10G network interfaces is a huge topic.
Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):
Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?
It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/
Tuning 10G network interfaces is a huge topic.
edited Feb 28 at 14:46
answered Feb 28 at 14:40
Dmitriy KupchDmitriy Kupch
3365
3365
add a comment |
add a comment |
Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.
Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.
Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.
Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f956148%2flinux-optical-10gbe-networking-how-to-diagnose-performance-problems%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.
– Chopper3
Feb 28 at 12:43
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.
– Chopper3
Feb 28 at 12:49
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here
– Mateusz Bartczak
Feb 28 at 14:22