Linux optical 10Gbe networking, how to diagnose performance problems?Why am I only achieving 2.5Gbps over a...

Why is c4 a better move in this position?

What is the purpose of easy combat scenarios that don't need resource expenditure?

If a druid in Wild Shape swallows a creature whole, then turns back to her normal form, what happens?

How to define a macro with multiple optional parameters?

It took me a lot of time to make this, pls like. (YouTube Comments #1)

Find the number of ways to express 1050 as sum of consecutive integers

Sometimes a banana is just a banana

What's the rationale behind the objections to these measures against human trafficking?

Which aircraft had such a luxurious-looking navigator's station?

Do authors have to be politically correct in article-writing?

Is the theory of the category of topological spaces computable?

Why is commutativity optional in multiplication for rings?

Metadata API deployments are failing in Spring '19

Am I a Rude Number?

What are these green text/line displays shown during the livestream of Crew Dragon's approach to dock with the ISS?

What is the wife of a henpecked husband called?

I am on the US no-fly list. What can I do in order to be allowed on flights which go through US airspace?

Where is this triangular-shaped space station from?

Crystal compensation for temp and voltage

How would an AI self awareness kill switch work?

Removing debris from PCB

Inventor that creates machine that grabs man from future

How Should I Define/Declare String Constants

Why do neural networks need so many training examples to perform?

Linux optical 10Gbe networking, how to diagnose performance problems?

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?Linux bonded Interfaces hanging periodicallyIs is possible to put a tagged vlan on one physical interface on a Force10 S50N switch?Link aggregation (LACP/802.3ad) max throughputTeaming the 2 ports of a hp 361T pcie Dual port GB NicHow to replace an SC-Duplex-switch with an SFP switch?How does one diagnose Linux LACP issues at the kernel level?UDP traffic loss on all interfacesIntel Network Card on Linux server - limited performance despite bonding portsDo 10GbE SFP transceivers need to match?Dell X4012 copper options?

I have a small cluster consisting of 3 servers. Each has two 10Gbe SFP+ optical network cards. There are two separate 10Gbe switches. On all servers one NIC is connected to switch 1, second NIC is connected to switch 2 to provide fault tolerance.

Physical interfaces are bonded on server level using LACP.

All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)

When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.

Those two good servers can also download from problematic one also about 9.8 Gbit/s

Iperf3 show strange thing when run as client on problematic server. It starts with a few hundred megabit in first turn. Later speed drops to 0 bit/s (while still running ICMP ping with ~96% success rate). Only in one direction.
When other servers download from this, they get full speed.

It's all running on a same hardware even firmware version is the same (Dell R620 servers, Mellanox ConnextX-3-EN NIC's, Opton SPF+ modules, Mikrotik CRS309-1G-8S switches). Also OS is the same latest stable Debian with all updates and exact installed packages.

There is no firewall, all iptables rules are cleared on all servers

On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex

Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors

I checked/replaced SFP+ modules, used different fiber patch cords, tried different switch ports and nothing changes, still this one problematic server get poor download speed from others and small packet loss (over bonded interface!).

I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change

Any ideas how can I diagnose it better?

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.

– Chopper3
Feb 28 at 12:43

Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.

– Chopper3
Feb 28 at 12:49

@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here

– Mateusz Bartczak
Feb 28 at 14:22

add a comment |

Physical interfaces are bonded on server level using LACP.

All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)

When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.

Those two good servers can also download from problematic one also about 9.8 Gbit/s

There is no firewall, all iptables rules are cleared on all servers

On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex

Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors

I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change

Any ideas how can I diagnose it better?

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.

– Chopper3
Feb 28 at 12:43

Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.

– Chopper3
Feb 28 at 12:49

@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here

– Mateusz Bartczak
Feb 28 at 14:22

add a comment |

Physical interfaces are bonded on server level using LACP.

All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)

When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.

Those two good servers can also download from problematic one also about 9.8 Gbit/s

There is no firewall, all iptables rules are cleared on all servers

On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex

Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors

I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change

Any ideas how can I diagnose it better?

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

Physical interfaces are bonded on server level using LACP.

All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)

When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.

Those two good servers can also download from problematic one also about 9.8 Gbit/s

There is no firewall, all iptables rules are cleared on all servers

On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex

Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors

I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change

Any ideas how can I diagnose it better?

debian linux-networking bonding lacp sfp

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

asked Feb 28 at 12:08

Mateusz Bartczak

asked Feb 28 at 12:08

Mateusz Bartczak

New contributor

Mateusz Bartczak is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.

– Chopper3
Feb 28 at 12:43

Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.

– Chopper3
Feb 28 at 12:49

@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here

– Mateusz Bartczak
Feb 28 at 14:22

add a comment |

How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.

– Chopper3
Feb 28 at 12:43

Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.

– Chopper3
Feb 28 at 12:49

@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here

– Mateusz Bartczak
Feb 28 at 14:22

How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets.

– Chopper3
Feb 28 at 12:43

Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok.

– Chopper3
Feb 28 at 12:49

@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here

– Mateusz Bartczak
Feb 28 at 14:22

add a comment |

2 Answers
2

active

oldest

votes

Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.

Generally, link aggregation only works with a single opposite switch (or a stack acting like it).

With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.

answered 12 hours ago

Zac67

3,9732415

add a comment |

Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?

It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/

Tuning 10G network interfaces is a huge topic.

edited Feb 28 at 14:46

answered Feb 28 at 14:40

Dmitriy Kupch

3365

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f956148%2flinux-optical-10gbe-networking-how-to-diagnose-performance-problems%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.

Generally, link aggregation only works with a single opposite switch (or a stack acting like it).

answered 12 hours ago

Zac67

3,9732415

add a comment |

Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.

Generally, link aggregation only works with a single opposite switch (or a stack acting like it).

answered 12 hours ago

Zac67

3,9732415

add a comment |

Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.

Generally, link aggregation only works with a single opposite switch (or a stack acting like it).

answered 12 hours ago

Zac67

3,9732415

Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.

Generally, link aggregation only works with a single opposite switch (or a stack acting like it).

answered 12 hours ago

Zac67

3,9732415

answered 12 hours ago

Zac67

3,9732415

answered 12 hours ago

Zac67

3,9732415

answered 12 hours ago

Zac67

3,9732415

add a comment |

Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?

It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/

Tuning 10G network interfaces is a huge topic.

edited Feb 28 at 14:46

answered Feb 28 at 14:40

Dmitriy Kupch

3365

add a comment |

Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?

It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/

Tuning 10G network interfaces is a huge topic.

edited Feb 28 at 14:46

answered Feb 28 at 14:40

Dmitriy Kupch

3365

add a comment |

Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?

It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/

Tuning 10G network interfaces is a huge topic.

edited Feb 28 at 14:46

answered Feb 28 at 14:40

Dmitriy Kupch

3365

Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?

It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/

Tuning 10G network interfaces is a huge topic.

edited Feb 28 at 14:46

answered Feb 28 at 14:40

Dmitriy Kupch

3365

edited Feb 28 at 14:46

answered Feb 28 at 14:40

Dmitriy Kupch

3365

answered Feb 28 at 14:40

Dmitriy Kupch

3365

answered Feb 28 at 14:40

Dmitriy Kupch

3365

add a comment |

Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Mateusz Bartczak is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Server Fault!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ryfujk