My son is a computer geek like me. He was a slow starter. He did not get into it until he was a teenager but he has an annoying habit of doing things before I do, reminding me all the time that I am slowing down…
He was the first in our family:
- to build a PC from OTS parts,
- to use a 3gHz CPU,
- to use more than four hard drives in a box,
- to use multi-core CPU, and
- to use dual socket motherboard.
Of course, I am proud of him but it still bugs the geek in me that I cannot stay ahead of him.
However, there is occasional joy in my family relations, like when, a few times a year, he writes, “Dad, how do you…?”. Hah! I am still useful!
Last week, after I returned from the annual teaching battle, my son said, “Dad, I am getting this weird message and Google does not help…”.
too many iterations (6) in nv_nic_irq
Lo, it was a problem using the forcedeth driver that I had encountered and overcome two years ago. I gave him hints that worked promptly to end an annoying instability in a server running GNU/Linux. Apparently, the information that I found to solve the problem was no longer on the web so we will document it here.
The problem is that forcedeth.c which is a reverse-engineered driver for an Nvidia NIC has a problem dismissing an interrupt, that is, clearing the interrupt flag so that the CPU promptly interrupts again when the driver returns control. To slow down this tight loop the authour put in a timeout which wreaks havoc in some production systems. There are parameters that can be configured to control the interrupt behaviour.
Here is the code.
* Known bugs:
* We suspect that on some hardware no TX done interrupts are generated.
* This means recovery from netif_stop_queue only happens if the hw timer
* interrupt fires (100 times/second, configurable with NVREG_POLL_DEFAULT)
* and the timer is active in the IRQMask, or if a rx packet arrives by chance.
* If your hardware reliably generates tx done interrupts, then you can remove
* DEV_NEED_TIMERIRQ from the driver_data flags.
* DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few
* superfluous timer interrupts from the nic.
…
/*
* Maximum number of loops until we assume that a bit in the irq mask
* is stuck. Overridable with module param.
*/
static int max_interrupt_work = 5;/*
* Optimization can be either throuput mode or cpu mode
*
* Throughput Mode: Every tx and rx packet will generate an interrupt.
* CPU Mode: Interrupts are controlled by a timer.
*/
#define NV_OPTIMIZATION_MODE_THROUGHPUT 0
#define NV_OPTIMIZATION_MODE_CPU 1
static int optimization_mode = NV_OPTIMIZATION_MODE_THROUGHPUT;/*
* Poll interval for timer irq
*
* This interval determines how frequent an interrupt is generated.
* The is value is determined by [(time_in_micro_secs * 100) / (2^10)]
* Min = 0, and Max = 65535
*/
static int poll_interval = -1;
So, the usual recommendation is to bump up the parameter, max_interrupt_work. I found that did not work well because these machines are so fast. What value would be large enough? Trial and error improved things but it was still flakey. This was on a machine with 8 processors. We could easily afford to poll the device to service the NIC…
We set
options forcedeth optimization_mode=1 poll_interval=100
in /etc/modprobe.d/options followed by rmmod forcedeth; modprobe forcedeth and everything was sweetness and light. No more freezes or crashes serving files at gigabit/s speeds.
How cool is it that old Dad could steer a young man towards the answer to a geeky problem in seconds?

17485
12769
206
3
2
23933
11870
11722
4636
4268
1642
198
14
2
0
0
0
Nice fix! This saved the day for me
These 2 parameters for the forcedeth module solved the network problems I had with the ASUS M2N-CM DVI motherboard. Thanks a lot!
Awesome solution! Many thanks!!!
thanks! this solution work on Linux 2.6.26-2-amd64, debian 5.0
Thanks for this!
Is there a way to make this setting effective without rebooting the system
Yes, this can be done without a reboot.
sshing in or from the console, change the configuration file and then reload the driver. There is a possibility of losing the connection (certainty, if you are sshing through the NIC) so put it all on one line:
rmmod forcedeth ; modprobe forcedeth
When the driver reloads, it will take the new settings. There will be a brief outtage on the NIC. I did not measure it but it was not noticeable to me in human terms. It is vital that the typing be correct in the configuration and commands so be very careful if this is a production system.
It’s too bad this problem still exists but I keep getting hits and comments several years later. The hardware manufacturers and the driver writers should cooperate more.