Son of a Geek! or “too many iterations in nv_nic_irq”

My son is a computer geek like me. He was a slow starter. He did not get into it until he was a teenager but he has an annoying habit of doing things before I do, reminding me all the time that I am slowing down…

He was the first in our family:

  • to build a PC from OTS parts,
  • to use a 3gHz CPU,
  • to use more than four hard drives in a box,
  • to use multi-core CPU, and
  • to use dual socket motherboard.

Of course, I am proud of him but it still bugs the geek in me that I cannot stay ahead of him.

However, there is occasional joy in my family relations, like when, a few times a year, he writes, “Dad, how do you…?”. Hah! I am still useful!

Last week, after I returned from the annual teaching battle, my son said, “Dad, I am getting this weird message and Google does not help…”.

too many iterations (6) in nv_nic_irq

Lo, it was a problem using the forcedeth driver that I had encountered and overcome two years ago. I gave him hints that worked promptly to end an annoying instability in a server running GNU/Linux. Apparently, the information that I found to solve the problem was no longer on the web so we will document it here.

The problem is that forcedeth.c which is a reverse-engineered driver for an Nvidia NIC has a problem dismissing an interrupt, that is, clearing the interrupt flag so that the CPU promptly interrupts again when the driver returns control. To slow down this tight loop the authour put in a timeout which wreaks havoc in some production systems. There are parameters that can be configured to control the interrupt behaviour.

Here is the code.

* Known bugs:
* We suspect that on some hardware no TX done interrupts are generated.
* This means recovery from netif_stop_queue only happens if the hw timer
* interrupt fires (100 times/second, configurable with NVREG_POLL_DEFAULT)
* and the timer is active in the IRQMask, or if a rx packet arrives by chance.
* If your hardware reliably generates tx done interrupts, then you can remove
* DEV_NEED_TIMERIRQ from the driver_data flags.
* DEV_NEED_TIMERIRQ will not harm you on sane hardware, only generating a few
* superfluous timer interrupts from the nic.

/*
* Maximum number of loops until we assume that a bit in the irq mask
* is stuck. Overridable with module param.
*/
static int max_interrupt_work = 5;

/*
* Optimization can be either throuput mode or cpu mode
*
* Throughput Mode: Every tx and rx packet will generate an interrupt.
* CPU Mode: Interrupts are controlled by a timer.
*/
#define NV_OPTIMIZATION_MODE_THROUGHPUT 0
#define NV_OPTIMIZATION_MODE_CPU        1
static int optimization_mode = NV_OPTIMIZATION_MODE_THROUGHPUT;

/*
* Poll interval for timer irq
*
* This interval determines how frequent an interrupt is generated.
* The is value is determined by [(time_in_micro_secs * 100) / (2^10)]
* Min = 0, and Max = 65535
*/
static int poll_interval = -1;

So, the usual recommendation is to bump up the parameter, max_interrupt_work. I found that did not work well because these machines are so fast. What value would be large enough? Trial and error improved things but it was still flakey. This was on a machine with 8 processors. We could easily afford to poll the device to service the NIC…

We set

options forcedeth optimization_mode=1  poll_interval=100

in /etc/modprobe.d/options followed by rmmod  forcedeth; modprobe forcedeth  and everything was sweetness and light. No more freezes or crashes serving files at gigabit/s speeds.

How cool is it that old Dad could steer a young man towards the answer to a geeky problem in seconds?

- Robert Pogson

7 Responses to “Son of a Geek! or “too many iterations in nv_nic_irq””


  1. 1 Nick Aug 13th, 2009 at 2:12 pm

    Nice fix! This saved the day for me ;-)

  2. 2 Walter Aug 28th, 2009 at 8:03 am

    These 2 parameters for the forcedeth module solved the network problems I had with the ASUS M2N-CM DVI motherboard. Thanks a lot! :)

  3. 3 Sigma Jan 19th, 2010 at 4:44 am

    Awesome solution! Many thanks!!!

  4. 4 joachim Feb 23rd, 2010 at 12:28 pm

    thanks! this solution work on Linux 2.6.26-2-amd64, debian 5.0

  5. 5 Roy Sep 2nd, 2010 at 11:52 pm

    Thanks for this!

  6. 6 leong Oct 14th, 2010 at 6:49 am

    Is there a way to make this setting effective without rebooting the system

  7. 7 Robert Pogson Oct 14th, 2010 at 7:09 am

    Yes, this can be done without a reboot.

    sshing in or from the console, change the configuration file and then reload the driver. There is a possibility of losing the connection (certainty, if you are sshing through the NIC) so put it all on one line:

    rmmod forcedeth ; modprobe forcedeth

    When the driver reloads, it will take the new settings. There will be a brief outtage on the NIC. I did not measure it but it was not noticeable to me in human terms. It is vital that the typing be correct in the configuration and commands so be very careful if this is a production system.

    It’s too bad this problem still exists but I keep getting hits and comments several years later. The hardware manufacturers and the driver writers should cooperate more.

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>




Archives by Month

Recent Comments

My Mission

My observations and opinions about IT are based on 40 years of use in science and technology and lately, in education. I like IT that is fast, cost-effective and reliable. I do not care whether my solution is the same as yours. I like to think for myself.

My first use of GNU/Linux in 2001 was so remarkably better than what I had been using, I feel it is important work to share GNU/Linux with the world. I have been blessed by working in schools where students and school systems have benefited by good, modular software easily installed in most systems.

I have shown GNU/Linux to thousands of students and hundreds of teachers over the years and will continue in some way doing that until I die in spite of the opposition.

Posts

June 2009
S M T W T F S
« May   Jul »
 123456
78910111213
14151617181920
21222324252627
282930  

    Writing

    3433 articles
    30615 comments

      Comments

      platforms
      linux 17485
      windows 12769
      macos 206
      sun 3
      wp 2

      browsers
      firefox 23933 
      safari 11870 
      chrome 11722 
      ie 4636 
      iceweasel 4268 
      opera 1642 
      konqueror 198 
      netnewswire 14 
      epiphany 2 
      flock 0 
      bonecho 0 
      lynx 0 

Bad Behavior has blocked 6981 access attempts in the last 7 days.