Battle of the Bots

Today this site had an hour-long denial of service. The logs were normal until 6 bots came to visit simultaneously… on a busy day. We’ve doubled the RAM again. Hope this works.

About Robert Pogson

I am a retired teacher in Canada. I taught in the subject areas where I have worked for almost forty years: maths, physics, chemistry and computers. I love hunting, fishing, picking berries and mushrooms, too.
This entry was posted in technology. Bookmark the permalink.

37 Responses to Battle of the Bots

  1. oiaohm says:

    Phenom and Dr Loser. Please note I am not saying going into swap is exactly a good move.

    There are conditions when its good. When it just a little extra data cached for something that is a little expensive to generate.

    The means if Linux to freeze cgroups and send them to swap. When a high load event comes the swap space is quite useful. Those low important background tasks shove there data out the way. Basically cut off cpu access from a cgroup and its memory is targeted for everything else.

    Linux is nice this way. swappiness value applies by cgroup. So non important processes to providing service can be targeted to make way for the more critical services.

    The effect is that swap does not thrash as soon. Can write out like down right mad. Because web server and database are given first access to cpu and memory. Every thing else is forced to give way.

    No swap at all. Your options are limited. Were I am sending background processes data into swap to be restored latter and to go on where they left off. You would be forced to kill them.

    Systemd I am looking forward on servers. Because services are automatically cgrouped and Systemd support having cgroup information assigned on order of preference of access to resources. Even down to what to take out first.

    Without control the swap partition can into a nightmare. With the control that background services off the direct service web processing get shoved into swap is helpful. Since they kick back up when the load drops and complete what they were doing.

    Using items like .net or java will turn into a nightmare due to placing executable code in swappable memory on the main required processing lines.

    Linux kernel in native code can see what is the executable code and what is the data. Does not get confused. Windows swap system also has a habit of getting confused when it has segments that are tagged executable and are swappable data.

    The basic design of the swap systems we have today were not designed to cope with JIT stuff. The odd program with one or 2 blocks of executable code that could go to swap yes. Not something with thousands to millions of blocks that could go to swap that is also executable.

    JIT makers never took the limitations of the OS kernels into account. So cause as all massive pain. Since the swap system should have been redesigned to copy with JIT requirements.

    So yes we have legacy designed swap management with a modern design JIT stuff and we have bombs. Linux we can go back to Legacy systems that the swap management was designed for.

    Its like that 500G+ memory render. Its settings on its cgroup. Said give up cpu time and if giving a chance to swap do. Heavy load comes along the process basically stops. It was basically using up the spare cpu cycles.

    There are many uses for this like checking for duplicate files. Security audits in background. Static analisation of the php and c++ code or equal for defects. This of course is not a problem if they will stop when the resources are need to meet the service side. More importantly thinking a static search for bugs can take days you cannot afford them restarting from the beginning over and over again because they will never complete.

    Swap is for those background processes that have to stop to free up resources that are using what would be wasted cpu cycles. Yet are still doing something useful.

    Use swap correctly is a good tool.

  2. oiaohm says:

    Dr Loser you are correct bing is too polite. You give it 503 and you never see it again or at least not for a very long time.

    Ok maybe you can come up with a 503 message for bing that its knows it been told to leave due to load. Basically the bing bot can be ID quite nicely so giving it a correct message would not be impossible if we knew what to tell it.

    So we cannot just 503 if we want to be indexed. Basically the politeness is hurting because its too polite. We need a way to ask it to be ruder.

    Swapping on a web server can be bad but does not always have to be. I have seen data from complex database searches sent to swap. This does not have that bad of effect if users will be using the same request that is cached in ram latter.

    Basically data in swap in particular cases even on a web server is valid. Claim that is not is wrong.

    A fast swap. Data in swap being basically cache data of requests. Pain quite min. Shoving extra data into swap temp can be a good thing if to regenerate that data is going to take more cpu time than sending it to swap and getting it back.

    Its when the swap thrashing is when you are screwed.

    .net and java and php all have events that cause them to trigger thrashing. hiphop normally does not end up in thrashing. Java to native code also has the same behavior of avoiding thrash events.

    Basically Dr Loser you are taking too simple of point of view.

  3. Dr Loser says:

    @oiiaohm:

    “All the search engine bots site interfacing instructions bar bings obey the one of 503 come back latter. You could do us a favor and get bings bot to respect a 503 message and come back latter like 4 to 8 hours latter when the stress is off the site.”

    And on a specific point of information, Oiaohm, the Bing bots do this too. (Actually, I’m not even sure they come back at all: we have a “politeness” policy which borders on the paranoid and consequently misses out on a small but significant proportion of sites “just in case.”)

    I’ve seen the damn code. Personally I thought it was too polite for its own good. One part of it will even refuse to index a site unless it has a robots.txt file … just to be on the safe side.

    Absolutely every single search engine I have ever seen does this.

    You’re spinning Robert’s favourite link out of thin air again, aren’t you?

  4. Dr Loser says:

    @Robert:

    Backtrack a lot, do you? Never recommending swapping on a server indeed. What on earth was the point of your interjection otherwise.

    I will add a discriminatory note here: we are talking about forward-facing, I/O bound, typically web servers here. Swapping on a database server is a very, very good thing indeed. Otherwise you tend not to be able to deal with datasets over and above, say, 32 GB, which is fairly limiting. (This observation is not limited to database servers alone, obviously. It’s a general comment on work loads.)

    Incidentally, this is why a proper IT set-up has separate farms for web servers and for db servers. There’s no rocket science involved.

    Swapping on a web-server is BAD. Period.

    And there’s no discernible difference between swapping to a Linux partition and swapping to an NTFS partition. Good Lord, did you think that NT swap space suffers from fragmentation? If so, may I disabuse you? Because it doesn’t, and it never has, right back to NT 3.51. Even Dave Cutler wasn’t that daft.

    I would suggest you leave technical recommendations to those of us who are qualified to make them and paid to do so and sacked when they go wrong.

    Here’s what you should have said:

    “I have never recommended swapping on a [Web] server but if you need to do it, I recommend buying more RAM.

    “Oh, and if you’re running a Web server (or somebody else is running it for you), and you are not monitoring for this stuff, then you are doomed.

    “Did I mention that, if you suddenly find yourself forced to swap to disk on a Web server, it is almost certainly because your Web traffic has doubled or trebled?

    “Guess what! You can afford to spend $200 on more RAM!”

    And in a similar vein:

    @oiaohm:

    “That does not appear either. The system is acting like its be hibernated. and restored from hibernation. But in the process the heaviest memory and resource processes have been terminated from existence. In this case it was because openvpn was leaking resources something bad. Not just memory.”

    No, the system is not acting like it be de hibernation small small tingy. In Windows terms, hibernation involves dumping state to a rather large disk file (wholesale paging, if you will). Coming out of hibernation, the disk file is loaded back in to RAM and the system resumes as it was.

    I don’t know what your problem was, but it clearly wasn’t caused by a reboot (otherwise you would have seen the dialog box) and it clearly wasn’t caused by involuntary hibernation (otherwise the system would have come right up again).

    I would suggest that, either you are working with old and busted hardware, or you really shouldn’t be doinking around with a worthless broken piece of crap like openvpn. Incidentally, why were you doing that? And why would you blame M$ when said worthless broken piece of crap leaks memory like there is no tomorrow?

    What is wrong with you?

    And did you, or did you not, report the problem to Microsoft? They actually want you to do this, you know. They have people who are paid to fix this stuff, even if the original problem does not point at Microsoft themselves.

  5. I have never recommended swapping on a server but if you need to do it, GNU/Linux swapping to a partition is the way to go.

  6. Phenom says:

    Pogs, I already told you not to listen to Ohio, that makes you look really bad. 🙂

    Hear me out, Pogs, I will say it once again. Swapping to disk on a server is bad. Bad. The disk is slower than RAM, period. Once you encounter such a case, it means that your system is simply overloaded, and either you should do something to conserve memory on your software, or add more RAM. Optimizing for swap is a meaningless effort, a pure waste of time, and demonstration for lack of skills.

    On top of that what you say is possible on Windows, too, and it doesn’t change the notion that swapping on a server is bad. Simply no architect with any good sense left would do such a thing.

    Btw, one of the projects I am now working on is based on Java and hosted in Linux (CentOS) systems, so please don’t you tell me about boxes. 🙂

  7. Phenom wrote, “you notion that a swap partition is faster is pure crap. The seek time alone is enough to kill you, and other partitions tend to stay somewhere else on disk, not exactly new your data.”

    One thing that kills swapping on that other OS is fragmentation. That’s a lot less likely on a swap partition. While seeking could be a problem, it’s not in practice because in most cases, swapping is the highest priority task and other I/O will have to wait for it. If one has the luxury of a swap partition on a separate drive, there is no extra seeking required. In a lot of my installations, the unit will have multiple drives and that is a good possibility. Even when the swap drive is used by other RAID 1 components, for read operations, the swap drive does not interfere much with the other drives.

    So, Phenom is thinking inside a box of that other OS and he has constipated thinking. GNU/Linux limits use only by imagination.

  8. Phenom says:

    Wow, Ohio, you are getting more and more funny. Not only your praise swapping to disk on a server, which is a definitely bad situation for any server.

    Now, you want to use a memory cache for the swap. In other words, what you say is: sacrifice RAM to improve the situation when we run out of RAM.

    Really wise, Ohio. 🙂

    Btw, you notion that a swap partition is faster is pure crap. The seek time alone is enough to kill you, and other partitions tend to stay somewhere else on disk, not exactly new your data.

  9. oiaohm says:

    Dr Loser
    “How about that little dialog box that pops up after every reboot? What happened to that?”
    That does not appear either. The system is acting like its be hibernated. and restored from hibernation. But in the process the heaviest memory and resource processes have been terminated from existence. In this case it was because openvpn was leaking resources something bad. Not just memory.

    Completely pretends nothing has happened. Its a novel trick of windows. My server has not been rebooted for X days. Might not be true if this has happened. Yes even the uptime counter keeps on ticking. You can set the bios to refuse to reboot straight away start it a day latter and it still keeps up the same pretend.

    Dr Loser
    “Having a sane operating system that deigns to write things to disk every now and again is key.”
    If asked the kswapd will. Default is that it don’t have it.

    Dr Loser
    “If so, are you suggesting that Linux memory management is somehow incapable of doing the very same thing with, say, the Java VM.”
    There is a native code compiler for java for Linux for a reason.

    JIT systems run into the same problem over and over again. The build code into ram. Not a good place to build binary code. Even Java JIT’s. Issue comes in when memory gets short. What are you going to do with the JIT produced binary in ram. Kernel cannot reget it from disk so has to send it to swap to make space. Most JIT don’t intergrate with kernels as of yet to tell the kernel that it can dispose of the JIT produced native code.

    Now lets say we had a JIT that uses the new Transcendent memory http://lwn.net/Articles/340080/ so executable code not in use can be disposed of instead of being sent to swap. This still could have a nasty cpu hit. When it has to be regenerated since you will be back to cold optimization. So this does not exist either because its not going to work.

    HIPHOP takes a different method builds to a normal binary and runs profile guided optimization between runs if enabled. This has the advantage of picking up the hot path optimization JIT do. But with the advantage that its not starting cold if the system is rebooted. Even better the Linux kernel does not have to send any executable code sections out the binary to swap space either. It can clear that binary from memory and reread from disk.

    Linux memory management is very good at getting rid of executable pages in normal binaries that have not been used in a fair while and keeping the hot code paths in memory.

    Linux memory management is doing run-time optimization of used pages and merge of duplicate pages. Its not that expensive to read a binary from disk.

    “On a server, you do not want to swap RAM out to disk.
    Never.
    Not at all.”
    That is the objective. Problem is we are in the real world sometimes running into swap might happen and there be no other option. Its nice if a little run into swap does not equal complete mother disaster.

    To be correct as long as data is going into swap that is optimized. Note data. Not executable code. As soon as you have anything like a JIT leading to executable code being forced into swap you are in major trouble. You can bet the next attacker hit will want that code that is now sitting on the swap drive that you now have to get back. Causing you system to go into thrash hell. Some of the problem here could have been php bytecode generated in memory being sent to swap.

    Again executable code of either form generated bytecode for interp or generated byte code for cpu in swap=you are doomed.

    Dr Loser
    “What a very, very, stupid idea. Only a Linux nut-job would come up with it.”
    Microsoft came up with the idea of only a swap file not a swap partition to avoid the issues. So yes MS nut job came up with the solution first. By putting it inside the file system the file system load balancing kicks in. Please blame the right nut-job since you are at bing its own of your own there somewhere.

    Linux at first did not have swapfiles at all. Swap partitions are faster because you don’t have a file-system under them that you have to interface with.

    “Otherwise, O All-Knowing One, would you please explain to me why servers these days are routinely provisioned with 32 GB of RAM, rather than just relying on a handy SSD drive?”

    Really more ram is better correct. Even with 32GB you can run out. Interesting running something like hiphop you normally don’t use the SSD drive for swap alone.

    Instead you use the SSD drive with bcache to cache the hard drive exclusively. Yes we have a disk cache in memory yet we still have a SSD for another bcache that is disc cache alone. So you have speed up all freq access to disk. Including swap. This does mean if the OS does run out of ram for any reason it has a better hope of digging itself out.

    Decanted caches have there places. They can surprisingly prevent problems. Increase odds that if you do go into swap for any reason you will make it out the other side. It also speeds up disc access something insane.

    Advantage of a decanted disc cache is that its not destroyed in a high memory load event in most cases.

    If you are hitting limits too much like running very heavy 3d renders you might end up running a decanted SSD for swap. 3d renders can run you out of everything.

    I have seen a 64 bit 3d rendering process really be using 500+Gb of memory for really complex seen. (note that actively written it was reading something like a few TB of allocated space) Thinking something like that might be only once in a blue moon using a SSD in combination with a hard-drive as swap still has a valid place. IO rules allows that to happen in background without disrupting the web servers.

    Web server bcache SSD to general has a place.

    Windows does not have a bcache option I just remembered. This is part of the reason why you are so screwed when you enter swap. You don’t have speed on swap and size on swap. You are also running .net so executable code ends up swap so making sure you are totally and completely screwed if you have a swap event.

    Basically Linux with hiphop or java built to native code can dig its way out of a swap event all because executable code is not landing in swap so Linux is not given a death sentence. It will dig it way out simpler if you have something like bcache deployed.

    Yes the supper bad thrash on swap is executable code going in to swap and being pulled back out of swap then go back to swap in a never ending cycle because now data it requires has to be pulled out of swap. Result data and executable code not in memory at the same time. Where it results process/thread table thrashing badly as well. This is the one that normally almost bricks the system.

    This is why .net is a lemon. Java with JVM is also equally a lemon. Its like playing lets roll a dice if my system gets bricked when I get load.

    Pure data can thrash kinda badly but its nothing like executable code. Pure data thrash is only kinda painful. OS kernels cope with this way better.

    Prebuilding the php bytecode can prevent the byte-code being pushed to swap. So reducing disaster if the bad happens.

    I work from the point of view be ready for the worst. Out of Ram is a item to be ready for.

    Dr Loser
    “The original point was the lack of logs. Your response suggested that bots will be nice if you 503 them.”
    Most Search engine crawling bots will leave if showing a 503 message. They are designed that way. Attacker bots with varnish can be black holed. Again it removed load from the back end server or servers. Making the server harder to DOS.

    Varnish gives you options. Nice bots tell them nicely to stop. Nasty bots black hole them. Evil bots be forced to let them in so you get indexed.

    Black whole a search engine bot you don’t get listed in search engine.

    All the search engine bots site interfacing instructions bar bings obey the one of 503 come back latter. You could do us a favor and get bings bot to respect a 503 message and come back latter like 4 to 8 hours latter when the stress is off the site.

  10. Dr Loser says:

    @oiaohm:

    One last thing.

    “Native code works straight into the strong points of the Linux memory management. Hiphop supports word press asp.net kinda does not so unless this side is recoded it not a option.”

    That is a truly demented proposition.

    Are you suggesting that Linux memory management does some sort of static analysis on native code before it runs it?

    If so, are you suggesting that Linux memory management is somehow incapable of doing the very same thing with, say, the Java VM? Because I can assure you that, if it did so, the JIT would take care of about 95% of the rest.

    And then you give in to your standard scattergun mania and quote hiphop and wordpress and asp.net, for some inexplicable reason.

    This shit ain’t impressing anybody here but Robert, and as long as you’re polite about GNU/Linux, Robert is easily impressed.

  11. Dr Loser says:

    @oiaohm:

    I’m sorry to break that down into so many little chunks, oiaohm, but you leave me no choice.

    You are fundamentally incapable of writing a nice coherent (I don’t care about the spelling or even the grammar) paragraph in response to a point.

    So the: here you are. This is how I would suggest you proceed.

    And please try to stick vaguely to the point. This Michael Caine “not a lot of people know this” bullshit doesn’t work amongst industry professionals; however ignorant we might be.

  12. Dr Loser says:

    @oiaohm:

    “Funny enough is there lazy way to prevent the issue. Yep running a slower swapfile.”

    If “lazy” === “moronic” (I’m using PHP syntax here), then you are correct, for once.

    What a very, very, stupid idea. Only a Linux nut-job would come up with it.

    I repeat:

    On a server, you do not want to swap RAM out to disk.

    Never.

    Not at all.

    Otherwise, O All-Knowing One, would you please explain to me why servers these days are routinely provisioned with 32 GB of RAM, rather than just relying on a handy SSD drive?

    (Actually, to be fair, putting the swap on a particularly high-quality SSD drive might be a very good idea. But that’s not my point.)

    Not only do you know nothing about Windows Servers, but you apparently know nothing about Servers in general.

  13. Dr Loser says:

    @oiaohm:

    “Dr Loser IO configuration is key.”

    No it is not.

    Having a sane operating system that deigns to write things to disk every now and again is key.

    Like I said, I’m utterly shocked that Linux can’t do this. I mean, we’re not talking a North Korean attack on the DoD here; we’re talking a smallish denial of service on Robert’s blog.

    That’s a rather pitiful Failure To Perform, don’t you think?

  14. Dr Loser says:

    @oiaohm:

    “I guess you are not running bios level logging recording reboots so you can see the OS has reboot yet the event log records no such reboot or anything else at that time.”

    I’m not doing anything at all, you idiot. I’m not a SysAdmin. But, believe me, if any of this stuff affected Bing SLAs, I’d have heard about it by now.

    Are you genuinely so far gone that you believe a professional organisation won’t recognise a reboot unless it’s in the event log?

    How about that little dialog box that pops up after every reboot? What happened to that?

    Seriously, young man: you clearly don’t have a clue what you’re talking about, at least when it comes to the world of Windows Servers.

  15. Dr Loser says:

    @oiaohm:

    “Its a bug you don’t want to trigger because it stealth 2003 and 2008 have it.”

    Well, that would certainly be handy in differentiating it from all those other bugs that I normally trip over myself in my enthusiasm to trigger them.

    Gibberish again. Irrelevant. Who cares?

  16. Dr Loser says:

    @oiaohm:

    “I found how MS handles a over stressed event after tracing down why a openvpn on windows server was losing connection. It was dead after a reboot. Yet the logs recorded no such reboot.”

    Hardly surprising, really, if it was dead after a reboot. Or are you in the habit of communing with the dead?

    This is meaningless, oiaohm. Did you report the event to Microsoft? Or am I allowed to assume that you are just full of shit as usual?

  17. Dr Loser says:

    @oiaohm:

    “Dr Loser this is one time windows connection limits is good. It is harder to trigger the exceed event under Windows because of it. When you do it really special you can find it.”

    You keep banging on about this, but you provide absolutely no proof that it exists.

    So: who cares? Irrelevant.

  18. Dr Loser says:

    @oiaohm:

    “503 response to some is better than massive failure and complete no response to anyone.”

    Who cares? Irrelevant.

    The original point was the lack of logs. Your response suggested that bots will be nice if you 503 them.

    Ludicrous. Irrelevant.

  19. Dr Loser says:

    @oiaohm:

    “Dr Loser varnish is highly compact little item. Takes very little ram to handle connections.”

    Who cares? Irrelevant.

  20. oiaohm says:

    Dr Loser varnish is highly compact little item. Takes very little ram to handle connections.

    Yes its useful in a denial of service attempt from a ddos since when it setup selectively discards. Yes you can lock the upper limit of number of connections that make it to the web server with varnish ever lock based on server load. So you never get in this mess of the server means to processes being exceeded so ending up in the OOM killer loops. 503 response to some is better than massive failure and complete no response to anyone.

    Dr Loser this is one time windows connection limits is good. It is harder to trigger the exceed event under Windows because of it. When you do it really special you can find it.

    I found how MS handles a over stressed event after tracing down why a openvpn on windows server was losing connection. It was dead after a reboot. Yet the logs recorded no such reboot. Nice having a tyan with a coreboot bios that you can insert boot logging into the bios. When I say pretend it did not happen I really do mean pretend it did not happen. Its a bug you don’t want to trigger because it stealth 2003 and 2008 have it. I guess you are not running bios level logging recording reboots so you can see the OS has reboot yet the event log records no such reboot or anything else at that time. To the network it appears the machine lagged a little. Its faster than a normal system boot as well like like some sneaky restore from suspend with all heavy using processes stripped.

    Dr Loser IO configuration is key. Default Linux server you logging goes to local disk. This can be altered to go to network. Not all cases do you care about the logging either. Logging failure under solaris so goes down the same way. Swap thrash of death in the wrong place not managed.

    “That’s probably the single most common cause of server failure. It isn’t the amount of RAM per se; it’s the probability that, once you start swapping, you will never recover.”

    This is not true. You only never recover if you io management is not fully setup and you don’t have something to limit connections ie network io management. This limiter can be something like varnish or a LVS. IO limits have to be configured to the hardware.

    With varnish server knows it limits have been exceeded so reduces number of connections it will take for a while so it sorts itself out from the swap thrash of death.

    Linux presumes that you swap partitions and you main partitions are not on the same drive by default. So swap can go for it. Yes this is a bit of a logic error some of the time but a perfectly correct response in others.

    Historic rules of installing a server are forgotten today like swap on independent drive. So lead to the no log event. Swap and data on the same drive is not something default configured Linux kernel particularly likes.

    The disc io config is more information Linux that its not on huge hardware with decanted swap drives that it can consume all the bandwidth to without causing issues. Once informed it must give so much disc io to other things logging on Linux becomes quite dependable.

    Funny enough is there lazy way to prevent the issue. Yep running a slower swapfile. Linux automatically does not stave the file-system out on the drive doing that. Of course people forget Linux has swap files and swap partitions and have forgotten why they exist. Swap partitions are design for the presume I need to swap I can take the full bandwidth to that device unless told otherwise.

    Yes bad presume that all partitions near a swap partition are not critical unless informed otherwise.

    NT swap partitions use todo the same thing. MS fixed it by removing the swap partition option since that was simpler than adding management.

    Dr Loser I guess you have forgot the evil of working with NT swap partitions and the strange issues they could nicely kick up.

    All OS’s with swap partitions with swap and data on the same drive can sometimes have a non configured IO for swap take out logging. AIX, Solaris, BSD and Linux are all the same in this regard.

    Failure to configure your io rules and have management for all forms of io makes a mess so leading to a system thrash it self to death trying to support what it being request todo that is too much. Yes Windows is not above this.

  21. Dr Loser says:

    @Ray:

    I believe that the Raspberry Pi has a spare end, featuring a USB connector, that would work perfectly for this.

    Should we offer our services to Robert as consultants?

    @Robert:

    So, you’ve quadrupled the RAM?

    Goodbye, small thingies. And it wasn’t a particularly ferocious Beast in the first place, was it?

  22. Ray says:

    There’s always the option of having a remote server for caching

  23. Dr Loser says:

    @oiaohm:

    “Robert Pogson varnish-cache can selectively display “503 Service Unavailable” to bots. This tells bots to go away and bother my site latter.”

    Not a whole lot of use in a denial of service attack, is it? Which I believe Robert was complaining about, and which wouldn’t surprise me in the least.

    If this was a bot-related incident, it’s seriously hard to believe that the bots in question belonged to search engines.

  24. Dr Loser says:

    @oiaohm:

    “Dr Loser there is a reason to use cgroups. Little unknown fact is that kswapd if you don’t have io controls enabled can end up taking all io resources. So leading to the nasty case of no logs.”

    So now there are “little known facts” and “little unknown facts.” Soon, Mr Rumsham, you will be delighting us with “huge possibly slightly misunderstood facts.”

    Not only is it a little-known fact, it’s a total irrelevance. No proper server OS requires this sort of band-aid in order to allocate resources between different IO domains. I was actually sort of shocked when I read that this is even possible on a Linux server, and I’m supposedly a shill who is paid not to be shocked by this stuff.

    “Windows server the same event normally blue/red screens of death when it cannot write logs. Then nicely automatically reboot and try to pretend it never restarted.”

    Never seen one myself, and remember we at Bing run a humongous number of Windows 2008 servers (also a fair few legacy 2003 items).

    As usual, links requested.

  25. Dr Loser says:

    @Robert:

    Before diving in with my usual snarks, have you considered the fact that your PHP app (or even Apache: I’ve never seen it, but it wouldn’t surprise me) is leaking memory like mad?

    That’s probably the single most common cause of server failure. It isn’t the amount of RAM per se; it’s the probability that, once you start swapping, you will never recover.

    Just a thought. I’d recommend a PHP tool to analyse the stuff, but it’s not an area I spend much time in.

  26. oiaohm says:

    Phenom
    “3. Native code cannot solve the problem with scalability only by itself. An OS with good vertical scalability features is needed, too, and Linux is not one of these. ”
    Native code is exactly where Linux vertically scales.

    Native code works straight into the strong points of the Linux memory management. Hiphop supports word press asp.net kinda does not so unless this side is recoded it not a option.

    http://huichen.org/en/2010/06/benchmark-wordpress-with-hiphop-for-php/

    Yep 4x faster. So same hardware can take up to 4 times the load. Of course more if you run profiling on the native code for hot paths it will go even higher than that. Yes 4x without fully optimizing the binaries with profile guide optimizations.

    Also its playing into the Linux memory management strengths. ksm works better with native code extending the upper limit even more. Hugepages also work better with native code.

    Yes a hiphop for php can beat serving static pages Nginx or appache or IIS hands down.

    VB in ASP.NET is technically not interpreted. Its JIT turned to native code.

    hiphop also has a Jit mode where it does profile guide optimization. So sorry ASP.Net is not fast. Hiphop copes better under high load than ASP.Net does. Reason native binaries OS does not try to send to swapfile. ASP.NET JIT bits can end up sent to swap making cpu usage worse.

    Robert Pogson varnish-cache can selectively display “503 Service Unavailable” to bots. This tells bots to go away and bother my site latter.

    I was not really talking about varnish as a cache but as a pest filter something varnish is very good at. Varnish can be set up to let number of bots you can handle in at a time. If you get more tell the others 503 so they comes back latter. Yes you have a universal way to tell these bots to come back latter. So removing o crud the search engine bot killed me problem.

    Yes 503 is message you should display to search engine bots when your server is overloaded. They obey and normally will come back for another attempt latter.
    http://www.google.com/support/forum/p/Webmasters/thread?tid=07309f6d7a300cdc&hl=en

    You can check with other search engines and you will find the same thing. Problem avoided by tell them to go away when too many come.

  27. Phenom says:

    Moving the goal post, uh, Pogs?

    64 cores… What about Sun Enterprise UltraSPARC T5440 with its 256 hardware threads?

    Speaking of scaling I would make my bets on AIX / Solaris anytime against Linux. Or against Windows.

    Back to the topic. We’re discussing that PHP as a language sucks, and you can save money on RAM if you use some basic cache.

  28. Phenom told the joke of the day:“An OS with good vertical scalability features is needed, too, and Linux is not one of these.”

    Bloor, 2001: At present, Linux scales well to 6-way SMP on Intel Hardware… expect the next major revision (Linux 2.6) to provide 16-way scaling in about one year.

    Oracle, last week: “Oracle Extends Performance Leadership with x86 World Record on TPC-C Benchmark”

    wait for it … on GNU/Linux. They used 8 Xeon processors with 64 cores.

    This is clearly a tiny niche. They spent $5million on 5U. It’s barely relevant to the scalability of Linux but, by this measure, Linux scales better than all the rest who compete in the TPC tournament.

  29. Phenom says:

    Pogs, wrong again on all points:

    1. PHP is a horrendous language by design from pure software engineering point of view. The mere fact that it had to introduce an operator like === is a single proof that the language sucks for any seasoned developer. Mind you, I don’t speak of the libraries, I speak only about PHP as a programming language.

    2. It is interpreted, indeed, but still slower than other interpreted languages. Even VB in ASP.NET was faster.

    3. Native code cannot solve the problem with scalability only by itself. An OS with good vertical scalability features is needed, too, and Linux is not one of these.

    4. High loads on read operations are mitigated exactly with caches. Fact, despite what you’d like to think.

  30. PHP is an excellent language for development but it’s interpreted and so is slower than compiled native code. What the world needs is a proper PHP compiler, not just caches. Caches don’t work when a bunch of bots try to download the whole site.

  31. oiaohm says:

    Phenom note if you are thinking me. I was praising php for a coding syntax for the job. Production I normally use php to c++. As a language for html design php is nice. As a running environment it can be a little heavy.

    Robert Pogson https://www.varnish-cache.org/ nice thing to have set up to universally take the bots. so avoiding php normal going off the deep end. Also its handy when it intentional.

    Robert Pogson also make sure you have ksm memory management enabled. That thing is magic with php at times and a bot problem at stacking the memory up so reducing the swap problem. Some sites I have seen ksm reduce required memory by more than half under heavy load.

    Dr Loser there is a reason to use cgroups. Little unknown fact is that kswapd if you don’t have io controls enabled can end up taking all io resources. So leading to the nasty case of no logs.

    Windows server the same event normally blue/red screens of death when it cannot write logs. Then nicely automatically reboot and try to pretend it never restarted.

  32. Phenom says:

    Hm, I vaguely remember someone here praising PHP for its outstanding qualities.

  33. We are the culprits, trying to run on too little RAM. The CPU was idling and the thing was swapping like mad. Wait-queue got up to 30-something. Nothing was being finished before processes were killed off for Out of Memory condition. Normal operation was OK but the bots are merciless…

    It’s a reflection of the slowness of PHP combined with the increasing popularity of the site. There’s only so many eggs we can fry at once.

  34. Dr Loser says:

    That’s commiserations rather than condolences, btw (apologies for loose thinking), but the commiserations still stand.

  35. Dr Loser says:

    @Robert:

    I wasn’t aware that you had “normal” users, Pog. I feel loved, all of a sudden…

    It’s a bit of a surprise to find out that a Linux server spends so much of its time servicing a Denial of Service attack, to the extent that it can’t even be bothered to write things out to the logs. The one thing that Linux is reliably good at, in my experience, is filling /var/log up with pointless jabber.

    On a more serious note, my condolences, and I hope somebody identifies and punishes the stupid bastards responsible.

  36. I scanned the recent logs for unique IPs and found the most hits were for our normal users. The logs did not even get written after several bots arrived simultaneously. I suppose I could ban everyone but Google… 😉

  37. Hanson says:

    Most likely paid Microsoft shills. They’ve got you in their sights, Pog.

Leave a Reply