Office 364 – M$ Charged with False Advertising

The Advertising Standards Agency is investigating M$’s claims of 99.9% availability in advertising.

Typically, one can get uptime like that with GNU/Linux on a single machine. With a cluster 99.9% should be trivial but not when you are using that other OS with wait-please-wait and re-re-reboots and malware and … Sometimes even clusters get confused.

see ASA probes Microsoft cloud reliability claims

It’s only Office 365 if you round up” I think it’s actually closer to 363… M$ may be “all-in” for the cloud but at this rate, they will be providing it as a free service if they have to keep giving discounts.

About Robert Pogson

I am a retired teacher in Canada. I taught in the subject areas where I have worked for almost forty years: maths, physics, chemistry and computers. I love hunting, fishing, picking berries and mushrooms, too.
This entry was posted in technology. Bookmark the permalink.

34 Responses to Office 364 – M$ Charged with False Advertising

  1. oiaohm says:

    Robert Pogson basically someone stuffed the configuration of the dns side up badly. Should have been on the check list so it does not happen.

    I do expect someone at Microsoft has been shown the door. Over the complete Office 365 stuff up.

    Yes its part of standards to be operating more than 1 dns master comes as part of risk assessments. ISO standards do cover the case if you are forced to operate 1. There are a stack of things you are meant todo due to the higher risk you are running. Recommended on duel or more dns severs at independent locations mandatory requirements if you only have 1 or both are in the same complex. All comes out of doing the risk assessments.

    ISO standards for hosting prevent stupid errors. ISO/IEC 27000 series are a very good thing to have sitting you shelf.

    ISO/IEC 27035 is the most critical how to manage a secuirty breach and ISO/IEC 27037 how to preserve evidence so who ever breached you system might end up in jail.

    I don’t think I have mentioned those to you before Robert Pogson. They are not short documents.

    Blogger case is not what you call a simple case of something just not answering. It a case that the data storage went nuts due to new storage nodes not replicating from old proper. Murphy is a prick this way at times. Worst is when it turns out the new have a dud batch of hard drives in them. Had that one.

  2. oiaohm wrote, “ns2.msft.net,ns1.msft.net,ns5.msft.net,ns3.msft.net,ns4.msft.net 5 dns servers instead of 1 that they had before the disaster with no monitoring if it was working so no N+1 in case of server failure.”

    Is that true??? It has been standard practice to have at least two DNS servers for a domain for ages. Perhaps they had a loadsharer in front of the DNS servers and thought that was good enough…

    Amazing, if true.

  3. Thanks for that. Bad things do happen. With the huge amount of data involved it does take time to fix things. I was not ignoring downtime on blogger. I have never seen any and my search found good uptime.

    So they had one day downtime in how many years? 10 years. Wikipedia used to have a “problems” section for Blogger but dropped it in the interests of neutrality… Apparently the problems are not worth mentioning.

  4. The choice of platform for this blog was not mine. My son provided a server. I like WordPress.

  5. Provide a link. I have never heard of anyone doing that. With ECC on RAM and redundancy codes on storage it should not be necessary.

    As is common in server-class deployments, the disks were powered on, spinning, and generally in service for essentially all of their recorded life. They were deployed in rack-mounted servers and housed in professionallymanaged datacenter facilities.”.

    Google never restarts their hard drives. Why should they restart their servers? I get no hits for “Google reboots servers”.

  6. RealIT wrote, “Please explain how Linux would move my session state to another webserver after it failed and I can magically resume everything I was doing on another web server.”

    The IP address of the old server is taken over by the new server which has been synching with the data on every write to the old server. If the session in question is in an interpreted language like PHP, the session information will be synched as well. Otherwise, one might need to login again. A good way to be gentle to end-users running web applications etc. is to run the session on a reliable terminal server and do failover on the cluster of servers for the file-system. Then the user’s session may not be interrupted at all. If the server on which the session resides fails, you will have to relogin but at least your data will not be lost. With VNC one can even arrange that if a thin client fails one can resume the session without interruption on another thin client.

    GNU/Linux is so reliable I have never been tempted to use high-availability terminal servers. I just use multiple servers and when one fails, a user can login and connect to another server. This can be done easily just by supplying multiple DHCP servers but better to use DHCP-failover. see http://republico.estv.ipv.pt/~nmct/ltsp/ha/ltsp-ha-howto-en-gpl.html

    So, the session can be saved in some cases but not in all cases.

    SAN’s often use RAID 1 and they can use RAID 1 over TCP to be more reliable.

  7. oiaohm says:

    quality of service features by ISO standards for hosting RealIT not the network traffic meaning. QoS is network traffic completely wrong meaning. Yes there are two quality of service means in IT. ISO define is also quality of service as per contract of hosting supply requirements.

    Office365 service not available due to DNS resolve from internet side being gone. So not outage at server farm other than the DNS server there being dead but outage to everyone trying to use it had an outage because the web address would not resolve any more because microsoft dns server for Office365 had failed. Yes they had only assigned 1.

    Of course they now have got it sorted.

    ns2.msft.net,ns1.msft.net,ns5.msft.net,ns3.msft.net,ns4.msft.net 5 dns servers instead of 1 that they had before the disaster with no monitoring if it was working so no N+1 in case of server failure.

    This was incompetence. You should always try to avoid depending on just one location for global dns provide. And if you have just one location it should have an operational N+1.

    If bring the reserve global dns provide backup server online and “SOA and corrupts even more DNS entries” your incompetent because you backup was not upto date as it meant to be by ISO standards for hosting. Each locations Global Provide SOA servers should be spitting up exactly the same data always. Since the stuff is basically static content unless you are running on dynamic IP’s that Microsoft is not.

    RealIT Cluster file systems some of those go across data centers “Geographic (asynchronous) Replication between” cluster groups. Yes a Cluster can extend across many data centers in different Geographical locations. RealIT. Lot of multi data center clouds depend on cluster tech for replication between data centers.

    Cisco ACE modules that are so called “hardware load balancer” internally some are Linux Virtual Server yes running on a way faster chip and Some are some are vxworks with custom cisco software for routing and some are some other groups custom software for the job. None are truly are hardware as such you can convert them and update them. Basically I see windows trained people call them hardware load balancers because they don’t know how to make them tell them what they really are so are just black boxes to them. Lot of devices people call hardware load balancers are just an OS running on some good network performing hardware like a tile or other massively multi core chip for traffic routing/virus scanning/other creative uses.

    Linux blade server define is brain melting those 3 words in that order have many defines. Depending who you are talking to it either means a single blade or a blade rack full of blades or a building full of blade racks operating as a Linux cluster of some form or worse a globally spread Linux cluster using blade parts at all locations.

    Yes a single blade rack or a single blade or single building all are singular object objects or a globally distributed blade rack based cluster. Problem is motherboard count is way different. 1 vs 14 vs some insane number vs some insane number in multiple locations. Yet they all manage to be called a Linux blade server. Its one of the super vague point of IT english. Yes they are all Linux servers made by blade design so are a form of Linux Blade Server. Reason why you see Linux server blade written or a Linux server running on a blade rack with x number of blades but never Linux blade server you will confuse people.

    Word sequences with multi meanings are evil particularly when they are one word in a different location to a single meaning sequence. Linux blade server many meanings Linux server blade one meaning. In fact you can drop the word Linux. blade server is vague server blade is not. Adding Linux just expands how far the vague means.

    Always make sure you know the means the person you are talking to are using.

    “Please explain how Linux would move my session state to another webserver after it failed and I can magically resume everything I was doing on another web server.”
    http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html cluster operation of webservers. Very much to the samba method but its normally linked to the scripting solution like tomcat and so on.

    Issue here this is not Linux special so why in hell are you having to ask me if you are an experienced IT.

    Mostly because you session state was not stored on one web-server so the failure of one web-server is quite min. Nothing says you session state has to be stored on one machine.

    Normally cluster groups under the load balance so a single machine failure does not disrupt things. Makes repair work simpler.

    So yes many layers of clustering each at different levels in the stack. Syncing different data. Makes a very strong cloud. Of course sending state data center to data center is not possible so you normally design it that it acts like the link failed but user can still login.

    Cloud is a stack of cluster techs glued together in the right way normally to get the job done.

  8. RealIT says:

    “Microsoft case is incompetence because quality of service features were not in place to detect a dns server and get a replacement server online.”
    Huh? How does QoS “detect” a DNS server and initiate a failover? You do know what QoS does right?

    “Yes the complete time MS Office online office was online just no one could resolved to it because the primary dns source for it was sending out the wrong address. Kind a simple test run nslookup by automated script like 1 an hour to check if the dns server is resolving the correct address. Fail to resolve correctly send SOS and try to bring reserve server on line”

    So which is it? Was the Office365 service not available, or was DNS an issue? Cool so Office365 infact did not suffer an outage. So your script will now bring online an out of date DNS server and serve even more out of date records. Because you know DNS never dynamically updates. We just manually sync DNS around the world every minute. How would NSLOOKUP even know a ALIAS is out of date? Are you going to maintain a list of ever changing IP adresses?

    “In fact the dns server was not responding at all. Yep so no one was being directed to the virtual ip.”
    So you imply that there is only one DNS server in the world serving that ALIAS? Just one. And your PC just goes straight to that DNS server. Using magic? Because thats how DNS works? DNS does not work like a HOSTS file.

    “Truly RealIT would you not be kicking yourself for such a stupid mistake. This is pure poor quality work.”

    No, I’d kick you for bringing online a out-of-date DNS server who now has SOA and corrupts even more DNS entries.

    “Lot of hardware load balancer are in fact this http://www.linuxvirtualserver.org/” Thats not a hardware load balancer. That’s a software load balancer. An no there are not alot of them. That I can assure you. Infact I’d hedge a bet that most load balancers are Cisco ACE modules or F5. But I dont need to tell you that right?

    Samba cluster Magic states? File shares are not exactly good examples of session states. Either your file opens or not. Once open the PC doesnt care if the share is available until you hit save. Nice try though. Heres one you can try. Please explain how Linux would move my session state to another webserver after it failed and I can magically resume everything I was doing on another web server.

    “Cloud computing is still normally cluster backed to give tolerance to failures.”
    No, cloud computing is hosting a service across multiple data centers around the world and leveraging DNS to direct you to an available data center. Not a 3-way cluster sitting in a rack and calling it a cloud.

    “Problem is also that a Linux blade server gets called one machine. So you need to know what there define of one machine is.”
    This sentence melted my brain. Normally we refer to anything on its own as “a” something. Dont know why you think Linux would require a different definition of a singular object.

    Oh and Pogson… twit? Don’t be so aggressive. I know you couldn’t answer any of my questions but thats ok. You can RAID 1 all you like but most people use SAN storage thereby negating the point of failure which is server based disk sharing. Just saying.

  9. Phenom says:

    Pogson, let me tell you a secret. At Google, servers, yes, Linux servers, are restarted automatically every couple of hours on a rotary schedule. Every two hours. You know why? To improve stability, to avoid memory and resource leaks. And Google are not the only ones doing that.

  10. Ivan says:

    Yeah yeah, we know you like google and have weird opinions about it, Bob. But if it is so great, why aren’t you using it for this blog?

  11. D-G says:

    Yes, I apologize for calling you stupid. I will from now on call you a devious weasel. It’s so funny that you always accuse others of spreading FUD. But you yourself also like to ignore FACTS that aren’t compatible with your view of the world. Like Blogger having been down for more than 24 hours in May 2011.

    Get it in your head, Pog. The two links you posted date from December 2010 and January 2011. The Blogger outage happened in May 2011. Every IT news site reported it, and non-IT news sites, too. Here’s just one example:

    http://www.zdnet.com/blog/bott/googles-blogger-outage-makes-the-case-against-a-cloud-only-strategy/3300

    Please get your facts straight. Comparing Windows XP to up-to-date Linux distributions, citing uptime measurements from 2010 while ignoring 2011 outages, and so on. You really like comparing apples and oranges.

  12. D-G wrote, “The topic was blogger, and the link above has NOTHING to do with Blogger.”
    Title of said link: “Google Apps: No more downtime like Blogger.com?”
    Google owns Blogger and Google Apps. Google knows how to do uptime. Twit. If you weren’t so busy calling people stupid you might have a more thoughtful reply or better, none at all.

    I cannot read your mind twit. If you want me to comment on a particular article or incident, give a link. Otherwise shut up.

  13. D-G says:

    Once again, you are too dumb for words, Pog.

    “almost 100% uptime…” (http://popherald.com/google-apps-uptime-blogger-gmail/4406)

    No, Pog. The topic was blogger, and the link above has NOTHING to do with Blogger. It’s about Google Apps. But as this is your forte you’ve taken a meager half-sentence (“almost 100% uptime”) out of context and used it as proof. You also conveniently forgot that the link above dates January 2011. But Blogger’s more-than-24-hours outage was in May 2011. God, you’re stupid.

    “Gmail: 99.984% uptime in 2010.”

    No, Pog. We weren’t talking about Gmail either.

    “‘The winner was without a doubt Google’s Blogger. The Blogger blogs didn’t have any downtime whatsoever during the two months we monitored them, followed by WordPress.com which had very little downtime.’

    see Royal Pingdom

    What are you blathering about?”

    He’s “blathering about” the fact that Blogger was down for more than 24 hours in May 2011. Didn’t you write something about Office 365 being closer to Office 364 or Office 363? Well, Blogger can join the club.

    Honestly, have you been living under a rock? How someone like you who apparently digests every bit of obscure IT news could’ve missed the Blogger catastrophe is quite beyond me.

    “I use Gmail and cannot remember the last time it was down.”

    You can’t? I can. On Tuesday I couldn’t reach it for nearly one hour.

    That aside, Pog, you’re using Gmail? Hasn’t somebody told you that Gmail isn’t free?

  14. oiaohm says:

    RealIT you have not read how MS screwed it up have you.

    Microsoft case is incompetence because quality of service features were not in place to detect a dns server and get a replacement server online. Yes the complete time MS Office online office was online just no one could resolved to it because the primary dns source for it was sending out the wrong address. Kind a simple test run nslookup by automated script like 1 an hour to check if the dns server is resolving the correct address. Fail to resolve correctly send SOS and try to bring reserve server on line. Not rocket science prevention required. In fact the dns server was not responding at all. Yep so no one was being directed to the virtual ip. You can set up as many virtual ips as you like but people still need dns in most cases to find there way to it.

    Truly RealIT would you not be kicking yourself for such a stupid mistake. This is pure poor quality work.

    RealIT it depends on cluster design what happens when 1 node fails. Lot of cluster filesystems operate very well with nodes missing. So web clusters are design with an operational fall back. Yes each node is able to maintain isolated operation just read only until the cluster file-system is reconnected. Master-Slave clusters.

    Due to internet not being dependable user states get mashed up anyhow as part of day to day web server operations. So as long as when the user logs back in everything is working there is no problem.

    Lot of hardware load balancer are in fact this http://www.linuxvirtualserver.org/ really RealIT if Microsoft DNS server had been behind something like this as a cluster everything most likely would have been ok.

    Example of magic state resuming on a different machine in cluster is done by samba in clustered mode using ctdb alongside a cluster filesystem.

    Internet not required. Most users will just start a new state and blame the connection not you.

    Cloud computing is still normally cluster backed to give tolerance to failures.

    Problem is also that a Linux blade server gets called one machine. So you need to know what there define of one machine is.

  15. Twit. We can do RAID 1 over the network so two or more servers have to fail to take down storage. Fail-over takes seconds to accomplish.

    see DRBD and Heartbeat

    Services that can be made highly available this way include storage, HTTP, MySQL, and DNS. see Linux-HA

    “If you you deploy DRBD in active/passive (failover) mode, expect Heartbeat, RHCS, or the other cluster manager of your choice to take around 15-20 seconds for failover. Any subsequent recovery procedures by your application may add to that. It’s highly unlikely that you’ll go beyond 1 minute failover time when your setup is properly done”

    see When not to use DRBD

    Some setups do fail-over in as little as 4s. A dozen such failures in a year is still 99.999% uptime.

  16. RealIT says:

    Bah, and I can’t end a sentence very well either. Must be because Office 2010 is installed and only has 40 lanaguage packs…

  17. Hmmm…
    http://popherald.com/google-apps-uptime-blogger-gmail/4406

    “almost 100% uptime…”

    Gmail: 99.984% uptime in 2010.

    and

    “The winner was without a doubt Google’s Blogger. The Blogger blogs didn’t have any downtime whatsoever during the two months we monitored them, followed by WordPress.com which had very little downtime.”

    see Royal Pingdom

    What are you blathering about?

    I use Gmail and cannot remember the last time it was down.

  18. RealIT says:

    What has clustering got to do with cloud computing? Two different architectures Poggy. Anyway your cluster theory is flawed. Are you running an Active/Active, Active/Passive, Majority Node Set deployment? Regardless, you will still incur downtime when a cluster fails-over. You think user connections, session states, active transactions just auto-magically just resume on another node? Does Linux now ship with quantum states? Even if you deployed a super spectacular Linux cluster with more redundancy than Linux itself, monitoring would still pick up that a cluster failed over and you would incur downtime for the duration that that server is offline.

    Why would you switch DNS? Surely in your design you would assign a CNAME associated to a V.I.P and let a hardware load balancer handle node availability? Just more down time and more manual labour.

    And then RAID 1? Sure mirroring is fine but the server is off. Are you going to run to the server room and yank out the disks and place them in the other cluster node? No SAN? No FC or even iSCSI?

    Your description looks like a mish-mash of terms you looked up on the internet and hoped to sound clever in a real world IT scenario. YOu don’t design very thought out solutiontion do you?

  19. NT JERKFACE says:

    Blogger went down for over 24 hours this year, blogger runs on Linux.

    Why does Google continue to use Linux?

    It must be the OS, there couldn’t be another explanation.

  20. Some people use their computers 24×7 like M$’s cloud. Changing the registry settings without a restart accomplishes little.

  21. I am not going to give M$ a cent if I can help it. “8” is available to the world as a beta test now. If M$ is as good as people say, I should rush out and download their latest and greatest, not that old “7” stuff.

  22. Binkly says:

    Mr. Pogson, don’t you think that you should just try Windows 7 first so that you can at least say you’re trashing it as a user?

    It’s kind of hard to take what you’ve said seriously when you haven’t even used Windows 7.

    Just a thought.

  23. Contrarian says:

    “These changes require that you restart your computer.

    The security update makes changes to registry entries that are read only when you start your computer.””

    Read it for what it says, #pogson. The Registry is just a storage place for data that an application (or OS module) uses to control its configuration. So if a process is running and an update changes a registry setting that will not be re-read until a re-start, then you have to restart. Else, you do not. It is not required for module to only read registry information one time, and most do not work that way. If you have a change to one that does, then it is obvious that you must re-start.

    Merely changing registry settings does not require a re-start.

    What is the big deal anyway? A few minutes of maintenance every so often is not such a hardship. I have my machines set to do any autoupdate in the middle of the night when I am not using the computers.

  24. The second link does not mention XP and is more recent in M$’s database. I have no cross-reference of M$’s stuff and don’t want to get my hands dirty visiting their site. The page I viewed even said stuff not relevant to my OS (Debian GNU/Linux) was suppressed… I am not going to run “7” just to visit their site.

  25. oldman says:

    “Windows XP will automatically restart the PC, to apply all the updates.
    Back to the top”

    But we are now running windows 7 Pog, and I can tell you for a fact that not all updates require restarts.

    If you keep talking about windows XP, you just IMHO keep making what you say less and less relevant.

  26. M$:
    “After installing the downloaded updates, Windows XP will automatically restart the PC, to apply all the updates.
    Back to the top
    CAUSE
    This is not an issue and is by design. However, this has been addressed in Service Pack 2, where Windows XP SP2 installed PCs will install the downloaded updates at the time of restarting the PC. In pre SP2 systems, this feature is not available.”

    M$:
    “Why you may be prompted to restart your computer
    After you install a security update, you may be prompted to restart your computer if one of the following conditions is true:
    The security update updates a DLL that is loaded in one or more processes that are required by Windows. The security update cannot be completed while the DLL is loaded. Therefore, the security update must stop the process that causes the DLL to be loaded. Stopping the process will unload the DLL that is required to complete the update. However, the process in which the DLL is loaded cannot be stopped while Windows is running. For example, the security update that is described in security bulletin MS04-011 updates many DLLs that are loaded in core operating system processes that cannot be stopped without shutting down Windows.
    The security update updates an .exe file that is currently running as a process that is required by Windows. The update cannot be completed while this process is running. However, you cannot force this process to stop unless you shut down Windows. For example, Csrss.exe is a required process in Windows.
    The security update updates a device driver that is currently being used and that is required by Windows. The update cannot be completed while this device driver is being used. However, you cannot unload this device driver unless you shut down Windows. For example, Disk.sys is a device driver that is required by Windows.
    The security update makes changes to the registry. These changes require that you restart your computer.
    The security update makes changes to registry entries that are read only when you start your computer.”

  27. Contrarian says:

    “Because GNU/Linux does not use that damned registry there’s no need to re-re-reboot for most updates.”

    Where did you learn that, #pogson? That is not the reason for rebooting if/when necessary.

  28. How come M$ is getting less than 99.9% uptime? Quit dodging the real question. You should be able to get 99.9% even with that other OS by clustering. e.g. 1% downtime with two in a cluster could mean up to 99.99% uptime for the cluster. If M$ is “all-in” why can’t they get clustering right? Perhaps they are not clustering…

  29. kolter.online says:

    *their

  30. kolter.online says:

    Only “excellent”?

    I would expect WP7 to have the absolute best – nay PERFECT integration with office, considering they’re both made by the same company, and integrating/lock-in is there forté.

  31. NT JERKFACE says:

    Because GNU/Linux does not use that damned registry there’s no need to re-re-reboot for most updates.

    Writing to the registry doesn’t require a reboot, and doing it the Unix way of writing to a dozen poorly named config files wouldn’t improve uptime.

    Oh btw I bought a WP7 phone the other day, you should try one. The Office integration is excellent.

  32. D-G says:

    “Because GNU/Linux does not use that damned registry there’s no need to re-re-reboot for most updates.”

    Oh, crap! I just installed six updates into Windows 7 and I didn’t have to reboot the machine. Is this a conspiracy?

    “So, what is M$ doing wrong? Using that other OS.”

    Sure thing, Pog. That’s why commercial web hosting providers do exist that offer Windows hosting with 99.9% SLAs. Because it’s not doable. Sometimes you should think before releasing your stream of unconsciousness to the web.

  33. The ASA can block the ads amongst its members.

    Of course, with GNU/Linux providing such a service would likely be done on a cluster in which case you get closer to 99.9999% uptime. What is M$ doing to get such low uptime? They are using that other OS…

    Typically when I install a new system with GNU/Linux there could be some infant-mortality issues that are discovered in testing. Then I expect the system to run trouble-free for years. Suppose I reboot occasionally for a new kernel, say 5 times a year. That takes a minute or so each time. Because GNU/Linux does not use that damned registry there’s no need to re-re-reboot for most updates. Likely a 1% failure rate per annum for hard drives is the limiting factor, which I can likely use RAID 1 to overcome. So, 99.9% uptime is easily achieved with GNU/Linux. With a cluster of two the probability of both units being out at the same time is 0.1% of 0.1% = 0.0001% so I get 99.9999% uptime. If I put three in a cluster it is better than wearing a belt and suspenders. The network is more likely to fail. I can have redundant networks too. I can even switch DNS in a few minutes. So, what is M$ doing wrong? Using that other OS.

  34. Contrarian says:

    “The Advertising Standards Authority (ASA) is the self-regulatory organisation (SRO) of the advertising industry in the United Kingdom. The ASA is a non-statutory organisation and so cannot interpret or enforce legislation.”

    It cannot “charge” anyone with anything, #pogson.

    “Typically, one can get uptime like that with GNU/Linux on a single machine…”

    Only if you trust in dumb luck, #pogson, and what happens if luck runs out? The Microsoft offer comes with a guarantee that they will compensate customers for any outages beyond the guarantee. Linux comes with nothing at all.

Leave a Reply