GNU/Linux Gives Real Savings In Education

I chuckle when I recall M$ claimed GNU/Linux cost more and when mud-slingers commented here that GNU/Linux would be too hard for teachers/students to use or that GNU/Linux lacked necessary applications.“Switching to Ubuntu has let the school tick many items off its list. It allows it to stay within its IT budget. They are no longer forced to buy licences for proprietary office suites or operating systems, and no longer have to study price lists for other proprietary solutions. The Linux PCs are perfectly compatible with the two common proprietary computer systems. The school PCs are very easy to maintain, all applications are up to date and all PCs run the same versions of software solutions. Moreover the flexibility of the free software licences allows the school to install PCs whenever they want, for example when they receive a hardware donation from the local administration.” Just ask schools that use GNU/Linux joyfully, freed from the EULA, malware, re-re-reboots, top-down IT people and whittling down the budget for IT and other things to afford more IT…

I worked in schools for many years and was able to gain much more IT for fewer dollars and a lot less effort than using that other OS and some people’s favourite applications. GNU/Linux works, LibreOffice works, GIMP works, Audacity works, etc. FLOSS works for education.

Consider a school librarian wanting a cluster of PCs for customers. Typically, the concept would be raised in a staff meeting or annual plan and have to percolate upwards through the chain of command where no budget for the request exists. That means either fundraising or “next year” in the budget. What is lost by teachers and students traipsing around the community raising a few $thousand for PCs? What is lost by shifting limited funds from salaries or other supplies to PCs? It’s all a disruption from the desired goal of preparing students for the future.

Send a memo home asking for donated PCs or acquire castoff PCs from businesses on the Wintel treadmill and skip the EULA by re-imaging with Debian GNU/Linux and the problem is solved in a few days from concept to execution. A few $dollars scraped from petty cash for cabling/power/switches is the entire budget. I’ve often been in schools where the necessary bits were just laying around unused, because some cluster was shut down or equipment died. I travelled for years in the North carrying a thousand feet of CAT-5 around because the government ordered a lab not up to specs be shut down. The cables were just dumped out back. I went out with a knife to fix the Gordian Knot. I bought my own crimping tool and RJ-45 plugs. A side-effect is that I was able to teach students how to do this at no extra cost to my employers. Using FLOSS does have synergies with hardware. Any school can get better performance out of 8 year old PCs used as thin clients using a few good machines as terminal servers. No CALs. No server licences. No special hardware. GNU/Linux is that flexible.

See Swiss school invests open source savings in education.

About Robert Pogson

I am a retired teacher in Canada. I taught in the subject areas where I have worked for almost forty years: maths, physics, chemistry and computers. I love hunting, fishing, picking berries and mushrooms, too.
This entry was posted in technology and tagged , , , , , , . Bookmark the permalink.

99 Responses to GNU/Linux Gives Real Savings In Education

  1. DrLoser says:

    ISO/IEC 10646 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 2, Coded character sets.
    This third edition cancels and replaecs the second edition (ISO/IEC 10646:2011), which has been technically revised.

    That takes care of your stupid argument that an earlier version might, or might not, encourage the use of five- or six-byte UTF-8, oiaohm.

    As TEG says, even if that was the case in 2000, nobody — and I mean NOBODY — ever did it.

    All you’re doing is to parrot somebody on a standards committee leaving room open for possible future enhancements. And since that self-same standards committee decided, in 2012, that those “future enhancements” are not actually required, you’re left with a standard that directly contradicts your feeble-minded beliefs.

    Do you actually have a problem with international standards, oiaohm? Because it looks to me like you do.

  2. oiaohm says:

    http://www.firstobject.com/wchar_t-string-on-linux-osx-windows.htm
    http://www.unicode.org/roadmaps/tip/ Plane 3 That Exploit Guy the existence of this is not written in the wikipedia. For the exact List of current placements for Plane3/4 you have to get china own map. Please note they already have enough to fill Plane3 and will be asking for another slot.

    Oracle bone script is not used in newspapers but it going to be in unicode.

    You also see clerical script expression effects in Japanese manga and anime and other Asian comics and cartoons. How a lot of Asian languages apply fear anger and so on to words comes from clerical script. Yes even that clerical script word is chosen on likeness to current day char it also chosen on emotion applied. So extra information above just the word. The problem comes how to apply clerical script style emotion inside unicode. Thinking each emotion for the same word is a different glyph. There are items from the old Clerical script that is still in use that are a big problem. As yet Unicode cannot represent the full richness of the Asian languages. This is effectively a Min of a times by 12 to all Asian languages.

    That Exploit Guy there is a nasty little problem. Chapter 3 2000 unicode is a buggy beast. Why take close note to Chapter 3.1 C1 in particular. “process shall interpret Unicode code values as 16-bit quantities.” Posix wchar_t is UTF32. So Posix systems interpret Unicode values as 32bit or 8bit never as 16 unless doing conversions. Note Apple OS’s never used 16 bit when processing unicode either. Chapter 3 is truly written by a Windows centric author. There is a lot of chapter 3 year 2000 Unicode does a Posix system not conform with.

    A different section of year 2000 unicode says the square box filler char of unicode still goes up to 31bit even that chapter 3 kinda suggests otherwise in one area.

    The following is from the chapter 3 you failed to completely read That Exploit Guy.
    The definition of UTF-8 in Amendment 2
    to ISO/IEC 10646 also allows for the use of five- and six-byte sequences to encode characters that are outside the range of the Unicode character set;

    This is also in chapter 3 that you did not read. The next bit is interesting
    five- and six-byte
    sequences are illegal for the use of UTF-8 as a transformation of Unicode characters

    So as long those chars 5 and 6 byte long are not Unicode listed chars by the 2000 document we are allowed to use them. Like the Ebcdic chars that don’t map to Unicode its fine. Sorry your no usage is invalid. Its clearly started in the 2000 document how we were allowed to use them. PUA equal only better in cases of Ebcdic unicode conversion since you don’t have a PUA conflict problem.

    Only 2003 Unicode does using 5 and 6 become forbidden without loophole.

  3. That Exploit Guy says:

    That Exploit Guy sorry you are wrong until 2003 it was believed that UTF-16 would be extended someway to support 31bit. UTF8 and UTF32 are 31 bit until 2003.

    I suppose you are assuming that UTF-8 was not restricted to 4 bytes per sequence at most until 2003. At least, that would be the impression you would get if Wikipedia was your reference. I mean, this is quite fun, isn’t it, considering that I can simply withhold pieces of information, let you spin tales after tales based on false assumptions and then destroy you with the reality?
    Despite RFC3629 was released in 2003, UTF-8 had already been restricted to 4 bytes at most as early as 2000 in Unicode 3.0:
    http://www.unicode.org/versions/Unicode3.0.0/ch03.pdf
    Again, I’d suggest you and whoever else reading this to focus on the big picture, namely that at no point was there an assigned code point above U+10FFFF. In other words, there had been no legitimate use for 5-byte and 6-byte UTF-8 sequences prior to their prohibition.

    rfc3629 ratification is 2003. Posix is sticking to the 2000.

    You are still clinging onto the false impression that Portable Character Set has something to do with UTF-8. It doesn’t. Again, your system can be perfectly compliant with that part of POSIX without in any way support UTF-8 or even ASCII. After all, POSIX is usually this deliberate in avoiding favouring any particular implementations (while making itself perfectly useless).

    Of course you are not counting the 3 plane because that has been allocated to the Chinese historic script problem.

    Perhaps you have not consulted your trusty source’s article on Unicode planes, or perhaps you have read it but have utterly failed in understanding what it was trying to tell you. Regardless, based on what it tells you, the currently assigned planes are supposed to be Planes 0 (BMP), 1 (SMP), 2 (SIP), 14 (SSP) 15 (PUA A) and 16 (PUA B), and none of them are dedicated specifically to the Chinese language.
    Of course, I shall spoon-feed you with non-Wikipedia sources only when I feel it’s time to put you out of your own misery.

    Clerical script is in fact larger than Unified CJK ham. Clerical Script is used in china on signboards so chars have many forms that assign emotion to the words.

    Ah, Clerical script, which, according to your trusty source, is supposed to be an archaic form of Chinese writing popular in the Han dynasty, or, in layman’s terms, a pretty long time ago.
    However, the article also gives people the impression that somehow the clerical script is still in use in signboards, newspaper headlines and alike. Perhaps the Chinese also has newspapers in oracle bone script as well, it seems, though that would be kind of like having your newspaper written in old English (ha!)
    Of course, this false impression is gone once you have read the article on “Chinese script styles”, which explains the clerical script in modern usage is in character likeness only. In other words, everything in those signboards, newspaper, etc. is still essentially modern Chinese.
    But, hey, don’t let such hiccups in understanding facts stop you from banging on about the clerical script.

  4. oiaohm says:

    http://en.wikipedia.org/wiki/Clerical_script
    Clerical script is used in current day Chinese newspapers and advertising and signs and so on DrLoser. Its used as images these days to express emotion with words.

    Clerical Script was used in wood tablet printing.
    Traditional Chinese characters come in when they get move-able type printing presses.
    Simplified Chinese characters comes out of making move-able type printers job simpler.

    DrLoser the problem here each language script change the from Clerical Script on the Chinese never completely stopped using the old script form. So a todays newspaper from China can in fact be a mix of all 3 yet we cannot unicode all 3.

    The big problem here Clerical Script is not a dead script its still getting new chars. Clerical not in unicode yet and really the limit on Unicode should have only been set once all active in use languages had been imported. Of course people like me remember when the Unicode main body said we would never exceed the first plane for language chars so symbols could be placed at plane 2. Then they imported Traditional Chinese and Simplified Chinese and opps Plane 1 filled and both of these are small compared to Clerical Script.

    China has 3 active language scripts so far Unicode only partly covers 2.

    Like if I said seal script you statement would be valid DrLoser because seal Script is not something you see in modern items from china. Basically someone from china could send you a newspaper with 3 different scripts on it. Clerical Scripts in newspapers is used in small amounts because its a image per char. Yet they are willing to go to that effort to use it. Due to the fact Clerical Script is not a char but images it also under mines means to search documents. So yes like it or not there will be motivation for china at some point to implement Clerical script depending on how they do it will define if we run out of Unicode char space or not.

  5. DrLoser says:

    Clerical script?

    Dear God, oiaohm, you’re reaching here, aren’t you?

    Kindly explain how one person, we will call them the source of Han Dynasty documents, needs to talk to another person, we will call them the recipient of Han Dynasty documents, via UTF-8?

    It’s a bit of a stretch, isn’t it?

  6. oiaohm says:

    That Exploit Guy sorry you are wrong until 2003 it was believed that UTF-16 would be extended someway to support 31bit. UTF8 and UTF32 are 31 bit until 2003.

    rfc3629 ratification is 2003. Posix is sticking to the 2000. 10 plains are not impossible to fill. In fact you could fill them very quickly That Exploit Guy.

    Of course you are not counting the 3 plane because that has been allocated to the Chinese historic script problem. What may no be enough. Problem is this is stupid. Clerical script is in fact larger than Unified CJK ham. Clerical Script is used in china on signboards so chars have many forms that assign emotion to the words. Think emotions x by the size of the complete Chinese language.

    Plane 3 and 4 will most likely go to China just in historic scripts without any new and without Clerical Script. This is the problem 1 script from China that is in active usage could break limited Unicode and might in fact break 31 bit old school Unicode.

    African scripts most of those are not in Unicode yet. Old native languages of the Americas also have scripts that are not in unicode.

    Luvr your obsolete claim does not hold note its posix 2008 that refers to unicode 2000. Even that a new version has been released. In other-words posix main body rejected the alteration.

    It is That Exploit Guy link that undos the obsolete status of Uncode 2000. If a standard is obsolete no newer standard should use it.

    FSS-UTF or the first thing that comes UTF-8 was first ratified byOpen Group before Unicode touched it.

    That Exploit Guy all countries go around inventing new symbols its like new emotions and a percentage of those symbols will make it into Unicode. Just the rate is linked to population size. India also has a lot of historic languages not in Unicode either.

    Clerical Script is why the Open Group has refused to support newer unicode that limits to 0x10FFFF. It could be over 10FFFF in size by itself. Emotions x by Chinese chars equals huge number of chars.

    I was remember there was something still in current usage from china but not in unicode that was down right huge. It is Clerical Script.

  7. DrLoser says:

    You’re going up against TEG now, Dog-Brain?

    Good luck with that! He’s just the tiniest bit more knowledgeable than you are. Not that it’s a high bar to leap over.

  8. dougman says:

    TEG and Loser are nothing but useless eaters that contribute nothing by hijacking this blog and know everything on how nothing can be accomplished.

    They both seem angry at the world and I feel sorry for their lot.

    Honestly, they should be able to retire and enjoy themselves, but with the UK banking sector screwing everyone out of their pensions they must continue their drudgery living in cubicles.

  9. That Exploit Guy says:

    And btw, what was your point again?

    This is not to mention that all this “6-byte UTF-8” nonsense was nothing more than Ham’s pitiful justification of “64 bit” UTF-8 encoding spinning completely out of control:
    http://mrpogson.com/2014/03/23/linux-and-world-domination/#comment-138777
    I like the made-up “9 byte” sequence, in particular. The claim of two self-invented characters per each Chinese person was also worth a laugh as well:
    http://mrpogson.com/2014/03/23/linux-and-world-domination/#comment-138597
    But hey, even if each Chinese doesn’t go around inventing characters that other Chinese don’t understand, someone will still have to fill those 10 empty planes (out of 17) just so Ham will have a reason to future-proof Unicode with an even more pointlessly large space of 31 bits!

  10. That Exploit Guy says:

    UTF-8, a transformation format of ISO 10646

    Congratulations on being another worthless twit that confuses “code points” with “encoding”.
    Back when RFC2279 still mattered, Unicode (or “UCS” in ISO-speak) had no character definitions above U+FFFF. In other words, there was no SMP, SIP, SSP or anything that would warrant a UTF-8 sequence length of more than three bytes. This also meant that regardless of your encoding format, the total number of characters available for you to encode was less than 65536.
    As code points beyond U+FFFF began to receive character definitions, it was established that there would be no Unicode characters beyond U+10FFFF. This meant from the inception of UTF-8 to the ratification of RFC3629, there had been no legitimate use for 6-byte UTF-8 sequences. This was why Ohio Ham’s claim of such sequences’ presence was nothing more than a farce, but, hey, who cares about having the ends justifying the means anyway?

  11. DrLoser says:

    Congratulations on reinforcing TEG’s and my points, luvr.

    I notice you were conspicuously absent whilst oiaohm was piling drivel on more drivel. Wonder why that would be, considering you obviously know your stuff?

    And btw, what was your point again?

  12. luvr says:

    Yawn… Don’t you think that it would be a nice idea to put this nonsense to rest? Do you really have nothing better to do than to keep bragging about who can come up with the silliest argument? May I suggest you get a life, for a change?
    TEG shouted:

    Really? Say that again?
    31 bits?
    I am pretty sure in some alternative universe UCS did start off with 31 bits. Here, however, it’s 16 bits, and everyone and his dog knows that for a fact.
    UTF-16 surrogate pairs and the whole U+10000-to-U+10FFFF business came about only when Unicode 2.0 was published in 1996.

    UTF-8, a transformation format of ISO 10646, dd. January 1998, §2, “UTF-8 definition,” says: “In UTF-8, characters are encoded using sequences of 1 to 6 octets.” It even includes a table showing how any 31-bit value is to be encoded in UTF-8. See? Upto 6 bytes in UTF-8, to encode 31-bit values. That they have never been seen in the wild, is irrelevant; they were clearly documented, and they were supposed to be supported. All 31-bit values that you could come up with. Period.

    That document was obsoleted by UTF-8, a transformation format of ISO 10646, dd. November 2003, which limits UTF-8 sequences to four bytes, encoding values up to 0x10FFFF.

    End of story. All the rest is pointless, boring bunk spouted by empty-headed wannabes. Redirect to /dev/null.

  13. That Exploit Guy says:

    That Exploit Guy you have just dug up the evidence that hangs you. To implement as per Posix standards you must implement this ISO/IEC 10646-1:2000 what demands 31bit UTF-8 and UTF32.

    Here’s how you hang someone: I lied.
    What the POSIX documentation defines only is the Portable Character Set, or, if you will, part of ASCII without the actual encoding part. So what’s all that deference to ISO/IEC all about? Well, to my best interpretation, being the worthless, wishy-washy pile of crap that POSIX always is, perhaps whoever penned that part wanted readers to look for more in those documents. However, if you take the sentence at face value, the only thing you get is that those handful of characters in the table happens to also exist in the two ISO/IEC standards. There is absolutely no reason to believe that POSIX actually mandates UTF-8 or any encoding in that sentence or that Portable Character Set is itself a complete encoding specification (saving the few wishy-washy statements immediately below the table).
    So, here’s the bottom line: you are perfectly free to believe that Unicode is part of POSIX, but what the standard actually gives you is a handful of characters, and you may encode your them in such a way that they do not conform to ASCII, UTF-8 or any encoding format in existence but are still perfectly conformant to what that part of POSIX specifies. Spending just one minute reading the page I have linked to should lead you to this conclusion. However, since you never care about reading things given to you or things you yourself have linked to, you have simply gone by whatever I tell you, and this is where you have screwed up, big time.

    Note that document you pulled from is 2008. 2008 posix main body has stopped at Unicode 2000. They will move when Unicode main body reallows 6 byte UTF8.

    We have discussed 5-byte and 6-byte UTF-8 sequences. The fact of the matter is that they have no legitimate reason to exist in an UTF-8 encoded stream.
    We have discussed overlong UTF-8 sequences. The fact of the matter is that they are not adherent to UTF-8 in any of its incarnations and their acceptance is considered a security vulnerability.
    We have also discussed POSIX. The fact of the matter is that POSIX has only defined a handful of characters that must be supported on a compliant system, and their implementations are not required to be UTF-8 conformant.
    Now, what is this nonsense about “Unicode 2000” again?

  14. That Exploit Guy says:

    That Exploit Guy nop you just hung yourself and just not aware you have done it.

    Absolutely adorable, that.
    Moving on…

    ISO/IEC 10646-1:2000 in 6.1 Portable Character Set is in fact 31 bit. Yes you are right on the declare plains. But it defines the square box filler char up to 31 bit.

    Really? Say that again?
    31 bits?
    I am pretty sure in some alternative universe UCS did start off with 31 bits. Here, however, it’s 16 bits, and everyone and his dog knows that for a fact.
    UTF-16 surrogate pairs and the whole U+10000-to-U+10FFFF business came about only when Unicode 2.0 was published in 1996. No code point above U+FFFF, however, would be seen in use until Unicode 3.1 was published in 2001.
    The corresponding adjustment from ISO/IEC was this: an additional document was released in supplement to the existing ISO/IEC 10646-1:2000. This was known as “ISO/IEC 10646-2:2001: “Universal Multiple-Octet Coded Character Set (UCS) – Part 2: Supplementary Planes”:
    http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=33208

    That Exploit Guy the limitation to 0x10FFFF does not exist in ISO/IEC 10646-1:2000. UTF-8 is still 0×00000000-0x7FFFFFFF in that document. The limitation of UTF-8 size happens year 2003 before 2003 UTF-8 is official 6 byte by Unicode main body. The limitation of size if UTF-8 is not recognized in any posix document.
    Ignoring that the POSIX documentation in question has fundamentally nothing to do with encoding, having an encoding standard giving you the ability to encode N code points does not mean you now suddenly have N characters to encode. Hell, we hardly have the amount of characters to fill UTF-16’s capacity of 0x110000 code points, let alone a capacity of some astronomical 0x80000000 code points.
    The bottom line is: the standard then was 0x10000 code points, and the standard now is 0x110000 code points. This means at no point we have actually seen 5-byte or 6-byte UTF-8 sequences legitimately in use, and you are simply misleading people with phantom code points and characters that no one on earth should concern his- or herself with.

  15. oiaohm says:

    That Exploit Guy nop you just hung yourself and just not aware you have done it.

    ISO/IEC 10646-1:2000 in 6.1 Portable Character Set is in fact 31 bit. Yes you are right on the declare plains. But it defines the square box filler char up to 31 bit.

    That Exploit Guy the limitation to 0x10FFFF does not exist in ISO/IEC 10646-1:2000. UTF-8 is still 0x00000000-0x7FFFFFFF in that document. The limitation of UTF-8 size happens year 2003 before 2003 UTF-8 is official 6 byte by Unicode main body. The limitation of size if UTF-8 is not recognized in any posix document.

    That Exploit Guy you have just dug up the evidence that hangs you. To implement as per Posix standards you must implement this ISO/IEC 10646-1:2000 what demands 31bit UTF-8 and UTF32.

    Note that document you pulled from is 2008. 2008 posix main body has stopped at Unicode 2000. They will move when Unicode main body reallows 6 byte UTF8.

  16. That Exploit Guy says:

    Patience, Dr. Loser. I prefer letting a liar give himself enough rope before hanging him.
    @ That gibberish-spewing berk called “O-what-is-this-I-don’t-even…”
    That Exploit Guy Posix standards still say 6 byte UTF8. Unicode bodies are free to redefine what UTF8 is in future.
    The only characters that I am aware of that the Open Group defines on its own is the “Portable Character Set”:
    http://pubs.opengroup.org/onlinepubs/9699919799/
    The standard simply defers all other character definitions to ISO/IEC 6429:1992 (or what is generally known as the “C1” control character set) and ISO/IEC 10646:2000. If I am not mistaken, the latter defines only what we now know as the “Basic Multilingual Plane” (BMP), which is 65536 code points in size, and does not include the “Supplementary Multilingual Plane” (SMP), the “Supplementary Ideographic Plane” (SIP) and the “Supplementary Special-purpose Plane” (SSP) that the current standards define. In other words, according to strict POSIX, there should not even be UTF-8 sequences more than 3 bytes in length, let alone one with 6 bytes in a row.
    This is not to mention that your bogus claim on overlong UTF-8 sequences is still, well, completely bogus:
    http://mrpogson.com/2014/03/23/linux-and-world-domination/#comment-138895

  17. oiaohm says:

    That Exploit Guy Posix standards still say 6 byte UTF8. Unicode bodies are free to redefine what UTF8 is in future. Also funny enough every thing up to 31 bit has a defined default char. Square box with value inside. Or in other words the standard unused char solution for unicode.

    The arguement that almost nothing in the world is is a short term arguement. Can you promise me for the rest of time that nothing will ever use the remaining half. Kernel developers have to answer the question about the rest of time so do Filesystem developers. This is why the unicode standard body is of limited importance when it comes to what is the size of UTF-8 you may strike sometime in future.

    If you like it or not the Solaris 8 documentation on Unicode is still the same official documentation for Solaris 11. Problem is Solaris 11 is only in PDF form Solaris 8 had the page as a html so a nicer link.

    That Exploit Guy by your stupid logic you could say any char in a unused code page in unicode should be rejected. This is the same arguement as reject everything about 0x10FFFF. There was a reason why Unicode invented the idea of the square box unified for undefined char.

    0x10FFFF on UTF8 bsd/Linux/unix systems limitation if it implemented is only implemented in the userspace code not the kernel space.

    The point to having a 6 byte UTF8 is simple future proofing. And at times knowing that its in the system it is useful.

    Really your arguement That Exploit Guy matches up so well with the same human error that caused Y2K. Remember 2 chars were good enough to record year. You arguement is that 0x10FFFF is enough chars to record all current and future chars just because that is where current standard says you should stop. Reality check here That Exploit Guy your statement is nuts.

  18. DrLoser says:

    Oh, and oioahm?
    EBCDIC and UTF-8 and Unicode.
    You didn’t really get around to explaining that brain-fart, did you?
    As I recall, it involved something outside the current planes, which naturally would justify a borked version of UTF-8 that requires more than four octets.
    Isn’t it time you elucidated?
    And as a bonus, that IBM SAN thing? You’re just fucking ignorant, aren’t you?

  19. DrLoser says:

    Come now, TEG. Of course oiaohm wants to encode gibberish.
    Gibberish is bread and butter to oiaohm. Low-level technical details … not so much.

  20. That Exploit Guy says:

    The Official Solaris documentation on the topic.

    I don’t know what’s stupider: citing documentation from the year 2000, or citing documentation for Solaris 8.
    Actually, you have picked both, so that’s doubly stupid.

    Of course its highly vague on how you can use the 6 byte form.

    Again, putting aside all that nonsensical playing around with nomenclatures, there is no point to having a 6-byte UTF-8 sequence. Well, not unless you want to encode gibberish, though I don’t suppose standardised communication with you is a biggie in the industry.

    Part of the reason why the Unix world is such a mess items like Solaris randomally decide when they will follow randomally follow Posix standards

    No, you have simply randomly decided that POSIX is what you say it is. Again, virtually nothing in the world is encoded in a 6-byte UTF-8 sequence, none of current Unicode standards permit code points above U+10FFFF, and I don’t see you have provided any proof that says otherwise.

  21. oiaohm says:

    http://docs.oracle.com/cd/E19455-01/806-5584/6jej8rb0n/index.html The Official Solaris documentation on the topic. UTF-8 is 6 byte by this. Of course its highly vague on how you can use the 6 byte form. It also mentions the existence of FSS-UTF. Sorry Drloser it might be obscure to you but its something that raises it head when you come to making drivers for Solaris.

    XPG internationalization framework still is FSS-UTF.

    Iconv GNU conforms to XPG standard for what UTF-8 is. Solaris has decided that UTF-8 will follow Unicode body not Posix. But to remain posix compatible they are using UTF2 naming.

    DrLoser I am not making up names for Protocols something hard is some protocols are defined by many different bodies under many different names. Unfortunately there are cases of overlapping names. UTF8 is one of those horible things.

    Part of the reason why the Unix world is such a mess items like Solaris randomally decide when they will follow randomally follow Posix standards or random-ally follow some third party standard. Then in a case of random-ally following some third party standard implementing something strange to allow compatibility. In Solaris case over UTF8 they don’t implement posix they implement Unicode so they have UTF2 as well. This is way autoconf setups for building source code across many Unix systems got so complex. It was a stack of what custom tweak does this horible system have.

    The reality here people were very dirty when they made a limited form of UTF8 it should have been given a new name then their would be none of these issues.

    UTF-2 is always 6 byte wide UTF-8 but gnu iconv does not implement it. But you do have to implement at UTF-8 that matches the OS kernel and Posix file system names are by standard FSS-UTF or 6 byte UTF-8.

    Here is the nasty trap I have held back. The Exploit Guy solaris has 2 UTF-8 defines. One that is iconv userspace and one that written about in all solaris driver documentation. Guess what solaris driver is expected to handle 6 byte UTF-8. This leaded to highly confused solaris documentation. Of cource iconv in userspace of solaris has to be able to be kernel compatible so it has a UTF2 option.

    That Exploit Guy what is the purpose of allowing 6 byte UTF8 with nothing to map to is allowing future proofing and for custom usage. You can replace iconv on a system without rebooting. Changing kernel space requires a reboot or major runtime patching that can render system unstable. So of course the kernel of solaris and everything else Unix/BSD/Linux implements the largest form of UTF-8 possible. All Unix relegated OS’s bar Windows do. Its the equal to y2k proofing.

    The reality is both 4 and 6 byte implementations UTF-8 it is very easy being incompetent at checking iconv to miss that that the 6 byte implemented on a system under a strange name.

    I also did say bloated UTF-8 was only accepted by some implementations. Only testing against 1 the exploit guy does mean it does not happen.

  22. DrLoser says:

    Posix file systems are FSS-UTF in other words 6 byte UTF8 might be named FSS-UTF or might be name UTF8 or Might be named UTF-8 or Might be named UTF2 in the iconv.

    For once, TEG, I am regrettably going to have to disagree with you. Why ignore this patent nonsense?

    1) There is no such thing as “FSS-UTF.” Well, actually, there is, but it is very obscure and has nothing to do with UTF-8, however oiaohm chooses to spell it. I wonder where oiaohm dredged this particular googling idiocy up from?
    2) “in the iconv?” Dear God. If it’s named in the iconv, little tin-foil hatted one, it would be visible in the iconv source code, wouldn’t it? Assuming there is such a thing as the iconv. Which, regrettably for those everywhere who don tin hats in lieu of thought, there ain’t.
    3) Might be named “UTF-Dolding,” oiaohm. Might be named that. Regrettably, you are the only person here making names up for well-established protocols.

    And, even if you weren’t, naming the things isn’t remotely important.

    Making sure that the transmitter and the receiver understand that protocol is.

    And good luck with that, considering that you, oiaohm, still do not understand that this is a method of translating an octet sequence into a Unicode code point.

    What a blithering idiot.

  23. DrLoser says:

    I’m sorry, oiaohm, but “It’s completely different because I say so” doesn’t really cut it.

    Do we need to call the fire brigade to rescue you out of that very deep hole you have pointlessly dug yourself into?

  24. DrLoser says:

    Guess what. UTF-8 is the new standard. UTF8 is the old standard.

    Did I really hear oiaohm say that? Oh dear.

    And presumably, oiaohm, “utf8” (as quoted in your excellently pointless earlier cite) is … er, I’m getting confused here.

    Could you, perhaps, point us lesser mortals to the three completely different RFC documents that no doubt support your unique perspective on the way UTF-8 (no matter how it is spelled, and knowing you, we’ll be seeing the 8 in the middle of the UTF some time soon) works?

  25. That Exploit Guy says:

    Posix file systems are FSS-UTF in other words 6 byte UTF8 might be named FSS-UTF or might be name UTF8 or Might be named UTF-8 or Might be named UTF2 in the iconv.

    Let’s ignore for a moment that all of the above is patent nonsense: have you ever considered the fact that character encoding are meant for encoding characters?
    Again, in case you are too stupid to scroll downwards: there are no characters assigned to scalar values above 0x10FFFF.
    What on earth, then, is the purpose of a 6-byte sequence when you have fundamentally nothing to map to?
    I think, therefore, it’s safe to assume that, like every single one of your comments, a UTF-8 stream that contains a 5- or 6-byte sequence is nothing more that worthless gibberish.

  26. That Exploit Guy says:

    @ That babbling twerp call “Ham Bone” or some such

    That Exploit Guy you have only been testing against 1 system.

    Three systems. The least you could do would be to scroll down the damn page and remind yourself what has been discussed before, you moron.

  27. That Exploit Guy says:

    @That lying little scumbag called “Ohio Ham” or whatever

    FSS-UTF this on Linux becomes UTF8. On solaris just to be strange is UTF2.

    Again, I specifically stated Solaris and GNU iconv as examples. You can’t possibly this dumb, can you?
    Well, whom am I kidding? Of course you are this dumb.
    GNU iconv is used on almost all Linux distributions you can get your hands on, including RHEL and its clones. GNU iconv simply does not exhibit the behaviour you describe, i.e. it does not accept overlong encoding. Again, on an RHEL clone trying to convert the letter “a” from “UTF8” to “UTF8”:
    0xC1A1 -> "illegal input sequence"
    0xE081A1 -> "illegal input sequence"
    0xF08081A1 -> "illegal input sequence"
    0xF8808081A1 -> "illegal input sequence"
    0xFC80808081A1 -> "illegal input sequence"
    Again, none of the overlong representations of “a” passes iconv.
    Non-overlong 5-byte or 6-byte sequences, however, are considered valid by GNU iconv due to it failing to adhere to standards that have been established since more than a decade ago (as it has been pointed out before):
    http://savannah.gnu.org/bugs/?37857
    UTF-2 is just an old, obscure way to refer to UTF-8. There is nothing strange about how Solaris iconv implements UTF-8. In fact, the Solaris implementation of UTF-8 is correct and current. The GNU implementation, on the other hand, isn’t.

  28. oiaohm says:

    I hate word swap I typed 31 byte utf8instead of 31 bit in utf8.

    I have not developed for solaris for a while. I knew it had 31 bit chars in UTF8 I had just forgot what solaris called it. UTF2.

  29. oiaohm says:

    That Exploit Guy you have only been testing against 1 system. You never allowed that the system you were choosing had a bug I had mentioned FSS-UTF that all posix system had to support. You incorrect said that is just UTF-8 today. As a solaris user you should have know that odd ball solaris that is UTF2.

    Solaris you have UTF-2 hitting UTF-8 causing trouble. Linux you have UTF-8 and UTF8 being the same and being UTF-2. AIX gets interesting UTF-8 and UTF8 are different and no UTF-2.

    Posix file systems are FSS-UTF in other words 6 byte UTF8 might be named FSS-UTF or might be name UTF8 or Might be named UTF-8 or Might be named UTF2 in the iconv. But it will be in the system iconv as something and FSS-UTF will be what the OS kernel accepts and rejects not the application level.

    Reality why should I walk away. You are the one who has been wrong this complete arguement. I have only been partly wrong. Posix systems you get very big surprises that UTF-8 is not always what you think it is.

  30. oiaohm says:

    That Exploit Guy in fact I mentioned the issue earlier in this thing.

    FSS-UTF this on Linux becomes UTF8. On solaris just to be strange is UTF2. So all filesystems under Linux and most other are UTF8 and all Filesystems under Solaris are UTF2. So yes rude shock of 31 bit UTF2 hitting UTF8 program can happen under solaris.

    The large 31 byte UTF8 string is always what a posix system calls its FSS-UTF. Most systems this is UTF8 or UTF-8. Solaris yes its UTF2. So solaris is one of the rare posix systems that can restrict UTF-8. Yes rude shock to those attempt todo portable programs. You also see on solaris systems from time to time locale with UTF2 in it instead of UTF8.

    Remember you threw out FSS-UTF and said that it was just UTF8 you made me forgot the oddity in solaris.

  31. That Exploit Guy says:

    @ Lying Froot Loop

    Guess what. UTF-8 is the new standard. UTF8 is the old standard.

    No. They are simply aliases.
    Solaris, for example, have three corresponding to UTF-8: “UTF-8”, “UTF8” and “UTF_8”. None of them behave as per what you claim as the “old standard”. Instead, overlong UTF-8 encoding are simply rejected as invalid octet sequences.
    The same applies also to GNU iconv.
    Fabricating such “old/new standards” nonsense does not seem to be a good idea when you are facing someone with convenient access to the systems in question, now does it?

  32. oiaohm says:

    I was very clear in every one of my examples here never to use the UTF-8 form. Basically you presumes I had typoed. In fact it you guys who have majorly typoed. UTF8 and UTF-8 request to iconv is in fact requesting different things. Filesystems on posix are UTF8 no -.

  33. oiaohm says:

    I did not see you idiots keep on going on this one.
    Do iconv -l |grep UTF-8 and iconv -l |grep UTF8 and find out iconv includes both.

    Guess what. UTF-8 is the new standard. UTF8 is the old standard. So since The Exploit guy cannot follow basic instructions he did invalid testing. Really I am sick of your level of incompetence.

    Really a simple command shows what filters your iconv takes. iconv -l.

  34. DrLoser says:

    A brief comment on TEG’s post, and I know he won’t mind.

    I rather like this Solaris implementation: it’s a subtle variation on Postel’s Law. That is, it’s conservative in the encoding it accepts (well, basically, it follows the standard, no matter what oiaohm claims) but liberal in the code points it accepts (and therefore encodes into what I presume is RFC2279).

    This is quite nifty. It means that if an idiot like oiaohm comes along with an absurdly large scalar value for a code poit — what with being blessed with 64 bit integers, and all, I mean how did Unicode ever manage without them? — then Solaris will happily output the corresponding RFC2279 UTF-8.

    It’s a bit excessive, I admit, but I can’t see it as a bad thing. Either the other end accepts it and decodes it (because the other end is also controlled by an idiot like oiaohm), or it rejects it. Kind of a win-win for Oracle here, I feel.

  35. DrLoser says:

    This is the big problem you are talking about how windows does stuff.

    I believe TEG may have mentioned Solaris and Red Hat at several points. Perhaps you are under the misapprehension that the “implementation” of iconv on these platforms is merely a network client that talks to a Windows Server back end?

    Big Fat U.

    And Windows doesn’t even have an iconv “implementation.” Which is why I had to build my own in Cygwin. Naturally, I had a little trouble with the networking to the Windows Server back-end. Because it isn’t there, numbskull.

    Marked UF for “Usual Failure.”

    Of course you are both googling like mad because reality here neither of you have any real world experience .

    Hilarious. If you can find TEG’s results (or my results) for specific input octet sequences through Google, be our guest.

    But since you mention real-world experience:

    You may strike cases like iconv -f UTF8 -t UTF32 …

    Indeed you can, because the actual CLI quote would be

    iiconv -f UTF-8 -t UTF-32

    Grade: U2

    Cutting and pasting a command (use the middle button on Posix systems, oiaohm) is not a difficult thing to do. You really have no excuse at all for failing here. But what’s this I see?

    The bug I gave link to in gnu iconv was directly complaining that iconv -f utf8 -t utf8 was in fact accepting 31 bit.

    (I don’t quite see your point here. There’s a bug. The developer has pointed out that there is a bug. And you have been insisting for a very long time that this is not a bug.) But let’s try to guess which platform you cut’n’pasted this command from, shall we?

    iiconv -f utf8 -t utf32

    does not exist on any platform of which I am aware. Perhaps you would grace us with an explanation? Because, lacking such an explanation, I can only assume that you have “Googled it like mad” and not even bothered to test the results on whatever putative machine you are allowed occasional access to.

    No testing at all, then: you’re just taking what you read on Google at face value. I suspect that the lack of a test suite (or more sensibly a reference implementation, since you can’t test a naked standard, but how would you know that?) is the very least of your concerns, oiaohm.

    Let’s vary the grade. I think you deserve one that doesn’t come with the Grading Standards:

    Grade: P, for Pitiful.

  36. That Exploit Guy says:

    A Sim from The Sims wrote:

    It appears after the UTF-8 decode is complete. The solaris message comes when the encoder fails. In fact if you were not being such a idiot 31 bit UTF-8 decoded in front of your eyes then just failed to encode.

    And interesting theory, and one that is not supposed to be testable at all from just the command line. However, I know just a way to test this.
    For some bizarre reason, Solaris own implementation of iconv accept conversion to but not from UTF-8 byte sequences that are longer than 4 bytes. This means if we feed a UCS-4 byte sequence to iconv, it will spew out the corresponding UTF-8 byte sequence whether the code point the UCS-4 sequence represents is above U+10FFFF or not. Conveniently, we can use this quirk to determine where the error occurs.
    Let’s try “iconv -f UCS-4 -t UTF-8 <input file>” first. This time for code points below U+10FFFF:
    41 00 00 00 -> 41
    01 04 00 00 -> D0 81
    01 80 00 00 -> E8 80 81
    01 00 10 00 -> F4 80 80 81
    Now for code points about U+10FFFF:
    C0 FF 1F 00 -> F7 BF BF 80
    00 F0 FF 03 -> FB BF BF 80 80
    00 00 CF FF -> FD BF BF 80 80 80
    As expected, iconv dutifully generates the corresponding UTF-8 bytes sequences even though they are invalid by standard. Now, based on these results, let’s try iconv again but this time with the command “iconv -f UTF-8 -t UTF-8 <input file>“.
    41 -> 41
    D0 81 -> D0 81
    E8 80 81 -> E8 80 81
    F4 80 80 81 -> F4 80 80 81
    F7 BF BF 80 -> (“conversion error”)
    FB BF BF 80 80 -> (“conversion error”)
    FD BF BF 80 80 80 -> (“conversion error”)
    Obviously, iconv is artificially prevented from reading byte sequences that are invalid by standard. Still not convinced? Let’s try the few UTF-8 byte sequences around the U+10FFFF boundary, shall we?
    F4 8F BF BC (U+10FFFC, PUA-B) -> F4 8F BF BC
    F4 8F BF BD (U+10FFFD, last character, PUA-B) -> F4 8F BF BD
    F4 8F BF BE (U+10FFFE, non-character) -> (“Conversion error”)
    F4 8F BF BF (U+10FFFF, non-character) -> (“Conversion error)
    F4 90 80 80 (0x110000, invalid) -> (“conversion error”)
    F4 90 80 81 (0x110001, invalid) -> (“conversion error”)
    F4 90 80 82 (0x110002, invalid) -> (“conversion error”)
    From here we can clearly see the behaviour of Solaris iconv in treating invalid UTF-8 byte sequences is as prescribed in the current standards so much so as to rejecting sequences that are considered non-characters and has absolutely nothing to do with the purported “encoder error”. This means, once again, you have been proven to be full of it.

    Of course you are both googling like mad because reality here neither of you have any real world experience

    Funny. Every time it’s either me or Dr Loser providing this discussion with practical examples. You have been virtually spoon-fed with them throughout. Let’s not forget also that:
    1) You claim overlong UTF-8 byte sequences (e.g. “FC 80 80 80 81 81”) are considered valid. Evidence conclusively shows you are full of it.
    2) You claim UTF-8 byte sequences above U+10FFFF are considered valid in Linux etc. Evidence conclusively shows you are full of it.
    3) You claim iconv behaves differently in treating “6 byte UTF-8” on Windows and on *nix. Evidence also conclusive shows you are full of it.
    Given where you are standing, I am afraid admitting your own mistakes is a better course of action than trash-talking. Don’t you agree?

  37. oiaohm says:

    That Exploit Guy and friend. I was not referring to the iconv function but how posix demand it implemented internally.

    Of course you both are thinking I am pulling stuff from wikipedia the conversion.

    Something to be aware is a 64 bit system has 64 bit integers. The reason it was written as UTF-32 was to define 32 bit size. Also iconv implementations are free to choose to convert to a string internally not 1 char at a time. UTF-32 covers all iconv implementations where a 32 bit int only covers some. Yes it since iconv is a string input implements are free to choose char at a time or string at a time in the back-end. The general description of operation covers both. There is more than 1 implementation of iconv.

    Solaris:
    iconv: conversion error.
    Conversion error detected while processing 7F000000.txt
    iconv: conversion error.
    Conversion error detected while processing 7FFC0000.txt

    I have to explain this because TEG requires a worse than U grade in reading error messages. For someone claiming. Where does this message appear in the solaris iconv source code. It is open to the public by the way. I could be quoting exact line of code that error prints from if I felt like it.

    It appears after the UTF-8 decode is complete. The solaris message comes when the encoder fails. In fact if you were not being such a idiot 31 bit UTF-8 decoded in front of your eyes then just failed to encode. Same message comes when you have a 4 byte UTF-8 and attempt to turn it to ascii.

    Of course you are both googling like mad because reality here neither of you have any real world experience and cannot read even basic error messages to know what the error is telling you. Basically TEG by his own post proved Solaris supports 31 bit UTF-8 just is unable to understand it.

    The posix implementations are all the same UTF-8 is 31 bit. The other Unicode stuff from iconv can be restricted to 21 bit. The import thing to remember is can. You may strike cases like iconv -f UTF8 -t UTF32 that does process on different unix letting 31 bit utf-8 pass out as UTF32.

    Is there any dirty tricks to using items above the RFC or unicode. Its a nice way to prevent files being transferred by http or smb.

    Wikipedia talking about the 11 bits of UTF-32 being used to store font data is what Windows and apple OS 9(and before)font rendering does. Anything using freetype does not do this. So anything after year 2000 that is not Microsoft Windows does not use those bits that way. Freetype is designed on the presume that it will receive 31 bit chars when on posix systems. What is the dominate font rendering today Freetype of course since its Android, OS X, OSi and Linux Desktops.

    In fact the unicode standard does not say you can use the upper 11 bits of UTF-32 for anything else. The wording of Unicode is one to one with chars for the complete value space. Of course there might be non unicode chars. So using those 10 bits for anything other than chars is undefined behaviour to unicode. Yes freetype treats that space as undefined chars. Notice I said 10 not 11. In fact it gets worse UTF-32 define that 1 bit is for signed/error. So you have only 10 bits to play with. In fact windows font render uses UTF-32 define error bit to store data so invalidating it from being UTF-32. Windows font render is storing a list of ints.

    This is the big problem you are talking about how windows does stuff. Not how the majority is doing stuff. The deeper you dig into posix systems running test cases the more and more you find designed directly to handle 31 bit chars.

    Due to MS Windows being so dominate on the Desktop for so long people have written a huge stack of documents on the presume everything is MS Windows. Its got that bad standards bodies have even started writing stuff that way. A standard without a test-suite is not really a good standard. If unicode body did ship test-suites they would know the real world does not match their documentation.

    TEG was never proving what he was thinking he was. The bug I gave link to in gnu iconv was directly complaining that iconv -f utf8 -t utf8 was in fact accepting 31 bit. The developer even gives it as example. Sorry failure to read and understand. How can either of you go around telling people they are wrong when you cannot get basics correct.

    That right you accuse me of using Google as solution. That is exactly what both of you do most of the time. Not the first time you guys have made twits out of selfs saying something I have said is false when its true.

  38. That Exploit Guy says:

    *seems to loosely paraphrase

  39. That Exploit Guy says:

    Dr Loser wrote:

    I have no idea where oiaohm came across the idea that the intermediate storage value was encoded in UTF-32.

    As I said, Wikipedia. Or, more specifically, a paragraph in “UTF-32” that contains no reference to anything.
    Wikipedia says:

    The main use of UTF-32 is in internal APIs where the data is single code points or glyphs, rather than strings of characters. For instance in modern text rendering it is common that the last step is to build a list of structures each containing x,y position, attributes, and a single UTF-32 character identifying the glyph to draw. Often non-Unicode information is stored in the “unused” 11 bits of each word.

    This seems loosely paraphrases a paragraph from Section 2.5 of Unicode Standard 6.x.
    Unicode Standard says:

    UTF-32 is the simplest Unicode encoding form. Each Unicode code point is represented directly by a single 32-bit code unit. Because of this, UTF-32 has a one-to-one relationship between encoded character and code unit; it is a fixed-width character encoding form. This makes UTF-32 an ideal form for APIs that pass single character values

    Chinese Whispers has never been this fun!

  40. DrLoser says:

    And, well, I’ve searched through src/iconv.c, lib/iconv.c and lib/converters.h, because I can. Let’s not forget the second of the four freedoms, amusingly given the cardinal 1 because they start at zero:

    The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.

    At no pint is there any evidence whatsoever to support oiaohm’s assertion:

    UTF-8 goes in its processed converted to a UTF-32 value internal storage then converted back to UTF-8.

    Now, this is interesting, because I was assuming a character point in 32-bit storage (64-bit, if you’re a nutter like oiaohm who wants to expand the standard to Intergalactic Communications).

    Without examining the code, therefore, my assumption was that the intermediary values were stored simply as a 32-bit integer. Not, incidentally, as yet another encoding (UTF-32), which would only make sense to somebody like oiaohm, because it would just be an encoding and not a raw code point.

    Doesn’t seem to be the case. Here’s the API for the basic conversion function in the library:

    size_t iconv (iconv_t icd,
    ICONV_CONST char* * inbuf, size_t *inbytesleft,
    char* * outbuf, size_t *outbytesleft)

    Internally, it appears (and I would happily be proven wrong) that a decoded input stream is temporarily stored as eight bit bytes and, following that, converted from eight bit bytes to the output decoded format.

    I have no idea where oiaohm came across the idea that the intermediate storage value was encoded in UTF-32.

    Perhaps oiaohm is unused to exercising the Four Freedoms?

    He certainly hasn’t exercised the command-line interface recently.

  41. DrLoser says:

    Just out of a quaint desire to tie up loose ends, oiaohm … those EBCDIC control characters you were once so obsessed about?

    Would you care to enumerate them, or even specify a single use case?

  42. DrLoser says:

    Now, let’s all be perfectly fair to oiaohm. It requires an absurd amount of time even to attempt to understand what he says (and I really wish he would use the spell checker on Libre), but that isn’t really the point, is it? Here’s a burtal and entirely unwarranted knock-down from TEG:

    An aphasic genius wrote:
    iconv -f UTF-8 -t UTF-8 is not in fact a noop.
    If anyone here is wondering what this “noop” might just be, you are not alone.

    Vile, vile, vile. It’s perfectly clear that oiaohm was asserting that, by “noop,” he actually meant “a pure function involving no state changes, returning the same value as the input, and with no side-effects.” Almost like a NOP in assembler … but not quite, and let’s ignore the clock cycles for now … although there’s no good reason to ignore them.

    Of course, it’s still gibberish. The function in question (iconv) is indeed pure, and it has no internal state changes.

    The only problem is that (in this implausible case) it hasn’t changed anything at all, but that’s actually for the worse — because according to oiaohm, but not according to the Savannah org bug he helpfully mentions, it is basically lying to the next input in the pipe.

    Which, given the fact that it has not actually decoded anything, and given the fact that the next input in the pipe is going to have to decode UTF-8 to insert random code points here, is not something I would boast about, oiaohm.

    In other words, what iconv has done in your scenario (which has no counterpoint in the real world that I know of, although programmers are quite often mentally deranged, so I wouldn’t count it out), is to waste clock cycles by turning something back into itself, and then lied about the validity of the encoding.

    And then the program at the other end of the pipe has to do it all over again.

    Try as I might, I can’t see much value in this.

  43. DrLoser says:

    Almost.
    ∀ should obviously be ∅.
    I don’t know about you guys, but this sort of random typographical epiphany makes me glad that we have people in charge of code point standards and encoding standards who actually know what they’re doing!
    Robert, you’ll be delighted to know that your blogging software passes the Mathematical Smell Test, in that it actually processes these characters correctly.
    Don’t ask oiaohm to pass the rest of the blog through iconv, please …

    😉

  44. DrLoser says:

    Ohio, you are an embarrassment, aren’t you?

    Dr. Loser, please add memory managers to the list of things he is totally incompetent in. Though I am sure you’d better create a list Ohio does understand. That would be a very short one. Zero-length.

    True, DeafSpy, and apart from one or two little tidying up details I think I’m done on this round of Six Byte Blues nonsense. Woke up one mornin’ … found that RFC2279 was dead as a brick …

    But before the tidying up, I’d like to point out to those who are interested in character conversions, as opposed to oiaohm, who wouldn’t know an application if Nurse Ratched slathered it over his bits (48 of them, apparently).

    I’m going to try a little demo here. I can’t swear that this will work (no preview, and the characters are tricky for a lot of apps and a lot of browsers), but it’s worth a go. Also, if it works, it saves me counting up the categories of which oiaohm knows nothing.

    We shall therefore take ∀ as the set of things that oiaohm understands. (Charitably, technical things, I suppose.)

    We shall take A as any countable set of things that it is possible to understand. One might argue that restricting A to be a countable set is an unwarranted assumption, but I’m the one doing the counting, and I’m a Whig Optimist. I firmly believe that the “infinite” in “the infinite perfectability of Man” is Aleph Null.

    ∀ A : ∅ ⊂ A
    … which is good for oiaohm, but:
    ∀ A : A ∩ ∅ = ∅

    Not so good. I believe I will attend to the little matter of tidying up a bit later, after I see how that works.

    (Clue, oiaohm: what is the stated purpose of a program called “iconv,” and what might be regarded simply as a side-effect?)

  45. dougman wrote, “Let’s have some common sense and start saving money in education”.

    It’s politics to discuss how much should be spent on education. If we just doubled the funding, I expect we would just create chaos. There needs to be a plan/organization for changes in education. The best school I ever worked in had a plan that cost $0 and was very effective. The principal told the teachers not to pass any student who could not meet the objectives of the curriculum. He had the backing of the schoolboard and about half the parents. It was amazing. Kids could suddenly read from K to Grade 8 while students from 9-12 were mostly illiterate. That guy stayed in that school for 12 years and made it work despite huge efforts to get rid of him.

    OTOH, a school would be much better off buying a thin client per student and using GNU/Linux and web applications rather than buying and discarding truckloads of paper each year. IT is the right way to do education and GNU/Linux helps schools get twice the IT for the money, and better IT because it works for them and not M$. Chromebooks are another kind of thin client which works very well for the cloud. I still think schools should have a local server running LAMP. It’s so cool to see students contribute to a growing body of knowledge on the server. It’s not lost in the shredder and it stimulates following students to strive for more and better education. In a paper school 95% of the paper just goes into the garbage and students start over each year instead of building something. There’s no reason a school can’t do “big” data with FLOSS. It costs almost nothing because schools can get castoff servers and hard drives for freight.

  46. dougman says:

    Common core?? Let’s have some common sense and start saving money in education. One billion spent on ipads will not students smarter.

  47. dougman says:

    Chromebook are becoming the goto device for education, also they are less costly then other products.

  48. dougman says:

    Has anyone noticed that by using Linux in schools, it saves them money?

  49. Deaf Spy says:

    Ohio, you are an embarrassment, aren’t you?

    Dr. Loser, please add memory managers to the list of things he is totally incompetent in. Though I am sure you’d better create a list Ohio does understand. That would be a very short one. Zero-length.

  50. That Exploit Guy says:

    That Exploit Guy so close yet so far.

    Perhaps you have mistaken me for DrLoser?
    Never mind, let’s skip to the part that is actually relevant to my comments.

    UTF-8 goes in its processed converted to a UTF-32 value internal storage then converted back to UTF-8. Even on solaris. Solaris enforces UTF-32 when export size limit when the export is done.

    At this point, I am not entirely sure if our gibbering one has my comments in mind as this is clearly poorly disguised plagiarism of the Wikipedia “UTF-32” article and has nothing to do with anything I said whatsoever.
    This is like marking the homework of a bunch of undergraduate students – you know at least a few of them are always going to just copy and paste something from the Internet and pretend they have actually done something.
    Oh, and did I mention no marks were to be awarded to plagiarism?
    Marked: U

    Note that solaris is displaying the right UTF-8 value this should have been a warning.

    So instead it should display an incorrect value?
    Peculiar.
    Marked: ? (Seriously, if anyone has any idea as to what he’s babbling about please let me know.)

    To display the right value solaris decoded the UTF-8 did not reject it as a malformed char. Rejected as a malformed char you don’t get the hex value.

    “To display the right value”… ? “Solars decoded the UTF-8 did not reject it”… ?
    To be brutally honest, I am dumbfounded as to what this miserable bowl of word salad might just mean.
    It’s just a bunch of disparate bits of jargon shiftlessly tossed together in a pitiful attempt to generate an impression that the author knows something that he in fact doesn’t.
    Marked: U

    Just like ascii to test 6 byte UTF-8 you have asked solaris for too small of an output format so have been rejected.

    Even as his plagiarised source Wikipedia tells him, UCS-4 is a superset of UTF-32 and uses its 32-bit space to the fullest. There is, thus, no “too small of an output format” as UCS-4 has the same space size as pre-RFC3629 UTF-8.
    It seems paying attention is just not this student’s forte.
    Also, no, making Solaris iconv perform a UTF8-to-UTF8 conversion instead of a UTF8-to-UCS4 conversion will still give you the same error message.
    Marked: U

    I did not say that posix systems would allow 6 byte utf8 to be converted to utf16 or utf32 directly.

    What you say doesn’t matter. What matters is what the standard says, and the standard says you are full of it.
    This is not even wrong: it’s just unwarranted self-importance.
    Marked: U

    This is the problem without adding a custom exporter or using the same exporter as importer you cannot see what the particular format will directly accept/reject in inconv.

    Student is not attuned the use of online documentations such as the man pages.
    Marked: U

  51. That Exploit Guy says:

    An aphasic genius wrote:

    iconv -f UTF-8 -t UTF-8 is not in fact a noop.

    If anyone here is wondering what this “noop” might just be, you are not alone.
    The first thing that came to my mind was “no-op”, but obviously our gibbering one was not talking about assembly code.
    Urban Dictionary defines “noop” as “an afternoon poop”, although I am quite certain that there has been no mentioning of bodily functions anywhere in this comment section, either.
    This “formal English” thing – a “new speak”, if you will – is truly remarkable, isn’t it? It claims to be more complex, even though at the same time it contains more ambiguity than its regular counterpart.
    George Orwell would be delighted to examine first-hand this peculiar, linguistic phenomenon, if he was still alive.

  52. oiaohm says:

    That Exploit Guy so close yet so far.
    iconv -f UTF-8 -t UTF-8 is not in fact a noop.

    UTF-8 goes in its processed converted to a UTF-32 value internal storage then converted back to UTF-8. Even on solaris. Solaris enforces UTF-32 when export size limit when the export is done. Note that solaris is displaying the right UTF-8 value this should have been a warning. To display the right value solaris decoded the UTF-8 did not reject it as a malformed char. Rejected as a malformed char you don’t get the hex value.

    Just like ascii to test 6 byte UTF-8 you have asked solaris for too small of an output format so have been rejected.

    This is the problem without adding a custom exporter or using the same exporter as importer you cannot see what the particular format will directly accept/reject in inconv.

    I did not say that posix systems would allow 6 byte utf8 to be converted to utf16 or utf32 directly.

    That Exploit Guy you have not proven at all that solaris rejects it as UTF-8. You have not proven at all really that proper posix systems reject 6 byte UTF-8.

    Its one of those suprising traps people argue and argue not knowing how to use iconv to test it even that a person shows you exactly. Try feeding a 7 byte form utf8 into iconv printf ‘\xFE\x90\x80\x80\x80\x80\x80’ |iconv -f utf8 -t utf8
    Yes rejection on every system. Its out of range of what they will convert and process.

    That Exploit Guy how are you meant to validate a stream without altering it using iconv. That is right iconv -f encoding -t encoding with encoding being identical.

  53. That Exploit Guy says:

    Errata:
    In the previous examples, I used the command “iconv -f UTF-8 -t ASCII <input file>” to attempt to convert byte sequence “FD BF 80 80 80 80” and “FD BF BF 80 80 80”. This did not work for an obvious reason.
    Using instead the command “iconv -f UTF-8 -t UCS-4 <input file>“, the results were markedly different
    Cygwin/RHEL clone:
    FD BF 80 80 80 80 -> 7F 00 00 00
    FD BF BF 80 80 80 -> 7F FC 00 00
    Solaris:
    iconv: conversion error.
    Conversion error detected while processing 7F000000.txt
    iconv: conversion error.
    Conversion error detected while processing 7FFC0000.txt
    This is in harmony with the GNU libiconv bug as detailed here. Notably, contradicting our gibbering one’s assertion, iconv produces the same result on both platforms. Also, Solaris-native iconv correctly rejects both the byte sequences as invalid.
    This proves two things:
    1) The acceptance of scalar values above 0x10FFFF is purely a bug in GNU libiconv and has nothing to do with backward compatibility whatsoever.
    2) “Proper” 6-byte UTF-8 byte sequences are considered invalid on a POSIX-certified system.
    Well, that pretty much wraps up your “6 byte UTF-8” nonsense, doesn’t it, Hamster?

  54. DrLoser says:

    iconv -f UTF-8 -t ASCII

    I am very much looking forward to oiaohm rehashing his

    iconv -f UTF-8 -t UTF-8

    … for several reasons, not the least of which is that the very idea of re-encoding an illegal sequence back to itself is plain stupid. (Note to oiaohm: this is why it doesn’t work any too well with any other decoding specified. Maybe it would, if iconv included an -rfc2279 switch, but for some unaccountable reason it does not and therefore lacks your notion of “backward compatibility.” The humanity!) Oh, and he copied his example from the Web, which meant that he got the iconv switches completely wrong.

    Not much of an expert with the CLI after all, I suspect.

  55. DrLoser says:

    Once more round the mulberry bush.

    That Exploit Guy the simplest reason is backwards and forwards compatibility is why the two sizes UTF-8 exist.

    No, only the one standard for UTF-8 is presently operational: RFC3629. What part of the following do you not understand?

    Obsoletes: 2279

    I’ve even included a link so you can look up the definition of “obsolete.” Interestingly, the third definition (“vestigial; rudimentary”) applies to your understanding of Unicode, ISO, and encoding/decoding formats. Amongst other things.

    Congrats, oiaohm: you’re obsolete!

    I was delighted to see you take me on with UTF-EBCDIC, though. This, from TR16:

    UTF-EBCDIC is intended to be used inside EBCDIC systems or in closed networks where there is a dependency on EBCDIC hard-coding assumptions. It is not meant to be used for open interchange among heterogeneous platforms using different data encodings. Due to specific requirements for ASCII encoding for line endings in some Internet protocols, UTF-EBCDIC is unsuitable for use over the Internet using such protocols. UTF-8 or UTF-16 forms should be used in open interchange.

    Meaning that you have just wasted two paragraphs or so attempting to criticize a use case that never existed in the first place.

    Now, I brought up UTF-EBCDIC in an attempt to help you with those pesky untranslatable EBCDIC characters? Remember them? You got all frothy about them until I claimed that I can refute any argument (on EBCDIC characters) you care to make. Then you went quiet. Want to revisit the subject?

    I do like the idea of a file system that “detects” the encoding of a file stream, however. Wherever did you come up with that little fantasy? Nothing in glibc or in M$ APIs lends the remotest support to such a preposterous notion, although I guess you could use metadata, a la Mime-Types. At which point it’s not remotely under the control of the file system, is it?

  56. That Exploit Guy says:

    The agreement to reduce UTF-8 for UTF-16 in 2003 was too late.

    Again, nothing was too late as no blocks were present beyond U+10FFFF even back then. The concept of surrogate pairs had already been introduced since 1996 in Unicode 2.0, and those responsible for the Unicode Standard were careful enough to keep character assignment below scalar value 0x110000.
    Marked: U

    You see people complaining from time to time when there ported program to a Unix/Linux/BSD explodes because someone has a 5 or 6 byte UTF-8 char on file-system or that iconv does not filter them out.

    The only thing that would make a computer literally explode is a stick of dynamite.
    An ill-formed byte sequence, on the other hand, would not cause a computer to explode: it would simply be rejected as per standard.
    Marked: U

    You learn to ask iconv for RFC3629 on Unix systems if you are sending it to windows.

    Perhaps you would like me to try out iconv on Cygwin, then?
    Let’s pick the letter “A”. In both ASCII and proper UTF-8, that’s 41 in hex code. In malformed 6-byte UTF-8 form, on the other hand, it’s FC 80 80 80 81 81.
    Simple enough? You bet it is.
    Now let see what Cygwin comes up with doing a conversion with the input byte sequence “41” (stored in wellformed.txt) from UTF-8 to ASCII using the command:
    iconv -f UTF-8 -t ASCII
    Result:
    “A”
    Well, that works. Now, let’s try the command again, but this time with the input byte sequence “FC 80 80 80 81 81” (stored in malformed.txt).
    Result:
    “iconv: malformed.txt:1:0: cannot convert”
    So, no love for our malformed “A” on Cygwin. How about an RHEL clone (Scientific Linux 6.5), then.
    Result for “41”:
    “A”
    Result for “FC 80 80 80 81 81”:
    “illegal input sequence at position 0”
    Still no love for our malformed byte sequence. Pah, screw Linux! Let’s try Solaris (Solaris 10 1/13).
    Result for “41”:
    “A”
    Result for “7C 80 80 80 81 81”:
    “iconv: conversion error.
    Conversion error detected while processing malformed.txt”
    That didn’t go well at all. But how about a more proper 6-byte byte sequence, say, that for 0x7F000000 (“FD BF 80 80 80 80”), on all three platforms?
    Cygwin:
    “iconv: 7F000000.txt:1:0: cannot convert”
    RHEL clone:
    “illegal input sequence at position 0”
    Solaris:
    “iconv: conversion error.
    Conversion error detected while processing 7F000000.txt”
    How about 0x7FFC0000 (“FD BF BF 80 80 80”), then?
    Cygwin:
    ditto
    RHEL clone:
    ditto
    Solaris:
    ditto
    So, there you have it: iconv refuses to treat the malformed byte sequence as the letter “A” on all three platforms. What’s more, it rejects 6-byte sequences even when they are “proper”. If you so wish, I can even write a script just to run through the entire [0x40000000, 0x7FFFFFFF] range. However, if it turns out (in most likelihood) that none of the scalar values are considered valid by iconv, I am going to charge you by the second for the entire effort.
    And I am not kidding.

  57. That Exploit Guy says:

    Beaker the Muppet wrote:

    The problem here all starts in 1995. With “File System Safe UCS Transformation Format” by X/Open.
    Plagiarism, particularly words lifted wholesale from the “UTF-8” article in Wikipedia, deserves 0 mark.
    Marked: U

    At this point all new Posix OS file systems used this UTF-8 this is 6 byte UTF-8.

    Again, characters are ultimately what we try to with here. Also, again, you have failed to take into account that at no point has any character existed in the “6 byte” range (or the scalar value range [0x4000000, 0x7FFFFFFF]).
    This isn’t a creative writing exercise, and claims of any Santa Claus visit must be accompanied by evidence of Santa Claus’ existence. Likewise, claims of the use of any fantasy characters represented by a known-to-be-unused scalar value range in well-known operating systems must be supported by evidence of the existence such fantasy characters.
    Marked: U

    Yes RFC3629 is marked as obsoleting RFC2279 but its does not obsolete “File System Safe UCS Transformation Format”

    Poorly disguised plagiarism aside, even as your plagiarised source points out, the only version of FSS-UTF that saw actual usage was the one as modified by Thompson et. al. In other words, it is just UTF-8 as we know it today.
    This brings us all the way back to what we have already know first and foremost: there are no known corresponding characters above the scalar value 10FFFF. In other words, RFC2279 or not, you are still not giving any compelling reason as to why anyone should care about a UTF-8 byte length that should not exist anywhere.
    Marked: U

    ISO/IEC 10646:2012 does not state you cannot use the extra value space of other things other things just you will no longer be well-formed.

    Putting aside the absence of known character above U+10FFFF (privately defined or otherwise), and as you have been told before, ill-formed byte sequences are considered standard non-conformant in both ISO 10646 (ISO) and the Unicode Standard (Unicode Consortium). This means by (inexplicably) insisting on standard non-conformant, 6-byte sequences, you also miss the very reason to follow established standard: interoperability.
    This is not to mention ill-formed byte sequences contradict the RFC 2279 requirement that “there is only one valid way to encode a given UCS-4 character”.
    Marked: U

    So NFS4 must remain 6 byte even that RFC2279 is Depecated

    Even as the very person maintaining the Linux kernel NFS module tells you, the Linux NFS client is implemented such that file names given to it are simply passed to the server as-is because *nix file systems, particularly those of Linux, have the tendency to not enforce any particular encoding. This makes encoding standard practically unenforceable in NFS.
    Sometimes I wonder if you pay attention to anything said you at all.
    Marked: U

    There was a high end PUA space.

    Code points beyond BMP were first introduced to the Unicode Standard in 2001 with the “highest” block located in the scalar value range [0x100000, 0x10FFFD]. The block was formally renamed “Supplementary Private Use Area-B” in 2002 and extended to scalar value 0x10FFFF.
    No block beyond 0x10FFFF has ever existed in the recorded history of the Unicode Standard.
    Marked: U

    This is also future safe.

    The future, as it stands, is a space 1,114,112 values large and mostly consists of unassigned code points.
    The present, on the other hand, is a vast collection of characters that comprehensively includes even such oddities as the Egyptian Hieroglyphs ([0x13000, 0x1342F]) and the Sumerian Cuneiforms ([0x12000, 0x1237F]).
    Really, is “future-proof” the best you can come up with as your argument for “6 byte UTF-8”?
    Marked: U

    So after 1995 you are making a new Posix file-system its UTF-8 6 byte encoding for file-names nothing else as per what was agreed.

    Again, no corresponding characters mean no “UTF-8 6 byte encoding”, “Posix” or otherwise.
    Marked: U

  58. oiaohm says:

    The problem here all starts in 1995. With “File System Safe UCS Transformation Format” by X/Open. This is in fact defined in c code how it should be implemented and its highly tolerant decoder. At this point all new Posix OS file systems used this UTF-8 this is 6 byte UTF-8.
    “File System Safe UCS Transformation Format” and RFC2279 are close enough to the same that you can use the names almost interchangeably with each other.

    Yes RFC3629 is marked as obsoleting RFC2279 but its does not obsolete “File System Safe UCS Transformation Format”. NFSv4 is define in Posix terms and RFC terms. Welcome to trap. So even that RFC NFS define uses RFC2279 there is another written using posix terms it is FSSUTF using. Define of protocol cannot go out of alignment. So NFS4 must remain 6 byte even that RFC2279 is Depecated or it will go out of alignment with the Posix documents.

    That Exploit Guy the simplest reason is backwards and forwards compatibility is why the two sizes UTF-8 exist. You cannot make stuff smaller without harming backwards compatibility when reading older filesystems that someone might have used those chars. So posix filesystem UTF-8 is 6 bytes even if there are no currently define Unicode chars or PUA space. There was a high end PUA space. This is also future safe. When the 21 bits do get used up Posix conforming file-system tools will not have to change. Just to be fun Posix is a ISO standard both are ISO. I really don’t know what UTF-16 implementation are going todo when the 21 bits get filled.

    ISO/IEC 10646:2012 does not state you cannot use the extra value space of other things other things just you will no longer be well-formed. RFC3629 states you cannot use the extra space.

    The proper filename encoding in *nix is usually left for individual filesystems to decide.
    That Exploit Guy you made a mistake to think that file name encoding has been left up to the filesystems makers to decide. That ended for Posix based OS’s in 1995. So after 1995 you are making a new Posix file-system its UTF-8 6 byte encoding for file-names nothing else as per what was agreed. Legacy support is left up-to the file-system driver same with alien non posix file-systems. NTFS and FAT and exfat are alien to Posix world these don’t support all features.

    Reason why unique per file-system encoding ended is a practical one. Every extra encoding type you have to support is more code you have to put in kernel space more bugs. Everyone has in fact come into alignment except for Microsoft. With the number of Posix file-systems you would hate if they all had different encodings.

    I really do mean everyone bar Microsoft is using UTF-8 6 byte in their new file systems.

    This is why there is a problem. The idea of making stuff smaller down the track does not work. The agreement to reduce UTF-8 for UTF-16 in 2003 was too late. Too much was already set in stone. The keystone agreement is 1995 and changing that agreement is to the point of impossible. Now if 2003 happened to go the other way 21 to 31 bits we would not be in this mess now with 2 different UTF-8 defines as it would not have been a backwards compatibility breaking move.

    Being aware of the keystone is important. You see people complaining from time to time when there ported program to a Unix/Linux/BSD explodes because someone has a 5 or 6 byte UTF-8 char on file-system or that iconv does not filter them out. You learn to ask iconv for RFC3629 on Unix systems if you are sending it to windows. UTF-8 Unix or Linux or BSD is most likely 6 byte. UTF-8 sent to windows better be RFC3629 or have users complain.

    UTF-EBCDIC is for-bin just to be hell on posix file-systems as file-names. Yes you do get some crappy closed source EDCDIC applications that use control chars in filenames as stupid forms of copy protection. UTF-EBCDIC is great when you want to put data inside files as long as you don’t forget as it detects as UTF-8 then attempts to decode as non EDCDIC yes after a while UTF-EDCDIC opened as normal UTF-8 and messed up you get use to spotting.

    If the problem was a simple as data storage UTF-EDCDIC would cut it. They complain about me not being in the real world at times. The reality both of you need to back off and do your homework. Sometimes it just not possible to change things. UTF-8 to be only 4 bytes for everything is just not possible. Attempting that just creates conflicts.

  59. DrLoser says:

    RFC3639 is only for web transport not for file systems.

    Good to know that you appreciate that, oiaohm. Considerations on the use of a Service Identifier in Packet Headers is, indeed, devoted to the Transport Layer, although I wouldn’t be as precipitate as you and argue for its exclusive use in a “Web” context. We can talk about these things later and at our leisure.

    Just as an aside, though, when you Googled this RFC, didn’t it actually occur to you that it was completely irrelevant to the matter at hand?

    TEG is correct. “U”.

  60. DrLoser says:

    And for the benefit of the many people on this site who appreciate a FAQ-style exposition:

    ISO10646:2012 does not forbid the usage of RFC2279 encoders/decoders.

    Q. Does ISO10646:2012 forbid the usage of RFC2279 encoders/decoders?
    A. Hardly. Why would it? It doesn’t forbid the usage of Magic I-Spy Dick Tracy Encoder Rings, either.

    The only minor issue here is that such a thing would be completely pointless. If the other end has no standard (or locally agreed as acceptable) way to decode this nonsense, then why bother?

    There isn’t any plausible reason for ISO10646:2012 to forbid the use of an RFC2279 encoder/decoder, oiaohm.

    ISO10646:2012 defines a set of code points and acceptable transforms: that is all. Umpty-byte “extended” UTF-8 nonsense is not really a concern, unless you’re a fantasist.

    But you’re the expert here. Show us a single code point that is in any way different when using RFC2279 and RFC3629.

    Just one. Go ahead — make our day.

  61. DrLoser says:

    I’m game for this, oiaohm. I think that TEG is being completely unfair by assigning a grade of “U” to you. I’m sure you can improve on your current record of failure (“F”) every single instance.

    There are so many, though. RFC2279?
    RFC 3629 has this to say:

    Obsoletes: 2279

    You can’t even read four miserable little lines into an RFC, can you?

    icopnv? Well, I’ve tried to defend it, but you have wielded the Sword of Truth and pointed out that, in fact, it allows non-standard UTF-8 encodings by, whilst still returning success (0) in $?

    Not especially useful, oiaohm, considering that it won’t translate to any other decoding whatsoever. Apart from UTF-8 (whoopee!), which, as it transpires, coughs up the input as an output.

    Don’t really see the value in that, myself.

    And there’s always those obscure EBCDIC control characters that for some reason need to be transported across a network and translated into Unicode code points.

    Hilarious. Remind me again which ones have ever been necessary to anybody at all?

    I’m going to be very cruel here and point you at UTF-EBCDIC.

    Have fun!

  62. That Exploit Guy says:

    The Swedish Chef wrote:

    go read NFS 4 look for 2279 and 3639 and you will find only RFC2279 is part of the standard in other words 6 byte long utf-8. You will find RFC3639 is not linked to any Posix native file-system or cluster filesystem or network filesystem. All are linked to RFC2279 or old ISO or why current ISO still includes instructions for 6 byte UTF-8.

    NFS you say? Sadly:
    1) You still haven’t been able explain why scalar values 0x110000 and above should be consider valid code points in the context of Unicode character encoding, and this is given the obvious facts that:
    a) Most code points in the range [0x100000, 0x10FFFF] outside of PUAs are currently unassigned. In other words, the current ISO 10646 standard has hardly exhausted even a measly space of 1114112 code points, and whatever reason you give for needing 2^31 code points, given this reality, can only be best described as completely bogus.
    b) Neither RFC 2279 nor RFC 3629 defines any code-point-to-character mapping (as this is strictly the territory of ISO 10646 and of the Unicode Standard). In other words, citing RFC 2279 simply won’t help with your case even one bit.
    2) The proper filename encoding in *nix is usually left for individual filesystems to decide. This is why the NFSv4 implementation for Linux does not filter or manipulate filename encoding at all.
    Your score this time?
    Marked: U
    By TEG

  63. oiaohm says:

    http://savannah.gnu.org/bugs/?37857 this bug report is 2012.
    http://www.rfc-base.org/txt/rfc-3530.txt go read NFS 4 look for 2279 and 3639 and you will find only RFC2279 is part of the standard in other words 6 byte long utf-8. You will find RFC3639 is not linked to any Posix native file-system or cluster filesystem or network filesystem. All are linked to RFC2279 or old ISO or why current ISO still includes instructions for 6 byte UTF-8.

    Since you build it from scratch go read the utf8.h header file in iconv it clearly states RFC2279 is in use with 5-6 bytes but its modified to forbid decoding of particular classes of chars that RFC2279 decoder should decode.

    RFC2279 default or slightly modifed is also used by OS X, AIX, HPUX… as what UTF8 is in iconv.

    ISO10646:2012 does not forbid the usage of RFC2279 encoders/decoders.

    RFC3639 is only for web transport not for file systems. That GNU iconv bug I can find OS X users, AIX and BSD complaining about the same fault. Yes the surprise of a 5 or 6 byte utf8 char from their file-systems..

    NFS in fact uses RFC2279. Linux cluster file-systems RFC2279. The bug in GNU iconv will not be fixed because default being RFC2279 is not a bug.

    GNU Iconv has not had major alterations from 1999 still operates the exactly the same way. Large sections of glibc have not change in 15+ years. And the fact the documentation has not been updated in 15 years shows that the openssl problem is not kinda localised.

    DrLoser the reality is RFC3639 is very poorly adopted. RFC2279 is in fact very broadly used. The dominate are UTF-16 microsoft style and RFC2279

    I have found RFC2279 owns to in posix
    “File System Safe UCS Transformation Format (FSS_UTF)” This is what RFC2279 is called in posix standard kinda. And what OS that claim posix conforming after 1991 have to support(yes reason why windows is does not claim to support posix newer than 1991). If your file system is posix conforming using UTF-8 it uses FSS_UTF or utf-8 6 byte in length. This is not a standard effected by any alteration done by the RFC body or ISO body or Unicode body.

    There is no RFC3639 in Posix standards. The memo was never added to anything from the open group.

    Sorry reality is using RFC3639 is not Posix conforming. UTF-8 Posix is FSS_UTF and the reason why you don’t use FSS_UTF but an RFC number is that FSS_UTF in posix standard can be RFC 2044 or RFC 2279 yes they are slightly different.

    Something new built today might claim to be Posix:1993 so that case it will be producing RFC2044 not RFC2279.

    IEEE Std 1003.1-2008 Posix os are all using RFC2279 as UTF-8 one of the reasons why Linux system is mostly is the fact GNU UTF-8 is not pure RFC2279.

    http://www.sunchangming.com/docs/unicode/xopen_utf-8.pdf

    xopen utf8 or in other-words the UTF8 you are meant to use when implementing anything with a posix in its name.

    Get the problem yet. TEG had it wrong. Badly wrong. Posix is also a ISO standard. There are disagreements still on going what UTF-8 is.

    Yes the last time posix utf-8 was update was 1995. The 1995 implementation is what all the posix world agreed to use. There has been no posix world agreement to use RFC3639.

    If it not broken why change it. FSS UTF8 or xopen utf8 decodes RFC3639 without issue. Since all posix system use the longer UTF-8 form its not a issue for those to use it between themselves.

    The UTF-8 difference is why hooking up Windows to Posix systems is so hard. Posix systems have to down grade self.

    Also rfc2279 is allowed in sections of android and forbid in others. Some of android is rfc2279 and some is rfc3639 depending on who was developing it.

    Apache portable runtime at the core of android supports 6 byte utf-8 to UCS-4 of course forbids to UCS-2.

    Welcome to the world of posix os nothing is consistent.

    The number of companies not fully accepting RFC3639 is long.

  64. DrLoser says:

    You’ve spent the best part of six years being a miserable and incompetent failure, oiaohm.
    You don’t understand encodings. You don’t understand B-trees. You don’t understand the difference between a compile-level static instruction set (MSIL) and something that requires (generally, I have to admit) a safe re-cast (Java byte codes). You don’t understand hard real-time. I suspect you don’t have a clue about soft real-time, although I warn you I might get around to a happy little conversation about, say, GPU parallelisation and/or event-driven systems. Beware!

    But, let’s face it: you’ve wasted about a hundred posts on the specific subject of UTF-8, and I’m not going to cost that out precisely, but to be honest it wouldn’t pay you to do that again unless you were unemployed.

    Because the end result is slightly less than zero.

    As always, oioahm, you have profusely embarrassed yourself.

  65. DrLoser says:

    I hate when I am suffering from word swap. 31 bit on and I had typed 31 byte.

    Not really one of your several distinctly disadvantageous weaknesses, oiaohm. However, let us continue. I’m not done with your marks yet.

    There is big reason why I said you should be using RFC or ISO numbers.

    Funny, that. You don’t use them. RFC3629 and ISO10646:2012 are the current standards. Everybody else but you accepts them.

    Marked: F.

    Its a simple reality you don’t understand iconv.

    Hilarious. I’ve built it from scratch (on FOSS principles). What have you ever done, besides googling it?

    Marked: F.

    Non destructive encoding means you have the data to restore back exactly what you received. Non destructive encoding does not mean that anyone else has to be able to decode it other than you. This is the same as lossless compression.

    No it is not.

    Lossless compression is essentially F(x) -> y -> F(y) -> x. I cannot put it any simpler than that. Encode, decode, that’s it. If you cannot use the transform F(y) -> x, this is not non-destructive encoding.

    I don’t care which side of the putative wire you’re on.

    Marked: F.

    Once you have chars that don’t encode into standard and have to use PUA you are basically in stuffed area anyhow.

    Sadly, you have no clue why anybody would use the PUA. (It’s nothing to do with EBCDIC, incidentally.) Let me spell this out for you:
    Private Use Area. Use it as you will. Just don’t expect encode-decode to work for you, outside your particular domain of “private use.”

    Your link to Gnu iconv implementation? Splendid, but I must warn you that “The evaluation is based on the current state of the development (as of January 1999)” doesn’t inspire confidence in your beliefs. (Gnu has moved on in the last fifteen years.)

    Your link to the basic iconv bug? sadly makes it clear that I, the idealist who believed that iconv is the canonical representation of a Posix UTF-8 standard, have been confounded by the idiocies of the people who wrote the software and can’t even be bothered to get it right.

    Your words, oiaohm. Your words. Not mine.

    Look, I tried. I think that Gnu/Linux and Posix and whatnot make a decent guess at standard UTF-8 … and now you’re telling me that they don’t.

    Robert will not be pleased with you, and for good reason.

    Not really my problem, is it? You have exercised all four freedoms.

    Apart from the “I will send the obvious fix to the guys at Gnu.”

    I will do that. You will not. Why not?

    Because, oiaohm, you are an ignorant free-loading fraud.

    Marked: F.

    As a postscript, check out RFC3629 one last time. Or, indeed, for the first time:

    This memo obsoletes and replaces RFC 2279.

    You just don’t have a clue what you’re talking about, do you, oiaohm? Oops, almost forgot:

    Utf8 on Linux is not always what you expect. Yes its acceptable to use 6 byte utf-8 in file-names on linux.

    No it is not.

    A Mr Peter Dolding made this insane suggestion to “improve” the Posix file-naming convention about four or five years ago. He was treated with courtesy by the Posix guys, up to and including an explanation from the world of encoding/decoding (basically Shannon’s Law) as to the precise details of how this is a remarkably silly and thoughtless suggestion.

    Have you ever met a Mr Peter Dolding, oiaohm?

    Marked: F.

  66. oiaohm says:

    I hate when I am suffering from word swap. 31 bit on and I had typed 31 byte.

  67. oiaohm says:

    “EO” Eight Ones assigned to hexadecimal code 9F in cross mapping to ascii from unicode. EDCDIC-unicode removes the first 128 chars (ascii) and inserts the 160 EDCDIC chars. Yes 160 is what is left of the 255 EDCDIC space when you remove all the unused. IBM method is also repeated with all ascii starting encodings.

    GNU UTF8 is 31 bit even in 2012. Posix file systems utf-8 expect it. The utf-8 to be used on filesystems was defined before 2003 and has not been changed.

    Before 2003 a PUA section was still at the top end of the Unicode encoding. 31 bit set was the PUA space. Yes no officially defined chars past but there was Private usage chars.

    100000–​10FFFF PUA is where the old 31 bit PUA moved to when it was reduced to 21 bit. This is another trap. Old files with old PUA have chars large than the 2003 limit as well. Then end of the encoding of unicode is where the PUA traditionally has been even that the numbers are vastly different.

    iso10646 still defines UTF-8 and UTF-32 as 31 bit. Not 21 bit locked. Just everything past 21 bit currently left unassigned and for other usage.

    non destructive encoding means you have the data to restore back exactly what you received. Non destructive encoding does not mean that anyone else has to be able to decode it other than you. This is the same as lossless compression.

    Reason for not using standards is standards leave you screwed. PUA is not an option when the PUA space is already been giving to the application coder to use how they so choose.

    Besides by your statements unicode is a destructive standard because it include PUA sections for people to uniquely encode stuff how ever they so please and hopefully not conflict with each other. So please tell me what should I use instead of Unicode since by your statement sending PUA Unicode is destructive.

    Once you have chars that don’t encode into standard and have to use PUA you are basically in stuffed area anyhow. All you can do is encode them so you can reverse it. Non destructive encoding is last option. Knowing the limits of valid encodings is also important so you can protect your extra chars from being destroyed as much as possible.

  68. oiaohm says:

    DrLoser I have already pointed out the problem. GNU Iconv define of UTF-8 is in fact not independent or set in stone or conforming.

    http://www.gnu.org/software/libc/manual/html_node/glibc-iconv-Implementation.html

    There is big reason why I said you should be using RFC or ISO numbers. UTF-8 in default Linux iconv is defined by a module. A changeable module. Reality here if you are presuming UTF-8 from iconv on Linux/Unix/BSD is something and you have not done test cases for it you are asking for it. Really old UTF-8 in GNU iconv and old Linux and old Mysql will reject any UTF-8 longer than 3 bytes utf8mb3 . Yes there is Ubuntu release where iconv happens to be utf8mb3 due to someone pulling in a old patch and no one noticed.

    Canonical Linux definition not exactly correct its Canonical Linux current default setting. To be correct is default hazard GNU UTF-8.

    http://savannah.gnu.org/bugs/?37857
    Yes iconv UTF-8 Default GNU is halfway between standards some cases allowing stuff some case forbidding stuff. RFC3629 and RFC2279 mixed into one is GNU iconv UTF-8. Not to standard anything.

    DrLoser unlike you I know how iconv on Linux works. How its changeable. iconv from GNU cannot be trusted to certify anything unless you have used your own modules exactly to standard or request modules exactly to a particular standard. UTF-8 request is pot luck.

    Yes utf8.h inside iconv GNU is kinda 6 byte RFC 2279. Note TEG claimed 4 byte was UTF-8. utf-8 to utf-8 iconv accepts without question 6 byte. Of course GNU UTF-8 rejects packed out null and chars that should be acceptable by RFC2279 but is to be rejected by RFC3629. Now if you were using a RFC2279 module on your iconv instead of GNU pot luck UTF-8 the modified UTF-8 would process.

    Its a simple reality you don’t understand iconv. TEG would have the hell of what the heck is going on when 5 and 6 byte utf-8 turns up that GNU iconv thinks is fine as utf-8.
    printf ‘\xfC\x90\x80\x80\x80\x80’ |iconv -f utf8 -t utf8 and printf ‘\xfD\x80\x80\x80\x80\x80’ |iconv -f utf8 -t utf8 The last one is byte 31 on.
    Yes this is acceptable to gnu iconv using the default utf8 to utf8 conversion. Interesting enough it will reject you attempt to turn that into utf32.

    Utf8 on Linux is not always what you expect. Yes its acceptable to use 6 byte utf-8 in file-names on linux.

    Reality working with utf-8 request a standard or expect strangeness. GNU UTF-8 and GNU iconv is pure strangeness and will get you into trouble.

  69. DrLoser says:

    Correction: N-1.

    oiaohm’s test case for NUL in Modified UTF-8 actually fails when piped through iconv.

    I was wrong, and oiaohm did indeed provide a single test case. He has yet to explain why it fails when matched against the canonical Linux definition of UTF-8, though.

    And I rather doubt that he ever will.

  70. DrLoser says:

    An d when I say that “for these purposes, a Linux standard is at least as good as any other,” oiaohm, may I point out that, unlike you, I have gone to the trouble of building the standard Linux tool to encode and decode UTF-8 (iconv for newcomers), and I have specified N test cases, where N is greater than the number of test cases that you have provided by, well, N, and I am therefore in a position to state that Linux conforms to what TEG defines as standard UTF-8.

    And that you are not.

    Big Huge Whopping Fail again.

  71. DrLoser says:

    Ah well. First, a small joke, dedicated to Robert (who like me has had to work with EBCDIC): do you know what the EBCDIC control character “EO” stands for?

    It stands for “every bit is one.” I’m not kidding you. It’s a real, but obscure, EBCDIC character.

    I believe your homework is due for marking, oiaohm. (Again, Robert as a teacher will appreciate this.)

    From IBM point of view Unicode is not Ascii.

    Still confusing code points with encoding, I see. A straight Fail on this one.

    Not everything encoded in UTF-8 is Unicode.

    See previous mark. You’re determined to fail hard, aren’t you?

    Old encoding to Modern in a lot of cases is destructive as well losing control char data. Visable chars have places in Unicode.

    Meaningless gibberish. As Wolfgang von Pauli would have put it, this isn’t even meaningful enough to constitute a Fail.

    When you need non destructive you are forced to leave standards.

    No: when you need “non destructive,” you use a standard. (For these purposes a Linux standard is at least as good as any other.) If you send “non standard” stuff to somebody else, that is per definitionem destructive.

    Big, Huge Fail. But there’s still a chance to redeem yourself.

    Before 2003 unicode was 31 bit in size and unable to be encoded into UTF-16.

    Oh dear. You’re not paying attention, are you? You’re still thinking about ISO 10646, which for the sake of the present discussion I will equate with UTF-32 (or UCS-2 if you will).

    I’ll sort of give you a D- here, however, because the 2012 standards (even the PUA) only go up to U+10FFFD code points. You still can’t tell a code point from a random encoding, but I’m happy to admit that neither Unicode nor ISO10646 go up to 2 billion code points.

    There’s no need, as TEG has repeatedly pointed out. And you haven’t listened once, so I’m going to downgrade you to an E-.

    Is there anything stopping Unicode in future returning to its prior UTF-16 incompatible define. In fact nothing if Microsoft losses is market dominance.

    So, you’re saying that Microsoft is the only defence we have left?

    I can hear Robert sputtering in the background (quite rightly so, in this case), so I’m going with a mark of U for Unclassified, because this is Beyond Failure.

    And there is no official space in Unicode to place edcdic unique char into.

    I encourage you to examine this proposition. (I must warn you that I have done extensive research and am well armed with EBCDIC-related defences.)

    Both Robert and I have programmed in EBCDIC. We know what we’re talking about. You? Not so much.

    Overall grade, oiaohm: Total Failure.

  72. oiaohm says:

    https://gigaom.com/2014/02/23/microsoft-qualcomm-offer-a-windows-phone-design-in-a-box-to-handset-makers/
    This is Microsoft new attempt at creating a new wintel.

    Mind you the unicode mess and what is going on here is all interlinked.

    Before 2003 unicode was 31 bit in size and unable to be encoded into UTF-16. Why was the selection to go to 21 bit done. Reality is that Microsoft was dominate. Why was UTF-16 allowed over Internet. Because Microsoft servers were already doing it. Is there anything stopping Unicode in future returning to its prior UTF-16 incompatible define. In fact nothing if Microsoft losses is market dominance.

    PC hardware the Linux world took into the Server room displacing custom mainframe hardware. Why because it was produced in huge volume so cheap. History is repeating Linux is taking arm into the server room this time because its being produced in the large volume.

    wintel is on the rocks not sunk yet but even Microsoft CEO admits its on the rocks with huge number of users out their who will never own a PC or will never have any reason to own a PC. Microsoft is trying to form a new like wintel with Qualcomm. Even Microsoft is not going to be producing any x86 phones.

    Intel is going to fight with x86 until the bitter end. If intel goes arm it means having to pay another company licensing.

    At the moment we are fairly much watching the complete thing come apart at seams.

    qualcomm thing is mostly because the other hardware vendors are not coming on board. qualcomm also makes those same chips for Android/Linux. So Android and Linux has access to everything qualcomm makes + everything everyone else makes as well.

  73. oiaohm says:

    Talking of homework, oiaohm: how’s your homework on Unicode going?
    I had. Ascii and EBCDIC are not compadible with each other. From IBM point of view Unicode is not Ascii. So welcome to hell.

    Not everything encoded in UTF-8 is unicode.

    I had done my homework. There is no lossless way to convert between edcdic and ascii. And there is no official space in Unicode to place edcdic unique char into. Result is there are a stack of solutions on the edge of standards. Interesting enough Linux can handle these. Of course you have to know how to set stuff up.

    Sorry the problem was that I had done my homework then had got forgetfull in the unicode case.

    Does not change the fact what the exploit guy was calling UTF-8 was in fact not UTF-8. Sub form of UTF-8 yes.

    EBCDIC control chars is not the only thing that the unicode standards body has not bothered dealing with. Old encoding to Modern in a lot of cases is destructive as well losing control char data. Visable chars have places in Unicode.

    I was also remembering what I had seen in the field. I now have full memory why evil messes exist. When you need non destructive you are forced to leave standards.

  74. BTW, I was watching Al Jazeera this afternoon and noticed they had a news story about the total compromisation of that other OS through IE… Gee. Global coverage by an Arab news source. I think M$ is top of mind and not in a good way.

    A consequence may be that XP may have zero security from now on because IE is built-in. M$ will update XP as far as I know. That’s 100+ million hosts for the taking…

  75. DrLoser says:

    Talking of homework, oiaohm: how’s your homework on Unicode going?

  76. oiaohm says:

    wolfgang of course loves walking into traps because wolfgang loves doing no homework.

    Second generation clones of the raspberry pi like the banana pi appear in completed products. Zao Pl mini-PC and Zao Intelligent Desk. Yes Zao Pl mini-PC is just place in standard raspberry pi case. Yes by all shock horror the raspberry pi has become a form factor. Zao Intelligent Desk is more interesting.

    About time you shut-up on this topic as well wolfgang until you learn todo some homework. The reality what is happening in with the descendants Raspberry Pi is forcing a problem on the x86 market.

    The HummingBoard is very interesting as it also the first of the boards out there to include mini PCI-Express and replaceable CPU. And its expected to be a 50 dollar board. This is the problem in the arm market the price of motherboard includes cpu and ram.

    Current entry level x86 has PCI-express crippled to x4 speed even if they have bigger PCI-express ports. Current arm hardware to public has it has PCI-express x1 ok a little behind the current x86 hardware in market. Next generation who knows.

    This is the problem the difference between arm and x86 has reduced massively in 12 months for hardware support.

    There is a enough size in a pi clone board to place a PCI Express External Cabling x8 port on back side. The motherboard with daughter board could be back as well.

    The price of parts all comes down to volume wolfgang, This reality means unless PC machines can do something about the declining volume their price will rise.

    several hundred million down equals less room to produce as many variation.

    wolfgang if you mean Microsoft tablet as in Surface tablet. Microsoft is selling the Surface tablet at a loss. In fact less than what the hardware is worth. This is the reason none of the other OEM’s are making Windows Tablets. Existence of arm tablets powered by Android are just too low in price for an x86 to price compete. Or where you referring to the Nokia X that Microsoft is able to sell at a profit??

    wolfgang if you want a real killer problem you are aware that Microsoft will keep the Nokia X that is an Android based phone and tablet line. Yes joke about Microsoft Linux is coming true. Yes even for Microsoft the non Microsoft based product is the profitable one. The fact surface cannot be sold at profit by a company that does not have to pay license fees to its self is a major problem.

    The blood in the water. There is a huge risk that if Microsoft remains it will be a Linux company. The end of the NT OS age could be happening.

  77. wolfgang wrote, “oiaohm is sagenhaft geek to harp on hobby tool as end of microsoft. laugh at idea of billion computer users walking around with board dangling from bunch of wires.”

    Raspberry Pi is not necessarily a killer product but it is a symptom of Wintel’s problem: anyone can design a good IT product from scratch these days and undercut Wintel’s taxes. Using and ARMed CPU and FLOSS cut more than $100 from the retail price. For consumers with idling hair-driers, this is very attractive, especially for the poor, young, next billions who can’t afford to pay the taxes. Intel has diversified reasonably well by Moore’s Law and the Atom to the point where they are competitive in power-consumption but they still can’t compete well on price and maintain monopolist pricing. They are having to cut price. M$ is fleeing the PC market. They have nothing competitive to offer. Their old products are satisfactory for users. They can’t sell the new products. On StatCounter, 8 is declining and 8.1 is sluggishly climbing. 8.1 is increasing about 0.6% share per month when that other OS used to climb 1-2% per month. That means half the shipments aren’t selling. That’s not sustainable unless M$ pays OEMs to ship. Indeed, M$’s total share of page-views is declining 0.2% per month, not at all like the good old days.

  78. wolfgang wrote, “pc production down to several hundred million pc per year now”.

    IDC reports shipments are way down from their peak of ~360 million per annum. In Q1 2014, they report 73.4 million shipments globally. Meanwhile tablets shipped 70 million units per quarter in Western Europe alone… Gartner estimated just over 300 million legacy PCs shipped in 2013. I estimate that a lot of those are still in warehouses. My Walmart hasn’t had a single desktop on display in the last year.

  79. wolfgang says:

    …raspberry pi…

    appropriate name for funny little board. get raspberry from wolfgang for sure.
    think oiaohm is sagenhaft geek to harp on hobby tool as end of microsoft. laugh at idea of billion computer users walking around with board dangling from bunch of wires. need wall adapter even, no battery.

  80. wolfgang says:

    …legacy pc is tiny niche, says pogson…

    wolfgang agree, subject to difference of opinion on what tiny mean. pc production down to several hundred million pc per year now. was once more than that. pc production still very big business as business goes even so. still very much windows business, too. one day something else be bigger, but not today yet.
    think pogson hoping for end of microsoft for years and years and every next year bigger year than before. even now.
    phones get better and better, too. even microsoft nokia phone. tablets get better and better, too, even microsoft tablet.

  81. oiaohm wrote, “You can almost bet they will be 8 core armv8 64 bit processors running at some decent clock speed still just powered by a USB port.”

    I don’t know. It may well be that RAM, storage and peripheral interfaces will ramp up so they need a bit more power. They are not exactly mobile devices. I would not be surprised to see a few watts of power supplied somehow. The SoCs are becoming more capable so there’s plenty of room on board for more stuff. It’s at a pretty good price now. If they can put more on the board for a similar price, they have a winner. I would not be surprised to see some white box guys shipping desktops with a Raspberry Pi core. For that they would want a bit more power and more USB connectors. They would also want an order of magnitude more production. Desktops will continue to sell but they need increasing performance/price. This basic board is close to ideal for tiny projects. It wouldn’t take much to make it more generally useful.

  82. oiaohm says:

    http://linuxgizmos.com/sbc-mimics-raspberry-pi-has-faster-cpu-adds-sata/
    http://linuxgizmos.com/raspberry-pi-like-boardset-boasts-quad-core-imx6/

    wolfgang hardware makers will supply stuff that is true. If Microsoft will be along for the ride that is a complete different matter.

    We are now seeing the generation hardware after the Raspberry Pi. These are not PC and these are not Windows compatible. They are a new form factor.

    The reason why these Raspbery PI clones prices are so low is they are based on chips that are in phones, tablets and other devices.

    wolfgang yes in the past when building FOSS based items to get decent prices we had to stick to hardware that was using in volume by Windows. The problem is the volume of production has left PC. Or at least the PC everyone knows.

    The one thing interesting about all second generation style clones of the Raspbery PI is they all contain Sata,

    Start paying some attention wolfgang. If you have been watching over the last 5 years the number of Desktop targeted x86 motherboards in production has decreased. Yes this is a direct result of the x86 market tightening. Motherboards in x86 are now majority being built for server. Server is not Windows. Server is majority Linux.

    The change away from conventional PC’s is happening if you like it or not.

    What will the third generation clones of Raspbery PI look like. You can almost bet they will be 8 core armv8 64 bit processors running at some decent clock speed still just powered by a USB port.

    To be correct the Raspbery PI is triggering an event exactly like the first IBM XT and its Clones. The clusters of Raspbery PI is also bringing on the Arm age in the server room. This is so much history repeating its not funny.

  83. wolfgang wrote, “if customers insist on products other than conventional pc, then microsoft and intel and major suppliers will supply them.”

    M$ and Intel are not supplying the smartphones that people demand. They’ve lost mind-share in a big way. These are consumers, mainly, and they want what their friends have, only the latest model. Therefor, M$ and Intel will have an uphill battle to gain even a tiny share. Further, folks are demanding PCs that are more like their smartphones: small, cheap, tidy, portable, useful… Wintel has no way to do that because they’ve burdened the legacy PC with a lot of useless crap that doesn’t fit the bill: fans, way too much power consumption, malware, re-re-reboots, EULAs, etc. I’m a dinosaur in having legacy PCs (GNU/Linux ones) in almost every room. The new generation wants to walk around with their PC. The legacy PC just doesn’t work for people any longer. The legacy PC is just a tiny niche for consumers. Businesses will catch on sooner or later. That’s why M$ is trying to diversify like mad, hoping to ride a new gravy train. No matter how hard M$ and Intel try, they will never again have a near monopoly as they did for decades.

  84. wolfgang says:

    …oiaohm not convinced…

    but what else is new? if customers insist on products other than conventional pc, then microsoft and intel and major suppliers will supply them. people who have money to invest will invest in new products that are expected to sell. that is plenty of choice.

    beggars who want foss products at volume prices created by windows volume have no real say in the timing of new products or even if they will ever be made.

    not hard to make product change, but expensive to pay for and risk is that profits never appear.

  85. oiaohm says:

    wolfgang you always want to bring it back to normal money. Linux will make it harder for the computer store model. Opening in a declining market is not the wises thing. PC market seams like it has peaked and is now reducing. At the moment you would be better to open a phone and tablet store with PC on the side. So at least you are in a growth market.

    The issue is with the Windows PC is worse than most can dream. Arm is expecting to gain over 10 percent of the Server market in under 12 months. Intel the biggest x86 maker is felling the pressure like delaying brining fab 42 online.

    The vendor lock in Windows has caused by being x86 limited may be it worse weakness.

    The problem with following others like sheep is something where you are following the sheep to is the slaughter house. Profitable for now but long term survival not assured.

    Of course we are seeing governments and other big parties invest in FOSS. Governments have the most power to set the standard of what OS their population uses.

    None of the writing on the wall is saying Microsoft has a great future.

  86. wolfgang says:

    …people not have choice…

    not true. people who in computer business and risk own money have complete choice of what they want to do. they can follow others and make and sell windows computers or they can take pogson advice and go into linux computer business and hope he is right about pent-up demand. companies have done both for many years now and companies that jumped in with both feet are mostly not in good business anymore.
    some companies still sticking toe in water here and there. some even put whole foot in in poor countries where 50 euro big deal.
    but choice go like old saying about golden rule. he who has the gold sets the rule.
    pogson should mortgage house and barn and new lawnmower and open linux computer shop in rustic Canada town. either make mint or not, then we see for sure.

  87. oiaohm wrote, “Linux supports a broader set of hardware so you have more hardware vendor competition so driving the physical hardware price down as well.”

    A bigger effect is that GNU/Linux allows hardware to be kept longer. XP was an exception but folks who promote Wintel like units to be replaced every 3-4 years. That’s just plain silly. With GNU/Linux on should be able to keep a unit running until it fails and cannot be repaired economically. In schools, I found the performance of GNU/Linux thin clients was superior to thick clients because of file-caching on the server and the users’ processes running on newer hardware on the server. Thus a thin client can last ~10 years unless screen resolution needs to be increased sooner. 1024×768 was popular from 1992 until just recently and many thin clients could exceed that resolution. With Wintel, M$ and Intel encouraged frequent upgrades of hardware almost always involving buying new sets of licences, a total waste.

  88. oiaohm says:

    Robert Pogson its not only cost of software. Its cost of hardware. Linux supports a broader set of hardware so you have more hardware vendor competition so driving the physical hardware price down as well.

    Reality your general PC is over priced in all segments that make it up.

  89. wolfgang wrote, “if people really had problem like you say, then not sell hundreds of millions of new ones each year.”

    If people really had choice of OS when they bought PCs retail, they might sell twice as many because the cost of the software would be way down.

  90. oiaohm says:

    wolfgang really you need to read my posts. I have never said I exactly want Windows gone. I am the one who pushes for the 90/10 splits.

    Really its not a major problem to me long term if Windows remains. Why it be will inside servers. I will not have the issues of infections spread across 100s of machines.

    Stop attempting to put words in other peoples mouth. Wolfgang you get it wrong. I have not seen a windows machine not get infected with something at some point. Some are creative like use a

    Windows registery cleaner on Windows 7/8 that is not compatible and result in not able to change network settings with no instructions from Microsoft to fix bar reinstall. Reason the cleaner deleted the registry permissions. Windows contains no way to reapply all the registry permissions back to factory.

    Linux repairing and auditing after user or malware damage is simpler. Heck even OS X is simpler. OS X you can reset all core permissions to factory if you know what you are doing.

  91. wolfgang says:

    …oiaohm say not his problem…

    oiaohm want to see windows gone, though, and reason why it not gone is his problem and will be his problem next week, month, year, decade, and beyond. oiaohm substitute cute hair slicing for sales work. he too flaky to know he is.

  92. wolfgang says:

    …pogson never seen pc working with windows…

    maybe should look. when whole rest of world crazy and you last ok person and only you know that, think how need to recalculate normal. if people really had problem like you say, then not sell hundreds of millions of new ones each year.

  93. oiaohm says:

    wolfgang yes Libreoffice download page checks your browser user agent string then points you to sometimes the correct solution for what you are using. Yes libreoffice does have 64 bit Linux and OS X packages. There are no 64 bit windows packages. What you download from Libreoffice is 32 bit MSI files for Windows or in other words the package format you feed into group policy deployment. So if you system is 64 bit windows it technically wrong.

    wolfgang just because you don’t know what a package is that is not our problem.

    Linux has the highest population of 64 bit applications able to take full advantage out a 64 bit cpu OS X has the second in number. Windows comes in dead last in this particular metric.

  94. wolfgang wrote, “free is not motivation since work fine with windows”

    I have never seen a PC work fine with that other OS: malware, slowing down, re-re-reboots… They may look pretty NIB, but I have seen too many totally broken (software, not hardware). The only GNU/Linux boxes I have ever seen broken like that were broken by geeks deliberately to try something new. The typical Debian release is solid on Day One and thereafter. There’s a reason ~100million PCs have been switched to GNU/Linux. It works and that other OS doesn’t one way or another. So, the only reasons to stay with that other OS are you are a masochist who likes shaky IT and wasting time and money.

  95. wolfgang says:

    …pogson say obstacle gone…

    not agree. obstacle is changing os from windows that came with computer to another, even windows. lots of work if don’t know how. nobody but geek want to do that unless big payday. open offices, original or spicy, are free is not motivation since work fine with windows. if user want free ride, no need to change anything.

    pogson talk of packages, too. wolfgang check libre site and see big green download button. click once and job done. never even ask about linux, must somehow check and see that wolrgang have windows 8 computer and pick right job. install like any other program.

  96. dougman wrote, “Only takes me a few mins to update or install LibreOffice.”

    It’s still a bit cumbersome, more than a dozen packages. Thank Goodness for globbing… Some of the packages are about languages but the main application comes in a huge bunch of packages. It would be painful if one had to install each one but dpkg -i * works wonders.

  97. dougman says:

    Only takes me a few mins to update or install LibreOffice. M$ Office is a bloated pig and takes 1/2 hour!!

  98. wolfgang tries to con us with “open office work perfect with windows and 99% of open office users using it with windows already”

    OK, the first part is correct but the 99% is way off. LibreOffice counts about 21million unique IP addresses downloading the product, mostly that other OS because distros have their own repositories. Almost every distro of GNU/Linux uses LibreOffice, several times more PCs than the users of that other OS. If the office suite is the reason folks are using that other OS, that obstacle is gone when folks use LibreOffice. The same reasoning applies to OpenOffice.org.

  99. wolfgang says:

    …pogson chuckle…

    not true at all. pogson get red in face and crabby. plain as day.

    good idea for schools to use cheap stuff I think too. save taxpayer money if not wasted on public school.

    Schweiz school meister say forced to buy proprietary office, but someone pulling his leg. open office work perfect with windows and 99% of open office users using it with windows already. so cannot rely on wisdom there.

Leave a Reply