Securing Software Is Not Enough

“Not only does our attack show that practical, deterministic Rowhammer attacks are a real threat for billions of mobile users, but it is also the first effort to show that Rowhammer is even possible at all (and reliably exploitable) on any platform other than x86 and with a much more limited software feature set than existing solutions. Moreover, we demonstrated that several devices from different vendors are vulnerable to Rowhammer. To conclude, our research shows that practical large-scale Rowhammer attacks are a serious threat and while the response to the Rowhammer bug has been relatively slow from vendors, we hope our work will accelerate mitigation efforts both in industry and academia.”
 
See Using Rowhammer bitflips to root Android phones is now a thing
A lot of energy has been invested in securing software but it’s useless unless the underlying hardware is secure. We’ve seen attacks on networks, even chips and their firmware but nothing works if memory is the lever by which malicious software can take over the world. We’re there folks.

RowHammer results from the convenient and efficient layout of memory chips with address lines selecting groups of bits to read/write. It allows reaching bits which should not be available by flipping bits which are. “Cross-talk” has long been a problem with phones, radios and now, memories of computers. There’s not much that can be done as we shrink the size of dies and bring more bits in the rows/columns of chips. The whole layout of chips must be redesigned with individual bits of words not being adjacent or in the same row/column. That’s a huge waste of the benefits of Moore’s Law but I don’t see any other way of doing it. It means we have to trade off performance and price against security.

I can think of two solutions. There may be more.

  1. Use bit number as part of the address and apply scrambling generators to access each bit from a different part of the chip. This will increase cost/complexity and probably slow throughput by an order of magnitude.
  2. Increase the size of memories by a couple of orders of magnitude so that each risky process is physically isolated from others. This would of course require operating systems to be completely redesigned going beyond Address Space Randomization to deliberate segregation. In the extreme each computer would have to be a super-computer with various processes running on nodes in a network with one node not being allowed to access the physical memory of another node. Imagine an order of magnitude increase in price with a step backwards in performance.

Sigh. No good deed goes unpunished. Inventing yet more efficient computers has invented yet more efficient vulnerabilities. Effectively securing computers throws away a decade or two of advances.

About Robert Pogson

I am a retired teacher in Canada. I taught in the subject areas where I have worked for almost forty years: maths, physics, chemistry and computers. I love hunting, fishing, picking berries and mushrooms, too.
This entry was posted in technology and tagged , , , . Bookmark the permalink.

16 Responses to Securing Software Is Not Enough

  1. oiaohm says:

    http://www.gossamer-threads.com/lists/linux/kernel/2554222
    There might be a kernel level work around to rowhammer. Ok slow the system down slightly intentionally. Rowhammer does depend on writing ram repeating quickly enough and that can be prevented from being possible by the operating system kernel.

    Hardware side we still don’t understand why row-hammer fully happens to design mitigations against all possible cases of row-hammer and the fact that the diagnostic takes hours to run is still a major problem.

    Do note the date Oct 27, 2016. Developers are still working out how Rowhammer works and how it can be mitigated and attempting mitigations OS side. Hardware developers are working on their side to make it totally impossible.

    Of course this means hardware effected by rowhammer may only need a software update to be safe. This brings us back the the class problem of how to deploy updates safely.

    Lot of hardware defects can be worked around by OS kernel. There are some that cannot be fixed kernel side but it appears rowhammer is not one of those.

  2. oiaohm says:

    Badly packages applications exist.
    Yes dougman that includes the crap Microsoft ships with windows that cannot be removed when its defective.

    Thanks for reminding me.

  3. dougman says:

    Badly packages applications exist. Depending how built Yes it would be better at long last has agreed to sit down and work

  4. oiaohm says:

    It’s an undesirable consequence of Moore’s Law. As memory cells become smaller, the charges needed to alter their logic-level becomes smaller.
    If it was only Moore Law it would track to the nm of the ram produced. Rowhammer happens form 40nm on. But you have 14nm ram that does not suffer from Rowhammer at all even that it contains rows. So smaller memory cells is not the only factor. So there is some other factors. Issue is finding out what the other factors are to be able to design ram rowhammer free.

    http://www.cvedetails.com/product/23546/Microsoft-Windows-Server-2012.html?vendor_id=26
    DrLoser the reality is no common OS is doing that great in CVE counts.

    Also you have to take into account that the CVE numbers you were looking at is for everything containing a Linux kernel in 2016 that reported an issue. So you have new CVE for horible out date things.
    http://www.cvedetails.com/cve/CVE-2016-3139/ This one the kernel was released in 2014 and if you had applied updates. Also something to be aware of here this watcom bug would not be reported as a Microsoft Windows CVE because it would be in a vendor driver.

    Looking at CVE numbers and making out something is insanely hard against Linux DrLoser. Linux kernel does show high because of the number of drivers bundled.

    36 escalation issues is not exactly that bad when you wake up windows server 2012 had 52 escalation issues in 2016 only in the parts Microsoft provides and this is a single product not like the Linux kernel CVE report.

    http://www.cvedetails.com/version-list/33/47/1/Linux-Linux-Kernel.html
    Also Linux kernel starts looking way different when you start looking at the CVE count effecting individual versions of the Linux kernel.

    So don’t though stones when standing in a glass house. OS X is not much better either. So in 2016 compared to the competition Linux kernel did reasonable but it could do better.

    This is one of the problems idiots against Linux always quote the Linux Kernel generic CVE as something to say Linux security is low. Something to consider is new bugs found in Linux kernel 2.0 from 1996 that were fixed in 1997 but there is some device out there with a Linux kernel 2.0 and fault happened in 2016 is reported as a 2016 Linux Kernel CVE bug.

    So grain of salt time if you took every version of windows from Windows 95 to current day 2016 release totalled up every bug in 2016 found do you really think this would be that truthful how bad a current updated windows machine really is.
    That is exactly what the thing DrLoser pointed to is for Linux.

  5. DrLoser wrote, “you could always invest some of your self-administered pension fund in patenting your ridiculous solution”.

    Nonsense, you cannot patent the design of memories from the ’70s in 2016. There’s lots of prior art. I have a stock of 1X16K dynamic RAMs that I never got around to installing… I can assure you they don’t suffer “Row Hammer” because they don’t have rows…

  6. DrLoser wrote, “when do you expect the Cello to be remotely useful”.

    Well, it was only marginally useful back when it was announced. Now they are putting much more powerful processors in smartphones so it’s only a matter of time before a better product comes to market. I expect something will appear this year as the big guys are buying direct. We individuals/consumers are not the top priority.

  7. DrLoser says:

    Meanwhile, the Linux kernel isn’t doing too well on CVE vulnerabilities in 2016. Thirty six privilege escalation issues, for example.

    Never mind. Focus on the hardware. Ignore the crappy OS. And while we are focussing on the hardware, Robert, exactly when do you expect the Cello to be remotely useful?

    Because most people don’t care about theoretical attacks, particularly in the home. Generally speaking, they pay for performance with an acceptable level of security.

    Does the Cello perform?

  8. DrLoser says:

    A very simple solution is to use off-chip addressing and RAM organized as 1 bit X 4G rows.

    Very simple, yet stupid … to quote Rowan and Martin, which is about your level of understanding when it comes to hardware.
    I’m not going to waste my time on this specious and stupid “solution.”
    I am merely going to point out that Intel (horrors!) and various Arm fabs (hooray!) employ a very large number of hugely qualified electronic engineers to solve this sort of problem.
    I hardly think they need ignorant stupid advice from an unqualified eejit in Manitoba.
    Then again, you could always invest some of your self-administered pension fund in patenting your ridiculous solution …

  9. oiaohm wrote, “Long term fix is work out exactly what is causing rowhammer and develop a proper QA process to prevent rowhammer getting to customers”.

    It’s an undesirable consequence of Moore’s Law. As memory cells become smaller, the charges needed to alter their logic-level becomes smaller. We are at the point where sustained activity on the chip can disturb the capacitively stored charge representing the bit rather than just sensing it for readout. Such effects have been around for ages. I remember in the 1960s teenagers with time on their hands would dial some phone number that never answered. A “chat” line existed between the “busy” signals. That was just a matter of crosstalk on adjacent conductors reaching that number on the switching racks. Increase the frequencies a gazillion times and decrease the spacing and capacitance and anything is possible.

    The typical design for memories that has evolved since the days of magnetic cores has been to place bits or words in rows and columns and layers. Address bits are used to select individual bits or words but the various signals that are clocked at ~1gHz pass through lines that are very close together.

    A very simple solution is to use off-chip addressing and RAM organized as 1 bit X 4G rows. Then rows are so short there can be no crosstalk. Of course that throws away a couple of steps of Moore’s Law as extra chips are needed. Another solution would be to have no rows at all, but that would make addressing tedious and time-consuming as a bunch of gates would have to settle before readout. Rows are a convenience, but for reliability we may have to do away with them. Increasing the gate-count for layers of decoding all over the array would really increase power-consumption too. Maybe we will have to have something like RAID-1 memory with different addressing/accessing methods for each mirror. Presumably one should not be able to hammer several different types of cells with a single technique in real time.

  10. ram says:

    Error Correcting coding is built into almost all modern memories, motherboards, and processor chips. The error correcting coding also almost always includes the address lines as well as the memory lines. The codes are based on Galois theory. These techniques and devices have been around for decades.

  11. oiaohm says:

    If ECC is required to ensure the integrity of RAM, then manufacturers should be obligated to integrare it in all new memory installed in all new devices, if not retrofit old ones.
    kurkosdr its not that simple. ECC cannot be retrofitted. The hardware has to have the ECC circuits to start off with to support ECC. ECC due to performing extra checks on every memory operation eat more power.

    Rowhammer is not inside spec unless the ram is not properly powered. Also you can have the event where you have two identical models of ram sticks made in the same factory on the same day in the same batch and one suffers from Rowhammer and the other is totally not effected by Rowhammer no matter what you do. This is the reason why people researching this fault are starting to publish memory test tools to detect it to collect more data to attempt to location why these chips are out of spec.

    Do note what I said that current design ECC is not designed to cope with Rowhammer. So ECC has to go back to the design table and be redesigned to cope with the event its being run on a system with ram chips suffering from Rowhammer.

    Yes ECC stops the exploit and causes a system crash instead. Exploit and system crash are both not wanted outcomes. Remember ECC is only stopping exploit is not stopping perform harm and other nasty side Row-hammer. Application hits a ECC error OS can terminate it. ECC is not designed to 100 percent transparently recover because its presumed that ECC errors should be rare in the design of ECC. Rowhammer says you can get millions of errors not something ECC is designed to cope with.

    Do remember that some of the devices out there will be perfectly fine even without ECC. Some of the tested and failed devices are perfect Rowhammer proof while they have a full battery but suffer from Rowhammer when the battery gets low not because the ram itself is defective but the devices power management is defective so the ram is getting under powered refreshing. This has also been a cause of some ram without rowhammer to randomly have bits of change state causing stability problems in some of the devices as well.

    kurkosdr lets say I said to you that you had to write a program to detect ram with Rowhammer defect for factory usage to make sure ram leaving factory will not suffer from Rowhammer fault if used correctly. What is the first thing you need.

    The first thing you need is list of conditions that trigger Rowhammer. Software side is quite simple to understand the hardware side not so much.
    1) Underpowered???? Does not apply to all ram tested.
    2) Overpowered??? Does not apply to all ram tested
    3) Incorrectly clocked??? Does not apply to all ram tested.
    4) Heat??? Does not apply to all ram tested.
    5) Lack of heat??? Does not apply to all ram tested.
    6) Age??? Totally unknown if this is a factor but is suspected
    7) some form of minor production fault.

    The issue here is all 7 are most likely causes. So why rowhammer is happening is most likely many different hardware design faults and manufacturing errors. So those 7 are most likely the tip of a very large iceberg.

    Victor van der Veen has published the tool for android most so more data on the problem can be collected to attempt to work out what in heck is going on. So allowing a QA process to be developed to detect the Rowhammer issue before it lands in the hands of consumers. If a proper diagnostic process for Rowhammer detection could be made this would mean ECC hack around to prevent exploit would not be required either.

    You have two android phones that look absolutely identical same model same circuit board same day of production in fact 1 serial number apart and 1 has Rowhammer and the other one does not. The one that does not we don’t know if that is age or just lucky and got a good ram chip or power management chip so will never suffer from Rowhammer.

    What we need it a rowhammer diagnostic added to memtest86 and the like and android phones in fact come with a memory test.

    The reality here is currently there are a lot of devices that have been made that don’t have the rowhammer fault. The problem is hardware makers are not absolutely sure how they made the devices without rowhammer or made the devices with rowhammer so they don’t have a solid QA process and they are not going to have a solid QA process any time soon. You have to remember it was not long ago that you would buy new ram run a memory diagnostic and have odds that the new ram would not work due to having failed bits was over 20 percent. So for years you would get new ram and run memtest86 or equal this allowed hardware makers to work out exactly what was causing the bit failures as much with the stack of returns they got exposed to different things.

    There is no simple fix to rowhammer. The short term fix is we as end users need ram diagnostics for it so we :
    1 know what devices are defective
    2 we can report this back to makers
    3 use our countries consumer laws to get replacements for defective supply
    Long term fix is work out exactly what is causing rowhammer and develop a proper QA process to prevent rowhammer getting to customers that requires more information that manufactures currently have.

    Its very easy to say to hardware makers do better. Some faults that rowhammer when they turn up hardware makers cannot do better until they get more data and collecting data and processing that data on why some defect is happening takes time before it can be properly fixed.

    So the best thing that hardware can do at this stage is be truthful with customers and provide diagnostics to find the duds.

  12. kurkosdr says:

    @ohioham

    If ECC is required to ensure the integrity of RAM, then manufacturers should be obligated to integrare it in all new memory installed in all new devices, if not retrofit old ones.

    But you see, this is where disclaimers and lawyers get into play. By arbitrarily claiming that broken behaviour such as vulnerability to rowhammer-style attacks is “within spec”, they can get away with it. Cost cutting for the win! Smartphones with 3GB main memory and other impressive specs that get pwned by a simple app, hooray!

  13. oiaohm says:

    The failure of warranties and success of disclaimers and slimey lawyers. Shouldn’t hardware companies make sure the problem never happened by changing the refresh timings?
    kurkosdr hardware companies cannot do this they don’t have the data exactly why the ram is failing. It has been found increase or decreasing or leaving at stock refresh timings all result in different rowhammer out comes. Lets say we have a few makes of ram.

    Make 1) stock no row hammer. Increase or decrease clock rowhammer appears.
    Make 2) stock row hammer. Increase or decrease clock rowhammer disappears.
    So on and so on. Until you have covered every single possibility. This is what the current ram rowhammer looks like.

    Add in that some makes of ram inside the same make one chip will suffer from row hammer and the next chip will not suffer from row hammer. This leaves a big question why. Is this a production quality issue or is this age. If rowhammer turns out to be age of ram linked then there is a big problem. Rowhammer might not have any simple fix other than adding replaceable ram and replacing ram every so often.

    To make a problem never happen again you have to know why it happening in the first place. Problem is ram makers are working flat out attempting to understand row hammer and until they do its basically impossible to understand if we have or have not fixed it for good.

    Lets just say the rowhammer bug is looking like a down right random bad luck generator at this stage because we don’t understand enough to predict what chips are going to stuffer from it.

  14. oiaohm says:

    Robert Pogson there is a third way to block attacks like row hammer fully checksum the blocks of ram and be able to handle mass failure of checksum.
    https://googleprojectzero.blogspot.com.au/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

    Google project zero tested rowhammer in 2015 fairly much just having ECC ram makes doing a rowhammer attack impossible to be successful as privilege exploit. Of course encrypted ram would make that harder.

    Google demoed rooting of phone in 2015 using rowhammer and also demoed solution. Now this brings the big question why are new phones and computers still not using ECC ram by default?
    http://www.memcon.com/pdfs/proceedings2015/SAT104_FuturePlus.pdf
    More detailed testing of the ECC solution found yes ECC stops rowhammer being used as a privilege exploit then it evolved to a denial of service attack.

    Issue with ECC is that it designed on the presume that ECC fails that the ram is failing permanently at that point in the stick. Now row-hammer its not ram module failing permanently at that location but the complete module in most cases being completely vulnerable. So ECC responds zone out the fail memory and as it proceeds it basically zones out all the ram then the poor OS is stuffed.

    Yes row-hammer explains why for a long time different cheep brands of ECC RAM could pass basic memory tests yet in production usage you would have ECC errors coming out of nowhere yet when taken out and tested would appear perfectly fine again.

    Row hammer is a true rock and hard place with no software fix and very limited information on quality of ram chips required. It is known than different makes of ram don’t show the row hammer fault at this stage even some of the newest highest density types. The big but is this stage will they start showing rowhammer as ram ages.

    So yes 4 way to prevent row hammer would be work out exactly why some ram suffers from row hammer and why some ram is totally resistant. Its not linked to nm of the ram either. When its worked out why mandate all ram be made this way for a while and all new generations of ram have to be tested against the problem. So slower ram size growth from now on.

    Due to the nature of row hammer it makes a very good case for replaceable ram particularly that at this stage we are not sure if ram suffering from row hammer is quality/design of production or age of ram. Ram could in fact have a serviceable life limit.

  15. kurkosdr says:

    never happened = never happened again

  16. kurkosdr says:

    The failure of warranties and success of disclaimers and slimey lawyers. Shouldn’t hardware companies make sure the problem never happened by changing the refresh timings? But nooo… Everything for performance. Resource hogs like Android need that performance…

Leave a Reply