Security and ABI discussion
SafeStack + CET
SafeStack
SafeStack is software-only solution. Compiler and runtime magic only, all in libc - no kernel changes needed.
- Have to allocate two stacks, one shadow stack per stack. Having two stacks makes things hairier (especially setjmp()/longjmp())
Bunch of userspace work to make SafeStack work
- Anything that has threads not 1:1 to POSIX threads needs to be aware.
- Breaks some types of threading, and maybe garbage collection
- This would commit us to an ABI we'd have to support forevermore
- We would likely not want to be the only people to support this - due to ongoing support need. (e.g. in compilers, etc)
- What would be our 10-year suport plan if it was only us doing it?
Chrome uses SafeStack, but statically linked to avoid ABI maintenance etc
- theraven: "I quite like safestack, but it does need long-term support"
- GCC has implemented it, LLVM has started to support it.
Calling non-SafeStack code from SafeStack code is fine, but would lose the protection.
Intel CET
- Intel CET approach must simpler. Recent hardware feature. Needs support from kernel.
- CFI plus hardware safestack extensions
- Second stack still needed, but extra register available to use.
- CET allows you to define "safe" call/return pointers, reducing ROP risk
- Return pointers onto two stacks, trap if they are different on return. Essentially one stack is canary
- Uses PTE flags to mark stacks for access/non-access
- Additional mask to limit CFI instructions used
General
- There are lots of patches, but do we want to merge?
- Could we commit it, but not enable it?
- Could we signpost it as "this might break in future"?
What about security updates delayed due to needing to not break SafeStack?
- Likely no point enabling it unless we start shipping ports with it enabled.
There is no migration path from SafeStack to CET should CET hardware become universal.
- Questions about how or if sigreturn() trampoline would need to change
- Everything in the kernel will need to know to both allocate and deallocate the second stack when the primary stack is created/destroyed. Lifecycle issues here - how do we know when a stack stops being used?
- Validation requirements:
Try with ports/packages -> SafeStack
In Intel simulator -> CET
See Also
Missing the Point(er) paper
CPI paper: http://dslab.epfl.ch/proj/cpi/ (This link is broken)
SGX
- Brief mention of Intel SGX, but nobody has looked into it
CheriABI
- CHERI: Bounds-checking in hardware
- CheriABI system call - ??? hard ??? using Microsoft SAL
- Every pointer in run-time, compiler generated is checked
- System call layer with pointer checking:
- like compat32
- Displace SCO i386 memory layout
- Problematic system calls:
- ioctl() - Who owns buffers? Which bounds should be checked? At least ioctl provides a length. Kernel doesn't necessarily know userspace rules. Could use tags rather than pointer/length.
- Generally, anything that embeds pointers in messages is bad
- Anything taking iovec is a bit of a pain, fctrl()
- Anything that takes void* and what those arguments contain varies depending on the command is a pain
- SAL open source, desirable
- Idea of compat64 layer which translates standard userspace pointers into capability-aware pointers. Pointer protection in kernel but not userspace.
- Upstream system-call vector fixes - wrong types, etc
- more kern_foo() for sys_foo()
- Experimental memory-safe kernel and userspace
- AFL fuzzer
ASLR
- Patches are not in yet, HEAD soon, likely MFCd for 11.1.
- Some stack issues: Kernel shared page is not randomised yet, no stack gap, a few other issues still to look into
- vd.so still to do properly, rtld bits
- Believe we randomise every time something is mapped
- jemalloc needs to do something sensible. hints to mmap, but less useful for security. Chunk alignment questions
- Are we interested in KASLR? Ask kib@ how hard it would be. PIC vs non-PIC. V vs P ASLR.
- Would be nice to reduce TLB overhead like Apple do, mapping such that we can reduce page table entries
/dev/random + system calls
- Mark, Colin, Robert have a new "scalable" Fortuna-esque design
- Preliminary plan for reducing contention for per-CPU entropy generation.
- Scale out entropy collection, and generation
- Synchronization internally to distribute entropy
- Each CPU has a number of pools, switch pools globally.
- Avoid asymmetric entropy distributor
- Avoid multicore contention
- New entropy collection in UMA - now off by default
Sanitisers
- Combination of compile-time instrumentation and run-time checking
- If you want to use sanitisers with libc++ you have to have libc++ compiled with that sanitiser
- Sanitiser support in some languages
- Support costs to consider
- e.g. memory allocation/use checking
- Some are cheap at run time (e.g. undefined behaviour), others are less so.
- ubsan often compiles down to a single predict-false branch
Some have SafeStack-like ABI ????
- Ship pre-compiled versions... on the side?
- Multi-arch? We use for 32-bit, soft/hard, ...
- Scope: key libraries? All of base? Ports/packages?
- ASAN is only 2x slower. MSAN is quick, no penalty. TSAN is hard, can be 5-10x slower.
- Possible to have "ready to instrument libraries"?
- Linux KASAN includes some built-in memory allocation (of zeroes) to support runtime decision
- Up to 50% increase in binary size
- Linux KASAN checks general memory allocation, heap allocation, perhaps BSS etc
- Question: Can we shift sanitisation to LLVM IR?
- ASAN gets speed up from being inserted into IR early, so it can be optimised
- Might break some applications
- Fancy scheme discussed
- But are they better than multicrch???
Architectural security
- 8: ARMv7, ARMv8 have translated memory access instructions. Good way of preventing unintentional access to kernel memory
- 8.1: PAN (Privileged Access Never) - Kernel can't access user memory, SMAP-type functionality but better. More efficient, Doesn't need switching off/on all the time.
- 8.2: UAO (User Access Override). Treat unprivileged load/store as privileged load/store. Useful for accessor code shared between userland/kernel targets.
- Andy thinks those sound good.
- Copyin/copyout/suword ... Change to use new instructions
- Assembly only routines
- PXN: Privileged Execute Never - kernel can't execute memory owned by userspace.
ARM TrustZone
- Nothing to care about here, kernel doesn't need to know about it.