Security and ABI discussion

SafeStack + CET

SafeStack is software-only solution. Compiler and runtime magic only, all in libc - no kernel changes needed.
Have to allocate two stacks, one shadow stack per stack. Having two stacks makes things hairier (especially setjmp()/longjmp())
Bunch of userspace work to make SafeStack work
Anything that has threads not 1:1 to POSIX threads needs to be aware.
Breaks some types of threading, and maybe garbage collection
This would commit us to an ABI we'd have to support forevermore
We would likely not want to be the only people to support this - due to ongoing support need. (e.g. in compilers, etc)
What would be our 10-year suport plan if it was only us doing it?
Chrome uses SafeStack, but statically linked to avoid ABI maintenance etc
theraven: "I quite like safestack, but it does need long-term support"
GCC has implemented it, LLVM has started to support it.
Calling non-SafeStack code from SafeStack code is fine, but would lose the protection.

Intel CET approach must simpler. Recent hardware feature. Needs support from kernel.
CFI plus hardware safestack extensions
Second stack still needed, but extra register available to use.
CET allows you to define "safe" call/return pointers, reducing ROP risk
Return pointers onto two stacks, trap if they are different on return. Essentially one stack is canary
Uses PTE flags to mark stacks for access/non-access
Additional mask to limit CFI instructions used

There are lots of patches, but do we want to merge?
Could we commit it, but not enable it?
Could we signpost it as "this might break in future"?
What about security updates delayed due to needing to not break SafeStack?
Likely no point enabling it unless we start shipping ports with it enabled.
There is no migration path from SafeStack to CET should CET hardware become universal.
Questions about how or if sigreturn() trampoline would need to change
Everything in the kernel will need to know to both allocate and deallocate the second stack when the primary stack is created/destroyed. Lifecycle issues here - how do we know when a stack stops being used?
Validation requirements:
- Try with ports/packages -> SafeStack
- In Intel simulator -> CET

CHERI: Bounds-checking in hardware
CheriABI system call - ??? hard ??? using Microsoft SAL
Every pointer in run-time, compiler generated is checked
System call layer with pointer checking:
- like compat32
- Displace SCO i386 memory layout
Problematic system calls:
- ioctl() - Who owns buffers? Which bounds should be checked? At least ioctl provides a length. Kernel doesn't necessarily know userspace rules. Could use tags rather than pointer/length.
- Generally, anything that embeds pointers in messages is bad
- Anything taking iovec is a bit of a pain, fctrl()
- Anything that takes void* and what those arguments contain varies depending on the command is a pain
SAL open source, desirable
Idea of compat64 layer which translates standard userspace pointers into capability-aware pointers. Pointer protection in kernel but not userspace.
Upstream system-call vector fixes - wrong types, etc
more kern_foo() for sys_foo()
Experimental memory-safe kernel and userspace
AFL fuzzer

Patches are not in yet, HEAD soon, likely MFCd for 11.1.
Some stack issues: Kernel shared page is not randomised yet, no stack gap, a few other issues still to look into
vd.so still to do properly, rtld bits
Believe we randomise every time something is mapped
jemalloc needs to do something sensible. hints to mmap, but less useful for security. Chunk alignment questions
Are we interested in KASLR? Ask kib@ how hard it would be. PIC vs non-PIC. V vs P ASLR.
Would be nice to reduce TLB overhead like Apple do, mapping such that we can reduce page table entries

Combination of compile-time instrumentation and run-time checking
If you want to use sanitisers with libc++ you have to have libc++ compiled with that sanitiser
Sanitiser support in some languages
Support costs to consider
e.g. memory allocation/use checking
Some are cheap at run time (e.g. undefined behaviour), others are less so.
ubsan often compiles down to a single predict-false branch
Some have SafeStack-like ABI ????
Ship pre-compiled versions... on the side?
Multi-arch? We use for 32-bit, soft/hard, ...
Scope: key libraries? All of base? Ports/packages?
ASAN is only 2x slower. MSAN is quick, no penalty. TSAN is hard, can be 5-10x slower.
Possible to have "ready to instrument libraries"?
Linux KASAN includes some built-in memory allocation (of zeroes) to support runtime decision
Up to 50% increase in binary size
Linux KASAN checks general memory allocation, heap allocation, perhaps BSS etc
Question: Can we shift sanitisation to LLVM IR?
ASAN gets speed up from being inserted into IR early, so it can be optimised
Might break some applications
Fancy scheme discussed
But are they better than multicrch???

8: ARMv7, ARMv8 have translated memory access instructions. Good way of preventing unintentional access to kernel memory
8.1: PAN (Privileged Access Never) - Kernel can't access user memory, SMAP-type functionality but better. More efficient, Doesn't need switching off/on all the time.
8.2: UAO (User Access Override). Treat unprivileged load/store as privileged load/store. Useful for accessor code shared between userland/kernel targets.
Andy thinks those sound good.
Copyin/copyout/suword ... Change to use new instructions
Assembly only routines
PXN: Privileged Execute Never - kernel can't execute memory owned by userspace.