Notice: Undefined offset: 1 in /usr/local/src/wordpress/wp-content/themes/montezuma/includes/parse_php.php on line 79

The Immutable Filesystem — Vastly Reducing Attack Surface

I am working on a novel approach to security by generalizing read-only filesystems in Linux, to prevent malware from modifying files or establishing a foothold in systems.  As far as I know this is a first in InfoSec.  When the whole operating system is set to read-only no changes can be made where it matters, even if the malefactor manages to get root.  This will greatly reduce the attack surface of any system without interfering with its ability to function.

bigdataIn almost all OS installs, system-critical directories and files are set to read-write, at least by root.  Why?  There’s no need.  I say this is unnecessary and leaves open many avenues of attack which are exploited time after time.  An immutable filesystem will eliminate almost all attacks by most of the hacker community, and even makes things difficult for nation-state malefactors.  Oh sure, it’s still possible to inject malware into memory, but that exists anyway and is a different problem.  An old axiom is ‘Good Security is Layering‘.  The Perfect, should not be made the enemy of the Good — and those who make it so end up with nothing.

When it is time to update the OS, it takes two clicks in the Host machine to enable read-write in the guest for updates.  Then two clicks again to make read-only.  And these can be simulated in Ansible.  This is vastly more practical than the recommendation of the SANS Institute (4 October, 2018):  “(5) It is time to abandon the convenient but dangerously permissive default access control rule of read/write/execute in favor of restrictive read/execute-only“…  meaning setting individual OS files this way.

In sum my approach means mounting /home on a writeable partition, and moving /tmp and /var to tmpfs (IOW RAM).  The rest of the virtual machine’s OS is set to read-only by this virtual machine’s KVM host using virt-manager, a method completely out of reach for the virtual machine.  Under the Least Privilege and Least Access principles, access to databases, APIs and other resources must be set to the minimum necessary of course — chances are, access to APIs can be read-only as well, and APIs must allow only strictly structured calls and no others.

This article assumes RHEL 7.5 or newer;  adjustments can be made for most other *nices.  

To start let’s pivot /home to a partition different from the OS.  With OS operations at a very quiet time (for example at night) prepare and mount the new /home target:

# parted -a optimal /dev/vdb unit % mkpart primary 0 100%
# mkfs.xfs /dev/vdb1
# mount /dev/vdb1 /mnt

Copy over /home, preserving all permissions:

# rsync --archive --progress /home/* /mnt

Notice the syntax /home/*.  If you simply specify /home rsync will copy home into /mnt, giving /mnt/home/etc.

Make sure you have current backups, then reboot, and at grub where you choose the kernel

e    (for 'edit')

Scroll down to the actual kernel line (starts with ‘linux’) and to the end of that line add:

 single    (space before 'single')

Then to continue boot {Ctrl}x.  You’re now booted in single-user mode where you can do many things not possible in multi-user mode.

# rm -Rf /home/*    (be careful of the syntax)

Edit /etc/fstab and add:

/dev/vdb1  /home     xfs     rw,noexec,nodev,nosuid     0 0

# Push /tmp to tmpfs in RAM
tmpfs      /tmp      tmpfs    rw,size=25%,nr_inodes=5k,noexec,nodev,nosuid,mode=1700     0 0

Note that /tmp size should be about 25% of the virtual machine’s total RAM.  With tmpfs it’s not consumed until needed.  All the other mount options are for various security reasons.  Save and:

# reboot

Time to check our handiwork so far.

# mount |grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime,seclabel,size=512000k,nr_inodes=5120,mode=1700)
# df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                         984M     0  984M   0% /dev
tmpfs                            997M     0  997M   0% /dev/shm
tmpfs                            997M  8.6M  989M   1% /run
tmpfs                            997M     0  997M   0% /sys/fs/cgroup
/dev/mapper/centos_pegasus-root  8.0G  2.9G  5.2G  36% /
/dev/vda1                       1014M  133M  882M  14% /boot
/dev/vdb1                        2.0G  437M  1.6G  22% /home
tmpfs                            500M     0  500M   0% /run/var
tmpfs                            250M     0  250M   0% /tmp

Yep, /tmp looks good.  By default tmpfs reserves 50% of RAM, although no tmpfs mount uses any space until it’s needed, and releases it when done – well-mannered.  25% should be enough for /tmp in most circumstances.  What about /home?

# mount |grep /home
/dev/vdb1 on /home type xfs (rw,nosuid,nodev,noexec,relatime,seclabel,attr2,inode64,noquota)

Now we’re cooking with gas.{/ancient reference}  Our /home is on a separate partition that we can leave read/write, but nothing can be executed or SUIDed there.

Before we can set the root filesystem to read-only there are two issues to deal with now.  /etc/mtab is dynamically populated with current mounts, at system initialization by various systemd functions.  But if we want to set /etc to read-only we need to simulate that somehow.  My solution is to symlink /etc/mtab to /proc/mounts.

# rm /etc/mtab
# ln -s /proc/mounts /etc/mtab

The two files contain mostly the same information with mostly the same syntax;  from the point of view of applications that read them, they’re compatible.  Since /proc/mounts reflects the current kernel information, it is always up-to-date, and the mount and umount commands won’t touch them.

The downside of /proc/mounts compared with /etc/mtab is that it shows information (especially mount options) as printed back by the kernel, rather than the exact parameters passed to the mount command.  So a little information is lost.  That information is rarely useful though.

At last we come to /var.  I was warned by the denizens of IRC to not put /var in tmpfs, but they couldn’t explain why.  ‘Something bad will happen.’  My question was What, pray, What, will happen, but I was met with the heavy weight of silence.  I ultimately concluded that… (with a debt to Steve Jobs),
They Were Doing It Wrong©.

Overlay mounts were only recently added to the kernel with version 3.18.  It’s simple enough in concept, but mind-bending to actually practice. (like Linux email…)  To do an overlay mount of /var it means that the ‘lowerdir‘ is /var on disk, which will be read-only.  And the ‘upperdir‘ will be a layer above it which is writeable.  /var has to be writeable or all kinds of apps will misbehave, and our writeable layer will be in tmpfs.  System applications will read from the lowerdir and write to the upperdir.  This of course means the upperdir goes away with each reboot so if you want to preserve logs, ship them to a log server. (which is good practice anyway, for forensic reasons)  It also means though that we can always reboot to a lean and pristine /var and access it in RAM lightning-fast.

In progress…

,'after' => '

') )