Insan-IT Search:

Thursday, September 23, 2010

Rant: GPT versus LVM

It hit medium-sized data centres approximately 4 years ago and now it's barreling down on home PCs with the force of a locomotive. Another virus? Worm? Trojan? Not at all - I'm talking about the 2TB partition size limitation for MS-DOS MBR-style partitions! Not too many have been affected but Linux users purchasing high-end machines are starting to hear the rumblings as their favorite distro's installer craps out on them. With 4TB consumer-grade drives out on the market it's only a matter of time until the rest of us are confronted with this. My first reaction was
"Damn! I won't be able to use cfdisk and will have to settle for GNU parted! I hate GNU parted!"
...but this is much more far-reaching than just affecting your choice of partition editor.

It is, of course, not the first time the industry has faced such barriers and overcome them. The 1MB RAM barrier faced by Intel's 16-bit processors (worked around with 24-bit addressing on the 80286 and finally dealt with by the fully 32-bit 80386), the 8GB partition size limitation imposed by CHS addressing (worked around with extended partitions and finally dealt with by LBA addressing), Y2K date limitation (worked around with a cutoff of 1930 instead of 1900 making it the Y2K30 limitation and resolved by a 4-digit year making it the Y9K999 limitation), the 4GB RAM barrier (worked around with PAE and resolved with 64-bit processors), and the "limit" of only 4,000,000,000 (2^32 minus reserved) 32-bit IPv4 addresses (worked around with NAT and resolved with 128-bit IPv6 which nobody's using), to name a few. As usual the pattern will be work around it and then finally resolve it with a new standard. Whether that standard is an elegant forward-looking "fresh start" solution or an inelegant back(ass)wards-compatible kludge which only delays the problem is yet to be seen.

So what can we do right now? Well...we can simply create more partitions and delay the inevitable. Using MBR you can splinter your data and create 4 x 2TB monsters leaving you with a "new" theoretical limit of 8TB or, better yet, create an extended partition and have an infinite number of 2TB slices at your disposal. In my opinion, that's hardly ideal. The "industry" response to this is GPT - which creates a sloppy kludge on top of the old MBR structure and comes with it's own "new" limitations. Essentially it takes the MBR and "reserves" space using a partition of type 0xEE then allowing it to create it's own odd (each partition has a "type" so that vendors can fight over the essentially meaningless UUID for each "type" - a concept carried over from the old MBR-style partitions), arbitrary (The partition tables at sector 34? Why 34?), limited (128 partitions to a maximum size of 9.4ZB, or 9,400,000TB), fixed-form structure (there is, and will only be a V1.0 of GPT).

Well...we get 9ZB partitions...it's supported in the Linux kernel...problem solved, right? OK...yes in a barely and kludgy backwards-compatible way but solved, right? Well let's think here. What would be considered forward looking and potentially a superior solution to GPT? It's best if we look at some challenges and caveats of the MBR-style partition system:
  • Larger block sizes improve performance and increase storage capacity. Does it support block sizes other than the old DOS-style 512-bytes?
  • Sometimes partitions run out of space and we need more - it would be nice just to buy another disk and have more. Does it allow partitions to be combined seamlessly into logically contiguous blocks? On a live system in real time? If they aren't side-by-side? On different disks? On different machines?
  • We like to get the most from our hardware and losing data sucks. Does it allow for partition striping to gain performance or mirroring to gain reliability?
  • Working with live filesystems prevents certain important activities such as backups. Does it allow for instant copies of partitions on a live system to ease backups, imaging? Or for safe trials of such activities as filesystem performance tweaking or high-risk repair tools?
The answer to all of these problems being resolved by GPT is a resounding "NO". It was this comment which really got me thinking: doesn't LVM as implemented in Linux solve all of these problems?
As a matter of fact Linux LVM is the penultimate evolution of the whole "partitioning" solution. While LVM is currently limited to "only" 8 Exabytes (0.008 Zetabytes) this is a small price to pay for all that other functionality. Additionally, and unlike GPT, LVM standards are also developed in a revisable manner so it would be trivial to increase this in future revisions. Why was GPT even designed when a superior solution already existed? Why was the wheel reinvented...as a triangle?

In the name of simplicity I'm hoping LVM will one day overtake GPT as a straightforward unified method to divide disk storage and give us all the flexibility we deserve. The main hurdle to this happening is a certain antagonist commonly named in the industry: Microsoft. I haven't been tallying, but this will be approximately the 15,000th time the industry has had to settle for mediocrity and inferiority in order to be compatible with the mediocrity and inferiority of their products and general incompetence. Since it's open-source, it would technically be easier for Microsoft (and to a lesser degree other vendors) to support LVM than to "create" a solution to this problem. The code is there for them to freely use and implement without the threat of getting sued over patents and/or copyrights. (As an aside, this is Microsoft's dirty little secret of course, since the case is the exact opposite for Linux, standards bodies, and other tech companies supporting their technologies and standards. After 15 years in this industry one sees Microsoft not choose external standards or technologies and instead create their own, not so much because of classic NIH, but rather because they seem to like kicking small puppies with any mighty newfound "Intellectual Property" powers they come to possess).

Personally, I don't need the added complexity of 2 impotent partitioning schemes (and all the related problems that pop up from time to time) and would like the possibility of simply having LVM manage my storage. Before this can happen though, it has one major hurdle to overcome: LVM is not supported by "standard" BIOS since it's not part of the EFI "standard" to which we can expect almost all current and future BIOSs to adhere. This means it is not accessible nor bootable from the BIOS and either MBR or GPT is required at least until a real OS of some kind is loaded.

Will this be the final result? GPT is here to stay because the "industry" has decided? In my opinion, yes and the two will coexist. However, there are many smart data centre administrators and savvy techies out there who will add some weight by choosing a superior configuration for their machines and making it work (custom BIOSs and booting from other media come immediately to mind). Second, booting from LVM becomes a greater possibility when you consider that computer BIOSs are no longer a fixed, burnt-in, permanence. They can be flashed and superior open alternatives are available to give Pheonix, AMI, Byosoft the kick-in-the-ass they've sorely needed for decades! Consider this a challenge to the coreboot community: help support superior solutions and make BIOSs LVM-aware!

The way I see it, MBR and the 2TB partition size limit is not a new limit at all but an expansion. I now can have up to 2TB of bootstrap code in which to get a running OS to use my LVM volumes on the rest of the drive ;- )

I'm curious if it's possible to create an MBR-style partition for /boot, an 0xEE partition to tell old-school tools to "piss off", and an LVM volume embedded directly into the hdd block device right after the /boot partition ends.

In the meantime is anybody else using LVM instead of GPT? I'd like to hear about different approaches people have taken and how they've panned out!

Best of Insan-IT