The RAW format is a lossless compressed format.
The file size gets higher at higher ISOs because there is less to compress (due to higher nosise).
No, it's because CR2 format doesn't use adaptive Huffman, but predefined tables that are optimized for common case value distribution.
If adaptive Huffman or arithmetic coding (like h.264 CABAC) was used, high ISO raw images would be significantly smaller. Unfortunately, processing cost in terms of power consumption and silicon area needed would likely be higher, especially with more complex coding schemes.
Maybe future RAW formats will have PNG or lossless JPEG style spatial predictor functions. Combine that with adaptive arithmetic coding, and file size savings would likely be significant, even halved.
But may that doesn't really make sense - file size is not really a big issue anymore. Current scheme is very reliable - flip one bit in current CR2 format, and you can recover rest of the image with just one pixel error given software that can resync to Huffman stream. More complex coding could mean larger blocks of the image become corrupted without sophisticated error recovery and correction. Otherwise one bit flip could render whole image unusable.
Besides, if file size was really an issue, Canon would probably stop embedding a thumbnail AND a full size JPEG image in every RAW file! I prefer reliability over file size any day or night.
Correction: Having taken a look at reverse engineered CR2 implementation , I have to say I was wrong. CR2 RAW compression is actually based on modified Lossless JPEG . Main differences are in data ordering, and of course that CR2 contains Bayer-filter values, not actual color components in any color space.
Sorry if I misled anyone! I think I mixed CR2 with some other camera manufacturer old RAW file format that used simple Huffman coding.
: David J. Coffin, http://cybercom.net/%7Edcoffin/dcraw/
: ITU CCITT recommendation T.81 - http://www.itu.int/rec/T-REC-T.81/e