I think stating the same thing in ISO would only serve to confuse less informed people attempting to understand the fundamentals of photography and the differences in formats.
Quite the contrary.
Image quality and noise / grain is very closely tied to the degree of absolute enlargement, which is itself very closely tied to the format.
ISO is determined in no small part by the noise / grain threshold...but that very threshold in the practical real world is dependent upon image quality, thus enlargement, thus format.
In a very real sense, one that you see hold up as a rule when comparing cameras from the same generation of technology, each larger format has roughly a one-stop ISO advantage. If you need to shoot at least at ISO 200 on a 135 format camera to get noise to levels you find acceptable, then you need to shoot at at least ISO 100 on APS-C, but you can get away with ISO 400 on medium format.
(And, yes, it doesn't scale linearly -- despite their size advantage, many medium format cameras have lousy extreme-high-ISO performance. This is partly due to there not being much demand for low-light utility in the medium format world, and the manufacturers putting their efforts into improving low-ISO performance instead. The fact that it's a much smaller market also plays a role.)
Understanding why this is so is a very basic part of photography, and essential to understanding why the different formats exist in the first place and one of the reasons why you would choose the one over the other.
While I do agree with your statement, I find your way of putting it a bit weird (at least to me). One of the primary thing that determines the noise level is the pixel size. It's the reason why some high-speed cameras (say a vision research v1210) can have no apparent noise at it's native ISO40000 (yes, 4 zeros). Thought it sacrifices resolution to get there, it only has 1MPx on a full frame sensor.
So if you compare an APS-C and a full frame sensor with both the same number of pixels with the same die technology, the full frame has a larger pixel size. This means that the full frame receives more light per pixels meaning that it needs less gain to bring it to a comparable ISO to a APS-C sensor. Bottom line, the full frame will always have a better signal/noise ratio. Usually, the APS-C sensors don't have much less pixels than the full frames so this is probably a better way to look at it (7d's 18 MPx vs 5d mkII's 21.1 MPx) so it does have a smaller pixel size.
The other way to look at it, if sensors had an equal pixel size. Kind of like putting the same ISO 200 film on a medium format camera and a FF camera. The FF sensor has less MPx or resolution in film terms. Where as the medium format camera has an increased resolution but the same signal/noise ratio. Once you blow them up to large print size, you're blowing the details up as much as the noise. Bottom line, you're getting worse signal/noise ratio on the full frame and the noise will be more apparent when blowing up the image to a comparable size.
When comparing camera formats, I think one should separate field of view, image quality (and noise) and depth of field as they all differ in different ways... though I do agree with you that it's important to consider all of it. Another thing to consider as that lenses usually aren't as sharp on corners as they are in the center so lenses might seem sharper on an APS-C sensor as you're only looking at it's best portion.