AFAIK, 1 pixel on a conventional sensor consists of 1 each of {R, G, B} sub-pixels.
Unless I am really missing something?
You are missing something. 1 pixel on a conventional CMOS or CCD sensor is one photosite, with no subpixels. Each pixel is covered by one color of the Bayer mask (R G G B). The de-mosaicing that occurs in RAW processing then uses adjacent pixels to interpolate the color for each pixel, and assign that color to the pixel. For example, take one photosite covered by a blue mask - that pixel is sensitive to blue light, and the software/firmware uses the data from the surrounding photosites to determine the actual color for that pixel. The interpolation means that some color information is lost, but the full spatial resolution is available.
Note that a similar issue comes up for the rear LCDs, where resolution is measured in 'dots'. The specs for the 5DII and 7D list LCDs with "Pixels: Approx. 920,000 dots (VGA)" although on their 5DII specs page, Canon leaves out the word 'dots' which makes it even more misleading than leaving it in. Nowhere do they tell you that dots ≠ pixels, although it's implied by 'VGA'. In fact, they count each red, blue, and green subpixel as a 'dot' so they are calculating display resolution as VGA x 3, i.e. 640 x 480 x 3 = 921,600 dots.
Where on Earth do people keep getting this idea from? That an 18MPix Foveon-like sensor is equivalent to an (18*3) MPix "conventional" sensor?
So in terms of quantity of data recorded, both are identical.
For a Foveon-type, an 18 MP sensor is still equivalent to 18 MP in terms of spatial resolution, but unlike the conventional sensor, no color information is lost because each discrete spatial element 'sees' the full visible spectrum, with no interpolation required.
The confusion comes from the manufacturers - if they produce a 10 MP (spatial resolution) Foveon-type sensor, there are actually 30 million photosites, stacked in 10 million little columns of three. So even though it's really a 10 MP sensor, the marketing folks will obviously want to call it a 30 MP sensor, because we all know that more MP is better.

As a side note, a Foveon-type sensor is just one way to achieve the effect, albeit a very practical way for a camera. In photomicroscopy, Zeiss has for many years produced a camera called the AxioCam, which uses a 1 MP CCD sensor plus 'tricks'. It can take 'standard' images at 1 MP with the Bayer mask and interpolate the colors. But one trick is to physically move the Bayer mask to make three separate exposures, so each pixel is exposed successively to R G B. Obvoiusly, not something that would work in a dSLR, but fixed specimens are amenable to sequential imaging like that (in fact, some current color microscope cameras are actually b/w cameras with a color filter wheel in front that rotates through R G B). Another trick ups the resolution - unlike current dSLRs, the CCD in the AxioCam has no microlenses, so each photosite only sees a small portion of the incoming light. So, Zeiss also moves the sensor around in sub-pixel increments to expose the photosensitive part of the pixel to different regions of the incoming light - a 2x2 array gives a 5 MP image, and a 3x3 array gives a 12 MP image. So, with those tricks a simple 1 MP sensor can generate a 12 MP image without any interpolated color! Of course, it takes 27 separate exposures to make that one image...