Difficult it is... (I had to look it up myself to understand it better)
Let's say you shoot with an MFT (crop factor 2) with a 50mm f1.4. To get the same angle of view (the same frame) with a FF you need a 100mm lens. The aperture size (f-number) is calculated by the focal-length divided by the diameter of the aperture; same diameter means same amount of light. The 50mm has a diameter of 35.7mm at f1.4 (50/1.4). For the 100mm we get an aperture of 100mm/35.7mm = 2.8. So 50mm f1.4 on a MFT equals 100mm f2.8 on a FF, in terms of amount of light. But the FF sensor has 4 times the area of an MFT sensor. This means the amount of light per area is 4 times smaller on a FF sensor. On the other hand, with the same pixel count, the light per pixel would be the same.
I believe this is not altogether correct.
Firstly, a 50mm f/1.4 on MFT is equivalent to a 100mm f/2.8 on full frame in terms of trying to produce images which are "equivalent" (at least very similar) in terms of framing and depth of field (assuming you are take photos of the same subject from the same spot, ie same perspective). However, from an "exposure" point of view, f/2.8 is still two stops "slower" than f/1.4, so if you use the same ISO, the full frame shot will be two stops dimmer than the MFT shot. (Let's leave aside any potential difference in quantum efficiency between two sensors!)
So, what does "exposure" mean? In what I've said above, I've used exposure in the way photographers often do, which is to refer to image brightness, ie light density on the sensor (light per unit area). However, exposure can also refer to the total light gathered by the sensor.
Going back to your example, from an "exposure" point of view, 50mm f/1.4 on MFT and 100m f/1.4 on full frame would give the same image brightness (at same ISO), ie the same light density on each part of the sensor. However, the full frame sensor has an area which is four times the area of the MFT sensor, so the full frame sensor gathers four times as much total light (total light exposure, if you like) than the MFT sensor. If you then view the images at the same output size (on a screen or in a print), you can think of the full frame image having four times as much total light packed into the same area you are viewing, which is important in giving full frame sensors a noise advantage.
However, if you take photos of the same subject from the same spot with a 50mm f/1.4 on MFT and a 100m f/1.4 on full frame, you will have the same perspective and angle of view, but the full frame will have shallower depth of field. That is because the distance to subject is the same in both cases, but the full frame shot is taken with a larger "aperture" (OK, entrance pupil as someone else has point out, ie apparent aperture, but let's not get too bogged down on that here). The point to remember is that although photographers often say they set their aperture to 1.4 or some other f stop number, f stop actually means
relative aperture (ie aperture relative to focal length), not physical aperture (really entrance pupil). So, when we use f/1.4 on a 50mm lens and f/1.4 on a 100mm lens, we use the same
relative aperture, but not the same physical aperture. In that case, the physical aperture when you use a 50mm lens at f/1.4 is 50mm / 1.4 = 35 mm, but when you use a 100mm lens at f/1.4 is 100mm / 1.4 = 71.43 mm.
Referring to relative aperture (f stop) allows consistent reference to image brightness (exposure in the sense of light density on the sensor) as we change focal lengths (and since it is referring to light density on the sensor, it does not change as sensor size changes). However, you need to factor in the physical aperture (entrance pupil) if you want to compare depth of field. You also need to factor in sensor size if you want to compare total light gathered (total exposure, if you like).
You might find this an interesting read:
http://www.josephjamesphotography.com/equivalence/
Hope that helps!