The camera shake issue is two fold.
Imagine holding a beam of light like a lazer on two squares, one square over twice the size of the other. Imagine your hand shaking so the light is moving up and down at the same amount on each square. The movement of the light on the smaller square will cover a larger percentage of its area than it will on a large square. Your hand shake is equal, but the area of the sensor on a crop is smaller and magnifying it. Most people don't get this, distance and FOV do not matter, they are not moving your hand is.
Second your pixels are smaller and if your vibration is over a pixel width your resolution advantage drops quick.
Neat little example. A single point of light pointing at the center of a square. Now, compound the number of squares a few million fold, and instead of one beam of light, you have trillions. All shaking concurrently and synchronously all over this array of a few million squares. Camera shake is camera shake. It's going to soften the image regardless. Light that should fall onto one square is going to fall on more than one square. Acutance is going to drop off precipitously at the first tiny bit of camera shake, and after that it's a diminishing effect.
I have to hold my 5D III as steady as I have to hold my 7D to get the most crisp, sharp shot. In the field, there isn't any difference...I don't think "I can handle X amount of shake with the 5D III" or "I can shake N times more than with my 7D"...I simply hold the lens steady, as steady as humanly possible period, and burst my shots to get a good number of frames so I can pick the sharpest one. There isn't any difference in tactic here, you use FF and APS-C the same way, birds, wildlife, or otherwise.
Do you want to maximize the potential of the system, or not? That's either yes, or no. If yes, then you do everything you can to extract the absolute best out of the system. There is no difference in effort to do that regardless of format...we can't compensate for the microscopic differences in pixels when were out in the field concentrating on a bird. You AFMA with both FF and APS-C. You use IS with both FF and APS-C.
When the Nikon D800 came out, DPR had to beef up their tripods, and take extreme care to get the sharpness that the extra pixels could give. They spent a lot of extra time and effort in their testing before they learned how to get the expected resolution. Its virtually impossible for hand held images at normal shutter speeds to make use of that available 36 MP resolution. So, yes, if you want to get the full resolution that a camera is capable of, sometimes you have to adopt new tactics that were not necessary before. Those tiny photo sites could fill a 51.7 MP FF sensor, and with a long lens, almost any vibration is going to reduce resolution. That doesn't mean that images will be blurred, just that they will not be as good as they could be. I learned that quickly with my 7D, and when hand holding my camera, I doubled shutter speeds or even tripled them where possible. Then, my images really improved. I had to force the camera to use high shutter speeds, using Av turned out to be a bad idea. I believe the 7D MK II allows you the option of faster shutter speeds for a given focal length. That's a worthwhile feature for those who want to use Av or full automatic.
You are right, I do take the same care with my 5D MK III as I did with my 7D. I use faster shutter speeds than with the old 12 MP sensors because it makes a difference. With my D800, I used it the same way as my 5D MK III, and except for a few bright sunlight, high shutter speed images, there was no noticeable sharpness advantage. I did appreciate the extra DR for those bright sun low ISO images, but for me, they were the exception, not the rule, because I was shooting in extreme low light much of the time, and struggled to get sharp images with the D800.
I started to mention the D800 example, I remember when it was released and the discussions on camera stabilization.
The point made to the OP and the discussion in this thread is that for the full benefit of the crop camera and it's pixel density you have to remove vibration.
On the extreme that would be a mirror up, time delayed shot with your hand not on the camera on very sturdy tripod legs and a head. Shoot in dead calm on a solid surface also. But the crop benefit is usually debated as a focal length limited option. For wild animals time delay and mirror up are rarely going to happen. Big lenses are heavy so you have to exhaust every method to stay stable. Shutter speed is one of these things and it is lowered sooner on a crop than a FF.
For me I see the resolution advantage of the crop body as a sliding scale starting high with the tripod as described above and disappearing hand held as light goes away. Whether someone sees a benefit from it will be determined by what, when and how they are shooting.