The only time the difference between larger pixels/larger frame and smaller actually matters from a shake standpoint is when you are NOT reach limited, and you can get about twice as close with the same focal length using the larger sensor. In that situation, then your packing far more pixels onto the subject...the larger sensor, pretty much regardless of the pixel size, is going to be easier to manage.
The statement is a bit skewed. I like these kind of statements because they are half based in reality and enough outside that only pieces of it can be disputed.
You need to check how this idea works out in the real world. The distance you need to get closer is no where near 2x as close.
More like 20% closer, maybe a bit more. I have already shot a few test shots on this one with the 7D II.
This is one real world test I have been thinking about doing a bit more. Shoot a test shot with FF at say 30' and then 6 shots at 3' intervals till I am at the same framing 1.6x out. I have already done comparisons at about 1.4x to 1.6 and the FF had much better resolution. It might be a good way to see how much benefit the crop factor really is.
In the concept you offer camera shake is a smaller part of the resolution equation.
I said 2x because that would generally normalize composition within the area of the frame as well (not exactly, but enough). No, you probably don't need to move that full distance forward to start seeing an improvement, but I try to stick to equivalency...otherwise the size of the subject in the frame/the number of pixels on the target, is entirely arbitrary. I think about OOC composition I guess...is the bird framed, in camera, how I want it to be framed? I used to crop...heavily. The only time I crop these days is to straiten or tweak composition...I'm not dropping down to 10-20% of the frame like I did the first six months I had my 7D and 100-400mm. To that end, FF is actually more than 2x the area of APS-C (2.6x, actually), so I wasn't actually stating that you should halve your distance to subject anyway.
This is all beside the point anyway, as all it takes is ONE step, or even to stand up or start standing up, and your target could flee. Birds of the heron family in particular, for example, are extremely skittish birds. If you manage to get close enough to get a decent shot at all, then smaller pixels are going to be a bigger friend to you than getting closer. I can't count how many times just seeing my head barely rise over the top of a ridge was enough to make every heron and egret in the area fly off. Hawks are similar...they can be perfectly content with you sitting there watching them if your not moving. The moment you stand up, they'll leap off their perch and fly right over your head!
(I've had this happen a few times.) Deer are content to get right up in your face so long as your sitting on the ground...stand up, they'll dance around and huff a few times, then wander off. Outside of wearing a ghillie suit, even in camo deer will spot me. If I stand up, they at the very least stand rigid and take notice. Start moving towards them, and they will often bolt.
It's not necessarily always as easy as taking a few steps closer to your target.
If you are willing to expend the greater amount of time to get closer to really make a difference with FF, you can indeed get some phenomenal shots...but not everyone has that kind of skill or time. That's why the reach argument exists in the first place. A 7D II with a 400mm or 150-600mm lens is going to get a lot more people excellent shots in fairly difficult situations with birds and wildlife than a 5D III with the same lenes. To take it to the next level, a 500mm or 600mm f/4 and some TCs so you can get 1000mm to 1200mm on FF (which would also normalize composition with APS-C at the same distance), is well beyond most people's budgets.
Now, I'm not saying you get more resolution with smaller pixels for free. It takes a lot of effort to hold and KEEP a lens steady while your shooting it. Especially longer lenses, which magnify ever smaller movements. It is possible to maximize the potential of your system, though, small pixels or large. That's my point. We can throw around numbers like 20% or 1.2x or 1.4x or whatever it is all day long. In the end...does your tactic change? Do you actually think in the field, I have 20% bigger pixels, so I can relax my hand-holding technique by 20%? No one does that. You hold yourself, and your gear, steady, as steady as humanly possible, period. You cannot account for the differences in the field...if you try, the chances of experiencing blurry shots with FF are going to be higher, as your not putting your full attention on what matters. Keeping yourself and your gear stable, as stable as you possibly can, with whatever tools are at your disposal to do so (IS, tripod, monopod, beanbags, whatever.)