Who is 'they'?
Everyone who has ever tried. I've posted unlabeled crop pairs to photo.net and dpreview, and shown 24" print samples to people in a couple photo clubs. Another dpreview member did the same exact thing, online crops, only with Nikon equipment. The results were random in every instance. Nobody could reliably tell FF from crop when push came to shove, even people that insist that there is a huge, just huge IQ difference. There is not at ISOs where noise does not come into play.
I always state that this is after
processing. Crop takes more sharpening and more LCE. When you get close to the ISO where noise differences are clearly apparent, crop takes more NR as well. Make of that what you will. But I process all of my shots any way. If two sensors are close enough that differences are gone after processing, then they are equivalent for my purposes.
Now at higher ISOs it's not a contest. Crop is very good, but FF clearly has less noise with more detail. I should also note that if higher pixel densities were applied to FF then there would be a human observable IQ difference at any ISO, at least for large prints. Put another way, the D800 does produce large prints that are better than those from any crop body. But they're also better, by about the same amount, as those from any Canon FF.
Here's what I can tell you. I took a series of paired, identically framed shots shots with the 7D and 1D X, using either the 24-105L or the 70-200/2.8L IS II, using the zoom to compensate for the effect sensor size on FoV (meaning same distance, so same framing and same perspective for each pair). I shot about a dozen paired images like that, some landscapes, some architecture, and a couple of close-up flower/plant shots with the 24-105. I processed them equivalently, then showed the paired images to my wife, scaled down to 3.7 MP (full screen on an Apple Thunderbolt Display), and asked her which she liked better. For 11 of the 12 shots, she picked the 1D X image.
Your three mistakes are as follows:
* Zooming the lens. This is likely inconsequential with these lenses, but it is a mistake none the less.
* Equivalent processing. This is a huge mistake
which invalidates your test and your results out right. You do not use identical processing with different sensors, even different sensors of the same format.
* You do not mention if the shots were unlabeled. If your wife knew which came from which before picking, the results are less than worthless, they are misleading. There is no shortage of examples of conscious and subconscious human bias, of people picking what they think they should pick. It's just what we do. Even if they were unlabeled, a strict scientist would discount your results because you knew, and there's no shortage of ways you could have consciously or subconsciously telegraphed the "correct" choice to her.
That said, if you want to post a zip archive some where with the RAWs I would love to process them optimally and run a taste test at a site/time we agree upon. I'll record all processing steps for your review, and we can play with those if you feel they are less than optimal. I doubt you made any shooting mistakes that invalidate the comparison.
Side rant: One of my pet peeves with photography is that people make mountains out of molehills. FF vs. crop, Canon vs. Nikon, lens A vs. lens B, tripod A vs. tripod B, etc. They used to do it with film A vs. film B, developer A vs. developer B, etc. During the film/digital transition it was film vs. digital all the time. I've seen people argue that there are huge, just huge differences between PS scaling in one step or multiple steps.
Most of what we debate is meaningless. It is below the threshold of human observability even in a large print (or on a modern LCD), especially in the age of digital processing. But people absolutely cling to these debates. If you think there's a real IQ difference between two pieces of equipment or two techniques, produce some big prints with both. Go to a mall. Ask people who pass by if the photos are identical or different, and if different, which is better. You will quickly learn if the difference is meaningful or not.