You are making far more work out of this than it is.
First, you cannot use ETTL, it just is not what ETTL is designed for. Second, taking just the small highlighted section illuminated by the hand held 600 has masses of exposure latitude because of the way you can adjust the exposure and then blend it. That is why nobody I know brackets the in hand flash, just know that you get the rough exposure you want of anything at a certain distance; I know f8 and 10 feet @ half power gives me a second story burst, at 20 feet it is a ground floor burst.
Myself, and everybody who I know that does this, just uses manual flash mode and pops away, if you don't have the remote viewing capability just keep varying your flash to subject distance, zoom setting, and angles.
But, because this question has been bugging me and you clearly don't want to go the ND filter gel route I have come up with a work around using just the 600's.
First, put the on camera unit (ST-E3-RT or 600-EX-RT) in Group mode, second, set each group to M and a different setting such that you would want to bracket, third, set the in hand flash to Group A, take the shot with the REL option. Then change the in hand flash to the next Group B, take a shot, then change the in hand flash to the next Group C, take the shot etc etc. This will give you five different exposures from the in hand flash.