just to get anal about this...I'm not aware of any convincing evidence that any human can hear into the ultrasonic region, which of course is why they call it ultrasonic. Not many of us, save perhaps young girls, can hear above 18K, and the world above 20K is just not audible to anyone of normal orgins

. Males above 20-ish and above rarely hear above 18K.
Frequencies in near ultra-sonic region are not usually identified as having a pitch either -- energy in this range contributes to tonal character and timre via harmonics but typically not pitch. Thus the report of an audible chirp that changes pitch is clear evidence to me that this not at all ultrasonic energy detected by extraordinarily gifted hearing ability --it is sonic energy in the ordinary audible range.
Baring some spurious mechanical resonance within the camera itself there are only two possibilities:
1. intermodulation distortion -- the combination of two or more frequencies above 20K, producing sum and difference frequencies, can result in sonic artifacts that can be heard (Otherwise, audiophiles and would not care about filtering energy above 20K). For example, if the mechanical nuances of the sensor and the ultrasonic mechanism that drives it were to produce two tones at 40K and 25K (both inaudible), then the combination would produce an artifact at 15K which would be audible. I suspect the spectral components of the actual sensor cleaning energy is quite broad (not just one frequency), not to mention dynamic, so this is entirely possible.
2. The Canon system produces energy directly in the audible region. I find this unlikely