Cache-thrashing is the cause and usually in combination with over scheduling the processor resources (too many software threads for available hardware threads, flooding the processor cache). If the data being loaded on one thread fills a processor cache, and the other thread requests a line read, then the cache gets dumped, loaded for the second one. Then, when the first core requests a line, the cache gets dumped again and the first thread's data gets reloaded from RAM (much slower than from L2/L3 cache). The excessive bus traffic and core context switching further affects the slowdowns. LR has been guilty of that many times, I've had exports that just dragged on when I had other apps working and fighting for proc resources.
I wrote an app that has it's own image processing and experienced quite a few thrashes in multi threaded operation using the thread pool until I split the parallel operations up in a method that takes into advantage not flooding the cache when one thread loads data. It sped up dramatically.
It's monitored using the performance monitor via a set of the 'cache' counters and some of the 'processor' counters.