[I switched to computer engineering about 3/4 of the way through my CS degree, so while I have an adequate grasp of algorithmic complexity, my professional experience is all in hardware design, not software.]
Lightroom's scaling suggests to me that Adobe did the 'hard' work of making a multi-threaded RAW processor, but not the comparatively 'easy' work of spawning mutliple instances of it to achieve near-linear speed-up on embarrassingly parallel tasks. Granted, there would be some overhead for inter-process communication and they might have to serialize database access, but the heavy lifting is all in the image processing. And they don't appear to have done any work to predict users' future actions and prepare for them in advance.
Export to Disk is the only test that Puget did which exhibits reasonably linear scaling, and even then only out to 8-10 cores. I would have liked to see Puget re-run their test with two simultaneous exports of 40 images each, rather than one export of 80. Does that improve the scaling, perhaps by forcing LR to spawn a second worker thread? I haven't been able to get good data running such tests myself, but I only have 4 physical (+4 logical) cores to play with, so the fact that LR export scales reasonably linearly to that point would tend to obscure any advantage I might see from the hypothetical second worker thread.
Convert to DNG should be just as embarrassingly parallel as Export, but its scaling behavior is quite different. Speed-up from the second core is roughly linear, but the third and fourth core do very little, and beyond that there's negligible additional scaling. Conceptually, Convert is the same render-to-bitmap operation as Export, followed by encode-to-DNG rather than encode-to-JPG. However, as we can see from the fact that Convert is ~3x as fast on 1-2 cores as Export, the algorithm appears to be avoiding a lot of the work of manipulating image data that Export does. Since the disk bandwidth is quite low (tens of MB / sec maximum) in both cases, either LR is phenomenally inefficient at file access (this seems very unlikely since there's no obvious reason they wouldn't load an entire file into RAM and then flush an entire file to disk when done), or the bottleneck is elsewhere. On a system with 20MB of cache, nearly able to cache an entire RAW image from Puget's test, it seems unlikely that memory bandwidth is the limiting factor.
Generate 1:1 Previews and Generate Smart Previews both have similar scaling limits to Convert: peak performance is reached at about 4 cores, with best performance being only 2-3x as high as single-core performance.
Since there aren't any obvious resource limitations to better performance (the tasks are not CPU bound, not disk I/O bound, probably not memory bandwidth or latency bound, and don't have clear inter-process communication or coordination limits). All of which leads me to believe that LR was architected to minimize latency (it tries to process a single image as fast as it can, via a multithreaded rendering process) rather than maximize throughput (operations per hour). Though it's pure speculation on my part, I would guess that rendering is probably pipelined (eg demosaic -> user edits -> noise reduction -> sharpen - and yes, I realize that's not the order LR does those steps: this is an example of pipelining) and scaling is limited by the slowest pipeline stage and the total number of pipeline stages.
(It will be interesting when AMD's Threadripper CPUs get into users' hands to see how LR scaling changes in response to the different hardware behavior (such as markedly lower single-threaded memory bandwidth, but much higher multithreaded bandwidth; or the very different cache organization and performance characteristics) that the new platform brings.)
But despite this hypothetical focus on latency minimization, there seems to have been little to no effort at latency hiding. For example, when I'm viewing a photo in the library module, LR doesn't seem to pre-render the next and previous photos so I can 'instantly' move forward or back in the film strip. It doesn't pre-load everything it would need if I were to flip to the develop module to make some quick edits. All three of those operations are high-probability guesses as to what my next action will be, and are good targets for trading power efficiency for higher productivity.
I would like to see Lightroom make use of idle compute power so that repetitive, predictable operations are fast because LR has already anticipated and completed the thing I'm about to ask it to do.