Highest Rated Comments


wellonchompy38 karma

Australia seems to be doing pretty well at cybernetics. Cochlear implants, and now retinal, and that girl who can play the violin with her robot arms. Woo, Cyborg Aussies!

wellonchompy13 karma

Thanks for the AMA, I spend all of my work hours working out how to wring the highest performance out of your work.

I'm a Linux engineer involved in very low-latency systems, where fast single-threaded performance and massive core counts are critical to what I do. We've just moved our platform from AMD to Intel Sandy Bridge-based Xeons after the disaster of Bulldozer, and have been very pleasantly surprised with the performance of the Sandy Bridge Xeons. The E5-2690 is one amazing chip, with 8 cores at 2.9 GHz that happily burst to 3.8 GHz for the fastest single-threaded performance I've ever measured in a general purpose CPU (although we've had FPGAs go faster).

Using AMD systems, we used to be able to comfortably run 48 discrete cores in a single system (4x 12-core chips), which was fantastic for the tasks we run, where latency of IPC between NUMA cores is still orders of magnitudes lower than for network IPC. However, Intel still don't have anything on the market that approaches this core density at the cost or speed of the 2-year-old AMD chips, so I have a couple of questions:

  1. What's the reason that Xeon chips have a low core count compared to AMD? 8 cores per socket feels a bit restrictive when the ARM SoC in my phone already has 4.
  2. I know that SMP is tricky, and NUMA must be hard to do well (no thanks to operating system schedulers being obtuse about it), but is there a technological reason that we don't see the fastest cores available in 4-socket (or more) setups? Like I said earlier, I love the E5-2690, but the 4-socket versions only go up to E5-4650 at 2.7 GHz, with only 3.3 GHz turbo.
  3. I guess this is probably more to do with marketing and SKUs, but why do the 4-socket versions of chips cost twice as much as the 2-socket versions? Related to the previous question, are they physically different, or are they artificially locked to 2-socket setups for marketing reasons? With AMD, we'd get exactly the same Opteron chip whether it was for a 1, 2 or 4-socket setup.