Use a 64-core PCI-E board (Made by Tilera: http://www.tilera.com/sites/default/files/productbriefs/TILEncore-Gx36_PB028-02.pdf ) with two onboard 10GB NICS in in-line mode, each core can run its own copy of a hard real-time (RTOS) BSD microkernel with a shared RDMA buffers ad/or a group of CPUS can be run in SMP or parallel mode for heavy calculations.
Pin the market data to as single core or multiple cores (e.g. one for tick data, one for completed orders). Offload all TCP (TOE, and a few dozen settings) to the NIC chips and share a revolving buffer using RDMA (this provides a lock-free mechanism with no mutexes, no context switching, and no buffer copying).
Cascade the market data into additional cores in SMP or Parallel mode (depending upon the calculation attributes) for quantitative algorithmic calculations (e.g. in-stream Kalman filters, SMC/UPF, ANNs etc.). Outbound orders on the second NIC fed by a microkernel pinned to a single core.
Configuration is also be used for ultra fast order cross