Algorithmic adaptations to use next-generation computers closer to their potential are underway in Oil & Gas and many other fields. Instead of squeezing out flops – the traditional goal of algorithmic optimality, which once served as a reasonable proxy for all associated costs – algorithms must now squeeze synchronizations, memory, and data transfers, while extra flops on locally cached data represent only small costs in time and energy. After decades of programming model stability with bulk synchronous processing, new programming models and new algorithmic capabilities (to make forays into, e.g., inverse problems, data assimilation, and uncertainty quantification) must be co-designed with the hardware. We briefly recap the architectural constraints, then concentrate on two kernels that each occupy a large portion of all scientific computing cycles: large dense symmetric/Hermitian systems (covariances, Hamiltonians, Hessians, Schur complements) and large sparse Poisson/Helmholtz systems (solids, fluids, electromagnetism, radiation diffusion, gravitation). We examine progress in porting solvers for these kernels (e.g., fast multipole, hierarchically low rank matrices, multigrid) to the hybrid distributed-shared programming environment, including the GPU and the MIC architectures.