This continues a CPU architecture discussion from the RISC-V thread:
fixed width pipelines i suspect are the actual bottle neck. flexible width pipelines with assignable lanes are probably next gen. or mixed width pipelines, get rid of the concept of a core. dispatch unit selection of pipeline allocation from instruction stream, them allows dispatch to utilize all of the lanes like threads, i think CPUs already do something like this with cache and int units but iirc the pipelines are fixed width per core. would require significant os and compiler and language changes afaict
There’s an advanced RISC-V CPU core written in Chisel called BOOM on GitHub. Maybe that would be a useful starting point.