I’ll start the discussion with the question of
“What is the state of the art for Unified Architecture programming”
We’ve been seeing the release of hardware from established players AMD / Intel / Nvidia (not startups) all show products with a combination of CPU and GPUs on a single chip.
Programming wise the best performance option has been Bend[0] but there may be proprietary software hiding
[0] GitHub - HigherOrderCO/Bend: A massively parallel, high-level programming language