ARM T604 is an upcoming mobile GPU from ARM. I remember reading slides from an ARM presentation, though I cannot find the link now, perhaps they were taken down. Anyway, here is what we know:
2. Upto 68 GFlops of compute performance. I assume this is for fp32. Exynos 5 Dual whitepaper claims 72 GFlops.
3. Barrel threaded (i.e. multiple simultaneous threads) like AMD or Nvidia
4. No SIMT! Rather, SIMD architecture. I take this to mean, the vector lanes are not predicated. So be prepared to write explicitly SIMD code.
5. Now 68 GFlops/4 core = 17 GFlops/core. Assuming 500MHz clock speed, that gives us 34 flops/cycle.
We do know that it has 2 ALUs/core so each ALU does 17 flops/cycle. Each ALU has one scalar and one (or more?) vector units. So perhaps 1 scalar, and 1 vec8 unit with MAD? or Perhaps 1 scalar and 2 vec4 units with MAD.
(If we go by the Exynos 5 Dual whitepaper, perhaps they have modified the scalar unit to also do MAD instead of just one flop/cycle.)
6. Full IEEE precision for fp32 and fp64. Very nice ARM! The full fp64 support makes me excited for this architecture for my uses. ARM has not published the fp64 speeds, but I think it will be either 1/4th or 1/8th.
7. OpenCL 1.1 Full Profile support. I hope that EVERY device that ships with this GPU comes with working OpenCL drivers and an open SDK is provided to everyone.