Architecture – Page 3 – Searching for divine code

Some ideas are as follows:

1. Touch responsiveness is an objectively measurable quantity. I think high speed cameras can play a very important role in this field.
Some good initial work has been reported by Tech Report.
It is unfortunately increasingly common to hear “Benchmarks don’t matter” and then some semi-coherent rant about user experience and “smooth” UIs. I think all it means is that the writer had no idea how to measure the touch responsiveness 😛

2. Application launch times: Application launch times can again be measured objectively. For slower apps, you can use a stopwatch and for small fast-launching apps, you can again use a high speed camera.

3. Web browsing benchmarks need to go beyond sunspider and browsermark. It is important to show the web page load times of REAL webpages. For reproducible results, webpages from say top 10 common websites (at a given date) should be copied to a local server and then those pages can be tested. Anandtech used to include such benchmarks but for some reason even they have fallen back to just using the meaningless synthetics. Even in synthetics, the test coverage needs to be increased. For Javascript, perhaps tests like the Mozilla Kraken or the new Octane suite should be looked at.

4. Proper application benchmarks need to be more common instead of synthetics. For example, Photoshop Touch can perhaps be used much as Photoshop benchmarks are now common on the desktop.

5. Synthetic benchmarks are poorly written and poorly understood. For example, Linpack Android version seems to be a poorly coded benchmark. The megaflops reported from the benchmark are far off the capabilities of the chips tested. I am looking at making a better one when I get time. For floating point tests, really what you want are accurate and separately reported measures of fp32 performance, fp64 performance and fp32-with-NEON performance.

6. Further, synthetics should properly report both single threaded and multithreaded numbers. (Linpack does do this but many other benchmarks don’t). I think single-thread performance is underestimated on mobile with most websites reporting benchmarks from multithreaded tests. However, few apps use 4 cores in any modern Android phone. And no, you don’t need 4 cores to multitask.

ARM T604 is an upcoming mobile GPU from ARM. I remember reading slides from an ARM presentation, though I cannot find the link now, perhaps they were taken down. Anyway, here is what we know:

1. Quad-core

2. Upto 68 GFlops of compute performance. I assume this is for fp32. Exynos 5 Dual whitepaper claims 72 GFlops.

3. Barrel threaded (i.e. multiple simultaneous threads) like AMD or Nvidia

4. No SIMT! Rather, SIMD architecture. I take this to mean, the vector lanes are not predicated. So be prepared to write explicitly SIMD code.

5. Now 68 GFlops/4 core = 17 GFlops/core. Assuming 500MHz clock speed, that gives us 34 flops/cycle.

We do know that it has 2 ALUs/core so each ALU does 17 flops/cycle. Each ALU has one scalar and one (or more?) vector units. So perhaps 1 scalar, and 1 vec8 unit with MAD? or Perhaps 1 scalar and 2 vec4 units with MAD.

(If we go by the Exynos 5 Dual whitepaper, perhaps they have modified the scalar unit to also do MAD instead of just one flop/cycle.)

6. Full IEEE precision for fp32 and fp64. Very nice ARM! The full fp64 support makes me excited for this architecture for my uses. ARM has not published the fp64 speeds, but I think it will be either 1/4th or 1/8th.

7. OpenCL 1.1 Full Profile support. I hope that EVERY device that ships with this GPU comes with working OpenCL drivers and an open SDK is provided to everyone.

Category: Architecture

Some thoughts on Android benchmarking

Some informed speculation about ARM T604