Author: bronco

Operating the H3 in a reliable way (sane clockspeeds)

[Copy link]
 Author| Published in 2015-11-19 19:10:28 | Show all floors
barquerito replied at 2015-11-19 15:00
You learn something new every day!

:-)

While you can specify clockspeeds for all cores by adjusting cpu0 this won't help since it's about dvfs (dynamic voltage frequency scaling).

If you clock components higher they need more voltage to operate reliable. The more voltage you feed, the hotter the stuff gets. And this increase in temperature is not linear but you increase voltage a little and the whole thing gets hot a lot more. Since I'm an electric NOOB better read this as an explanation: https://olimex.wordpress.com/201 ... -way/#comment-20385

That's what dvfs is for: You define operating points with combinations of core voltage and clockspeeds that are known to work reliable. Less voltage: less heat, longer lifespan.

The fex files common on H3 based Orange Pis seem to define just 2 operating points: 1.53GHz @ 1.5V and 1.2GHz @ 1.3V. Since I don't own an H3 device I can not tell whether that means that if the H3 is clocked down to eg. 480 MHz it still operates at 1.3V or not (at 480 MHz it could also work reliable at a much lower voltage like 1.04V for example). This would be interesting to test since it might avoid thermal issues and increases lifespan.

But this cpufreq approach has a drawback. Compare with the 'race to idle' concept outlined here: http://linux-sunxi.org/Cpufreq#The_.22performance.22_governor

When my Orange Pi PC arrives I will dig a bit deeper. Currently I prepare an RPi-Monitor template for the H3 to be able to visualize relationships.
 Author| Published in 2015-11-19 19:24:46 | Show all floors
Edited by bronco at 2015-11-19 19:35

Thx to loboris and whitebox! They provided a few benchmark numbers for different CPU/DRAM clockspeeds.

Prerequisits: [sudo apt-get] install sysbench p7zip-full mbw

sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4
7za b
mbw -t0 256
mbw -t1 256
mbw -t2 256

4 tests: The 1st and 2nd with 1.53 GHz cpufreq and DRAM clocked with 672 MHz, in the left row with heatsink and fan and without thermal throttling, in the 2nd row without a heatsink and with throttling occuring, in the 3rd and 4th row no throttling occured, one time with 1.2 GHz cpufreq and 600 MHz DRAM, the right row showing 1008 MHz cpufreq and 480 MHz DRAM:

  1.                             1.53/672    1.53/672     1.2/600     1.0/480
  2. sysbench (less is better):    123.3       183.7       154.5       184.6
  3. 7-zip (more is better):        2774        1924        2125        1795
  4. mbw -t0 (more is better):       396         407         344         264
  5. mbw -t1 (more is better):       756         769         500         415
  6. mbw -t2 (more is better):       718         633         627         459
Copy code


Since I don't want to use a fan with any 'energy efficient ARM device' it's already obvious how to adjust dfvs values and DRAM clockspeeds: maximum 1008/480 MHz with lower voltage. The sysbench results of the overclocked system (2nd row: 1.53 GHz CPU and 672 MHz DRAM) without heatsink and the one with sane values are the same. And since the sort of applications I use do not depend that much on memory bandwidth the decrese there is fine and worth the efforts (no heat issues any more)

3

threads

44

posts

1236

credits

Gold member

Rank: 6Rank: 6

credits
1236
Published in 2015-11-19 23:55:03 | Show all floors
Since I don't want to use a fan with any 'energy efficient ARM device' i


A 12V CPU fan draws about .1 amp or 1.2 watts. If you plug it into the 5V pin on the OP PC you will draw .5 watts. For that .5 watts I have lowered my cpu temp 40 to 50 degrees F. You use an additional 1/2 watt but you gain stability and part longevity.
Just saying.
P
 Author| Published in 2015-11-20 00:11:20 | Show all floors
Edited by bronco at 2015-11-20 00:14
phreon replied at 2015-11-19 23:55
You use an additional 1/2 watt but you gain stability and part longevity

Unfortunately you draw the wrong conclusions. Better heat dissipation does NOT lead to "part longevity" and if it's necessary to have a fan to increase stability then there's something really wrong.
The H3 doesn't overheat 'by design'. It's due to default settings on all Orange Pi OS images being 'broken by design'. Your devices run at insane high clockspeeds and voltages. This is what

  • negatively impacts the SoC's lifespan
  • corrupts data from time to time
  • leads to stability issues without active cooling
  • increases consumption needed
  • increases temperatures


By adding a fan you adjust symptoms but you don't solve the problem. The latter would mean: Accept that the H3 is an '1 GHz SoC', clock the DRAM with 480 MHz and the CPU cores with 1200 MHz max. And by defining additional operating points in the so called fex file the SoC will lower the voltage needed down to 0.96V when idle instead of 1.3V as it's now. You will lower both temperatures and consumption dramatically while loosing also a bit 'peak performance'. But who needs the latter? The H3 is an ultra-slow Cortex-A7 design. If you want something that's both fast and energy efficient you would've to choose Cortex-A72 or A35 these days. But driving an A7 design the way it is now on the H3 based Orange Pis is neither energy efficient nor performant.

 Author| Published in 2015-11-20 15:49:20 | Show all floors
Edited by bronco at 2015-11-21 18:50

To get a clue why the H3 is so slow I compared with one popular older Allwinner SoC: The A20 used on Orange Pi and Orange Pi Mini.

The A20 has only 2 Cortex-A7 cores instead of 4. The clockspeeds used were conservative (960 MHz cpufreq, 432 MHz DRAM). In the left row loboris' setup with heatsink and fan (1.53GHz/672MHz), in the middle row a more sane approach (1008MHz/480MHz) contributed by whitebox and on the right the Banana Pi with A20:

  1.                          H3 overclocked     H3         A20
  2. sysbench (less is better):    123.3       184.6        371
  3. 7-zip (more is better):        2774        1795        924
  4. mbw -t0 (more is better):       396         264        305
  5. mbw -t1 (more is better):       756         415        590
  6. mbw -t2 (more is better):       718         459        586
Copy code


If you use the sysbench scores to get a rough 'per core' comparison

  1. 123.3 * 1.536   = 189.388
  2. 184.6 * 1.008   = 186.076
  3. 371 / 2 * 0.960 = 178.080
Copy code


Please remember: less is better, therefore the A7 cores in the H3 perform slower at the same clockspeed than compared to the A20. And if we have a look at the mbw scores maybe there we find the culprit. The A20's RAM config was 432 MHz while the H3's was 672 and 480 instead. Since we're talking about DDR RAM (Double data rate) I would believe on the H3 the multiplicated values are used. So in reality it's not 672/480 MHz but 336/240 MHz instead and the H3's memory controller really sucks)

But there might be another explanation for low memory throughput: It might make a difference which display settings have been used while testing and "headless vs. GUI": http://linux-sunxi.org/Optimizin ... ution_graphics_mode (to be confirmed -- I'm looking forward to give this a try when my Orange Pi PC arrives)
 Author| Published in 2015-11-21 18:31:17 | Show all floors
Edited by bronco at 2015-11-21 18:34

I had a look into Allwinner's H3 SDK today:

In the tools/pack/chips/sun8iw7p1/configs/ folder there are a couple of settings available but they all are identical more or less. As an example:

h3-dolphin-p1.fex: http://pastebin.com/WgpwUyyA
h3-dolphin-perf.fex: http://pastebin.com/dtjpmDKs

They share the following settings and the only noticeable difference is pmuic_type 2 vs. 0 (pmuic_type:0:none, 1:gpio, 2:i2c -- the Orange Pi's SY8106A is connected via I2C therefore 2 applies)

  1. ; extremity_freq(Hz): cpu extremity frequency when run benckmark or demo apk
  2. ;                     1536MHz@1500mV with radiator, 1296MHz@1340mV without radiator
  3. ; max_freq: <b>cpu maximum frequency, based on Hz, can not be more than 1200MHz</b>
  4. ; min_freq: cpu minimum frequency, based on Hz, can not be less than 60MHz
  5. ;
  6. ; LV_count: count of LV_freq/LV_volt, must be < 16
  7. ;
  8. ; LV1: core vdd is 1.50v if cpu frequency is (1296Mhz,  1536Mhz]
  9. ; LV2: core vdd is 1.34v if cpu frequency is (1200Mhz,  1296Mhz]
  10. ; LV3: core vdd is 1.32v if cpu frequency is (1008Mhz,  1200Mhz]
  11. ; LV4: core vdd is 1.20v if cpu frequency is (816Mhz,   1008Mhz]
  12. ; LV5: core vdd is 1.10v if cpu frequency is (648Mhz,    816Mhz]
  13. ; LV6: core vdd is 1.04v if cpu frequency is (0Mhz,      648Mhz]
  14. ; LV7: core vdd is 1.04v if cpu frequency is (0Mhz,      648Mhz]
  15. ; LV8: core vdd is 1.04v if cpu frequency is (0Mhz,      648Mhz]

  16. boot_clock      = 1008
  17. dram_clk        = 576
  18. ;extremity_freq = 1344000000
  19. max_freq        = 1200000000
  20. min_freq        = 480000000
  21. boot_freq       = 1008000000
  22. LV_count        = 8
  23. LV1_freq        = 1200000000
  24. LV1_volt        = 1300
  25. LV2_freq        = 1008000000
  26. LV2_volt        = 1200
  27. [no more operating points defined]
Copy code

The fex files used with the H3 based Orange Pis differ as follows (taken from orange_pi_pc.fex):

  1. boot_clock      = 1536
  2. dram_clk        = 672
  3. extremity_freq  = 1536000000
  4. max_freq        = 1536000000
  5. min_freq        = 480000000
  6. LV_count        = 8
  7. LV1_freq        = 1536000000
  8. LV1_volt        = 1500
  9. LV2_freq        = 1200000000
  10. LV2_volt        = 1300
  11. [no more operating points defined]
Copy code

Based on this it's obvious:

  • H3's recommended maximum frequency is 1.200 MHz
  • Anything exceeding that is for OTT box manufacturers to fool customers (to provide higher Antutu scores)
  • The 672 MHz DRAM frequency as well as the 1.53 GHz used with Orange Pis can be called overclocking by default
  • These settings lead to unnecessarily high consumption and temperatures

The H3 is just another incarnation of the same old boring ultra-slow Cortex-A7 design Allwinner uses since years combined with a Mali400MP2 GPU. Neither the A20 (dual-core A7 with Mali) nor the A31s (quad-core A7 with PowerVR) suffered from the heat problems the forums here are full of. They're both manufactured in the older 40nm process whereas H3 is already 28nm. Both are able to run without any heatsink when sane dvfs settings are used unless overclocked or constantly operated under full load (BTDT).

That's what dvfs is for: Defining different operating points to let the SoC being fed with less voltage when it's idle. There's no need for heatsinks and fan when you define dvfs settings correctly unless you really want to use an overclocked chip that even consumes too much energy and overheats when idle (that's the result of the dvfs table entries containing only 2 operating points and being both on the upper limit or let's better say exceeding the recommended limit already).

For now I'm both done and convinced that the H3 is a wonderful SoC that doesn't suffer from heat problems 'by design'. Fortunately these problems are 'handmade'.

When my Orange Pi PCs arrive (I ordered two to be able to fry one -- I'll try to feed the SY8106A with 12V since according to the datasheet it's possible to provide up to 18V) I'll get back to you with an H3 RPi-Monitor template, conservative settings (based on these settings as a first try) and measurements.

1

threads

115

posts

709

credits

Senior member

Rank: 4

credits
709
Published in 2015-11-21 20:36:11 | Show all floors
Isn't dram in the OPI rated for 1600MT (so 800Mhz) ?

So its really memory controller, that is overclocked here, not the dram itself.

1

threads

115

posts

709

credits

Senior member

Rank: 4

credits
709
Published in 2015-11-21 20:43:30 | Show all floors
Edited by hojnikb at 2015-11-21 20:44

Btw, could it be possible to define a "turbo boost" type of deal, that would only clock one or two cores at above 1.2Ghz, but only for a short while...That way you would still have some extra performance, but would probobly not heat as much.
 Author| Published in 2015-11-21 21:43:40 | Show all floors
Edited by bronco at 2015-11-21 21:52
hojnikb replied at 2015-11-21 20:36
Isn't dram in the OPI rated for 1600MT (so 800Mhz) ?

Correct, this type of DRAM is used: http://linux-sunxi.org/DDR3#K4B4G1646Q-HYK0

Regarding memory controller and 'turbo boost' and so on... you buy Allwinner chips not since you want performance. You buy them because they're dirt-cheap. Especially SoCs with aging Cortex-A7 cores (rather sooner than later the consumer demand for these type of SoCs will drop since consumers then all need '64 bit' the same way they demand now 'octa core' and 'quad core' even when the SoC will be slower than a dual-core machine -- it's 'marketing for morons' that completely drives this market segment)

Especially the H3 is dirt-cheap and enables OTT box manufacturers to produce dirt-cheap TV boxes (why not tablets also? Because there's no PMU with the H3, therefore no charging capabilities, that's why all attempts to use H3 based Orange Pis for mobile applications are just weird, better use the A20 based instead, the A20 is superiour to the H3 in terms of features and maybe also singe-core performance). The H3 contains the well known outdated Mali400MP2 GPU (therefore we see now things happen here that happend over at LeMaker's forums last year with the A20) and also an integrated 100 MBit Ethernet PHY further decreasing costs to build a system around the H3.

There's no such thing like turbo mode. You can adjust some clock speeds (and have to take care since some components share the same clock sources so by increasing some value here the clockspeed somewhere else might actually decrease since another divider will be automagically chosen and so on) and you should always keep an eye on consumption (unfortunately not that easy with the H3 since unlike other Allwinner SoCs that feature a PMU that can be queried through sysfs there's nothing comparable here -- you have to use internal temperatures without a fan as indicator how much energy you waste).

And the problem the H3 users are currently suffering from doesn't happen when the H3 is under full load. It seems idle consumption is the real problem. I'm already preparing an RPi-Monitor template for the H3 to play with. In case anyone's interested I can share it here together with short instructions. Will take you 5 minutes maximum and gives you the ability to visualize what's going on. Eg. decreasing DRAM frequency from 672 to 480 (echo 480000 >/sys/devices/platform/sunxi-ddrfreq/devfreq/sunxi-ddrfreq/cur_freq) and let RPi-Monitor plot a nice graph regarding the SoC's temperature... Just an assumption that it works that way -- still waiting for my Oranges to arrive.

BTW: It also depends on the use case you're after. Since on ultra-cheap SoCs like Allwinner's everything seems to interact with anything else you might think about running an SBC based 'server' headlessly not since less running services are better but since GPU and CPU also have to share memory bandwidth: http://linux-sunxi.org/Optimizin ... ution_graphics_mode (no idea wether that applies to the H3 as well but since one of the keys to produce dirt-cheap chips is to re-use everything instead of re-inventing the wheel I would suspect the H3 suffers from the same issue -- to be confirmed)

2

threads

11

posts

204

credits

Intermediate member

Rank: 3Rank: 3

credits
204
Published in 2015-11-21 23:29:33 | Show all floors
If I adjust the .fex file and reboot, will the changes apply?  or is the .fex file just a residual from which uImage was built?

Also, does adjusting /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq have any effect on voltages?  From above, it looks like using the .fex that ships with the device, that scaling to 1Ghz would have the same effect as scaling to 1.2Ghz.  eg.  1.3v either way.

Assuming all this is true, wouldn't it be best to create a complete correct dvfs table that includes overclocked values such as 1536, 1296 frequencies, and then adjust scaling_max_freq at boot to whatever you feel comfortable with based on needs and cooling?
You need to log in before you can reply login | Register

Points Rule

Quick reply Top Return list