[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ale] fascinating data on temperature, including ATI / AMD Radeon gpu



Hi all,

I have some additional temperature data I wanted to share with you based 
on my further interaction with my systems and research.  This is long, 
but I think the info is very handy.

Supposed I asked you "what temperature can I run my cpu (or gpu) at and 
be safe?".  And you gave me a number.  You'd almost certainly be wrong.  
Suppose you asked someone else the same question and they gave you a 
number.  They'd almost certainly be wrong.  Why?  The answer is because 
EVERY cpu has a different thermal design spec and maximum operating 
temperature.  Thus, any number you get is wrong for the vast majority of 
parts.  (That doesn't mean that two parts won't coincidentally share the 
same number.)  The only way to KNOW what your cpu can take is to look up 
ITS specs from a credible source.  The best source is the manufacturer's 
website.

Here are the maximum operating temperatures, in deg C, for the 4 cpu 
parts I have on hand.

AMD Athlon II x2 250 - 74 deg C
AMD Athlon II x3 460 - 75 deg C
AMD Phenom II x4 965 - 62 deg C (That's in insanely low number.)
AMD Phenom II x6 1045T - 71 deg C

As you can see, the numbers are all over the map.  If I assume I can go 
to 75 deg, or even 90 as some will tell you, I will be frying 2 of my 4 
chips.  Some devices can take 90 deg, but not these.  There is no single 
number that works well for all parts, except maybe 50 deg C.  The good 
thing is that they will probably shut down or self throttle before 
destructing, but you still don't want to go there.  Doing that could 
wreck your OS on your HDD.

Here's where you can look up data for AMD desktop parts.

http://products.amd.com/pages/desktopcpuresult.aspx?AspxAutoDetectCookieSupport=1

I don't have much experience with Intel chips, although my laptops have 
them.  However, these resources should help you determine the maximum 
temperature for them.

http://www.intel.com/support/processors/sb/CS-033342.htm
http://www.intel.com/support/processors/sb/CS-032341.htm
http://ark.intel.com/

So, once you know your max temperature, how do you make sure it's not 
exceeded and that your fans stay at least relatively quiet?

I tried to quickly find authoritative data on lifetime versus 
temperature.  I couldn't find much in a short time.  This article has 
some good data, but doesn't address lifetime too much.

http://www.overclock.net/t/476469/the-truth-about-temperatures-and-voltages

Having been unable to find authoritative data in the time I allocated to 
write this, I'll give you my opinion.  It is strictly that, my opinion.  
Others are free to disagree or prove me wrong.

My opinion is that any solid state component in my system should be fine 
if I stay at least 15 degrees below the maximum limits listed.  
Mechanical devices (hdd's, optical drives, floppy drives) are a whole 
other matter.

In my opinion, with proper ventilation, the PC should be able to run 
almost indefinitely at full load at Tmax - 15.  I don't believe I'm 
shortening the life substantially.  Again, I could be wrong.

Having said that, I don't max my systems out unless I have a reason, 
like mining, or video rendering, that I want to accomplish that requires 
all that horsepower.

As you may know, the cpu coolers that come with cpu's are not the 
greatest, but they can (usually) get the job done.  They typically have 
a little 3" fan on top of a heat sink.  The main problem, for me, is 
that once the fan spins up to about 5000 rpm, it makes an annoying 
whining noise.  At this point, I don't want to buy an aftermarket cooler.

So, I wanted to make sure my system didn't overheat, but also wanted it 
to be as quiet as possible.

There are certainly various utilities out there for fan and temperature 
control, but I want to mention what you might have built into your bios.

The bios for my main boards on my desktops is AMI.  It has several 
features for temperature control.  (For my laptops, I just let them do 
what they want, but I monitor the temperature.)  I have my power 
settings set for active cooling, which increases fan speed before 
throttling the processor.  I also have the processor set to throttle 
down to as little as 20% frequency when not active.

In my bios, there is a feature which I have to turn on called cpu smart 
temp, or something like that.  Once it's turned on, I can set a 
temperature target for the cpu.  The system rounds to 5 degree 
increments.  The number I put in is Tmax - 15.  So, for the Phenom II 
x4, this is 62 - 15 = 47.  This rounds to 45.  For the Phenom II x6, 
this is 71 - 15 = 56.  This rounds to 55.  Note that it's customized on 
each PC for that chip.

There is also a minimum cpu fan setting which I set to 50%.

Thus, if the cpu is idling, as it is at the moment, and it's temperature 
is 40 deg, the cpu fan will be mozying along at 50% of maximum speed or 
about 2800 rpm.  At this speed, it's relatively quiet.  If I start 
taxing the system and the temperature approaches the limits I set, the 
fan will wind up, ultimately running about 5600 rpm.  This will keep the 
system within the limits I've set, or close to them.

This is quite noisy at full cpu speed.  However, I'd rather have noisy 
and cool versus quiet and hot.

I have tested my system by stressing it with Prime95.  This is a program 
which uses the cpu up to calculate prime numbers.  If you wish, you can 
contribute to a world wide scientific effort to find the primes, but you 
don't have to.  You can just use it to test your system.  It's available 
for almost any OS, including linux.  If you'd like information on how to 
use this, you can contact me.  One cool thing is that you can turn 
individual cpu cores on and off.  So, you can partially load the system 
if you want.

http://www.mersenne.org/

For my Phenom II x6 system, which is using the stock air cooler, the cpu 
temp reaches a max of 58 degrees under full load with a maximum 
specified temperature of 71.  This is a 13 deg delta.  I have no qualms 
about running this thing full blast continuously if I have a reason to.  
I do not, however, run Prime95 all the time, so it's usually idling.

My Phenom II x4 system is another matter.  It simultaneously has a 62 
deg max temperature and a 125 W power dissipation.  Bad combination.  I 
was never able to guarantee a 15 deg delta below the max with the stock 
air cooler.  I have a corsair h70 liquid cooling unit.  It has a heat 
sink and liquid pump that fits on the cpu.  This leads to a radiator 
with 2 120 mm fans.  This cooler WILL keep the monster cool.  Under full 
load, this cpu gets to 46 deg with a maximum of 62 deg.  This is a 16 
deg delta.  Again, I have no qualms about running it full blast.

There are bios settings for the case fans as well.  It is my preference 
to have them running full speed all the time, so I set that to 100%.  I 
don't want any chance on a thermal runaway of the active components.

With a liquid cooling unit, there is a decision you have to make about 
which fan port to connect the liquid pump to and which one to connect 
the radiator fan(s) to.  I chose to connect the liquid pump to a case 
fan port, which is running at 100% all the time.  I don't want the 
liquid pump spooling down.  I don't even know if it can be spooled down.

I then connected the radiator fans to what was originally the cpu fan 
port.  This is the port associated with the smart cpu temp setting in 
the bios.  So, the radiator fans WILL spool down to 50% when the system 
is relatively dormant and cool.  When the cpu is taxed, they will spool 
up to their max just as a normal cpu fan would.

This bios also has a 'cool and quiet' function in the cpu section and a 
number of 'green power' functions which adjust the different power 
phases to be more efficient.  I turned all this on.  I don't want the PC 
shutting down or going into standby mode, but I'm fine with it doing 
things behind the scenes transparently.

I know the following has been mentioned before, but a great little 
device to monitor power consumption is the

Kill-A-Watt EZ
http://www.homedepot.com/p/P3-International-Kill-A-Watt-EZ-Meter-P4460/202196388

You want the EZ model, which is more advanced than their original 
design.  With this one, not only can you monitor instant power usage, 
but you can program in your electric rates and it will tabulate 
cumulative cost of usage over time.

I hope this info is useful and that it will help you keep your cool.

Sincerely,

Ron


On 4/21/2013 2:01 PM, Jim Kinney wrote:
> One of the fun parts of temp monitoring is when the sensors must be 
> calibrated. Most chips "know" the scale factors but some are off a 
> bit. So the driver makes the change. With Linux system, you can feed a 
> bunch scale-factor params to the start up of lm_sensors. Tyan used to 
> provide the lm_sensor data they had tested for best accuracy on their 
> boards. Not sure if other makers do or not.
>
>
> On Sun, Apr 21, 2013 at 12:38 AM, Ron Frazier (ALE) 
> <atllinuxenthinfo at techstarship.com 
> <mailto:atllinuxenthinfo at techstarship.com>> wrote:
>
>     Hi all,
>
>     The topic of monitoring temperatures in a PC comes up here
>     periodically.  As I mentioned in other threads, I've been working
>     with graphics cards on a Mint installation for cryptocurrency
>     computations.  As you may know from my previous posts, I've always
>     wanted to keep an eye on the status of my systems.  In the process
>     of working with this project, I've discovered a number of
>     interesting pieces of information that I thought I'd share.
>
>     Take a look at this image:
>
>     https://dl.dropboxusercontent.com/u/9879631/sensors-sample1.png
>
>     This shows a part of my screen on my Mint system.  Note my Gnome
>     panel at the top with a temperature monitor on it.  This is the
>     hardware monitor widget that is available in Gnome.  However, when
>     I installed the ATI / AMD graphics drivers, the sensor system was
>     no longer able to monitor the cpu.  After a bit of googling, I was
>     directed to lm-sensors.  Many of you are already aware of that.  I
>     tried this command.
>
>     --> sudo apt-get install lm-sensors
>
>     I found that it was already installed.
>
>     I then found and issued these two commands to reinitialize the system.
>
>     --> sudo sensors-detect
>
>     I accepted the defaults here then told it to save the changes.
>
>     --> sudo service module-init-tools start
>
>     I think that allowed the changes to take effect without a reboot.
>
>     This allowed the sensor system to work again, and my panel widgets
>     to read both the cpu temperature and the hard drive temperatures
>     as shown in the image.
>
>     You can use this command to read the sensors once in a terminal
>     window.
>
>     --> sensors
>
>     This command will read the sensors every few seconds and display
>     the results continuously.
>
>     --> watch sensors
>
>     I searched for a while to find a utility to read the gpu
>     temperatures.  I found nothing for a while.  Then I discovered
>     that it's built into the ATI / AMD driver.  I don't know how to do
>     this with nvidia cards.
>
>     The following command will read the clock speed and load on the
>     first gpu.
>
>     --> aticonfig --adapter=0 --od-getclocks
>
>     The following command will read and display the results continuously.
>
>     --> watch aticonfig --adapter=0 --od-getclocks
>
>     The following command will read the temperature of the first gpu.
>
>     --> aticonfig --adapter=0 --odgt
>
>     The following command will read and display the results continuously.
>
>     --> watch aticonfig --adapter=0 --odgt
>
>     Once I found this out, I modified my mining program to add a
>     temperature status window for each gpu so I could keep an eye on
>     the temperature.  This script file shows how I did it.
>
>     https://dl.dropboxusercontent.com/u/9879631/start-miners
>
>     If you look at these images, I also discovered something very
>     interesting.  The first one is the same as the one mentioned
>     above, including the temperature readings of the GPU's on my Mint
>     machine.  The second is an image of the temperature readings of
>     the GPU's on my Windows machine.
>
>     https://dl.dropboxusercontent.com/u/9879631/sensors-sample1.png
>     https://dl.dropboxusercontent.com/u/9879631/sensors-sample2.png
>
>     All the gpu's are being run at close to 100% load, and the cases
>     of both computers are well ventilated with multiple fans.
>
>     Look at the Miner 1 temperature window in image 1.  This is an MSI
>     7850 gpu running in the Mint machine.  It's running at 73 deg C.
>
>     Now, look at the right hand window in image 2.  This is an
>     IDENTICAL MSI 7850 gpu running in the Windows machine.  It's
>     running at 62 deg C.
>
>     Like I said, they're identical cards running in almost identical
>     conditions.  So why is one running 11 degrees hotter than the other.
>
>     This was puzzling me for a while but I think I've figured it out.
>
>     In the Linux machine, the MSI card is in the TOP one in the
>     chassis.  That means its intake fan is right next to the 2nd gpu,
>     with only about 1/8" of space between.  So, it's air flow is very
>     restricted.  That's the card that's running hotter.
>
>     In the Windows machine, the MSI card is the SECOND one in the
>     chassis.  It has several inches of air gap to the next object.
>      It's the one that is running cooler.
>
>     Now look at each image and compare the readings for each card
>     within the same computer.
>
>     In image 1, the Mint machine, Miner 1, the top card, is at 73 deg
>     C.  Miner 2, the bottom card, is at 57 deg C.
>
>     In image 2, the Windows machine, the left window is an Asus 7850
>     card, and is the top card.  It's at 75 deg C.  The right window,
>     the MSI card, is in the bottom slot.  It's running at 62 deg C.
>
>     So, in one case, the top card is running 16 degrees hotter.  In
>     the other case, the top card is running 13 degrees hotter.
>
>     Based on this, I am convinced that any gpu or other card with it's
>     own fan on the side will run substantially hotter than its
>     baseline temperature if it's next to another card.
>
>     I'm not quite sure what to do about it.  I think 75 deg C is OK,
>     but not great.  For what it's worth, I think my AMD cpu's are
>     rated at about 67 deg C.  Apparently, the gpu's have more
>     tolerance.  You can see in image 2 that the fans on the gpu's in
>     the Windows system are only running at about 40% of their max,
>     assuming that GPU-Z is reading them right.  So, maybe the card is
>     not too unhappy.  But, it may mean the card would be pushed over
>     its thermal limits much faster if a case fan fails, or if the room
>     ambient temperature rises too much.
>
>     Anyway, I found this fascinating.  I guess I'll just have to keep
>     a close eye on any PCI-E cards with fans which are jammed up
>     against other cards.
>
>     PS I think I was monitoring the wrong temperature for CPU on my
>     desktop machine for years.  The MSI motherboards have a 2 digit
>     led display on the board which monitors post codes and then
>     temperature once the machine is running.  I was monitoring the
>     sensor that matched that reading.  When I ran the AMD Overdrive
>     utility, it came up with a different, lower, number for CPU
>     temperature, so I started monitoring that instead.  I don't know
>     now exactly which temperature that the motherboard display is
>     monitoring.
>
>     PPS I took some of the text in this email from the Linux machine
>     to the Windows machine to write the email.  When I tried to open
>     it up in notepad, I just got one long line of text with no breaks,
>     since Windows has different line breaks.  However, I found out
>     that I could open it in Wordpad and it worked OK.  Then, I could
>     copy it into this email.
>
>     Let me know what your experiences have been monitoring and
>     controlling temperature.
>
>     Hope this is helpful.
>
>     Sincerely,
>
>     Ron
>
>
>
>
> -- 
> James P. Kinney III
> ////
> ////Every time you stop a school, you will have to build a jail. What 
> you gain at one end you lose at the other. It's like feeding a dog on 
> his own tail. It won't fatten the dog.
> - Speech 11/23/1900 Mark Twain
> ////
> http://electjimkinney.org
> http://heretothereideas.blogspot.com/
> ////
>


-- 

(PS - If you email me and don't get a quick response, you might want to
call on the phone.  I get about 300 emails per day from alternate energy
mailing lists and such.  I don't always see new email messages very quickly.)

Ron Frazier
770-205-9422 (O)   Leave a message.
linuxdude AT techstarship.com
Litecoin: LZzAJu9rZEWzALxDhAHnWLRvybVAVgwTh3
Bitcoin: 15s3aLVsxm8EuQvT8gUDw3RWqvuY9hPGUU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20130424/35e07379/attachment-0001.html>