Jump to content
Corsair Community

Memory errors due to underpowered PSU?


freixas

Recommended Posts

I've got an RMA for my Corsair power supply, but before I ship it off, I thought I'd better run some information by this group.

 

My home-built system has given me trouble off-and-on since I built in. But a month or two ago, it started crashing in a serious way.

 

The first time I had problems, I replaced the SSD drive SATA cables with ones designed for higher speed and my problems went away for a while.

 

Other times I had problems, opened up the case and found that one of the modular power cables for the Corsair AX 750 supply wasn't tightly hooked in. I would re-insert it until I verified it was locked in place and the problems seemed to go away.

 

Then my SSD drive got corrupted and I had re-format the drive and re-install the OS. I thought perhaps the drive was bad, but the S.M.A.R.T. data didn't agree and I learned that there was no test available for SSDs. I have no idea if it works or not, but lately am inclined to think its not the source of the problem.

 

I ran Memtest86+ overnight and got no errors, so I figured the RAM was OK.

 

Recently, I found the Prime95 stress test. I ran it and got an error within a few minutes. I decided to try to figure out the cause by starting with the power supply. I swapped out my Corsair AX 750 and replaced it with the 500W supply that came with my Case. I ran the Prime95 test overnight and had no failures. Problem solved, right? I have an RMA for the power supply sitting on my desk.

 

Today, the system crashed again (it's been about a week since I swapped the power supply). I decided to run Memtest86+ again and this time it found a number of errors. It's suspicious that the memory is failing now when it didn't fail before. I'm wondering if the 500W supply is a little wimpy for my system and might be affecting the memory. How can I determine if it this is the case?

 

I'm guessing that biggest power draws are from the CPU and the GPU. I believe that both items modulate their power usage based on their needs. During Memtest86+, the GPU should not have been drawing a lot of power.

 

Anyway, I'm looking for advice. Should I still send the Corsair PSU back? I ran Prime95 again and still have no errors (with the case power supply). Should I try to identify the memory stick having problems and send it back as well? Should I check RAM voltage (somehow) and adjust it (somehow)? Is there anything else I should try?

 

I haven't overclocked my system.

Link to comment
Share on other sites

First off, you have ALOT of variables. I would put your AX 750 back in since you suspect it as being a potential failure. That way any testing you do , you have a base line to work off of. That being said, a random error in prime can be just a matter of incorrect voltage. But it would depend on the test and when the error occurred. If you could rerun that and record when and what the error was would be helpful. Same goes for memtest. There are a few things that can trigger a false positive. You can also see what the BIOS is reporting for the various voltages from your PSU. Probably not a bad idea to check those BEFORE you remove the 500w supply, so when you put the AX750 back in you can get a comparison between the two

 

The best thing to do to test your memory is make sure you load setup defaults, punch in your timings and voltages manually according to the label on the sticks, and test each stick individually. Also make sure you disable any power saving features your board has so your cpu, ram and board are getting full voltage, and also disable USB legacy support in your BIOS if it has that option. Most ASUS boards do, and that can trigger memtest errors. Let us know how they test out.

 

Then my SSD drive got corrupted and I had re-format the drive and re-install the OS. I thought perhaps the drive was bad, but the S.M.A.R.T. data didn't agree and I learned that there was no test available for SSDs. I have no idea if it works or not, but lately am inclined to think its not the source of the problem.

Any idea what caused the corruption? I ask because i have seen instances where the S.M.A.R.T. information was wrong. Making a bad drive seem healthy.

Link to comment
Share on other sites

First off, you have ALOT of variables. I would put your AX 750 back in since you suspect it as being a potential failure. That way any testing you do , you have a base line to work off of. That being said, a random error in prime can be just a matter of incorrect voltage. But it would depend on the test and when the error occurred. If you could rerun that and record when and what the error was would be helpful.

 

Switching power supplies is a bear of a job. The way my case is designed, I have to remove the motherboard in order to switch PSUs. I suppose I could connect the AX750 without actually putting it into the case?

 

Let's make sure, though, that I'm not just capturing information I already have. First, I have never tinkered with any voltage settings, although there might be some BIOS setting that indirectly affects voltages. I'm far from a voltage expert.

 

Second, the Prime95 error was something like "fatal error: received 0.5, expected 0.4". I ran Prime95 a number of times with the AX750 and without. With the AX750, the error occurred in various threads at various times. The pattern was inconsistent, but usually five minutes would be long enough to cause several threads to receive this error. Without the AX750, I have not received any Prime95 errors. Is this information useful? What would you be looking for if I re-inserted the AX750?

 

That being said, a random error in prime can be just a matter of incorrect voltage.

 

I'm interpreting this to mean an incorrect voltage setting as opposed to voltage problems in the PSU. In the latter case, of course, the PSU is broken and we're done. If it's a voltage setting problem, I haven't changed the voltage settings since swapping out the AX750, so if you could tell me what you're looking for, I could give you the settings.

 

Same goes for memtest. There are a few things that can trigger a false positive.

 

OK, this might be a good reason to switch PSUs. It would be interesting to see if the memory problems go away. With the case PSU, the memtest errors were an error in bit 0000 0001 0000 0000 (either 0 when it should have been 1 or vice-versa). After getting four of these at various addresses (0001 84D6 DD1C, 0001 44C9 DD5C, 0001 84D7 CDDC, 0001 84D7 DD5C), I then received an error in a different bit, which I didn't record.

 

You can also see what the BIOS is reporting for the various voltages from your PSU. Probably not a bad idea to check those BEFORE you remove the 500w supply, so when you put the AX750 back in you can get a comparison between the two

 

Sounds reasonable.

 

The best thing to do to test your memory is make sure you load setup defaults, punch in your timings and voltages manually according to the label on the sticks, and test each stick individually. Also make sure you disable any power saving features your board has so your cpu, ram and board are getting full voltage, and also disable USB legacy support in your BIOS if it has that option. Most ASUS boards do, and that can trigger memtest errors. Let us know how they test out.

 

OK, can do. So the information is on the stick, huh? I was looking for that!

 

Any idea what caused the corruption? I ask because i have seen instances where the S.M.A.R.T. information was wrong. Making a bad drive seem healthy.

 

No idea and I understand your concern. It's an SSD drive and I have no way of testing it. According to the manufacturers, tests that work on a hard disk (such as a surface check) shouldn't be used on an SSD. And because of the limited write cycles, running a memory test on an SSD is not a good idea.

 

Thanks for your suggestions.

Link to comment
Share on other sites

Switching power supplies is a bear of a job. The way my case is designed, I have to remove the motherboard in order to switch PSUs. I suppose I could connect the AX750 without actually putting it into the case?

 

Let's make sure, though, that I'm not just capturing information I already have. First, I have never tinkered with any voltage settings, although there might be some BIOS setting that indirectly affects voltages. I'm far from a voltage expert.

In your BIOS, under toolsShould be a tab marked hardware monitor. I believe it's in there that the BIOS reports your system voltages. There should be 4 of them i believe.

1)CPU voltage........actual

2)3.3v...................actual

3)5v......................actual

4)12v....................actual

You are allowed 5% variance between listed and reported voltages So your 12v should read between 11.4 and 12.6v. And the same goes for the other three voltages. I suggested making a note of them before you switch PSU's back, that way you can see if both PSU's are reporting close to the same thing. If the AX750's voltages are all within that 5 % variable, then it's pretty safe to say it's not the power supply.

 

Quote:

Originally Posted by peanutz94 View Post

The best thing to do to test your memory is make sure you load setup defaults, punch in your timings and voltages manually according to the label on the sticks, and test each stick individually. Also make sure you disable any power saving features your board has so your cpu, ram and board are getting full voltage, and also disable USB legacy support in your BIOS if it has that option. Most ASUS boards do, and that can trigger memtest errors. Let us know how they test out.

OK, can do. So the information is on the stick, huh? I was looking for that!

The timings and voltages for the memory should be on a sticker on the module itself.

Timings are easy. Go into your BIOS and under the AI Tweaker menu first you need to set "AI TWeake" from AUTO to MANUAL. look for DRAM TIMING CONTROL. You will want to hit enter to open that up. Now all you need to be concerned with are the first 4 lines under the heading "First Information" If your timings are for examp 7-7-7-20, then the you would highlight the first line and choose 7, second line down...7 third line down...7 and fourth line down 20. Thats it! Your sticks may be different, this was just an example.

Voltage is pretty much the same deal. Under AI Tweaker look for the tab marked DRAM buss voltage and enter the voltage printed on the lable. hit enter to save your changes and exit.

 

As far as Prime 95 goes what i was looking for would be incorrect memory or CPU voltages. Sometimes BIOS's misread or fail to hit correct voltages and sometimes just a little bump in voltage or just correct voltages can make the prime errors go away. I'll have to do some more research into the error codes you recieved. Those can be really helpful in determining faulty hardware..Other times your left with more questions than answers. :P

Link to comment
Share on other sites

I went ahead and swapped my Corsair AX750 power supply back in. The computer is on its side with the case open and the Corsair PSU sitting on top. I didn't want to have to pull out the motherboard to get the PSU inside the case. I'm sure this affects the thermal profile of the system. It seems to be staying pretty cool.

 

I ran Prime95 for over an hour. No errors.

I ran Memtest86+ for 1.5 hours. No errors.

 

I will run both tests for longer periods of time, but what the heck is going on? Some wild guesses:

  • Because the PSU is outside of the system, it was much easier to make sure the modular cables were solidly hooked up. Maybe I had a badly connected cable before?
  • There's some weird sporadic problem that once it starts, it sticks around until power is cycled. This would mean that swapping out any component would make the problem go away.

Here are the voltage and temperature readings with the Corsair PSU and the open case:

 

Voltage 0 0.89 Volts [0x6F] (CPU VCORE)

Voltage 1 1.75 Volts [0xDB] (VIN1)

Voltage 2 3.41 Volts [0xD5] (+3.3V)

Voltage 3 5.11 Volts [0xD5] (+5V)

Voltage 4 11.71 Volts [0xD2] (+12V)

Voltage 6 0.84 Volts [0x69] (VIN6)

Temperature 0 37°C (98°F) [0x25] (SYSTIN)

Temperature 1 38°C (99°F) [0x4B] (CPUTIN)

Temperature 2 35°C (95°F) [0x46] (AUXTIN)

Fan 1 1096 RPM [0x4D] (CPUFANIN0)

 

Here are the voltage and temperature readings with the case PSU just prior to putting the Corsair PSU back in (these are the readings I had when I had the memory errors):

 

Voltage 0 0.88 Volts [0x6E] (CPU VCORE)

Voltage 1 1.72 Volts [0xD7] (VIN1)

Voltage 2 3.38 Volts [0xD3] (+3.3V)

Voltage 3 5.06 Volts [0xD3] (+5V)

Voltage 4 11.60 Volts [0xD0] (+12V)

Voltage 6 0.86 Volts [0x6B] (VIN6)

Temperature 0 37°C (98°F) [0x25] (SYSTIN)

Temperature 1 37°C (98°F) [0x4A] (CPUTIN)

Temperature 2 34°C (93°F) [0x44] (AUXTIN)

Fan 1 1042 RPM [0x51] (CPUFANIN0)

 

I'll post BIOS info in a moment.

Link to comment
Share on other sites

Here are what I thought were relevant portions of the BIOS settings. Everything appears to have default values.

 

==========

Ai Tweaker

==========

 

CPU Level Up [Auto]

----------------------------------------------------------------------

AI Overclick Tuner...............[Auto]

CPU Ratio Setting................[Auto]

Intel® SpeedStep Tech......[Enabled]

Intel® TurboMode Tech..........[Enabled]

Xtreme Phase Full Power Mode.....[Auto]

DRAM Frequency...................[Auto]

QPI Frequency....................[Auto]

ASUS/3rd Party UI Priority.......[ASUS Utility]

 

OC Tuner.........................[Turbo Profile]

Start auto tuning

 

> DRAM Timing Control

..1st Information: 9-9-9-24-5-74-10-7-20-0

..2nd Information: 2N-52-53

..3rd Information: 5-5-16-10-10-10-7-6-4-7-7-4

 

CPU Differential Amplitude.......[Auto]

CPU Clock Skew...................[Auto]

 

CPU Voltage Mode.................[Offset]

..Offset Voltage.................[Auto]

..Current CPU Core Voltage.......[1.208V]

IMC Voltage......................[Auto]

..Current IMC Voltage............[1.140V]

DRAM Voltage.....................[Auto]

..Current DRAM Voltage...........[1.550V]

DRAM DATA REF Voltage on CHA.....[Auto]

DRAM DATA REF Voltage on CHB.....[Auto]

 

Load Line Calibration............[Auto]

CPU Spread Spectrum..............[Auto]

PCIE Spread Spectrum.............[Auto]

 

========

Advanced

========

 

> CPU Configuration

 

..CPU Ratio Setting..............[Auto]

..C1E Support....................[Enabled]

..Hardware Prefetcher............[Enabled]

..Adjacent Cache Line Prefetch...[Enabled]

..Max CPUID Value Limit..........[Disabled]

..Intel® Virtualization Tech...[Enabled]

..CPM TM function................[Enabled]

..Execute-Disable Bit Capability.[Enabled]

..Intel® HT Technology.........[Enabled]

..Active Processor Cores.........[All]

..A20M...........................[Disabled]

..Intel® SpeedStep Tech....[Enabled]

..Intel® TurboMode Tech........[Enabled]

..Intel® C-STATE Tech..........[Enabled]

..C State package limit setting..[Auto]

 

> Uncore Configuration

 

..Memory Remap Feature...........[Enabled]

 

> USB Configuration

 

..USB Functions..................[Enabled]

..Legacy USB Support.............[Auto]

..BIOS EHCI Hand-Off.............[Enabled]

 

=====

Power

=====

 

CPU Voltage......................[ 1.208V]

3.3V Voltage.....................[ 3.392V]

5V Voltage.......................[ 5.016V]

12V Voltage......................[12.264V]

Link to comment
Share on other sites

The voltages for both the memory and whats coming out of the power supply look great!

 

There are a few things i would change though...DRAM FREQUENCY. Take it off of AUTO and choose 1333mhz. Just highlight it with the arrow keys , hit enter to highlight, and choose the speed. Hit enter to set it. You may also have to change AI Overclock tuner to manual instead of auto before it will let you change parameters in the BIOS.

 

Now, IF you still experience problems you could try raising the IMC voltage a little bit then retest your modules. Just go slow when raising voltages. Just not to go too much too fast. Use .02v increments and retest. To do that change IMC voltage to manual from auto then the next line down key in 1.16 for you next step up in voltage.

 

Ohh, and dont for get to disable "USB LEGACY SUPPORT" in the bios before you test your modules.

 

It is very possible that you had something just not quite connected right , giving you erratic results. Let us know how it goes!

Link to comment
Share on other sites

Thank you. I'll make the changes and report back if I have problems.

 

I also need to put the Corsair PSU back in the case. This time, I'll leave all cables attached before I insert it, rather than trying to attach them while the PSU is in the case. It's really hard to reach in there.

Link to comment
Share on other sites

I re-assembled my computer, ran Prime95 for a few hours and Memtest86+ overnight. No problems.

 

The computer worked without problems for a while. I took a four-day vacation. When I turned it one and tried to use it, I had several crashes, all in quick succession. One crash even occurred in Safe Mode!

 

So I reset and started to boot once more and noticed that the POST messages showed a delay in detecting the SATA drives. Normally, it detects all drives almost instantly, but after some crashes, I've seen this behavior of taking a long time detecting the second - nth drive.

 

The solution has always been to power down instead of using the reset button. So I did. The boot was fast and I ran Prime95. No problems. I ran Memtest86+ overnight. No problems. The next day (today), I've used my computer all day without problems.

 

I'm writing this in the hopes that it maybe it provides a clue as to what is causing my crashes.

 

I use a nice program called WhoCrashed to analyze the crash mini-dumps. Here's the report. I wouldn't give much credence to the suggestions about driver problems. Drivers don't get fixed by cycling the power.

 

On Sun 7/17/2011 11:55:59 PM GMT your computer crashed

crash dump file: C:\Windows\Minidump\071711-12620-01.dmp

This was probably caused by the following module: ntoskrnl.exe (nt+0x7FD00)

Bugcheck code: 0x1A (0x41790, 0xFFFFFA8001A28FC0, 0xFFFF, 0x0)

Error: MEMORY_MANAGEMENT

file path: C:\Windows\system32\ntoskrnl.exe

product: Microsoft® Windows® Operating System

company: Microsoft Corporation

description: NT Kernel & System

Bug check description: This indicates that a severe memory management error occurred.

This might be a case of memory corruption. More often memory corruption happens because of software errors in buggy drivers, not because of faulty RAM modules.

The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.

 

 

On Sun 7/17/2011 11:55:59 PM GMT your computer crashed

crash dump file: C:\Windows\memory.dmp

This was probably caused by the following module: ntkrnlmp.exe (nt!KeBugCheckEx+0x0)

Bugcheck code: 0x1A (0x41790, 0xFFFFFA8001A28FC0, 0xFFFF, 0x0)

Error: MEMORY_MANAGEMENT

Bug check description: This indicates that a severe memory management error occurred.

This might be a case of memory corruption. More often memory corruption happens because of software errors in buggy drivers, not because of faulty RAM modules.

The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.

 

 

On Sun 7/17/2011 10:11:12 PM GMT your computer crashed

crash dump file: C:\Windows\Minidump\071711-12963-01.dmp

This was probably caused by the following module: ntfs.sys (Ntfs+0x5A88)

Bugcheck code: 0x24 (0x1904FB, 0xFFFFF8800BBFD258, 0xFFFFF8800BBFCAB0, 0xFFFFF80002C6D51F)

Error: NTFS_FILE_SYSTEM

file path: C:\Windows\system32\drivers\ntfs.sys

product: Microsoft® Windows® Operating System

company: Microsoft Corporation

description: NT File System Driver

Bug check description: This indicates a problem occurred in the NTFS file system.

The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system which cannot be identified at this time.

 

 

On Sun 7/17/2011 10:05:04 PM GMT your computer crashed

crash dump file: C:\Windows\Minidump\071711-15584-01.dmp

This was probably caused by the following module: ntoskrnl.exe (nt+0x7FD00)

Bugcheck code: 0x1A (0x41790, 0xFFFFFA80026ECEC0, 0xFFFF, 0x0)

Error: MEMORY_MANAGEMENT

file path: C:\Windows\system32\ntoskrnl.exe

product: Microsoft® Windows® Operating System

company: Microsoft Corporation

description: NT Kernel & System

Bug check description: This indicates that a severe memory management error occurred.

This might be a case of memory corruption. More often memory corruption happens because of software errors in buggy drivers, not because of faulty RAM modules.

The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.

 

 

On Sun 7/17/2011 9:56:40 PM GMT your computer crashed

crash dump file: C:\Windows\Minidump\071711-14352-01.dmp

This was probably caused by the following module: ntoskrnl.exe (nt+0x7FD00)

Bugcheck code: 0x19 (0x3, 0xFFFFF900C297F650, 0xFFFFF900C297F650, 0xFFFFF8800421B5B0)

Error: BAD_POOL_HEADER

file path: C:\Windows\system32\ntoskrnl.exe

product: Microsoft® Windows® Operating System

company: Microsoft Corporation

description: NT Kernel & System

Bug check description: This indicates that a pool header is corrupt.

This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. This might be a case of memory corruption. More often memory corruption happens because of software errors in buggy drivers, not because of faulty RAM modules.

The crash took place in the Windows kernel. Possibly this problem is caused by another driver which cannot be identified at this time.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...