Jump to content
Corsair Community

Seemingly random BSOD's. Inconsistent memtest


Cybbe

Recommended Posts

Hello, let me first introduce myself, my background and my current system.

 

I'm a software bachelor student, with a fair amount of experience in overclocking and assembly of computers. I honestly don't remember the last time I bought a pre-assembled one. I think I started taking apart and reassembling my own computers when I was 12. I've gone through a lot of stuff, and generally know my way around computers.

My current system is a less than a month old i5-2400, running on an ASRock P67 Pro3 SE, powered by a fairly old 550w ######## NeoHE PSU.

I have a Radeon HD5850 video card. I have two harddrives currently (500gb and 2tb Seagate drives, Sata 2 and 3 respectively), and no optical drive.

 

The ram in question are 2x2gb XMS3 1600mhz ram, specced at timings 9-9-9-24 (1T, I think), at 1.65v.

That is, CMX4GX3M2A1600C9.

 

I have done no overclocking whatsoever on this system (Like I'd even be able to with that processor).

 

I'm using Win7 64bit, running on the Sata 2 drive, which is also plugged into a Sata 2 controller.

 

The system started off as stable, but a few weeks in, I'd get random notifications of applications that have stopped responding. After about 3-4 of those from random programs, I'd get a bluescreen, with no real definitive driver as cause. I've gotten about 50 different ones by now, so I don't remember them honestly. But I suspect my event log can help me on that matter if it comes down to it. Entertain my story for now though.

I first feared it was a harddrive issue, so I performed diagnostics on it using seagates tools, as well as windiag, with no issues reported at all. I was at that time still able to boot and run for a while before BSOD'ing again, so I proceeded to update all drivers, including mobo drivers, and purged all suspicious applications I had installed over the last few days. I even suspected ReadyBoost, and disabled it.

I then ran a Memtest overnight, and it came back clean, so I figured it wasn't the ram.

The BSOD's then started getting more frequent, and sometimes I wouldn't be able to boot at all. I tried booting into ubuntu on a live USB, and at first it worked, but it crashed, and any subsequent attempts failed at initialization. It also didn't seem to recognize my drives. I then updated to a newer version of ubuntu, which ran fine for a while, but then started going black screen, after which it'd recover. It'd do this every half minute to a minute or so. I tried to access my drives, but soon after mounting the second of the drives, it crashed and ceased to be able to boot, again stalling at initialization.

I also tried a windows bootable usb to do recovery, but it BSOD's too.

My drives still appear to be healthy.

 

I then ran a Memtest again, and suddenly got some 500+ errors. I wondered why they didn't show up at first, but was honestly just happy I had something to work with. I took out one stick first, and ran a single pass, no issue. Let's call this the "good" stick. It was positioned in slot 0, I'll call this the "good" slot. Quotation marks are there because I can't tell.

I then took the good stick out of the good slot, and put the other, the "bad" stick, into slot 2 (Which is the dual channel pair of slot 0), which we'll now refer to as the "bad" slot. Immediately, I got 3000+ errors. Like, in the first few tests.

Okay, I figure, it's either the stick or the slot. So I take the "bad" stick and stuff it into the "good" slot, just to see. It makes a few clean passes, no issues. I'm scratching my head, but okay, maybe the slot was failing. I take the "good" stick, place it in the "bad" slot. It passes fine, no issues.

I put both sticks back in their respective slots and they both pass fine.

 

It's been a long road, so I might be forgetting some details by now, but either way.. While the "good" stick was in, memtest displayed the correct (at this time Auto) configuration of 9-9-9-24. However, I noticed the "bad" stick had a config of 6-6-6-20.

So I went and manually set the timings to 9-9-9-24. The "bad" stick continues to read out wrong, but it still appears stable. I try to boot into windows, and everything seems to work again, so I start backing up files, in case I want to do a fresh install. It functions fine for a few hours, and then the BSOD's start rolling again.

 

I then decide to take out the "bad" stick, and I get a pretty stable system using only the "good". I only tried for half a day, but it didn't show any issues.

However, I also only tried it in the "good" slot, so I still don't know if it's the memory or the mobo slot.

After that, I decided to run the bit fade test on both ram blocks, and it passed several times.

I then somewhere got a seemingly random (I'm not changing any settings here) error runnig the standard tests, where it'd report 8000+ errors. I wasn't able to reproduce them when swapping modules around.

 

I then read somewhere that the default specs of 1T command rates are way too fast, and set it to 2T. This made sense to me, since my old XMS2's also ran 2T. After that, the system ran fine for a full day, using Visual Studio, passing several stress tests (Memtest, Orthos, Intel Burntest, PCMark, 3DMark, running several games etc). Until it finally BSOD'd while idle, so I'm back at square one.

I'm considering upping the voltage to 1.7v, but if the stick is to be RMA'd, I'd rather stick with specified settings to be safe, in case it's a bad stick.

What mostly concerns me is I can't reproduce the memtest errors. I've only ever gotten three batches of errors, and I've run it god knows how many times by now. I can't reliably tell if it's a stick or a mobo error. I can't really reliably do something and assume the problem is fixed because I can't tell until the next random BSOD, which might be from now and in a month's time.

 

I first suspected the timings, which would somehow explain the erratic behaviour, but even at 9-9-9-24 set manually (and being reported correctly when both sticks are in), and at a 2T command rate, it still errors. My old XMS2's were at 5-5-5-12, so I'm already a bit wtf at how lax these timings are, but I'm sure there's a sensible DDR3 reason for that. One reason I'm considering upping the voltage though, is that 1.65v seems like very little. My old XMS2's ran 2.1v.

 

But it seems like wasted effort when I can't even memtest reliably. So I figured I'd ask here if anyone has any ideas or comments.

 

I feel I can exclude the errors being software or harddrive when it occurs while booting ubuntu and win7 installer from usb. I also recall it happening in safe mode, even while viewing the event log. I forgot to mention that, but I also forget where in the timeline that is. It's been a long road, I've been dealing with this for over a week now, so forgive me if I've left out something, and feel free to ask specific questions. It might jog my memory.

 

I'm pretty much down to thinking it's PSU, Memory, CPU or the Mainboard. Memory being the most obvious and cheapest culprit to replace. But I don't want to RMA just to RMA and have an unusable computer for a full week (I'm writing my dissertation right now, due July 1st). And even then, if it comes back and it turns out it's not the issue, I'm back at square one.

My economy doesn't hold for me to easily obtain replacement PSU/Memory/CPU or Mainboards either, at the time.

 

I hope some of this information is useful. I'm really at a loss here. Sorry for the wall of text, but I hope someone is able to help :)

 

Thanks in advance.

Link to comment
Share on other sites

  • Corsair Employees
Please make sure that you have the latest MB BIOS then install just one module and go to BIOS setup and load setup defaults then enable XMP Profile one and Disable Legacy USB then test the modules one at a time and let me know the results.
Link to comment
Share on other sites

There is only one bios revision available, so can't do much about that.

I resat to UEFI defaults, and disabled Legacy USB support. I also loaded XMP profile one (which sets it back at 1.65v 9-9-9-24 1T, 1333mhz).

 

However, now everything that resembles USB refuses to work at all. Keyboard isn't responding, the mouse doesn't work in the UEFI.. So I plugged in my old ps2 keyboard, and entered the boot menu, and the USB stick I use for memtest isn't recognized.

I have no floppy or optical drive, so this is quite a setback. (I haven't needed a dvd drive for over a year.)

I tried the USB stick in all slots, it just won't recognize it anymore.

 

-EDIT-

Okay, strike my first statement. I was loading the driversite again while posting this, and apparently there are new bios revisions available. I'm gonna flash it right away. I have no idea how I missed this the first time around.

Link to comment
Share on other sites

Derp. Guess I'm getting too eager. I loaded up the wrong driversite (Just the regular Pro3, not SE).

The only current revision out for this specific board is still 1.10.

It says it corresponds to first released. Nothing new to get there.

 

So I'm still left with no USB without legacy. Would you like me to re-run the tests at defaults with legacy enabled?

Link to comment
Share on other sites

Oookay, weird stuff.

 

So I've been running two memtests with default settings now. Trying for 2 passes on each (It's getting late, so this is what I'll manage before going to bed.

 

Now, the "good" stick passed twice. The CAS is set correctly to 9-9-9-24 1T, but I notice the frequency is now set to 798Mhz (DDR3-1596). At least that's what Memtest tells me. Seems like an odd number (I'm only really used to dealing with friendly numbers in these cases, so can't tell. Such as 1066, 1333, 1600, etc).

I was also under the impression my CPU wouldn't let me get near 1600, but apparently so?

 

The "bad" stick, I decided to plug into the same slot. It now finally runs the correct CAS according to memtest (exact same as the other stick). It also fails terribly.

Within the first 1-3 tests, it managed to rack up 60928 errors. (Highest yet).

Increased to 61184 after test 4.

Test 5 didn't find anything else.

Test 6 didn't find anything else.

Test 7 didn't find anything else.

Test 8 didn't find anything else.

 

I didn't go for a second pass on this stick.

I'll run the first stick again overnight, just to make sure.

 

This result pleases me a bit, as it now at least points more to that one stick of ram than the motherboard (since I've now succesfully managed to get errors in two slots).

Mind you, I still have Legacy USB enabled at this point, since I can't do anything without it.

 

I've been a tard and misplaced the original packaging though. I might be getting ahead of myself, and let me know if you want me to run more tests, but will not having original packaging be an issue in terms of RMA?

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...