Jump to content
Corsair Community

ECC errors on CM72SD1024RLP-bad sticks


jengl_usbr

Recommended Posts

Hi-

 

I've recently been getting some ECC mem errors reported in my syslog. Unfortunately this led to a sys crash this past weekend. My diagnosis seems to potentially indicate two bad sticks.

 

System history: purchased 2 Aug 2005; has been running great ever since I got it. Just recently started acting strange. System has had no modification since purchase, other than adding/deleting a couple of SATA drives and OS upgrades (now on Fedora Core 5). Hooked to APC SmartUPS 1000; so I don't think power is an issue. Regressed to a couple of older kernels to make sure kernel upgrades weren't causing erroneous hardware errors in logs ;-) Wasn't the problem. Currently running 2.6.18-1.2257.fc5.

 

Board: (MonarchComputer) SuperMicro H8DCE

Bios: factory defaults; checked and reset

Processors: AMD Opteron 248 (x2)

PSU: Enermax 465P-VE-24P; in Enermax CS-10182-BA tower case

Mem: CM72SD1024RLP-3200/S (x4)

 

So, good news is that I've had over 1-1/2 years trouble-free with 4Gb of this memory. Bad news is now some is failing.

 

First indication of problem was in my syslogs last week. System then crashed over the weekend, and I started to test mem first. Here are some snips from the boot log, indicating problem shows up on boot:

EDAC k8 MC0: general bus error: participating processor(local node response), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)

EDAC MC0: CE page 0x36b1, offset 0xdc0, grain 8, syndrome 0x26, row 1, channel 0, label "": k8_edac

EDAC k8 MC0: extended error code: ECC error

 

Here are some snips from /var/log/messages (random errors depending on application):

Feb 12 12:09:54 arkansas kernel: Machine check events logged

Feb 12 12:36:14 arkansas kernel: EDAC k8 MC0: general bus error: participating processor(local node origin), time-out(no timeout) memory transaction type(generic read), mem or i/o(mem access), cache level(generic)

Feb 12 12:36:14 arkansas kernel: EDAC MC0: CE page 0x4a8d1, offset 0xdc0, grain 8, syndrome 0x26, row 1, channel 0, label "": k8_edac

 

Here are some snips from /var/log/mce (where hardware errors get piped):

MCE 0

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 0 0 data cache TSC a400a88127

ADDR 2fd1dc0

Data cache ECC error (syndrome 26)

bit46 = corrected ecc error

bus error 'local node origin, request didn't time out

data read mem transaction

memory access, level generic'

STATUS 9413400000000833 MCGSTATUS 0

 

 

I used memtest86+ (v. 1.65) to test.

1. First tested all four modules in place (stock system). Failed right away with lots of ECC errors.

2. Took out two modules. Reran test on one pair; failed during first cycle with ECC errors.

3. Removed pair that failed. Then tested remaining two modules as a pair. These passed one test cycle.

4. Then ran the two working pairs in a different paired slot (to double-check slots and cpus) - passed 12 cycles of memtest with no errors overnight.

5. Tested failed sticks individually. Both indicate problems.

 

Questions for you:

1. can I get an RMA number to return? I'll submit this request via your online system.

2. I plan to send only the failing pair back - what do you think? I think the others test okay so I can at least keep my system running temporarily on 2Gb.

 

Thanks for your assistance-

-John

Link to comment
Share on other sites

Thanks- I'll try and send all 4 back. I just need to get some temp replacement so I can keep my system running. One problem, FYI:

 

The online RMA request form keeps timing out in Firefox. I'll try the old-fashioned phone method to get an RMA number, unless you have a better suggestion.

 

BTW, your "new" tech web site FAILS with Firefox v. 1.5 (x86_64). I tried to get an RMA number that way and I go to click on the very first radio button on "Memory: XMS2, VS, CMSS" and it CRASHES my web browser!! Please let your web folks know that it appears to be incompatible w/Firefox, and to fix it. I tried it twice, and it crashed my browser both times. I'll also repost this under "Customer Care" thread so they can see it.

 

TIA-

-John

Link to comment
Share on other sites

  • Corsair Employees

I have been using Firefox on new system at home and did not have any problems at all. But I just down loaded and installed it this morning a new Build.

 

Version info:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...