Jump to content
Corsair Community

Possibly bad ram or cpu?


dac

Recommended Posts

I have a Tyan Tiger S2875 with an Opteron 148 (rev C0) and 4 x 512M Crucial Registered DDR 3200 (CM72SD512RLP-3200/N). The OS is Gentoo 64-bit Linux. Originally, I was regularly experiencing correctable ECC memory errors about 4 or 5 times a day under load which I had originally suspected to be bad ram but could never reproduce unless all 4 DIMM slots were occupied. All efforts to reproduce the error with a single DIMM or pair of DIMMs never succeeded. Memtest86+ also never indicated a problem even after running for several days. After digging around on the forums here, I realized that my Crucial memory is 8x rather than 4x, and therefore does not support Chipkill on earlier Opteron revs (for some reason, in a recent BIOS update Tyan enabled Chipkill by default). I disabled Chipkill and reran my tests which seemed to get rid of the ECC memory errors but replaced them with L2 ECC errors also occuring about 4-5 times a day:

 

MCE 1

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 0 2 bus unit TSC 4a2ef2489ad3

L2 cache ECC error

Bus or cache array error

bit46 = corrected ecc error

bus error ′local node origin, request didn′t time out

prefetch mem transaction

memory access, level generic′

STATUS 9000400000000863 MCGSTATUS 0

MCE 0

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 0 0 data cache TSC 4b60e86ca22a

ADDR 6d75c640

Data cache ECC error (syndrome d6)

bit46 = corrected ecc error

bus error ′local node origin, request didn′t time out

data read mem transaction

memory access, level generic′

STATUS 946b400000000833 MCGSTATUS 0

 

Again, these L2 ECC errors are only reproducible with all 4 DIMM slots occupied. I've never been able to reproduce the errors with a single DIMM, or any combination of pair of DIMMs.

 

So I suspect a bad CPU now, but wanted to completely rule out the possibility of bad RAM and/or a bad motherboard before replacing the CPU. If the CPU is bad, then why does the problem only occur when all four DIMM slots are occupied, and never with just one or two? Could it still be a memory or motherboard issue? Any help would be appreciated, as I have run out of other things to try.

 

Thanks,

David

Link to comment
Share on other sites

  • Corsair Employee

David,

What you have posted would suggest a software issue or possibly a bios issue. I would talk to the software vendor before you change any hardware.

If possible I would test the CPU and memory in another system to be sure. And if this test is coming from the Gentoo Linux I would try another O.S. as well.

 

I would talk to Tyan as well and see what they think before you change anything as well.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...