dac Posted October 24, 2006 Share Posted October 24, 2006 I have a Tyan Tiger S2875 with an Opteron 148 (rev C0) and 4 x 512M Crucial Registered DDR 3200 (CM72SD512RLP-3200/N). The OS is Gentoo 64-bit Linux. Originally, I was regularly experiencing correctable ECC memory errors about 4 or 5 times a day under load which I had originally suspected to be bad ram but could never reproduce unless all 4 DIMM slots were occupied. All efforts to reproduce the error with a single DIMM or pair of DIMMs never succeeded. Memtest86+ also never indicated a problem even after running for several days. After digging around on the forums here, I realized that my Crucial memory is 8x rather than 4x, and therefore does not support Chipkill on earlier Opteron revs (for some reason, in a recent BIOS update Tyan enabled Chipkill by default). I disabled Chipkill and reran my tests which seemed to get rid of the ECC memory errors but replaced them with L2 ECC errors also occuring about 4-5 times a day: MCE 1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 2 bus unit TSC 4a2ef2489ad3 L2 cache ECC error Bus or cache array error bit46 = corrected ecc error bus error ′local node origin, request didn′t time out prefetch mem transaction memory access, level generic′ STATUS 9000400000000863 MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 0 data cache TSC 4b60e86ca22a ADDR 6d75c640 Data cache ECC error (syndrome d6) bit46 = corrected ecc error bus error ′local node origin, request didn′t time out data read mem transaction memory access, level generic′ STATUS 946b400000000833 MCGSTATUS 0 Again, these L2 ECC errors are only reproducible with all 4 DIMM slots occupied. I've never been able to reproduce the errors with a single DIMM, or any combination of pair of DIMMs. So I suspect a bad CPU now, but wanted to completely rule out the possibility of bad RAM and/or a bad motherboard before replacing the CPU. If the CPU is bad, then why does the problem only occur when all four DIMM slots are occupied, and never with just one or two? Could it still be a memory or motherboard issue? Any help would be appreciated, as I have run out of other things to try. Thanks, David Link to comment Share on other sites More sharing options...
This topic is now archived and is closed to further replies.