gpel Posted November 9, 2005 Share Posted November 9, 2005 Hi, I have a problem with some of the newer (manufactured mid 2005) CM72SD1024RLP-3200 modules used in server systems but these problems are spontaneous and only reproducable after a variable run time in memtest (sometimes straight after booting, sometimes after 20 hours). I was able to reproduce errors in one particular module - always in the range 512.4 MB or 512.9 MB. Some errors can be corrected by ECC others cannot. I usually test with ECC off in the BIOS, as this gives more stable results. Now the weird part: The reproductability of the error is very spontaneous. Some runs I collect a great number of errors, then other times it won't find an error in 100 passes (setting the test range to 500 MB - 520 MB). Whenever I got an error it was in the 512 MB range or somewhere around at 1500 MB when it was plugged in as a second module - the maths worked out. When it failed - it failed in tests 4, but also in tests 6 and 7, never in any of the other tests. It's not a thermal problem: Other modules do 40 full passes over night without a problem. It doesn't appear to be a problem with the memory slot on the motherboard as the error was reproducable in two different slots with the same module. It sounds like a clear case for RMA - except I can't currently reproduce the error anymore having been able to do so this morning and yesterday. What could be up? In on of the threads the memory guy writes that errors can be triggered in random tests. It would be interesting if someone could be more specific on that. RAM Guy: It would be great if you could explain the difference between SMI (System Management Interrupt), NMI (non maskable Interrupt) and SCI (System Control) when ECC mode is enabled? My experience with Tyan boards (S2735 and S5350) is as follows: * ECC enabled and set to SMI - machine will power off immediately when it finds an error in memtest / memtest+ * ECC enabled and set to NMI - machine shows "unknown interrupt" and halts the machine. memtest stops running, control menu can still be called but not further tests can be initiated. * ECC enabled and set to SCI - machine performs as expected - memtest+ shows and counts both ECC and non-ECC errors and shows whether or not errors could be corrected by ECC. Link to comment Share on other sites More sharing options...
This topic is now archived and is closed to further replies.