The Curse of the Intermittent Fault

At the end of last year, I made a temporary change on the control system. A temperature transmitter, which is used to control the temperature of gas leaving a cooler, kept seeing sudden drops in temperature. The control system would see these drops and would respond quickly to them and reduce the cooling. Unfortunately the sudden drops were not a real process change. The sudden drop in cooling caused problems with the downstream equipment.

As a temporary measure, I switched the control to use another temperature indicator on the line. For some reason this instrument was not seeing the sudden drops.

A couple of weeks ago, I checked the two readings and both seemed to be behaving in a similar manner. Yes there were drops in temperature, but they weren't that big and both instruments were doing the same.

I checked again last week, same thing. So we agreed to remove the chance and put everything back to how it was originally. There were a few other issues that delayed things, so I only got around to removing the change today.

Half an hour after putting everything back to normal, the plant was kicked.

Looking back at the data over the last two weeks, it turns out that the large temperature dips hadn't completely vanished. They were intermittent. The just happened not be be active on the days I had checked.

I quickly reimplemented the change. I don't yet know what the best option going forward is, but the temporary solution is good enough for now and may be made permanent.

And I will try and be more aware of intermittent faults. Checking one day's data is not enough. Check all the data you can!

