Maintaining high availability of Hong Kong Observatory Operational Computer Systems
The summer sun is blazing but a storm is creeping from afar, unseen beyond the horizon. The sky of Hong Kong, so blue and homogenous it almost resembles a waveless sea, gives nothing other than a hot sun hanging and a few clouds slacking off, and the blueness of the sky is showing no sign of the coming storm. Now let us be brave enough to imagine that the weather forecasters in the Central Forecasting Office (CFO) of the Hong Kong Observatory (HKO) are now suddenly deprived of weather data and forecast products. These data may include those from surface and upper-air weather observations automatic weather stations, weather satellites, weather radars and numerical weather prediction (NWP) models. Without these telescopic eyes and sensors, it is not hard to imagine how difficult it would be to predict whether a developing storm located far from human sight would affect Hong Kong in the upcoming few hours. Indeed, the weather data and forecast products are crucial to weather forecasting operations. One of the major tasks of the HKO staff is to ensure the computer systems that are used to process the data and generate products remain operational round the clock. Nevertheless, diligence does not promise flawless performance. If any one of the computer systems unexpectedly breaks down, it is just like blinding and handicapping the forecasters.
CFO is not the only office operating 24-hour in HKO. In fact, colleagues in the Information Technology (IT) Management Division (ITMD) of HKO are also working 24-hour to monitor the operation of all critical computer systems and IT facilities in HKO. If any of the systems show a sign of problem, the Information Technology Officer (ITO) will follow the established procedures to handle the problem as soon as possible. In case an urgent situation occurs in non-office hours, even off-duty relevant officers have to assist the ITO to ensure that the computer systems at CFO can continue to operate smoothly. However, it may still need some time to recover from the problem if there is one. In that case, will the forecasters be blind to the sky during the period?
Figure 1： Colleagues of Information Technology Management Division closely monitoring the performance of the Hong Kong Observatory operational computer systems
There is an important concept called “redundancy” in engineering. Although it may sound negative, it is of extreme importance in keeping the availability of critical IT systems. Redundancy means that there is at least one duplicate of each critical component in the systems. In other words, each critical component has a set of backup that can immediately replace the original if the original malfunctions. In the human body, pair of organs like eyes, lungs and kidneys share a similar concept. If one in the pair fails, the other can still support the functioning of the body. Although these pairs of organs always work at the same time, this may not be the case for computer systems with redundancy. In general, only one set of these systems will be in operational mode. The backup system will switch to the operational mode (either automatically or manually) when the operating system shows a sign of problem.
HKO strives to provide timely weather forecasts and warnings to the public and the operations of CFO must never be interrupted during any types of weather. Therefore, HKO designs all critical systems with the concept of redundancy in mind. The systems each has a set of backup, which, in case of an unexpected failure of its counterpart, can immediately takeover the operation. Upon acknowledging a system problem, ITMD will decide whether it is necessary to switch the backup system to the operational mode. If that turns out to be the case, ITMD will repair the malfunctioned system as soon as possible so that it can operate properly again as early as possible. There is thus no need for forecasters to worry about being blind to the sky.