היום הותר לפרסום.
Сегодня разрешено к публикации.
Cleared for publication today.
In 2005, I was serving in a technological unit of the IDF (Israel Defense Forces). There are bugs in Java history that stay with you for a lifetime, and one such incident happened exactly on my watch. It was an issue with handling the Spring Forward daylight saving time transition in Sun JDK 1.4.2. When attempting to parse a non-existent time, the system fell into a 100% CPU infinite loop. Tomcat hung, thread pools were exhausted, and the hardware literally overheated and shut down via thermal trip.
Below is the chronology of how I investigated this incident, and the mechanics of the bug itself.
There is more below.Ниже есть продолжение.
1. Schrödinger's Time Zone: How Time Was Erased in Israel
In the early 2000s, the Asia/Jerusalem time zone was an absolute nightmare for developers. Unlike Europe or the US, Israel did not have a rigidly fixed rule for the clock switch. The dates for transitioning to summer and winter time were approved annually by the Knesset and depended on the Jewish lunar calendar (specifically, the timing of Yom Kippur). Because of this, official patches containing ZoneInfo tables from Sun were constantly delayed or contained errors.
On the night of the transition, the time interval of 02:00–02:59 was physically erased from reality (becoming an absolute vacuum, ∅). If a server received a string with a time like 02:30:00 and attempted to parse it, the Java code collided with a void.
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
sdf.setTimeZone(TimeZone.getTimeZone("Asia/Jerusalem"));
Date date = sdf.parse("2005-04-01 02:30:00"); // ← This is where everything broke
2. The Lenient Mode Trap and Transaction Salvage
By default, GregorianCalendar and SimpleDateFormat operate in a lenient mode. Instead of honestly rejecting the phantom date, the virtual machine attempted to «fix» the incorrect time.
In Sun JDK 1.4.2, this heuristic was fatally broken. When processing time inside the DST gap, the computeTime() method tried to calculate milliseconds from the Epoch, but the algorithm panicked upon hitting a contradiction between the local fields (HOUR_OF_DAY = 2) and the actual offset.
3. Anatomy of the Infinite Loop: How Code Boiled Processors
Here is what the loop inside the JVM looked like (simplified):
- Take the local time 02:30 and the winter offset (UTC+2 for Jerusalem).
- Calculate the UNIX timestamp.
- Check against the time zone rules — summer time (UTC+3) should already apply for this absolute time.
- Rollback. Apply the UTC+3 offset to the same 02:30.
- Calculate the new timestamp.
- Check again — for this timestamp, the barrier has not yet been crossed, so the UTC+2 offset must apply.
- Return to step 1.
The result was a classic infinite loop at the level of base classes. And this was not an abstract software freeze. The thread monopolized 100% of the core's capacity. Since there were no interrupts in the loop, the processor began to endlessly grind in vain.
When dozens of servers simultaneously hit 100% load, the air conditioners in the server room simply failed to handle the sudden spike in heat dissipation (ΔS → ∞). The temperature in the cold aisle began climbing rapidly. Ultimately, hardware sensors initiated a thermal trip — servers violently cut their power to save the processor crystals from thermal degradation and physical burnout.
4. Thread Pool Exhaustion: Investigation
When the nodes started dropping one after another, it became clear this was not an ordinary memory leak. The Garbage Collector (GC) and Heap Size tuning were useless. The real drama was unfolding inside the Tomcat servers. Every request containing an invalid Israeli timestamp permanently captured one thread. Within a few minutes, the pool was drained (Thread Pool Exhaustion), the Tomcat stopped responding to any pings, and then went into a thermal knockout.
sdf.setLenient(false); // ← A magic pill? Not quite.
With this setting, the infinite loop did not occur — the code threw java.text.ParseException: Unparseable date. The server did not overheat, but the system lost data packets.
5. Hot-fix via Reflection
Official patches (like tzupdater) merely updated the binary files on the disk, but the JVM kept the time zone tables in its RAM since startup and completely ignored disk changes.
We had to apply a radical but uniquely functional technique: a controlled cache failure via the Reflection API. We wrote code that breached the sun.util.calendar.ZoneInfo class and forcibly nullified its internal array:
import java.lang.reflect.Field;
import sun.util.calendar.ZoneInfo;
Field cacheField = ZoneInfo.class.getDeclaredField("cache");
cacheField.setAccessible(true);
cacheField.set(null, null); // Nullify the cache
It worked. On the next parse attempt, the JVM accessed memory, received a [CACHE_MISS], and was forced to reread the updated files from the disk.
Conclusion
In later versions of the JDK (and especially with the arrival of java.time in Java 8), this architectural hole was closed forever. Strict checks, iteration limits, and predictable behavior when hitting gap/overlap zones were added.
But for me, the Sun JDK 1.4.2 bug will forever remain the reference example of how a banal clock adjustment can boil a server.

No comments:
Post a Comment