IntelliJ IDEA 2016.3 Help

Tutorial: Java Debugging Deep Dive

On this page:

Introduction

Debugging is one of the most powerful tools in any developer's arsenal. It gives us a unique insight into how a program runs and allows us to gain a much deeper understanding of the piece of code we debug. It allows us to trace running code and inspect the state and the flow of the execution. As part of that, it gives us the illusion of a sequential flow. This is very intuitive and powerful but also may be misleading as most modern applications are multithreaded.

"Debugging" suggests we deal with bugs but this is actually a misnomer. The information we get from debugging is useful even when there is no problem with the code. Finding bugs just happens to be a very common use case for the knowledge we can get from a debug session.

The IntelliJ IDEA debugger offers a rich experience that helps us to easily debug anything from the simplest code to complex multithreaded applications.

In this tutorial we will cover some of its features from a developer's point of view, taking you through a virtual debugging session of some extremely buggy code. After some general comments and an outline of basic debugging concepts, this tutorial dives into practical debugging by looking at different 'bug types' or assumptions we might make.

There is no single way to debug. Through those types of issues, we cover several options the IntelliJ IDEA debugger gives us that we can take in order to help us to prove or refute our assumptions.

Before We Start

A word of caution: debugging is a very powerful tool but it does come with a cost. The debug process is part of the runtime and therefore affects it. Every evaluation of an expression happens using the same memory of the debugged application, and can modify and potentially corrupt the state.

During this tutorial, bear in mind that debugging is an intrusive approach that may affect the outcome of the debugged application. We will explore a few ways to minimize its impact and sometimes even exploit it.

The timing of execution is also very different when you debug code compared to running it. As we are about to see, this becomes a critical issue in multithreaded environments, when reproducing a bug sometimes depends on a very specific sequence of events, which a change in timing can make unreachable.

In a debugging session, the minimal debug tracking overhead in itself may already be enough to change the timing of events and therefore the application behaviour. Every breakpoint or log is a possible synchronization point, and stepping obviously changes the timings significantly.

An important point to remember is that debugging is not a substitute for understanding the code. In fact, the only way to learn from a debug session is to constantly compare the information the debugger shows us with our expectations from the code and how we think it "should" behave.

Before starting a debugging session we must have some knowledge of what we're trying to achieve by it. If we're looking for a bug, we need to roughly know what is incorrect, i.e. what is different from the expected behaviour or state. In most cases we will also have some initial assumption as to why things are wrong. These two pieces of information will dictate how our debugging session should be conducted.

The Basics: Finding Out What the Application Does

The simplest case for debugging is seeing how the program behaves given a certain input, either because of a bug, or just to understand the code. In this case, using a simple breakpoint (called a line breakpoint) that will suspend the JVM (or thread, see Breakpoint properties) and then stepping through the code is our bread and butter.

Stepping

Basic Stepping

The four basic options we will constantly use when stepping are:

  • Step over F8 Step over a line of code.
  • Step into F7 Step into a method called from this line of code.
  • Step out Shift+F8 Step out of a method back to the calling code.
  • Resume Program F9 Resume debugging and stop at the next breakpoint.

IntelliJ IDEA will step into most code with two exceptions, described below.

Stepping Skipping and Force Step Into

Code can be configured to be skipped in stepping. Some of it, like, for example, any com.sun.* classes is skipped by default. This is defined in Preferences | Build, Execution, Deployment | Debugger | Stepping.

The motivation behind this is to reduce noise and skip unrelated code. Class loading for example, is almost always unrelated to our application code, so is a lot of third party code, especially frameworks.

We can change the configuration to skip more or less, and we also have an overriding mechanism at runtime to step into "skipped code". That is what the Force Step Into option is for.

Debugging Without Source Code

If we don't have the source to specific code, IntelliJ IDEA will still decompile the class and show our steps in the decompiled source. This is very helpful, but note that the generated decompiled class may look different from the original, and if the lines do not match debugging in decompiled code may be confusing. Always try to obtain the source code of the classes you want to step into.

Classes Compiled Without the Debug Flag

Code that was compiled without the debug flag cannot be debugged. There is no way to step into this code. When the debugger encounters such code during a debugging session, it will step over that part of the code.

Line breakpoints are also not possible to define and hit. However, this is where the Method Breakpoint might save us, as we can still define in IntelliJ IDEA a breakpoint to stop before entry or exit from a specific method, even if the method itself was compiled without the debug flag.

When viewing the state, the call frame itself will appear in grey when not selected, and since the actual variables from within the method cannot be inspected, we will see a warning message instead.

Call frame appears in grey

Inspect State

Once running is suspended, we can inspect the state of the application. Based on the thread and the position we are in that thread, IntelliJ IDEA will show us the various variables and fields in scope and their values. This is good enough for almost all cases, but sometimes we want to enquire about something else.

Watch

If we want to know the value of a particular expression in every frame or thread, and across many suspended breakpoints, we can set a watch on an expression.

Type Renderer

Usually, the default rendering of a value into a displayed string is good enough. When it's not, in most cases it is because toString() was not overridden. This is simply a hint to us that we should implement toString() for this class.

Sometimes though, implementing toString() is not possible, or we will be better off with a specific Type Renderer for our purposes. For example, assume we're dealing with lists as queues: in all those lists, all we want to see is whether the list is empty, and if not, we want to see the last element of that list. Different lists will have different implementations and renderers. The renderer of an ArrayList for example, will show you only the first 100 elements. That will both contain lots of unneeded details and also may not contain what is really important for us. To help us, IntelliJ IDEA allows us to easily override the Type Renderer.

We choose to render any java.util.List object to show the "EMPTY" string when empty or the last element when it has one.

Updating the Type Renderer

Evaluate Expression

Evaluate Expression is possibly the most useful feature of the IntelliJ IDEA debugger. It allows us to inspect values and evaluate specific expressions. For example, evaluating parts of a condition to figure out why a certain complex condition gave an unexpected result.

The expression can also contain calling methods, testing out scenarios and using parameter values that did not exist in the actual debugging session. These kind of "what-if" scenarios are easily done by Evaluating Expression. This really allows us to closely inspect the behaviour of certain methods or expressions.

Unexpected Parameter or Call to Method

This section covers what to do if we know where things have already gone wrong, but don't know why.

Exploring Call Frames

A Line Breakpoint should be enough for most cases of detecting the cause behind an unexpected call or call with unexpected parameter values to a method. If we're not sure where it's being called from, we can put the breakpoint inside the method. When the VM is suspended, click on the previous call frames to view the call stack and inspect the state in each scope to see how we got here. If the frame information is unavailable it will appear in grey.

Call frames

Drop Frames: Replaying the Execution

If we want to step back through that method call to get more information by stepping, we can use the Drop Frame feature. This will allow us to go back up the stack and re-execute the code. It's a useful feature, but also potentially dangerous: we must be aware that re-executing the code will execute the same instructions twice, and if those instructions modify state we might end up in a corrupted state, and certainly in a scenario that would not happen in a normal run under the same conditions. To make the impact of Drop Frame obvious, consider this simple demo.

public class DropFrameDemo { private static int state = 0; public static void main(final String[] args) { modifyStateBasedOnParameter(state); modifyStateBasedOnStaticField(); } // dropping frame within this method, // and executing again will print state = 2 private static void modifyStateBasedOnStaticField() { state++; System.out.println("state = " + state); } // dropping frame from within this method, // and executing again will print state = 1 private static void modifyStateBasedOnParameter(final int parameter) { state = parameter + 1; System.out.println("state = " + state); } }

Breaking inside modifyStateBasedOnParameter() will not impact the state because IntelliJ IDEA remembers the parameter values passed in to that frame and will not recalculate those. However, breaking inside modifyStateBasedOnStaticField() will make the state field equal '2'. A value which is impossible under a normal run of main().

Method Breakpoint

An alternative to having a line breakpoint defined within the problematic method is to define a Method Breakpoint. This type of breakpoint is not attached to a source code line number, but to the entry and exit of a call to a method. It is especially useful in cases where we don't have the source code, only a decompiled version, and we still want to inspect the ins and outs of the method call, without any confusing differences in line numbers between the compiled class and the decompiled source code.

Unexpected Value of Field

This could be viewed as something that went wrong before the access method to that field. That is why all the above approaches can help detect the problem. Sometimes though, especially if the field was accessible by more than just the getter and setter, it's not easy to find what flow started the problem. For that scenario, IntelliJ IDEA provides us with another useful option.

Field Watchpoint

A Field Watchpoint is really another type of breakpoint. It is a breakpoint at any point we either read from or write to a specific field. This is especially useful in cases where the field is being accessed all over the code, or we're not sure about all the places it is being accessed from.

This is another case of the test giving a clue to the developer. If possible, we should try controlling the access to that field by an access modifier and using getters and setters to make it easier to manage.

Unexpected Exception Thrown

Analyse Stacktrace

Although not strictly a debugging feature, when we want to investigate why an exception was thrown, we can analyse the exception stack trace and quickly get to the line of code that generated that exception. From there, the usual combination of Line Breakpoint and Stepping is enough to figure out what is wrong.

Analyze Stacktrace

Sometimes however, the exception is wrapped in another exception or caught, and we only see its side effects but not its stack trace. For that we have a special type of breakpoint.

Exception Breakpoint

We can define an Exception Breakpoint for a specific type of Exception, or any type, and the breakpoint will be triggered whenever the matching exception is thrown. This is extremely useful in cases where we don't know where the exception is generated from.

Debugging Multithreaded Applications

Multithreaded applications are the biggest challenge to debug. These applications are a lot less deterministic and much harder to control. The illusion of sequential flow we get from Stepping in a debug session does not help either.

In a multithreaded environment, and in particular when investigating issues that can be concurrency bugs, we need to try to Step less and fine tune our Breakpoints more. This is because a lot of the concurrency bugs depend on a specific interaction between the different threads, and an intrusive debugging session will interfere with that. We'll show how using various Breakpoint properties allows us to limit the interference to a minimum.

The other important topic is controlling and switching between the different threads in the application. We'll go through some examples of debugging different concurrency bugs to demonstrate how IntelliJ IDEA's features help with this.

Breakpoint Properties

IntelliJ IDEA debugger properties allow us to control the actions taken when the Breakpoint is triggered. Some of them define the action, and others are there to add further conditions on whether to take the action at all. This fine level of control of the breakpoints is critical for concurrency bugs, because most will only be reproduced when the threads interact in a very specific way. Any interference of the breakpoints may prevent us from reproducing the bug.

Breakpoint Actions

Deciding on the breakpoint action depends on what we want to achieve in the debugging session.
Suspend VM
If we want to view all the threads, with their state and call frames, when one thread reaches a specific point in the code, or when a certain condition occurs, then suspending the entire VM is our best approach.
Suspend Thread
In many cases, suspending only the thread and not the whole VM is preferable. This is especially true when the application is part of a larger system and suspending the VM will cause either an overflow of messages waiting to be served, or request timeouts that end up breaking the entire system. When we have many worker threads, it is better to keep almost all working and focus on the one thread which manifests the problem.
Log
When we deal with a concurrency bug, any suspension of execution may prevent us from reproducing the bug. We can opt to make the breakpoint not suspend anything, just log either a message or a value of a particular expression to the console, then inspect the log. This works well when we have a strong theory about what exactly are we looking for.

Breakpoint Conditions

Apart from being convenient, breakpoint conditions let us minimize the intrusive nature of the debugging session. They allow us to limit the breakpoint actions to only what we see as absolutely essential.

Conditional Expression
The most widely used condition. Allows us to trigger the breakpoint only when our application reaches a specific state. Ideal if we can define an expression that captures when things start to go wrong.
Pass Count
Pass count is useful in code that is being run many times, either an event handler or a loop and the interesting scenario we're after only manifest itself after several passes.
Remove Once Hit
Remove Once Hit is especially useful when the breakpoint action is to log rather than suspend, as if we were suspended we could remove or disable the breakpoint ourselves. We use this when the code is being hit many times but only the first case is interesting.
Filters
Allows us to filter the triggering by class or specific instance.
Dependency on Other Breakpoint
A very useful feature. Its obvious use is as a filter to triggering a breakpoint in a scenario where we're interested in a visit to a method, or a specific state in the code only after another state was reached. But as well as that, we can use it to reproduce a particular concurrency issue, as it can help us suspend threads and control which thread reaches what particular line in the code and in what order.

If the Debugging Session is Too Slow

Method Breakpoint and Field Watchpoint slow down the code execution considerably. When executing the same code a huge number of times, even conditional breakpoints slow down the processing enough for it to be noticeable. This is a real issue because the scenario of an event handler processing many events is fairly common, and evaluating a breakpoint condition inside that event handling code can slow down the system to an unusable state. To overcome that, assuming we can modify the running code, we can improve the speed by employing a little trick we shall call "breakpoint in code".

This trick is very useful when we debug processing of millions of events where only one causes a problem and we have no idea in advance which is the problematic one, and can save us a lot of waiting for a conditional breakpoint to be triggered.

The fastest code is the executed code that was compiled and optimized by the JVM. We want to use that fact and so, instead of writing a condition on a breakpoint, we introduce it to our executed code in a way that we can manipulate later. We then debug without any breakpoint, thus running in the fastest way a debugging session can run, and introduce a breakpoint only when we actually know we need to hit it.

Breakpoint in Code

  1. We introduce a loop to the code with our condition. This means we enter the loop only if the interesting state occurs. We then print something to the console so we will know when the code has entered the loop. Because the loop does not change any state, once we enter the loop we will stay inside it.
    while (bugCondition(msg)) { System.out.println("gotcha!"); try { Thread.sleep(1_000); } catch (InterruptedException e) { //ignore } }
  2. At this point we initiate the debug session, sit and wait for the "gotcha!" to appear. The console will show us we "hit" the "breakpoint".
    Console shows breakpoint hit
  3. When it does appear, we introduce a real line breakpoint inside the loop. The breakpoint will be hit and suspend the VM or thread. Now we can inspect the event and its state. If that is not enough, the last thing to do is to make our code exit the loop. There are two options to do that.

    • We can take advantage of Evaluate Expression to evaluate a code fragment that will actually modify the loop condition to false. This is easily done if we use a field or variable as the condition of the loop, since we can then modify its value.

      boolean enterLoop = bugCondition(msg); while (enterLoop) { System.out.println("gotcha!"); try { Thread.sleep(1_000); } catch (InterruptedException e) { //ignore } }
      Use Evaluate Expression to stop the loop
    • We can exit the loop by using another feature of a debugging session, HotSwap. This allows us to modify the running code during debugging, compile it and then IntelliJ IDEA will hot swap the debugged classes with the new versions. We change the loop condition to false, exit the loop and continue to debug. By default, IntelliJ IDEA will detect that a class has a new version and will ask us whether to reload the class with the new version.

      Reload classes?

Race Condition: Corrupted State

A race condition is a common issue in multithreaded applications. Multiple threads access and modify the same state, potentially corrupting it or causing undesired flow. A race condition is a very subtle bug and usually hard to reproduce. That is because it is only reproduced when the threads work in a very specific order, other runs will look fine.

Sometimes race conditions only occur once every tens or hundred runs of the system. If we suspect there is a race condition in our multithreaded code, we must always make sure that the intrusive nature of the debugging session does not make the issue not reproducible. For example, here we've created a system of a publishers and subscribers, however all our subscribers share a primitive (and non-atomic) counter to count the total consumed messages.

private class Subscriber implements Runnable { @Override public void run() { while (true) { String msg = messageQueue.poll(); if (msg != null) { if (msg.equals(STOP)) { break; } else { // race condition right here! counter++; } } } } }

When looking for a race condition among threads, the debug run must be as minimally intrusive as possible.

Once we're satisfied the issue can be reproduced in debug mode, we try setting a breakpoint with logging instead of suspending the program execution. Here again, just the fact we are logging from all threads to the same console may "synchronize" the threads in such a way that will "solve" the bug. We need to be sure we can still reproduce it even if now it might take more attempts. Logging the suspected state can narrow down our options and allow us to see that the problem is not with the number of calls to the method but with the counter field.

Race Condition: Subtle

A race condition such as the one in our previous example will, on most machines, turn out to be a "subtle" race condition. By "subtle" we mean that any modification or change to the runtime environment can "fix" it.

// race condition right here! counter++;

To reproduce the bug we need two threads, both reading the same value: the "second" thread must read the value before the "first" one updates it and flushes its CPU cache. Easy enough to create on multi-core machines during a normal run, but almost impossible to reproduce in a debugging session.

Logging, via a breakpoint, at that same point synchronizes the threads, as they all need to write to the same log. Most likely, this also flushes the CPU caches of all threads, as writing to the log is atomic. In short, it prevents us from reproducing the bug. Suspending either the VM or the thread cannot help us here either, as we can't separate the two instructions (reading the counter value and incrementing it) to break between them. At this point, we need to make some assumptions then prove or refute them. Since we cannot use any breakpoints, our only hope is that we can change the actual executed code and introduce new code that will be compiled and therefore will interfere less.

This is very much a last resort option. A good pattern to help us here is a trace buffer.

Trace Buffer

We can introduce an internal buffer and store the interesting values into this buffer. We must make sure that:
  • We have a buffer per thread, and those buffers are isolated so this does not introduce new concurrency issues
  • Because the buffer is per thread, it also is not thread safe, and therefore does not introduce unwanted synchronization points.
  • The values we insert are not references to real state that can change, but copies or log messages.
  • The introduced code is as minimal as possible, to minimize its effect on the running code.
  • We print or log the contents of the buffers only after the execution has ended, to avoid making the logging action a synchronization mechanism between threads. Another option is to only store the values in the trace buffer, then inspect its contents by putting a breakpoint after the critical part of the code have finished executing.
private class Subscriber implements Runnable { private int index = 0; private final int[] traceBuffer = new int[NUMBER_OF_SUBSCRIBERS_AND_PUBLISHERS * 100]; @Override public void run() { while (true) { String msg = messageQueue.poll(); if (msg != null) { if (msg.equals(STOP)) { break; } else { traceBuffer[index++] = counter; // race condition right here! counter++; } } } } }

For example, here we introduced a primitive int array large enough for all messages, and in order to prove our suspicion of a bug in the counter, we store just the counter values before advancing it. Yes, it may not be the exact value advanced by the counter, but it will reproduce the bug in terms of having the same counter value reported by several threads. After all the events are done, we can inspect the trace buffers and find the duplicates.

Race Condition: Unexpected Flow Control

private class Subscriber implements Runnable { @Override public void run() { String msg; while (true) { msg = messageQueue.poll(); if (msg != null) { if (msg.equals(STOP)) { break; } // else do something } } // Will NOT work with multiple subscribers, as main thread will // wake up when the first subscriber is done. // Using a CountDownLatch here is a much better approach. synchronized (messageQueue) { messageQueue.notify(); } } }

In this example we have another race condition, but the contended shared state is not directly visible and is only deduced by what seems like a wrong flow control: the first subscriber will wake up the main thread, which will exit even if the second thread is still processing a message. We can't inspect or print the waiting thread in this case, but we can inspect the position of the various threads when we suspend the entire application.

  1. First we can suspend the main thread after it wakes up. We can see that sometimes, one of the subscriber threads is still marked running.
  2. We can suspend only one of subscriber threads. That will cause the other thread to notify the main thread. This will prove to us the problem can happen in any subscriber and is in its logic.
  3. Then to be sure, we can suspend the application earlier, just before it notifies the main thread. We can then inspect the status of the two subscriber threads and prove that one of them is still polling, while the other has already finished and notifies the main thread it's done.

    Suspending the application before main thread notified

    We can see from inspecting the threads that both are marked 'RUNNING', which means that while the first is about to notify the main thread it is done, the other is still processing messages.

  4. To prove our assumption beyond any doubt we can also put a breakpoint inside the polling loop of the subscribers. We make that breakpoint depended on the previous breakpoint just before we notify the main thread we're done. Hitting the depended breakpoint (as shown below) proves our theory.

    Breakpoint inside the subscriber

Deadlock

A deadlock occurs when two threads will conflict in such a way that both are preventing each other from working at all. Once they occur, deadlocks are easy to spot by looking at the frames of all threads. We can do this by using Dump Threads. This feature is also available when running. If we know we're chasing a deadlock, running mode is even preferable to debugging. This is because we will not interfere at all with the execution this way, and the snapshot will be the output of a Java thread dump of the application. A thread dump can detect deadlocks and warn about them. For example, in the dump below we can see the process found 1 deadlock between the PublisherThread (which is stuck in line 44) and SubscriberThread (in line 78).

Thread dump
  1. In this example, we can see that both threads are stuck waiting for a lock, which means that another thread is not freeing those locks. We can also see that both are waiting for different locks, as the synchronizer id is different. Even more informative is the lines at the top, they tell us that the two deadlocked threads are holding the lock the other thread is trying to obtain.
  2. This should already give us plenty of information on how the deadlock occurs. If it is still unclear how our code reached a deadlock, we can then try debugging with breakpoints just before we hit the lines provided by the thread dump. When we have a theory of what is wrong, we can try to reproduce the scenario using dependency between breakpoints.
  3. We can now create a Suspend Thread breakpoint on one of those threads and verify, using another snapshot, the other thread reached its deadlock position.

    Breakpoint in Publisher thread
  4. Now we can inspect the state just before one of the threads gets deadlocked.
  5. Another option is to put suspend thread breakpoints on both threads and switch between them. Inspecting the states of the Publisher and Subscriber in this example will show us the confusion that caused the deadlock.

    Breakpoint in the publisher
    Breakpoint in the subscriber
  6. When we inspect our lock instances we can see that the concurrent code was actually correct, but we confused the read lock and write lock between the two objects.
  7. And indeed when we then inspect the constructions (where we injected those locks) we can see the bug
    ThreadGroup threadGroup = new ThreadGroup("Demo"); new Thread(threadGroup, new Subscriber(messageQueue, readLock, writeLock), "SubscriberThread").start(); //passing locks in the wrong order will cause deadlock between publisher and subscriber new Thread(threadGroup, new Publisher(messageQueue, writeLock, readLock), "PublisherThread").start();

Livelock

A livelock is a scenario where threads are not blocked, but still unable to make a progress. From the outside, a livelock should behave just like a deadlock, but because the threads are not blocked, the snapshot (thread dump) will not alert us on any deadlock.

  1. One strategy to try before starting debugging, is to repeat the Thread Dump several times then compare the stack traces for the various threads. This should give us a clear view of the problematic areas in the code, in most cases a loop that the code cannot escape from.

    Livelock snapshot
  2. If we're still unsure, we can use a Pass Count with a large number that we assume we will only reach in a livelock situation.

    Using pass count
  3. Then we can Step and verify exactly what area of code is being executed but not progressing. At this point we can use a Conditional Expression to capture the point in the execution when we enter the livelock situation.

    Using a conditional to capture livelock situation

    In this example, we assume that the STOP message failed to break us from the loop, so either it was never sent or that it was not handled.

  4. Our breakpoint is hit, meaning the STOP message was not handled.

    Breakpoint hit
  5. We step in, inspect state with Evaluate Expression and find the bug. The 'valid' method does not support STOP as a valid message and puts us in this livelock scenario.

    Using Evaluate Expression to test the theory

Summary

Debugging is an extremely powerful tool that allows us to view the state and flow of the source code we run. Via the use of examples of buggy code and some ideas on how to find the bugs using debugging, we have explored many of its features and benefits. We've also looked at its limitations and how to minimize them.

Ultimately, debugging is a way to gain a lot of information. When we look for a bug, we must always compare that information with our expectations, and pay close attention when the code deviates from that, as that is the point where it is so effective. That is the point where we learn.

Last modified: 29 November 2016