Qodana / Static Code Analysis Guide / Control Flow and Data Flow Analysis

Control Flow and Data Flow Analysis

Analyzing static source code is important for finding and fixing bugs, and security weaknesses before your code is executed. Solving these issues earlier in the development process saves you time, reduces the unmanageable accumulation of technical debt, and lets you fix problems before they have serious consequences.

One of the most important functions of static code analysis is highlighting potentially deep-set bugs that could settle into your codebase, so you can remove them before they impact your product’s performance.

Two elements central to this process are control and data flow analysis. These two methods of analyzing code work differently, but complement each other to minimize potential security vulnerabilities in your codebase.

Let’s take a closer look at what control and data flow analysis are, explore the differences between them, and discover their role in effective code analysis.

What is data flow analysis (DFA)?

Data flow analysis (DFA) is a technique that analyzes and tracks the flow of data and the way it moves through a program. It determines data paths and highlights how the program defines, uses, and shares data.

This process makes it possible to identify how variables change over time or across branches. It supports detecting potential vulnerabilities and errors before you run your code, including conditions that are always false or true, endless loops, and null pointer dereferences, working with other processes to spot problems.

DFA’s role is crucial in finding and fixing security and quality problems in your code before execution. It also improves efficiency by helping you remove dead code, unused variables, and redundant computations.

Static data flow analysis is performed without running code. But DFA also exists as a dynamic process, which tracks how data moves during runtime. This process supports static DFA by tracking variable states and highlighting errors and anomalies during execution.

Taint analysis also uses DFA to spot security risks. This data flow technique focuses on security by monitoring untrusted sources (tainted user inputs) and assessing the data flow. It can identify any tainted data that reaches sinks to flag possible vulnerabilities, such as SQL injection

Types of data flow analysis

There are several common types of data flow analysis, all of which look at different aspects of how data moves through a program. The information produced by each DFA type is used to identify specific issues and highlight potential actions for your development team.

Data flow analysis can operate in either a forward or a backward direction. The direction you need depends on the problem you’re looking to solve.

Forward data flow analysis follows the natural execution order of the program. It determines what information reaches certain points in your code by propagating earlier statements to later ones. It’s commonly used to understand how variable definitions, values, or expressions flow through a program.
Backward data flow analysis propagates information in the opposite direction, from the end of the program to the beginning. It determines what information is required later in execution and how future uses of variables influence earlier statements. This analysis moves from successors to predecessors in the control-flow graph.

Reaching definitions

This forward method tracks variable definitions and identifies the point in the program where it can affect it with a particular use. It can determine any data flow from sources to sinks as part of taint analysis.

Purpose: Identifies variables you can optimize (replace with its value) or dead code you can eliminate.

Live variable analysis

This backward data flow analysis process finds the point when a variable or expression is live and when, or if, its value is required in the future.

Purpose: Helps you spot inefficient code patterns, determines variables to optimize if they’re not live and can be reused for other variables, and those you can safely remove when they’re never used.

Available expressions

This forward process tracks computed expressions that are always valid and available from the program start to that point. Any computed expressions identified can be reused rather than needing to be recalculated.

Purpose: Identifies optimization opportunities for reusing results of computed expressions and common subexpression elimination.

Constant propagation

Constant propagation tracks the values of constants, including specific points in the program that use their value. It helps you get rid of unnecessary computations and simplifies expressions. Constant propagation works alongside taint analysis, as both optimize code that helps improve security.

Purpose: Determines constant (unchanging) folding opportunities and other optimization possibilities.

What is control flow analysis (CFA)?

Control flow analysis (CFA) is a form of static code analysis that assesses every execution path in your program. It tracks the flow of control to understand processes and data manipulations. CFA creates a control flow graph (CFG) that represents all possible execution paths and routes in your program.

This analysis allows you to see how data passes between different parts of your codebase so you can understand its behavior. It makes it easier to spot issues like infinite loops and unreachable code before execution, improving the quality and security of your code.

CFA detects problems like unreachable code, infinite loops, faulty conditions that mean certain code can never execute, and exception handling paths that are never taken. The technique helps you find and fix improper branching that causes errors, and improve code readability.

Control flow analysis is key to data flow analysis, as DFA operates along control flow paths. It helps you understand how data flows through your program and how the various parts interact. CFA and DFA work together to analyze structure, highlight issues, and help you fix them before you run your code.

Control flow path

Control flow paths show you the sequence of instructions and the way data flows during the execution of your program, via control flow graphs (CFGs), which visualize the process. Each path shows a potential route for how your code moves from one stage to the next.

Control flow graphs include entry and exit blocks that show the control flow. They also contain decision points, which highlight where control can branch off, like in an if statement.

These paths guide logical execution to help your code run optimally through each stage. They can also make debugging easier by enabling responsive decisions and loops.

Examples of control flow paths include:

Linear sequences (straight-line code): The simplest execution order. Each segment of code has a single entry and exit, and instructions follow each other sequentially with no branches or loops.
If/else branching: A two-path system. When an if statement condition is met, it’s executed, but if it fails, the else clause activates and offers an alternative that ensures one of the defined blocks runs.
Loops: Constantly executing statements until a condition is met. This processes data efficiently and creates a cycle in a control flow graph.

There are a number of inspections related to control flow path which exist in this list of JetBrains and Qodana inspections.

Importance of control and data flow analysis in static code analysis

DFA and CFA both play important roles in static code analysis, working together to help improve your code quality and security before runtime.

Data flow analysis identifies bugs such as uninitialized variables, optimizes performance by removing dead code, and improves security by detecting vulnerabilities earlier, allowing your developers to fix them before execution.

Catching errors in advance means being able to improve your code faster, as well as reducing issues during execution. It’s important for security analysis too, as taint analysis relies on DFA to monitor potentially untrusted data and prevent vulnerabilities before runtime, such as SQL injection and cross-site scripting (XSS).

CFA is key for static code analysis as it enables DFA. Understanding program execution paths with CFA highlights problems like dead code, infinite loops, unexpected states, and security check bypasses early on, allowing you to address them before execution.

DFA focuses on data and variables, keeping an eye on where data goes and how it changes, while CFA looks at the paths and routes data can take. When they work together, they make it easier to identify issues, fix bugs, and protect against security vulnerabilities without having to run your code.

Read our taint analysis guide to find out more about how DFA and CFA help identify vulnerabilities and improve security.

What Is Taint Analysis?

Learn how this technique catches risky data paths, prevents exploits, and strengthens overall application security.

What is Code Vulnerability?

Learn what code vulnerabilities are, how they impact software security, and how to detect and fix them early with static analysis tools like Qodana.

Code Review for Security

Software reviews should span more than simple issues and smells. Keep your codebase secure and compliant with a code review for security.