CodeSurfer Technology Overview

Dependence Graphs and Program Slicing

GrammaTech, Inc.

Introduction

CodeSurfer, GrammaTech's program analysis, understanding, and inspection system for ANSI C, is based on system dependence graphs, a fundamental intermediate structure for representing programs. Slicing is a particular application of dependence graphs. Together they have come to be widely recognized as a centrally important technology in software engineering, with applications in program understanding, maintenance, debugging, testing, differencing, specialization, reuse, optimization, parallelization, and anomaly detection. Because they operate on the deep structure in programs rather than surface structures, they enable much more sophisticated and useful analysis capabilities than conventional tools. GrammaTech's implementation of components for program dependence graphs and slicing is probably the most advanced in existence.

This overview is structured as follows:

Background and Terminology

Slicing and Chopping. A backward slice consists of all program points that affect a given point in the program. For example, the backward slice shown in Figure 1 indicates exactly which statements influence the output of variable i.

void main() {
    int
i = 1;     int sum = 0;
    while (i<11) {
        sum = add(sum, i);
        i = add(i, 1);
    }
    printf("sum = %d\n", sum);
    printf("i = %d\n", i);
}

static int add(int a, int b)
{
    return(a+b);
}

Figure 1.  Backward slice from printf("i = %d\n", i);

A forward slice consists of all program points that are affected by a given point in the program. For example, the forward slice shown in Figure 2 indicates exactly which statements are influenced by the initialization of variable sum.

void main() {
    int
i = 1;     int sum = 0;
    while (i<11) {
        sum = add(sum
, i);
        i = add(i, 1);
    }
    printf("sum = %d\n",
sum);
    printf("i = %d\n", i);
}

static int add(int a, int b)
{
    return(a+b);
}
Figure 2.  Forward slice from sum = 0;

A chop [5, 6] consists of all program points affected between one point (the chop source) and another (the chop target). For example, the chop between the initialization of variable sum and the output of variable i is empty, reflecting the fact that there is no information flow between the source and target.

Dependence Graphs. The system dependence graph (SDG) representation of a program is illustrated in Figure 3 [2]. An SDG is a collection of procedure dependence graphs (PDGs) [3, 4], one for each procedure.


Figure 3.
The sample program and its system dependence graph.

The vertices of a PDG represent the individual statements and predicates of the procedure. A call statement is represented by a call vertex and a collection of actual-in and actual-out vertices: there is an actual-in vertex for each actual parameter, and there is an actual-out vertex for each actual parameter that might be modified during the call. Similarly, procedure entry is represented by an entry vertex and a collection of formal-in and formal-out vertices. If a procedure references a global variable, it is treated as a ``hidden'' input parameter, and thus gives rise to additional actual-in and formal-in vertices. Similarly, if a procedure assigns to a global variable, it is treated as a ``hidden'' output parameter, and thus gives rise to additional actual-out and formal-out vertices. Each PDG also contains vertices for its local-variable and parameter declarations (not illustrated in Figure 3).

The edges of a PDG represent the control and data dependences among the procedure's statements and predicates. A vertex is control dependent on a predicate (shown in blue) if the predicate controls whether or not the vertex will be executed. An entry vertex is treated as a pseudo-predicate. A vertex is data dependent on an assignment (shown in green) if the value assigned can be referenced in the vertex. There is a dependence edge from each declaration vertex to each of its definitions and uses (not illustrated in Figure 3).

The PDGs are connected together to form the SDG by call edges (which represent procedure calls, and run from a call vertex to an entry vertex) and by parameter-in and parameter-out edges (which represent parameter passing and return values, and which run from an actual-in vertex to the corresponding formal-in vertex, and from a formal-out vertex to all corresponding actual-out vertices, respectively).

Implementing Slicing and Chopping. Slices and chops are efficiently computable by reachability analysis in the program's SDG. For example, the backwards slice from the printf("i=%d\n",i); statement in Figure 1, which contains all computations that affect the value printed, can be computed by traversing edges backwards in the SDG from the print i node.

Some slicing algorithms are precise in the sense that they track dependences transmitted through the program only along paths that reflect the fact that when a procedure call finishes, control returns to the site of the most recently invoked call. Other algorithms are imprecise in that they safely, but pessimistically, track dependences along paths that enter a procedure at one call site, but return to a different call site. Precise algorithms are preferable because they return smaller slices. For example, the precise backward slice from the printf("i=%d\n",i); statement in Figure 1 does not include the statement sum=add(sum,1); even though it is reachable by traversing edges backwards from the print i node. Precise interprocedural slicing and chopping is one of the distinctive innovations of GrammaTech's technology.

Precision. Some slicing algorithms are precise in the sense that they track dependences transmitted through the program only along paths that reflect the fact that when a procedure call finishes, control returns to the site of the most recently invoked call. Other algorithms are imprecise in that they safely, but pessimistically, track dependences along paths that enter a procedure at one call site, but return to a different call site. Precise algorithms are preferable because they return smaller slices. For example, the precise backward slice from the printf("i=%d\n",i); statement in Figure 1 does not include the statement sum=add(sum,1); even though it is reachable by traversing edges backwards from the print i node. Precise interprocedural slicing and chopping is one of the distinctive innovations of GrammaTech's technology.

Program Specialization. Slicing allows one to isolate separate ``program threads'' -- semantically meaningful decompositions of programs -- where the ``threads'' consist of elements that are not necessarily contiguous. For example, Figure 4 shows a scaled down version of the UNIX word-count utility wc. It scans a file, counts the number of lines and characters in the file, and prints the results. The first slice is with respect to the statement that prints the number of characters in the file. The second slice is with respect to the statement that prints the number of lines in the file. The results are, respectively, a program that counts characters and ignores the number of lines, and a program that counts lines and otherwise ignores the number of characters.

 

Original program

Count characters only

Count lines only

void line_char_count (FILE *f)
{    int lines = 0;    int chars;    BOOL eof_flag = FALSE;    int n;    extern void scan_line(FILE *f,                                      BOOL *bp,                                      int *ip);   scan_line(f, &eof_flag, &n);   chars = n;   while (eof_flag == FALSE) {     lines = lines + 1;     scan_line(f, &eof_flag, &n);     chars = chars + n;   }   printf("lines = %d\n", lines);   printf("chars = %d\n", chars); }

void char_count (FILE *f)
{    int lines = 0;    int chars;    BOOL eof_flag = FALSE;    int n;    extern void scan_line(FILE *f,                                      BOOL *bp,                                      int *ip);   scan_line(f, &eof_flag, &n);   chars = n;   while (eof_flag == FALSE) {     lines = lines + 1;     scan_line(f, &eof_flag, &n);     chars = chars + n;   }   printf("lines = %d\n", lines);   printf("chars = %d\n", chars); }

void line_ count (FILE *f)
{    int lines = 0;    int chars;    BOOL eof_flag = FALSE;    int n;    extern void scan_line2(FILE *f,                                       BOOL *bp,                                       int *ip);   scan_line2(f, &eof_flag, &n);   chars = n;   while (eof_flag == FALSE) {     lines = lines + 1;     scan_line2(f, &eof_flag, &n);     chars = chars + n;   }   printf("lines = %d\n", lines);   printf("chars = %d\n", chars); }


Figure 4. A scaled-down version of the UNIX word-count utility and two of its slices. In each slice, the starting point is shown in italics, and the strikethrough regions indicate which program elements are to be removed from the original program. Items in boldface indicate where new names have been introduced.

Component Architecture

GrammaTech's dependence graph and slicing technology is implemented as a collection of ANSI C components organized into the following modules:

  • Front Ends
  • SDG Builder
  • Analysis Engine/Slicer
  • Transformation Engine
  • Scripting Languages
  • GUI

The data flow between these modules in a typical end-user tool is shown in Figure 5.


Figure 5.
The architecture of the component toolkit.

  • Front ends for multiple languages (FEi) emit a common preliminary intermediate representation (Pre-IR) designed to require minimal analysis to produce. The Pre-IR includes abstract syntax trees (ASTs), control flow graphs (CFGs), and symbol table information.
  • The Builder performs extensive static analyses to produce the common intermediate representation (IR), which extends the Pre-IR with dependence graphs, i.e., the SDG.
  • The Query Engine provides an API to the IR and a collection of built-in primitive analyses (e.g., slicing) that are efficiently implemented by computation over the IR. The API is lifted as an abstract data type to scripting-language interpreters (SLi).
  • The Transformation Engine supports generation of new code from old.
  • Clients write scripts to perform complex analyses on the IR in terms of the API and primitive analyses supported by the Query Engine and the Transformation Engine.
  • One particular script written in STk (Scheme with Tk widgets) implements an optional GUI.

CodeSurfer, a program understanding tool for ANSI C built with GrammaTech's dependence graph and slicing components, which is illustrated in the next section, has the architecture illustrated in Figure 5.

Program Understanding Tool Walkthrough

This section demos CodeSurfer, GrammaTech's program understanding tool [7]. A project is built by invoking make on the program's standard Makefile, with CC redefined to invoke CodeSurfer rather than the C compiler. This generates the CFGs for each source file, and the SDG for the project.

Figure 6 shows CodeSurfer's project viewer, which lists the source files, globals and statics, and include files of the project. This particular project, which is one of the SPEC95 benchmarks, contains two files. The area just below the menu bar displays information about the current status of the project. In the figure, the status area indicates that the SDG is up to date for this project, and that no operation has been invoked or is being displayed.


Figure 6.
CodeSurfer's project viewer.

Clicking on one of the [+] icons reveals the list of functions contained in the file, and double-clicking on one of the function names opens a file viewer on the file.

Figure 7 shows a file viewer after double-clicking on function decompress. Text in italic indicates preprocessed areas: regions of text that do not correspond to nodes in the SDG (comments, #-directives, and excluded conditionally compiled code) and areas whose correspondence to the SDG is not direct (macro expansions).


Figure 7. CodeSurfer's file viewer.

The user can select any text in the file, and invoke commands to do forward and backward slicing, chopping, or display immediate successors and predecessors. The slice points must correspond to nodes in the SDG. If the user selects text that does not intersect with any PDG nodes, then a warning is issued. In general, text is displayed in a distinctive style for each property that holds at that point.


Figure 8. Color coding of text properties.

For example, say the user selects function main and does a forward slice. This will color every reachable statement in the system red. Figure 8 shows how some properties are displayed. Underlines indicate the query point (i.e., the point at which the slice was initiated) and red indicates the query results (i.e., the points in the slice).

Figure 9 shows the project window after doing this slice. The status area has been updated to show the kind of operation being displayed, and the number of vertices in the PDG selected by that operation. Evidently, six functions (shown in black) are unreachable. The red bars at the right are explained below in connection with Figure 10.


Figure 9. Slice results displayed in the project viewer.

Double clicking function ran opens a file viewer on file harness.c and scrolls to the definition of ran.


Figure 10. Slice results displayed in the file viewer.

Program elements that are in the slice result are shown in red. The tick marks in the summary bar on the right indicate where slice results are located in the file. The length of the summary bar represents the length of the file. A tick mark half way down the summary bar indicates a slice result half way through the file. Large empty spaces in the summary bar indicate either dead code, comments, code conditionally compiled out, or commented out code. For example, the first part of function ran is not red because it does not correspond to any vertices in the SDG. The next part of function ran is not red because it is not reachable from main.

Figures 12 through 16 illustrate the utility of backward slicing. The user has selected the loop test code>=0 in function decompress and invoked the backward-slice operation. The loop test is data dependent on the two statements that can assign the value of code being tested: code=255 and code--. The loop test is also control dependent on the entry to procedure decompress: unless decompress gets called, the test will not be executed.



Figure 12.
Surfing the backward slice from within function decompress

Clicking on the tick marks earlier in the file scrolls to other program points in the slice. Alternatively, a dependence link can be selected from a pop up or property sheet and you can surf there. Figure 13 shows the part of the file that includes the call to function decompress, contained in the else-branch of a conditional statement that tests variable do_decomp. It also shows that do_decomp gets its value from action.


Figure 13
. Surfing the backward slice from within function decompress, continued.

Selecting a dependence link from a pop up at do_decomp=action; navigates to the place where action gets its value. This is the header of function spec_select_action, where formal parameter action is declared, as illustrated in Figure 14.

 


Figure 14
. Surfing the backward slice from within function decompress, continued.

And yet again selecting one of the dependence links from the pop up at the definition of function spec_select_action navigates to its caller(s), as illustrated in Figure 15. Actual parameter oper, corresponding to formal parameter action, is in in the slice, as are the definitions of oper.

 


Figure 15.
Surfing the backward slice from within function decompress, continued.

Figure 15 shows one of the approximations made by slicing: both calls to spec_select_action are in the slice even though only the second can result in a call to decompress. However, discovering this would require constant propagation along paths, something slicing does not do.

Finally, Figure 16 shows the same slice in the project viewer.


Figure 16. Backward slice shown in project viewer.

Applications

Slicing is a broadly applicable static program-analysis technique. Applications of slicing include program understanding, debugging, testing, parallelization, re-engineering, maintenance, etc. Here are some examples of how slicing helps with these tasks:


Figure 17.
A chop between a high-security input (the return value of procedure
read_high) and a low-security output (the parameter to procedure write_low)
reveals the possible information flows.

The dependence-graph representation of programs can be used in many other software-engineering and software-reengineering applications:

Component Details

This section describes the dependence graph and slicing components in slightly greater detail.

The C front end consists of

The SDG builder produces the PDGs for the procedures, and the SDG for the program in the following phases:

The Analysis Engine provides the following services:

The User Interface provides:

References

1. K.J. Ottenstein and L.M. Ottenstein. The program dependence graph in a software development environment. Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, 1984, pp 177-184.

2. T. Reps, S. Horwitz, and D. Binkley. Interprocedurral slicing of computer programs using dependence graphs. Patent No. 5161216. USA. 1992, Wisconsin Alumni Research Foundation.

3. J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, Vol. 9, No. 3(July) 1987, pp. 319-349.

4. J. Beck and D. Eichmann. Program and interface slicing for reverse engineering. In Proceedings of the Working Conference on Reverse Engineering, Baltimore, MD. IEEE Computer Society Press, 1993, pp. 134-143.

5. D. Jackson and E.J. Rollins. Chopping: A generalisation of slicing. Technical Report CMU-CS-94-169, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, 1994.

6. T. Reps and G. Rosay. Precise interprocedural chopping. in Proceedings of the Third ACM SIGSOFT Symposium on the Foundations of Software Engineering, 1995.

7. CodeSurfer User Guide and Reference: http://www.grammatech.com/csurf-doc/manual.html

Acknowledgements

This work was partially supported by ONR contract N00014-97-C-0072 and DARPA/ITO contract DAAH01-99-C-R192.

Home | Products | Services | Support | R & D | Papers | News | Corporate | Partners | Download | Contact Us

Copyright © 2000 GrammaTech, Inc. All rights reserved.