Graphviz calculates flexible graphs

The more elements a graph contains, the more complicated the placement of nodes. Instead of working on your project into the night, why not pass the task on to the computer – equipped with Graphviz [1] – to do the perfect job? Graphviz is based on an open format, called DOT, which uses simple text files.

Most distributions already have Graphviz in their repositories. With Ubuntu, you install the software by using the apt-get install graphviz command.

Basics

Listing 1 shows the basis for a simple diagram. The first line defines the diagram by name (G ). All components enclosed in curly brackets are related to this diagram – in this case, the connections for a tree diagram. End each line with a semicolon.

Listing 1

Simple Diagram

digraph G {
   DOT -> "Organizational Diagram";
   DOT -> "Arrow Diagram";
   DOT -> "Mind Maps";
   DOT -> "Network Plans";
}

Graphviz includes a renderer that creates the diagrams. It calculates the best order the individual elements should take. The standard renderers are dot , neato , fdp , circo , and twopi . The result depends on which renderer you choose.

The software supports a number of output formats. The most common are EPS, JPG, PNG, SVG, and PDF. You can get the complete list with dot -T? . If you select svg , you can edit the result again in a drawing program.

The actual conversion is achieved with a simple command. Figure 1 shows the result of the following command:

Figure 1: A simple command converts the entries in Listing 1 to a diagram.
$ dot -T png -o fig01-diagram.png fig01-diagram.dot

The description language essentially knows three objects with which you can create a diagram (Table 1). Properties that you set for these objects apply to all objects – provided you give the property a new value in each case.

Table 1

Objects

graph A directed graph is called a digraph ; an undirected graph is designated a graph . Every graph can contain further graphs.
node Nodes that are connected in a certain way. You don't need to define a node in advance; it gets defined during its first use.
edge Link between two nodes. A directed link occurs when two nodes connect with a -> .

Labeling

By default, Graphviz uses the name of the node as its label. Node names can't have just any characters, and they especially cannot contain special characters. If you want to use space or special characters or breaks, then use label to define the label and enclose the content in quotes. Specify a break with the \n escape sequence. The code in Listing 2 shows an overview of the sequence of schools in the United States (Figure 2).

Figure 2: Chart of educational flow in the US (more or less).

Listing 2

Educational Flow in the United States

digraph G {
    PK  [label="Pre-school"];
    P   [label="Primary"];
    S   [label="Secondary"];
    VT  [label="Vocational / Technical"];
    CC  [label="Community College"];
    UP  [label="Undergraduate Programs"];
    MD  [label="Master's Degree"];
    PS  [label="Professional Schools"];
    PHD [label="Doctoral Studies"];
    PDS [label="Postdoctoral Studies / Research"];
    PK->P;
    P->S;
    S->VT;S->CC;S->UP;
    CC->UP;
    UP->MD;UP->PS;
    MD->PHD;
    PS->PHD;PS->PDS;
    PHD->PDS;
}

The result isn't necessarily optimal; it would have been better to flow from left to right rather than top to bottom. The rankdir attribute of the graph object sets the chart's orientation. The values for it are TB (top to bottom), LR (left to right), RL (right to left), and BT (bottom to top).

Changing Appearance

You can change the appearance of a node with various attributes, including shape=box . Keep in mind that, with some forms, the text might bleed beyond the boundaries of the box – one of those unsolved problems of automation. You may have to tweak things by hand by resizing the box as needed.

You set the text color with color and the background color with fillcolor . The program allows color names, color values in HTML style (e.g., #F101FF ), and color numbers. The fillcolor only works, however, if you set the style to style=filled . You can find an overview of the color settings online [3].

The software normally determines the height and width of a node automatically, but you can set minimum height and width values.

If you set the fixedsize=true attribute for a node, the program interprets both values as set and doesn't exceed them. The entry is in inches. (An inch consists of 72 points – a measurement used in typography.)

The style attribute allows you to set the node's appearance (the type of frame). Graphviz knows filled, rounded-corner, dotted, dashed, bold, and other forms (Figure 3). With the peripheries=2 attribute, the software draws a double line; with style=invis , you can hide the node.

Figure 3: With the "style" attribute, you can change a node's appearance.

You set the text font for a label with the fontname attribute and the text size with fontsize . The fontpath defines the path to the font directory. Alternatively, you can use the environmental variables DOTFONTPATH or GDFONTPATH .

You label a node with its name or with the label attribute. You can also use the labelloc attribute, whereby the renderer positions the text. The value t sets the text at the top, and b sets it at the bottom. A labeljust value of r sets the text right-justified, and l makes it left-justified. By default, Graphviz centers the text.

Changing Edges

What works for text and boxes also works for lines: The style attribute also determines their appearance. As with node lines, you can make them solid, bold, dashed, dotted, or invisible (Figure 4). You can use the same values for line colors as for nodes. Line endings can have various styles. You can use the arrowhead attribute to set the tip, arrowtail to set the tail of the arrow, and arrowsize to change the size of the tip.

Figure 4: As with boxes, you can modify lines as desired.

The dir attribute sets the arrow direction, with both for both ends, forward for one direction, and back for the opposite direction. You would use none for a line without arrows. You'll find a comprehensive overview of arrow styles online [4]. With the attributes headclip=false and tailclip=false , you can set the line not to run to the edge of the node but to the middle. By default, the software creates a connection point at the edge of the node. To combine connecting lines, use the sametail or samehead attribute, depending on which end (see the sametail=groupB example in Figure 5).

Figure 5: You can combine connecting lines for a better appearance.

Graphviz provides three ways to label a connecting line. The label attribute puts the text in the middle of the line; headlabel puts it above the line. Placing a label at the top or bottom sometimes causes the text to come too close to the line or node. Here the labelangle and labeldistance attributes will help.

You can use all three types of line labels simultaneously. The decorate=true attribute underlines the text, and the labelfloat=true attribute allows overlapping labels, which can make the graph more compact.

Additionally, you can edit the text with the labelfontcolor , labelfontname , and labelfontsize attributes. The labelangle and labeldistance attributes determine where the text should be placed. Polar coordinates are used in this case. labelangle sets the angle in degrees, and labeldistance sets the distance from the node.

Overall View

The size attributes determines the maximum size of the entire image, such as in graph [size="0.5,0.5"]; ). Should the graph exceed this size, the software scales it accordingly so that it conforms to the given values. The program reduces the size while keeping the aspect ratio the same.

If you add an exclamation point to the size value, Graphviz always scales the image to the specified size. If the graph is too small, the program increases the size, and vice versa. In this operation, the tool respects the aspect ratio. The attribute interacts with the ratio attribute that determines the height/width aspect ratio.

The page attributes sets the size of the page. If the graph is bigger than the page, rectangular areas of the image can land on multiple pages. Thus, you can create something like a poster (an output only for multipage image formats). The margin attribute sets the page margins. If you set page with valid values, pagedir determines the page orientation – portrait or landscape.

The nodesep attribute sets the minimum distance between two nodes fixed on the same level, and ranksep sets the minimum vertical distance between two vertically placed nodes.

If you add equally after a number, the levels are always equidistant. Especially in hierarchical diagrams, it's important to have all the nodes on an equal level. With the rank=same attribute, you can form groups (line 9 in Listing 3). Many other attributes are available, some of which you might use sparingly for special applications. A complete overview is in the Graphviz online handbook [5]. You can also enhance Listing 2 quite a bit (Listing 3) so that it makes a significant improvement in appearance (Figure 6).

Listing 3

Improved Chart of Educational Flow

digraph G {
    graph [rankdir=LR,nodesep=.5,ranksep=.5];
    node [shape=box,style=rounded,width=3];
    PK  [label="Pre-school",fillcolor="#99FF99",style=filled];
    P   [label="Primary",fillcolor="#99FF99",style=filled];
    S   [label="Secondary",fillcolor="#99FF99",style=filled];
    { rank = same; VT; CC; UP; }
    VT  [label="Vocational / Technical"];
    CC  [label="Community College"];
    UP  [label="Undergraduate Programs"];
    MD  [label="Master's Degree"];
    PS  [label="Professional Schools"];
    PHD [label="Doctoral Studies"];
    PDS [label="Postdoctoral Studies / Research"];
    PK->P;
    P->S;
    S->VT[arrowhead=onormal];
    S->CC[arrowhead=onormal];
    S->UP[arrowhead=onormal];
    CC->UP;
    UP->MD;UP->PS;
    MD->PHD;
    PS->PHD;PS->PDS;
    PHD->PDS;
}
Figure 6: With a few additional attributes, you can go beyond the default output to give a graph a more attractive appearance.

Renderers

With the DOT description language, you can specify what nodes there are and how they relate. The renderer defines the position in the diagram.

The dot renderer draws the diagram strictly hierarchically, and the diagrams always have a fixed orientation (see Figure 7).

Figure 7: The "dot" renderer.

The neato renderer arranges the nodes circularly from the center outwards (Figure 8) and is good for mind maps with symmetrical layout. The software tries to reach the closest possible interaction among the nodes. Straight-line connectors are the most appropriate for showing this kind of symmetry.

Figure 8: The "neato" renderer.

The fdp renderer produces similar results (Figure 9) to neato but tries to reach a wide interaction among the nodes, and it distributes the nodes evenly on the canvas.

Figure 9: The "fdp" renderer.

With circo , Graphviz tries to maintain the existing hierarchy and build the nodes similarly to neato (Figure 10).

Figure 10: The "circo" renderer.

The twopi renderer tries to resolve the hierarchical structure, similarly to neato (Figure 11).

Figure 11: The "twopi" renderer.

The sfdp renderer, like fdp , tries to resolve the structure, but it uses a multiscale approach to render large graphs in a short time (Figure 12).

Figure 12: The "sfdp" renderer.

For grouped graphs, osage is a good choice (Figure 13). You can find many more examples online [6].

Figure 13: The "osage" renderer.

Conclusion

Graphviz offers a lot of potential to fulfill tasks with just its automatic functions. The program has a lot to offer in terms of formatting boxes and arrows. Note, however, that Graphviz does not provide ways to define your own styles; therefore, some standard graph types cannot be properly rendered.

Additionally, labeling is with text only, and use of other elements, such as math formulas or images, is not supported. The additional Dot2tex [7] tool can provide some help in positioning objects and converting the output to the corresponding LaTeX format. l

The Author

Michael Niedermair teaches in the Municipal Vocational College for Information Technology in Munich (https://www.bsinfo.eu), where he is coordinator for the Programming and Application Development department. He writes a lot, especially lesson scripts where he renders visual diagrams, entity-relationship/Unified Modeling Language (ER/UML) diagrams, and the like with Graphviz.