Minimizing load time for web pages

3dfoto, 123RF

3dfoto, 123RF

Optimized

,

Web page loading time relies on a complex interplay among the web server, the web page, and the web browser. Learning a few tricks can help speed up load times for the pages you create.

Web page optimization can be approached from two sides: the user and the editor. Users can improve loading times by optimizing the configuration of a web browser and removing unwanted content with the help of various browser extensions.

In this article, however, we will focus on the editor side. We will be looking at issues that can assist you in preparing and displaying the contents of your web pages.

As an editor, you are responsible for the content of web pages. In addition to creating the individual elements of the web presence like text, tables, and images, editing tasks frequently include web page design.

This involves format templates in the form of cascading style sheets (CSS) [1] and also active content produced with JavaScript, such as the popular components Ajax, jQuery, and JSON. You can tame these technologies with the help of various tools for validating content.

Readability

Formulating content that is as easy to read as possible may sound like a totally obvious concept. Even so, authors frequently devote only passing attention to readability. Ideas and the text that originates from them need to be expressed clearly so that, for example, automated translations are successful, thereby allowing you to reach readers from all over the world.

The back translation of an automatically generated translation is a good indicator of the international attractiveness of a web page (Figure 1). You can use the translation services from Google [2] or Yandex (in Russian) [3] to that end. If you cannot easily recreate the meaning of the text after the second translation, then you should think about using less complicated grammar and simpler sentence structures. This approach gives quick and improved results and noticeably increases success.

Figure 1: "If" by Rudyard Kipling is still easy to understand after an automatic translation into Spanish and than back into English.

Generators

Whether you create web pages automatically or on the basis of a template, you should take care to follow the suggestions in the "Web Page Generators" box. Following these suggestions might look like a lot of work, but the end effect will be that things will run faster on a variety of levels.

Web Page Generators

  • Is the font correct? Whenever possible, you should use Unicode.
  • Does every page of the web presence have a suitable and informative title?
  • Does each page in the heading section have key words that match the page content?
  • Does the web server deliver an HTML document that is completely valid?

Web browsers process valid data significantly faster because there is no post-processing on the part of the rendering engine.

Additionally, search engine crawlers will select the content and the page titles together with the corresponding key words, thereby figuring out how to make sense of your web page.

Web page identification and relevance of a web page to search requests depend in part on how the questions listed above are resolved. If done properly, the search engine will be able to categorize your web page more precisely according to its own criteria, and the page can later be found again via the index of the search engine.

This increases the hit rate and the number of visitors, which in turn has an effect on the impact and relevance of the web presence. Content and advertisements don't count for much unless the page also has visitors.

Automatic

Documents composed in a markup language such as XML, LaTeX, WML, Markdown, or AsciiDoc provide the starting point for a web page. The HTML code created from these languages should be checked for accuracy after export. This step is usually done as part of optimization when trimming and simplifying the HTML output [4] and the CSS files [5].

A variety of tools and formats are available for exporting HTML, including Pandoc [6] and AsciiDoc [7], which use a formulation similar to wikis, DocBook [8], as an intermediate step, and also directly via Docbook2html.

If your documents are based on LaTeX, then you are probably already familiar with the classic LaTeX2HTML [9]. Because this language has not been developed since 2001, it makes sense to take a look at its successors TeX to HTML translator (TtH) [10], HyperLaTeX [11], PlasTeX [12], and tex4ht [13].

If you are using XML, then Saxon [14] and Htc-py [15] are helpful. An XHTML document is by definition also an XML document. If correctly exported, there are no problems for either an XML parser or most browsers. However, Internet Explorer has trouble handling XHTML documents, so it would be better to use HTML5. If you need XHTML5 because of SVG or MathML, for example, then it's best to develop polyglot documents [16].

Validating

Even though documents are generated automatically, this does not ensure that documents created during each export will comply with all of the conventions of the HTML standard.

You should always monitor the output and include HTML, CSS, and JavaScript. The result will be that you reduce errors in the display for which HTML and CSS are responsible, and in the execution when JavaScript, Ajax, jQuery, and JSON come into play.

Furthermore, a user's web browser will have an easier time correctly interpreting and displaying data it receives. As a side benefit, the network load gets reduced because fewer requests and data packets need to be sent back and forth between the web server and the browser.

The W3C Markup Validation Service [17] is the reference for validating HTML code. The service provides a reliable report for entire websites or just individual HTML files. Files get uploaded via a form, and it's easy to figure out from the report where cleanup and improvement are needed (Figure 2).

Figure 2: These are results for the validation of an existing web page from the W3C Markup Validation Service.

The XML Schema validator [18], which is included in the service, is fastidious but it specializes in the XHTML dialect. The Firefox plugin validator [19] and HTML validator [20] can also provide helpful assistance. They display the results of the test as a separate window. These results are based on a method established by the W3C in combination with the tools Tidy [21], Tidy for HTML5 [22] and OpenSP [23].

Check accuracy for JavaScript code is more difficult. In practice, JSLint [24] and JSHint [25] have proven helpful. Both tools can be used via a text field that is provided on the web page for each project.

After entering the JavaScript code into the field, you will immediately get an evaluation of the complexity of the program code and also a list of the errors that have been discovered. Offline tests include Acorn.js [26] and ESLint [27] in addition to JSHint.

You can use the npm package manager from Node.js for installing both of these command-line tools. In this way, you keep installation of these components separate from the package management of the distributions.

Validating CSS

It is easy to forget to check formatting guides in the form of CSS. However, detecting errors in these files is quick and easy with CSSTidy [28].

CSSTidy not only validates CSS code, it also analyzes and optimizes key terms in the code. The program uses RGB notation to translate things like colors from word form to the corresponding color code; for example, the word "white" is translated to #FFF.

Moreover, CSSTidy removes superfluous spaces, semicolons, and redundant assignments. The output shown in Listing 1 illustrates this with a sample invocation. Altogether, CSStidy reduces the size of the example by more than 25 percent.

Listing 1

Using CSSTidy

$ csstidy style.css
Selectors: 24 | Properties: 100
Input size: 2.922KiB  Output size: 2.134KiB  \
Compression ratio: 26.97%
-----------------------------------
body {
background:#FFF;
color:#000;
font-size:medium;
}
img {
border:none;
}
[...]
3: Optimised color: Changed "white" to "#FFF"
4: Optimised color: Changed "black" to "#000"
20: Optimised color: Changed "#DD0000" to "#D00"
38: Optimised color: Changed "white" to "#FFF"
46: Optimised color: Changed "white" to "#FFF"
47: Optimised font-weight: Changed "normal" to "400"

Combination

Many websites distribute the format templates into different files, putting them back together later. As far as possible, you should collect these different files into one single file so that the browser need not open a new connection for each additional CSS file.

Be careful to reference the format templates in the heading section of the HTML file, because modern browsers try to load referenced files in parallel. The cache for the web browser takes on the role of buffering external files. The browser will only reload the files from the original source via an explicit reload.

You should also check whether the content of the web page becomes accessible to the reader even without a format template. It is possible that many readers will use the text browser you offer, but others may have deactivated CSS in their web browser or the format template may have gotten lost during transmission. You should likewise be careful that the web crawlers for the search engines are only interested in the contents of the web page and that CSS itself attracts little attention.

Optimized

As stated previously, a complex process exists behind the representation of a website on your monitor. Part of this complexity is because of the display of illustrations in the text flow.

If the rendering engine of the browser already knows the image size, it can reserve an appropriate space and add the image data, which loads more slowly, into the correct spot in the layout after the transfer is complete.

The images need alternate text (ALT attribute) and correct size specifications in the IMG tag so that they load with the least amount of computing cost and time. During the data transfer, the web browser will show the alternate text in the placeholder. Visually disabled persons, as well as search engines, can profit from good descriptions of the images. Image scaling turns out to be a disadvantage in this step.

It does not make sense to move a large image together with the accompanying data volume over the connection only to have the rendering engine turn it into a smaller size that fits.

The image size also influences the processing in the browser cache. Sizes that are powers of 2 have an advantage (e.g., 8, 16, 32, 64, 128, 256, 512, and 1024 pixels). The cache internal processing and the page alignment work at their most efficient when dealing with these multiples.

Loaded Later

HTML5 includes functions that already load the content before a visitor to the web page explicitly calls it. For example, this technique is used for teaser text that points to additional content, such as a complete article. News portals favor this function on the title page. Many content management systems come with this type of function already integrated.

From a technical point of view, this content is referred to as "loadable later." HTML5 provides the link attributes prefetch and prerender [29] for this purpose. The first of these attributes only loads the referenced resources. The second attribute additionally prepares the entire page in the background. Listing 2 shows two links specified accordingly as an example.

Listing 2

Prefetch and Prerender

<link rel="prefetch" href="http://www.meineurl.de">
<link rel="prerender" href="http://www.andereseite.de">

As soon as the user calls the corresponding link, the web browser loads a page in the background and displays it with no further load and computing time. This approach reduces load times and better utilizes the network bandwidth. However, it also causes additional network load and generates entries in the browser cache even for pages you have not actually visited.

The entire process only functions if the corresponding option has been activated in the web browser. Firefox comes with prefetch in its standard configuration.

If needed, you can monitor the corresponding setting in the network.prefetch-next key under about:config (Figure 3). Firefox does not offer an option for controlling the function via the configuration dialog.

Figure 3: Prefetching, which is the automatic loading of content before it actually gets called, can be turned off in Firefox via the internal settings.

Clean Code

When you use dynamic content created with JavaScript, PHP, Perl, or Python, you should use the most effective programming language that is available for the web page. And, if possible, you should always use the most up-to-date version.

Remember to take the usual principles of good programming into account, including readability, documentation, and modularity of the components. Using templates reduces the number of errors and makes it possible to have a unified site that is easier to maintain.

The complex browser differences of the past were cause for lots of extra work and lots of gray hair. Today, however, the developer has to keep all of the possible output devices in mind.

The reader with a smartphone has different requirements for the web page than a PC user. You can identify the device being used with a little PHP code snippet like the one shown in Listing 3 and then you can send back a specific format template. If you are using several different format templates, then they should all be referenced in HTML5 similarly to Listing 4.

Listing 4

Referencing Format Templates

<link href="iphone.css" rel="stylesheet" \
  type="text/css" media="only screen and (min-width: 0px) \
  and (max-width: 320px)">
<link href="ipad.css" rel="stylesheet" \
  type="text/css" media="only screen and (min-width: 321px) \
  and (max-width: 768px)">
<link href="style.css" rel="stylesheet" \
  type="text/css" media="only screen and (min-width: 769px)">

Listing 3

Identifying User's Device

if(strpos($_SERVER['HTTP_USER_AGENT'], "iPhone"))
{
   // Instructions for a visitor with an iPhone
}

A few stumbling blocks can trip you up when information about size is used in format templates. To make the various output devices scalable, you should always use the em [30] specification. This unit has a long history in typography and is used to measure the horizontal width and the number of letters. In CSS, it will define the number of pixels and let you measure width and height even though only the proportions of the web page elements are of interest.

If you have not specified the size of the contents on the web page by means of the BODY tag, then the settings of the user will apply. This also applies to the specifications of font type and size. You can always enter a generic font like sans or sans-serif as a fallback solution. If the requested font is not found in the visitor's system, then the browser will at least load a usable alternative.

Compact Code

Although the shape of the HTML source text may be quite important to you as a web developer or an editor, the web browser ultimately pays no attention. It will ignore spaces, indentations, and line breaks. Thus, it makes sense to set up a compact and cleaned up version of the web page on the web server. This significantly decreases the data volume to be transferred and the preparation time of the web page.

Many tools can be used for the cleanup process, primarily as part of the HTML Tidy project described above. Examples include the Java-based JTidy [31], the Perl version PTidy [32] and the Python interface for TidyLib [33].

Via the libhtml-clean-perl package, users of Debian based distributions enjoy access to the workings of the htmlclean program, which assumes this task with appealing output (Listing 5).

Listing 5

htmlclean Output

$ htmlclean -v *.html
  2317   1999 13% impressum.html
  3669   3276 10% index.html
 15361  13823 10% neuigkeiten.html

In order of appearance, the columns in the output include the original size of the file, the size of the compressed version, the degree of reduction, and the file name. Additionally, htmlclean creates an archive file with the extension .bak so that the original file remains intact.

To avoid having to constantly compress files manually, you can use the mod_tidy [34] module for the Apache web server.

Conclusion and Outlook

This article discussed some of the tools available to help you optimize your web pages. It also provided information on how to use these tools to formally check the correctness of your web page content and optimize it for rendering. For further information on this topic, you can refer to a Firefox lecture given by Frank Richter at Chemnitzer Linux Day 2010 [35], which provides details about various extensions along with concrete examples. Additionally, the caching tutorial by Brian D. Davison [36] explains how to optimize the organization of data at the meta level.

Acknowledgements

The authors thank Werner Heuser, Wolfram Eifler, Wolfram Schneider, and Thomas Osterried for their input and enthusiasm during the preparation of this article.

Infos

  1. CSS tutorials: http://www.w3schools.com/css/
  2. Google Translate: https://translate.google.com
  3. Yandex Translate (in Russian): https://translate.yandex.ru
  4. King, Andrew B. Website Optimization . O'Reilly Media, 2008.
  5. McFarland, David S. CSS: The Missing Manual . O'Reilly Media, 2009.
  6. Pandoc: http://johnmacfarlane.net/pandoc
  7. AsciiDoc: http://www.methods.co.nz/asciidoc
  8. DocBook: http://www.docbook.org
  9. LaTeX2Html: http://www.latex2html.org
  10. TtH: http://hutchinson.belmont.ma.us/tth
  11. HyperLaTeX: http://hyperlatex.sourceforge.net
  12. PlasTeX: http://plastex.sourceforge.net
  13. TeX4ht: http://www.tug.org/applications/tex4ht/mn.html
  14. Saxon: http://sourceforge.net/projects/saxon
  15. Htc-py: http://sourceforge.net/projects/htc-py
  16. XHTML5 in a nutshell: https://blog.whatwg.org/xhtml5-in-a-nutshell
  17. W3C Markup Validation Service: http://validator.w3.org
  18. XML Schema Validator: http://schneegans.de/sv
  19. Validator: https://addons.mozilla.org/en-US/firefox/addon/validator
  20. HTML Validator: http://users.skynet.be/mgueury/mozilla/index.html
  21. HTML Tidy Library Project: http://tidy.sourceforge.net
  22. Tidy for HTML5: http://www.htacg.org/tidy-html5
  23. OpenSP: http://openjade.sourceforge.net
  24. JSLint: http://www.jslint.com
  25. JSHint: http://jshint.com
  26. Acorn.js: http://marijnhaverbeke.nl/acorn
  27. ESLint: http://eslint.org
  28. Csstidy: http://csstidy.sourceforge.net
  29. Link prefetching FAQ: https://developer.mozilla.org/en-US/docs/Web/HTTP/Link_prefetching_FAQ
  30. "W3C, The amazing em unit and other best practices": http://www.w3.org/WAI/GL/css2em.htm
  31. JTidy: http://sourceforge.net/projects/jtidy
  32. PTidy: http://sourceforge.net/projects/ptidy
  33. Python wrapper for TidyLib: https://pypi.python.org/pypi/pytidylib
  34. Mod_tidy: http://mod-tidy.sourceforge.net
  35. Firefox as a tool for web development: http://www-user.tu-chemnitz.de/~fri/ffwebdev-clt2010.html
  36. Caching Tutorial for Web Authors and Webmasters: http://www.web-caching.com/mnot_tutorial