What Is HTML? The Anatomy of an HTML5 Document

by on 10th October 2011 with 7 Comments

screenshot

This is the second article in our series on the absolute fundamentals of web development. Our first article explained in detail what HTML is on a conceptual level. We looked at what a markup language is, what tags are and how HTML compares to other important pieces of the web development puzzle such as CSS.

Join us today as we move on and take a look at each basic piece of an HTML document. I’ll explain all that stuff at the top of an HTML file that confuses you and outline the basic structure that you’ll follow for creating your own HTML files.

DOCTYPE

The very first thing that you typically see in an HTML file is the DOCTYPE declaration. Before HTML5, this could be a very confusing bit of code that looked something like this:


There’s a lot going on here and every bit of it speaks to either the web browser, the reader or both. The “PUBLIC” part just speaks to the availability, the DTD stands for Document Type Definition, which declares the version of HTML being used and the final section is a URL pointing to where the DTD can be found.

The words “loose,” (or transitional) “strict” and “frameset” refer to different versions of HTML 4, which allowed for slightly different markup. These were essentially just to help transition developers from older versions of HTML.

The HTML5 DOCTYPE

There are several DOCTYPEs to choose from, which can be monumentally confusing for new developers. Fortunately, HTML5 completely simplifies the situation with a refreshingly simple DOCTYPE:


See how easy that is by comparison? It really is a beautiful thing.

What Does the DOCTYPE Do?

Now we’ve seen what a DOCTYPE looks like but we haven’t really discussed what it does. The answer is that the DOCTYPE tells the browser which type of HTML to expect, which in turn affects how the browser renders the page.

As you explore web development more, you’ll learn that there’s a huge emphasis on “standards-based development.” The general idea is that if we all follow certain rules and standards, web development will be a more cohesive and consistent practice. This is better for developers, better for browsers and most importantly, better for users.

DOCTYPEs were originally designed to trigger “standards mode” in browsers, which meant that the page was rendered using newer web standards. Similarly, older pages without a DOCTYPE triggered “quirks mode” in browsers, which allowed for older practices to be used that wouldn’t function properly in standards mode.

The new, very simple HTML5 DOCTYPE is supported in all major browsers, and it triggers standards mode in all of them. The DOCTYPE also helps you validate your code, which ensures that current standards are being adhered to. Every page that you create should use a DOCTYPE and hopefully be fully standards compliant.

Root Element

After the DOCTYPE, the HTML really begins. This is indicated by the HTML Root Element. If your entire HTML is a tree, this is the root from which everything else sprouts.

The Root Element is defined by a “tag,” which we learned about in our last article. In this case, it’s the “HTML” tag.



  

Notice that the root element includes a language attribute, in our case English. Always be sure to indicate the appropriate language for every page you create.

Everything else that we will add to this page is situated inside of the Root Element. It is the container for every scrap of information and piece of content, the only exclusion being the DOCTYPE.

Head Element

The next thing you’ll encounter in an HTML document is the “head” section. The head tag is exactly what you’d expect it to be:


  

The stuff that goes into the head section is primarily informational, it tells both you and the browser certain things about the page such as the title, the charset, etc. This is also where you traditionally load in important external resources.

There are a few important things that go into a head tag. Let’s look at them one by one.

Meta Tags

As you can probably guess, meta tags hold metadata about the page. Metadata takes many forms and can include keywords, authors, descriptions, etc. Here are a few notable inclusions:

Charset
This is pretty boring stuff, the charset is typically set to UTF-8 and essentially tells the browser which character encoding to use.

Your pages should definitely include an indication of which charset to use. Don’t over think it, it’s just one of those things you need to stick in your template. Below is a typical charset declaration in HTML5.


This is yet another thing that has gotten easier with HTML5. Check out the version of this snippet required for HTML 4.01:


Some other typical metatags include description and author. Here’s a quick, self-explanatory example of each of these:

Description


Author


Title

Another thing that goes inside the head portion of your document is the title tag. This is a very simple piece of code that simply states whatever you’d like the title of the page to be. Here’s an example:


This title is usually shown to the user at the very top center of the browser window, on a tab, etc.

screenshot

Links and Scripts

The last thing we’ll discuss regarding the head tag is the inclusion of external resources. In a very simple web page, you’ll typically see these take the form of a stylesheet or script:



Here I’m essentially loading my CSS file as well as jQuery (a JavaScript library) into the page. If these resources are included in the source files but not linked to in the head section, they will not function. Note that the links for these could either point to something in the local folder hierarchy (as above) or something hosted on another web server.

The link used above for the CSS file uses a link relation (rel=”stylesheet”). For more on link relations, check out this article.

Also, as an alternative to linking to external files, you can embed code right into the head element. Here’s an example with CSS, but the same can be done via the “script” tag and JavaScript (embedded scripts are often placed at the end of the body element instead).


Body Element

The final portion of an HTML page is the most important. Everything inside of the body element defines the content and structure of your page. As far as development time, you’ll likely use a set template for everything above and spend a few minutes customizing it for specific projects. The rest of your HTML time will be spend inside the body element.


  

Putting it All Together

Now that we’ve walked through each individual piece of an HTML file, let’s put it all together into one extremely basic HTML5 template.





	
	
	
	
	
	



  


An Overview

A thousand apologies for the mundane nature of this topic, beginners are often turned off by boredom at this point but hang in there, the real fun of HTML is everything between the body tags, which we haven’t even discussed!

All of these pieces were necessary to accurately paint the picture of what an HTML document actually is. We now see that an HTML document has a DOCTYPE that tells the browser how to render the page and helps ensure the proper standards are being used.

We also know that there’s a set hierarchy to how HTML pages are structured. Just about everything but the DOCTYPE is thrown into the root element, meaning it is the “parent element” of the head and body elements, which in turn have their own children.

screenshot

The basic structure of an HTML document is referred to as the DOM, or the Document Object Model. This is almost always metaphorically referred to as a tree and depicted like image above. Our own Jack Rocheleau wrote an in-depth look into the DOM titled Deeper Study Into the WWW’s Document Object Model. For the next set in understanding the basic structure of an HTML page, check out that article.

Conclusion

This article represents a very brief overview of how an HTML skeleton is structured and the types of things that are typically included. It is by no means exhaustive, but should serve as a good basic introduction to these topics.

When you’re just starting out in code, most people will simply give you a template for all of the code above without really explaining what it all does. This can leave a sizable hole in your education so it’s important to read through this information and attempt to understand what you can.

Stay with us in this series and check back soon as we answer another important question: What is CSS?

Comments & Discussion

7 Comments

  • http://www.ostheimer.at Andreas Ostheimer

    Hi, the link to the first part is dead – please fix so I can read it again. Thanks, Andreas

  • http://www.bransonwerner.com Branson Werner

    and elements are also a new feature for HTML5 and should be mentioned as well.

  • http://websized.com Marlou

    Why do you use xml:lang=”en” in HTML5?
    Isn’t the whole point of HTML5, as opposed to XHTML, that it does not need to validate as xml anymore?
    Since this is all about the basics, I’d leave that bit out of the html-tag.

  • http://beben-koben.blogspot.com/ Beben Koben

    i found this on resource developer web’s (a lot)
    <!doctype html>
    <html>
    <head></head>
    <body>
    <header>
    </header>
    <nav>
    </nav>
    <section>
    <article>
    </article>
    <aside>
    </aside>
    </section>
    <footer>
    </footer>
    </body>
    </html>
    Sorry, i’m not an expert, just share ;)

  • http://www.stoons.ca Sue-on-the-farm

    Thank You!

    I absolutely have to update our farm site this winter [if you click on the link you'll see why! LOL] which means re-learning basic HTML and CSS.

    I have been procrastinating for YEARS, because I could not figure out the DOCTYPE.

    Now I’m kinda glad I waited. Yippee!

    So, am bookmarking the first article and this page and will be snoopin’ around, and following what you write, so don’t screw it up, ‘k?

    I have some cognitive dysfunction which slows down my processing ability and some memory problems, so If I can re-do our site based on the info I find here — anyone will be able to do it. You have a guinea pig.

  • Lynn Hayes

    Fantastic – I have been looking for simple tutorials like this to help me to see under the Dreamweaver hood. Can’t wait for more.

  • http://allenresha.com Allen Resha

    Another excellent explanation of HTML more in depth. I never knew exactly what everything meant (minus the language, meta data, and the part that called for CSS) until now. I now have a deeper understanding of HTML and what exactly each piece of it is there for.

Subscribe

Membership
About the Author