Creating Documents for Use Outside of Microsoft Office Word 2007

  • 1/3/2007

Creating an XML Document

As we mentioned earlier, basic Web pages are coded in HTML so that they can be displayed in a Web browser. HTML is a small, fixed subset of Standard Generalized Markup Language (SGML), a comprehensive system for coding the structure of text documents and other forms of data so that they can be used in a variety of environments. Extensible Markup Language (XML) is another subset of SGML. However, instead of being fixed like HTML, XML can be customized (extended) to store data so that it can be used in many ways in many environments—for example, as text, in a database or spreadsheet, or as a Web page.

Creating sophisticated, multi-purpose XML files can involve highly technical processes that are designed by experienced systems analysts and application developers. However, with Word 2007, anyone can participate in these processes by creating a Word document and then saving it as an XML file. During conversion, Word tags the file based on its styles and other formatting and saves it with an .xml extension.

You can open and edit an XML file in Word, in the same way you can an HTML file. You can also open it in an XML editor such as XMetal, or as a plain text file in a text editor such as Notepad.

If you want more control over the tagging of a document, you can attach an XML schema to it. The schema is an additional file that describes the structure allowed in the document, including the names of structural elements and what elements can contain what other elements. For example, a book might be divided into parts that can each contain chapters, which in turn can contain topics, which in turn can contain a heading, paragraphs, numbered and bulleted lists, tables, and other elements. The schema might also define formatting attributes that you can apply to text within specified elements. Word uses the schema to validate the document content and prompts you when content has been incorrectly tagged. Generally, companies employ a specialist with in-depth knowledge of XML to create custom schemas, but anyone can use an existing schema to tag a Word document and save it as an XML file.

In this exercise, you will first save a document in XML format. Then you will attach a schema to a document, tag document elements to create valid structure, and save that file as an XML file.

  1. Click the Microsoft Office Button, and then click Save As.

    httpatomoreillycomsourcemspimages951077.jpg

    Microsoft Office Button

  2. In the Save As dialog box, type My XML in the File name box, click Word XML Document in the Save as type list, and then click Save.

    Nothing appears to change, except that the title bar now displays My XML.

  3. Close the document.

  4. Click the Start button, click Documents, and then in the Documents window, navigate to the Microsoft Press\Word2007SBS\WebDocs folder.

    httpatomoreillycomsourcemspimages951041.jpg

    Start

  5. Right-click the My XML file, point to Open With, and then click Notepad.

    The Notepad plain text editor opens, displaying the contents of the XML file.

    httpatomoreillycomsourcemspimages951978.jpg

    This “simple” method of creating XML files turns out to be not so simple after all! Hundreds of tags enclosed in greater than (>) and less than (<) signs make it possible for this plain text document to be displayed exactly as it appears in Word.

  6. Close the Notepad window, and then in the WebDocs window, double-click the XML document to reopen it in Word.

  7. Click the Microsoft Office Button, and click Word Options. Then on the Popular page of the Word Options window, under Top options for working with Word, select the Show Developer tab in the Ribbon check box, and click OK.

    The Developer tab appears on the Ribbon.

  8. On the Developer tab, in the XML group, click the Schema button.

    httpatomoreillycomsourcemspimages951980.jpg

    The Templates And Add-Ins dialog box opens.

    httpatomoreillycomsourcemspimages951982.png
  9. On the XML Schema tab of the dialog box, click Add Schema.

  10. In the Add Schema dialog box, navigate to the Documents\Microsoft Press\Word2007SBS\WebDocs folder, and then double-click XMLSchema.

    The Schema Settings dialog box opens.

    httpatomoreillycomsourcemspimages951984.png
  11. In the Alias box, type XMLSchema , and then click OK.

    Word adds the schema to the list of available schemas and attaches it to the document.

  12. In the Templates and Add-ins dialog box, click XML Options.

    The XML Options dialog box opens.

    httpatomoreillycomsourcemspimages951986.png
  13. Under Schema validation options, verify that the Validate document against attached schemas check box is selected and the Hide schema violations in this document check box is cleared.

  14. Under XML view options, verify that the Hide namespace alias in XML Structure task pane check box is cleared, and then select the Show advanced XML error messages check box.

  15. Click OK to close the XML Options dialog box, and then close the Templates and Add-ins dialog box.

    The XML Structure task pane opens.

  16. In the XML Structure task pane, verify that the Show XML tags in the document check box is selected.

  17. Click anywhere in the document window. Then at the bottom of the XML Structure task pane, in the Choose an element to apply to your current selection list, click classlist {XMLSchema}.

  18. In the message box asking how you want to apply the selected element, click Apply to Entire Document.

    Word selects all the text in the document, adds an opening XML tag and a closing XML tag at either end of the document to indicate that the entire document is now a classlist element, and lists the element in the Elements In The Document box in the XML Structure task pane.

    httpatomoreillycomsourcemspimages951988.jpg
  19. Select all the text from Designing with Color down through Check with Jo about color swatches and kits for students. Then in the Choose an element to apply to your current selection box, click class.

    Word tags the selection as a class element. All the information between the two class tags belongs to one particular class.

  20. Select the Designing with Color heading, and tag it as title. Then select each of the next six paragraphs one at a time, and tag them in turn as instructor, date, time, description, cost, and classroom.

    As you tag each element, it appears in the Elements In The Document box. An X next to the classlist and class elements indicates that the structure is not valid according to the schema rules, and three dots under the classroom element and at the end of the class element tell you that an element is missing.

  21. Point to the X beside class.

    A ScreenTip tells you that untagged text is not allowed in the class element; all text must be enclosed in valid start and end element tags.

  22. Select the sentence that begins Check with Jo (the only remaining untagged text in the class element). Then in the Choose an element to apply to your current selection list, click notes.

    Word tags the element, and the X next to class disappears.

  23. Select all the text from Feng Shui Made Easy down to Andy will need the screen set up for his PowerPoint slides. In the Choose an element to apply to your current selection box, click class.

    Word tags the element and the X next to classlist disappears.

  24. Select each of the paragraphs in this class in turn, and tag them as title, instructor, date, time, description, cost, and notes.

    In the Elements In The Document box, a question mark appears next to the second class element, and a wavy purple line appears in the left margin of the document to show you the section with invalid structure.

    httpatomoreillycomsourcemspimages951990.jpg
  25. Point to the question mark.

    Word tells you that according to the rules laid out by the schema, the class element is incomplete.

  26. In the Feng Shui Made Easy class in the document, click to the right of the cost end tag, press the httpatomoreillycomsourcemspimages951033.jpg key, type Room 2, select the text, and tag it as classroom.

    The document’s structure is now fully valid, and you’re ready to save the document as an XML file.

  27. Click the Microsoft Office Button, click Save As, name the file My XML With Schema, change the Save as type setting to Word XML Document, and then click Save.

  28. Close the XML Structure task pane, and then close the My XML With Schema document.

  29. Click the Microsoft Office Button, and then in the Recent Documents pane, click My XML With Schema.

    The XML file opens in Word, where you can edit it like a normal document.