1.What is DTD?
- Document type Definition(DTD): specifies the rules(structure) of the XML document, like which tags to use and what attributes these tags can contain and which tags can occur inside other tags.
- Well-formedness in XML: An XML document is said to be well formed if all the tags are closed in the proper order and if it has a declaration.
- Validation in DTD: The process of checking XML against DTD is known as validation.
2. Purpose of DTD
- Ensure the order of the tags are maintained( Sequence in which we want to receive data)
- Avoid receiving duplicate tags.
- Have fixed tag names (Eg: emp for Employees)
- Restrict the type of data(Eg: Phone Number should accept only numbers)
3. Types of DTD
- Internal DTD: is written as part of the XML document itself using
<?xml version=”1.0"?> <!DOCTYPE root-element [element-declarations]>
The XML is validated without referring to any of the external files. Disadvantage is that internal DTD’s cannot be reused.
- External DTD: is written in a separate file. It is used when the set of rules need to be common for many of the XML documents then, DTD’s can be written in separate files and the XML documents can refer to the DTD which is present in the external location.
<!DOCTYPE root-name SYSTEM “dtd-name.dtd”>
- Here, root-name is the root element of the XML document and dtd-name is the name of the file containing the DTD and has the extension as .dtd
4. DTD Declarations
i) Element Declarations:
- It specifies what kind of tags are allowed and also the allowable contents of these elements.
- Empty Elements: do not have contents.
Example: <!ELEMENT br(EMPTY)> XML document conforming to that DTD should contain a tag as <br/>
- Elements with Text Content: If the XML tag needs to be a contained element i.e. if it needs to contain data between the start and end tag, it is represented as #PCDATA. The declaration to restrict the element to contain text data.<!ELEMENT element-name (#PCDATA)>
- PCDATA is Parsed Character Data.
XML document confirming to that DTD should contain a tag as <technology> XML is a tool for data transportation storage in platform and language neutral way </technology>
- Elements with Mixed Content: The Elements can have any mix of text and other elements as its children. The order, mandatory/optional and where exactly the child elements need to be placed can be specified as well.
Place #PCDATA at the start of a list of choices to mix regular, untagged text with the specified child tags.
In the example below, the “speech” element can contain zero or more occurrences of parsed character data, “loud” or “soft” elements.
<!ELEMENT speech ((#PCDATA|loud|soft)*)>
ii) Attribute Declarations:
- It specifies the attributes which can be associated with an element type. If the element has attributes, then the attribute name and its possible values should be declared in the DTD. Attributes can be specified using <!ATLIST>. It contains the element name, attribute name, attribute type and attribute value.
<!ATTLIST element-name attribute-name attribute-type attribute-value>
- Attribute-Type can take three kinds of value:
- Attribute Types
CDATA: the attribute can have any character string as its value.
ID: is a unique identifier for the node.
IDREF: is to refer to the ID of another node.
IDREFS: contains one or more ID references separated by spaces.
ENTITY: refers an external non-parsed entities.
ENTITIES: are same as entity but the attribute can be list of one or more entity names.
Attribute Types reference link
- Attribute Values
#REQUIRED: means the attribute is mandatory.
#IMPLIED: the attribute can be omitted. If omitted, DTD does not supply with the default value.
#FIXED: the attribute is mandatory, attribute value is fixed.
iii) Entity Declarations:
- Entity declarations define group of fixed data which can be included elsewhere. Entities are used to reference data that act as an abbreviation or can be found at an external location. The first character of an entity must be a letter, ‘_’ or ‘:
Syntax: <!ENTITY entity-name “entity-value”>
- XML standard also defines a set of pre-defined entities as shown in the figure below: