The Tags Panel displays the primary structure of the document. While the Order Panel provides tools to restructure what content is grouped together, how it is labeled, and when it is read, the Tags Panel gives more granular control over the latter two aspects. The order of the tags is the reading order of the document, moreso than that displayed by the Order Panel, and each tag serves a specific semantic purpose (besides grouping tags like
<Sect>). Some of the structures formed by these tags require strict adherence to that structure; more information about them can be found on the Complex Tag Structures page.
Some important functions of the Tags Panel are:
- Dragging and dropping content and tags to change the structure
- Right Click > New Tag
- Editing a tag's properties by Right Click > Properties
- Type allows you to change what tag is used
- The Actual Text field can be used to correct poorly scanned text
- Alternate Text can be provided for figures and formulae.
- Right Click > Copy Contents to Clipboard can be useful for identifying and correcting poorly scanned text
While it can get tedious for longer documents, the best way to check over the tags is simply to select the first one and use the arrow keys to go through each tag that's present and check that it is in the correct order and is properly representing the content. You may find that the document is structured with many section tags. These don't have any semantic purpose, so their presence is not problematic, but it if they make it difficult to traverse the tags, you can pull the content out and delete them.
Paragraphs often get split across columns or pages, resulting in a halting of the reading in the middle of the paragraph. Thus, these should be combined such that each visual paragraph is contained within a single paragraph tag. Sometimes, paragraphs contain other content such as inline math or footnote references that require their own tags. In these cases, break the text using the order panel so that it can sit around the inline content and include them all in order in one
<P> tag (see below example).
In extreme situations, preserving visual paragraphs in tag form may require the repositioning of other elements in the reading order. In one PDF we remediated, there was a paragraph that spanned three pages because each page also contained a large graph. The graphs were adjusted to sit elsewhere in the tags panel, after the paragraphs in which they were referenced, and the paragraph was combined into a single tag.
Footnotes are a curious case, as their usual reading flow for a sighted user does not quite match the experience of a screen reader user or anyone else navigating the document by keyboard. There is no definite standard on where they should be placed in the reading order, whether at the end of the page or read after the paragraph in which they are referenced. The DART uses the latter positioning when remediating PDFs, as it lends itself to a more natural reading order.
The superscript that refers to the footnote should be tagged as
<Reference>, and the footnote itself should be tagged as
Correcting Inaccurate Text
For some documents, particularly scans, the text recognized by the OCR may not match the text in the document. In these instances, you can use the Actual Text field in the tag's properties to provide accurate text. This can also be used to correct other elements like hyphens that are inserted as a consequence of line wrapping. These hyphens should be tagged as a
<span> element, and the Actual Text should contain a soft hyphen character (U+00AD) to denote it as a connective element rather than a textual element.
The following articles were used as examples. Modifications to the PDF structure were made for illustrative purposes.
Souleiman, Y. (2021). Convergences and numerical analysis of a contact problem with normal compliance and unilateral constraint. African Journal of Mathematics and Computer Science Research, 14(1), 13-23. https://doi.org/10.5897/AJMCSR2020.0865
Copyright © 2022 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0.