Accessibility Document Conversion
PDF Accessibility Overview
The Portable Document Format (PDF) is a file format for representing documents in a manner independent of the application software, hardware, and operating system used to create them, as well as of the output device on which they are to be displayed or printed. The PDF specification was introduced by Adobe Systems in 1993 as a publicly available standard. In January 2008,
PDF 1.7 became an ISO standard (ISO 32000-1). The
PDF Universal Accessibility (PDF/UA) became an ISO Standard in July 2012 (ISO 14289-1:2012). PDF/UA is meant to be a set of guidelines for creating more accessible PDF. The specification describes the required and prohibited components and the conditions governing their inclusion in or exclusion from a PDF file in order for the file to be available to the widest possible audience, including those with disabilities.
Many applications can generate PDF files directly. This direct approach is preferable, since it gives the application access to the full capabilities of PDF, including the imaging model and the interactive and document interchange features. Alternatively some applications can produce PDF output indirectly and then import the formatted output into the PDF container. Although these indirect strategies are often the easiest way to obtain PDF output from an existing application, the resulting PDF files may not make the best use of the high-level PDF imaging model relied upon to expose the semantics of the document.
The PDF accessibility support lies in the ability to determine the logical order of content in a PDF document, independently of the content's appearance or layout, through logical structure and Tagged PDF elements. Applications can extract the content of a document for presentation to users with disabilities by traversing the structure hierarchy and presenting the contents of each node. PDF logical structure shares basic features with standard document markup languages such as HTML, SGML, and XML. A document's logical structure is expressed as a hierarchy of structure elements, each represented by a dictionary object. The document's logical structure is stored separately from its visible content, with pointers from each to the other. This separation allows the ordering and nesting of logical elements to be entirely independent of the order and location of graphics objects on the document's pages. Such information might include, for example, the organization of the document into chapters, headings, paragraphs and sections or the identification of special elements such as figures, tables, and footnotes. Tagged PDF, that builds on the logical structure framework, defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes, like conversion to other common file formats (such as HTML, XML, and RTF) with document structure and basic styling information preserved; So as to make content accessible to people who rely on assistive technology.
Adobe Acrobat PDF Creator
Although you can create accessible PDF files in several programs,
Adobe Acrobat Professional is required to evaluate, repair, and enhance the accessibility of existing PDF files. The Accessibility Checker (available in Acrobat X and XI) is a good tool to ensure that nothing was overlooked in your document. You can convert a file to PDF in Acrobat (Select File, Create PDF and then From File), and if the file format is supported (Microsoft Office), the file should be tagged as it is converted. However, if no tags are present, select Edit, Preferences, Convert to PDF, choose the correct format, select Conversions Settings, and ensure that Enable accessibility and reflow is selected. You should then run the full Checker, select Tools, Advanced, Accessibility and then Full Check.
The Tags pane allows you to view, reorder, rename, modify, delete, and create tags (Select View, Show/Hide, Navigation Panes and then Tags)
The Order pane allows you to change the reading order of the content and tags on the page so it matches the visual reading order (Select View, Show/Hide, Navigation Panes, and then Order)
PDFMaker MS-Word Add-In
Create PDFs with PDFMaker for Microsoft Windows is the best choice to create high quality tagged PDF files. With the Adobe add-in installed, you can export to PDF by selecting Create PDF from the Acrobat ribbon.
Microsoft Office PDF Converter
Microsoft Word 2010 and 2013 allows you to create tagged PDF files without installing Acrobat. Convert your document by selecting File, Save as Adobe PDF. The tagging process may not be quite as good as with the Adobe add-in, but most content, such as heading levels, lists, and alternative text for images is exported. If you want to verify the accessibility of the PDF or edit the tags that are created, you will still need Acrobat Professional or another vendor solution. Without a Microsoft Office to PDF professional converter, it is recommended you establish accessibility document creation best practices.
Accessibility Document Creation Best Practices
- Create document templates with accessibility features and user guidelines that can be used as a starting point for documetn creation.
- If possible, install a third-party accessibility converter plugin.
- Create Microsoft Office accessibility macros that can be used to guide document creators with accessibility requirements.
PDF Accessibility Creator Free Tool
The PDF Accessibility Checker (PAC) is a free tool developed and distributed by the Access For All Foundation to evaluate the accessibility of PDF documents and PDF forms. PAC offers the added possibility of displaying a preview of the structured PDF document in a web browser. The PAC preview shows which tags are included in the PDF document and presents the accessible elements in the same way as they would be interpreted by assistive technologies (such as screen readers). PAC also provides an accessibility report which lists the detected accessibility errors. Clicking the links in the report displays the most probable source of the error within the document.
PDF Document Reader
The Adobe Reader is a Freely distributed PDF Viewer from Adobe Systems which is compatible with Microsoft Accessibility Architecture (MSAA) devices on the Windows platform. It has a number of built in accessibility features including text to speech (Read Out Loud), high contrast display, reflow for large print display, auto scroll, accessibility quick check, and an accessibility setup assistance. It is the only PDF file viewer that can open and interact with all types of PDF content, including forms and multimedia.
Accessing PDF Documents With Assistive Technologies
Accessing PDF Documents with Assistive Technology, by American Foundation for the Blind (AFB) This user guide provides guidance on accessing Portable Document Format (PDF) documents for blind and visually impaired users of screen reading technology. The goal is to enable a better understanding of the issues that affect the accessibility of PDF documents by discussing specific examples, highlighting important principles, illustrating common problems, and presenting suggested solutions.
Document Conversion
Microsoft Word provides the greatest level of screen reader accessibility, and document conversion accuracy. PowerPoint is a good format for face to face presentations, but it is usually not the best format for content on the web. PDF or HTML is often the best format to display PowerPoint presentations on the web. Heading structure and other accessibility information will remain intact if you export the file correctly, and everyone has a PDF reader and browser. If you are comfortable with HTML and CSS, and if your content is intended to be displayed on the web, you could consider creating your own slides in HTML. You would have to create your own "next" and "previous" buttons, and then add in images as well as visual styles. The accuracy of Microsoft Office document conversion depends upon the complexity of the content elements and formats. Quite often manual inspection and accessible updates are required in the converted PDF or HTML file.
PDF Accessibility Conversion Tools
Checkpoint Reference
- Text Tags: A text equivalent for every non-text element must be provided (via alt, longdesc, or in element content).
- MultiMedia Presentations: An equivalent alternative for any multimedia presentation must be synchronized with the presentation.
- Colour: PDF documents and Web pages must be designed so that all information conveyed with color is also available without colour.
- Readability: Web pages must be organized so they are readable without requiring an associated style sheet, and PDF content must appear in a logical reading order when presented.
- Image Maps: Text links must be provided for each active region of a server-side or Client-side image map.
- Data Tables: Row and column headers must be identified for data tables. Markup must be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers.
- Frames: Web page frames must be titled with text that facilitates frame identification and navigation.
- Flicker Rate: Pages must be designed to avoid causing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz.
- Text-only alternative: A text-only page, with equivalent information or functionality, must be provided to make a web site compliant with the WCAG standards, when compliance cannot be accomplished in any other way. Note, The content of the text-only page must be updated whenever the primary page changes.
- Scripts: When pages utilize scripting languages to display content, or to create interface elements, the information provided by the script must be identified with functional text that can be read by assistive technology.
- Applets and Plug-ins: When a web page requires that an applet, plug-in or other application be present on the client system to interpret page content, it must comply with the WCAG criteria, or the page must provide a link to alternative accessible functions.
- Electronic Forms: When electronic forms are designed to be completed on-line, the form must allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues.
- Navigation Links: A method on web pages must be provided that permits users to skip repetitive navigation links.
- Time Delays: When a timed response is required, the user must be alerted and given sufficient time to indicate more time is required.
- Identify language: Identify the primary natural language of the document.
PDF Accessibility Checklist
h4 align=left>General Requirements for all Documents
- Does the document file name not contain spaces and/or special characters?
- Is the document file name concise, generally limited to 20-30 characters, and does it make the contents of the file clear?
- Have the Document Properties for Title, Author, Subject (AKA Description), Keywords, Language, and Copyright Status been applied?
- Does the document utilize recommended fonts (IE. Times New Roman, Verdana, Arial, Tahoma, Helvetica, or Calibri)?
- Have track changes been accepted or rejected and turned off?
- Have comments been removed and formatting marks been turned off?
- Does the document refrain from using flashing/flickering text and/or animated text?
- Is the document free of background images or watermarks?
- Do all images, grouped images, and nontext elements that convey information have meaningful alternative-text descriptions?
- Do complex images (IE. charts and graphs) have descriptive text near the image (perhaps as a caption)?
- Do all URLs contain descriptive hyperlinks (IE. avoid generic phrases like Click Here and, instead, use phrases that let users know about the content of the linked page prior to selecting it)?
- Are all URLs linked to correct Web destinations?
- Are e-mail links accessible?
- Has a separate accessible version of the document been provided when there is no other way to make the content accessible?
- If there are tables, are blank cells avoided?
- Is all of the text easy to read in comparison to the background of the document (IE. has a colour-contrast ratio of 4.5:1)?
- Has the document been reviewed in Print Preview for a final visual check?
Formatting Requirements
- If there is an automated accessibility checker in the program used to create the PDF, has that been run and does it pass?
- Have bookmarks been included in the PDF file that are more than 9 pages long, and if so are they logical?
- Are decorative images marked as background/artifact?
- Is the document free of scanned images of text?
- Have all scanned signatures been removed from the PDF?
- Do images/graphics appear crisp and legible?
- Is the document free of layout tables?
- Are blank table cells avoided?
- Do all tables have a logical reading order from left to right, top to bottom?
- Do all data tables in the document have Row and/or Column headers?
- Do header rows repeat across pages if the table is multiple pages?
- Are data cells set so they do not split across pages?
- Are all tables described and labeled (where appropriate)?
- Have lists been tagged completely, making use of all four of the following tags: L, LI, Lbl, and LBody tags?
- If the document has a tabular appearance, was that tabular structure made using the table option, as opposed to manual Tabs and Spaces?
- If a table of contents (TOC) is present, are the page numbers correct, and if linked, does the TOC function correctly?
Accessibility Tagging and Reading Order
- Have PDF tags been added to the document?
- Does the order of the PDF Tags match that of the order that the content should be read in?
- Has the PDF been formatted using Style elements, like the title of the document as Heading 1, the first-order heading as Heading 2, etc.?
- Are heading styles organized in a hierarchical and logical fashion, with consecutive headings?
- If nonstandard/custom tags are used, have they been mapped correctly in the Document Roles dialogue box and verified as working with assistive technology?
- Have documents with multicolumn text, tables, or call-out boxes been checked for correct reading order?
- Are any footnotes or references tagged with standard Note and Reference tags and placed in the proper logical reading order?
Document Image Requirement
- Is the document free of background images or watermarks?
- Are multiple associated images on the same page, like boxes in an organizational chart, grouped as one object?
- Have all multilayered objects been flattened into one image and does that image use one alternative text description for the image?
- Do all images, grouped images, and nontext elements that convey information have meaningful alternative-text descriptions?
- Do complex images, like charts and graphs, have descriptive text near the image?
Form Fields
- Do all form fields have correct labels and markups?
- Are all form fields keyboard accessible?
- Are all multiple-choice answers keyboard accessible and grouped together as form-field sets? (1) The value attribute needs to match the text next to the answer. (2) Make sure the name attribute is the same.
HTML Accessibility Checklist
Checkpoint Criteria
- Do images that convey contextual content have equivalent alternative text specified in the alt attribute of the img element?
- Do images that are purely decorative, and not contextual, have empty, or null, alternative text specified?
- Does the alternate text convey contextual relevance to the page it is on?
- Do images that convey complex content have longdesc attributes or equivalent text content available elsewhere on the page?
- Does text content contained in images disappear when images are not available?
- Do image map area elements have the link destination correctly titled (If the title attribute is used, it ought not to duplicate the alt text)?
- Do form non-text controls, like input type image, provide a text alternative that identifies the purpose of the non-text control?
- Do noframes elements have appropriate equivalent or alternative content for user agents that do not support frames?
- Is a full text transcript provided for all prerecorded audio?
- Is a full text transcript provided for all prerecorded video?
- Are open or closed captions provided for all synchronized video?
- Is fully synchronized text alternative or sound track provided for all video interaction that is not otherwise described?
- Is information conveyed by color also conveyed by context, markup, graphic coding, or other means?
- Does a contrast ratio of at least 4.5:1 exist between text, and images of text, and background behind the text?
- Is a correct contrast ratio maintained when images are not available?
- Is a correct contrast ratio maintained when CSS is disabled?
- Are links distinguished from surrounding text with sufficient color contrast and is additional differentiation provided when the link receives focus (it becomes underlined)?
- With CSS disabled, is color and font information rendered in the browser's default CSS?
- With CSS disabled, are headings, paragraphs, and lists obvious and sensible?
- With CSS disabled, does the order of the page content make sense as read?
- With CSS disabled, is most text, other than logos and banners, rendered in text rather than images?
- With CSS disabled, does any content that was invisible before stay invisible?
- With CSS disabled, is any content or functionality provided by the CSS through mouse action also provided through keyboard-triggered event handlers?
- When tables are used for layout, does the content linearize properly when layout tables are turned off?
- Are links in server-side image maps repeated elsewhere in the page that are non-graphical (a normal list of links)?
- Are client-side image maps used instead of server-side image maps?
- Do client-side image maps have appropriate alternative text for the image, as well as each hot spot region?
- For tables containing data, do th elements appropriately define every row and/or every column headers?
- For tables containing data, do th elements contain the scope attribute for row and/or column headers that are not logically placed (in the first row and first column as applicable?
- For tables containing data, is the summary attribute used to explain the meaning of the table if it is not otherwise evident from context?
- For tables that are used for layout, are th elements or summary, headers, scope, abbr, or axis attributes NOT used at all?
- For complex tables, do th elements appropriately define row and/or column headers?
- For complex tables, does each th element contain an id attribute unique to the page, and/or does each th element and any td element (that acts as a header for other elements) contain a scope attribute of row, col, rowgroup, or colgroup?
- For complex tables, does any td element that is associated with more than one th element contain a headers attribute that lists the id attribute for all headers associated with that cell?
- Are the summary attribute and thead and tbody elements used to clarify the table meaning and structure if needed?
- Does each frame and iframe element have a meaningful title attribute?
- Does the page have equivalent content in a noframes element for user agents that do not support frames?
- Does any page element NOT flicker at an unhealthy rate (less than three flashes per second)?
- Does any page NOT contain the marquee and blink elements?
- Does a document have a text-only version (If so, does it meet all WCAG criteria)?
- Does the text-only version contain the same exact information as the original document?
- Does the text-only version provide the functionality equivalent to that of the original document?
- Is an alternative provided for components (plug-ins and scripts) which are not directly accessible?
- Is any content or functionality provided by JavaScript through mouse action, also provided through keyboard-triggered event handlers?
- Are link-type behaviors created with JavaScript on ONLY focusable elements?
- If content or functionality provided by JavaScript can not be provided to assistive technology, is equivalent content or functionality provided without JavaScript?
- Are links provided to any special readers or plug-ins that are required to interpret page content?
- Does each appropriate input element or form control have an associated visible label element or title attribute?
- Are all cues for filling out the form available to users of assistive technology (mandatory fields, help boxes, error messages)?
- Is the tab order to reach the form and the tab order between form elements logical and consistent with the normal and visual order of entering form data?
- Are logically-related groups of form elements identified with appropriate fieldset and legend elements?
- Is placeholder text, if used, NOT redundant or distracting to users of assistive technology?
- Do form error messages identify the errors to the user and describe them to the user in text?
- If repetitive navigation links are at the beginning of the source of the HTML page, can a user navigate via a link (skip link), at the top of each page directly to the main content area?
- If a skip link is provided, does the anchor element contain text content that is visible with CSS disabled?
- If a skip link is provided and it is hidden with CSS, is it available to users of assistive technology (not using the display:none method)?
- Can a user navigate over groups of links, between multiple groups of links, and between sections of the page content by means of section headings or visible and audible local links?
- Are heading elements used to convey logical hierarchy and denote the beginning of each section of content?
- Is enough time provided to allow users to read and interact with content?
- Is the functionality of the content predictable (will a user experience contextual changes when unbeknownst to them)?
- Does the user have control over the timing of content changes?
- If a page or application has a time limit, is the user given options to turn off, adjust, or extend that time limit?
- Can automatically moving, blinking, or scrolling content that lasts longer than 3 seconds be paused, stopped, or hidden by the user?
- Can automatically updating content be paused, stopped, or hidden by the user or the user can manually control the timing of the updates (automatically redirecting or refreshing a page) a news ticker, AJAX updated field, a notification alert, etcetera?
- Can interruptions be postponed or suppressed by the user (alerts, page updates, etcetera)?
- If an authentication session expires, can the user re-authenticate and continue the activity without losing any data from the current page?
Resources