Portable Document Format ( PDF ) is a file format developed in the 1990s to present documents, including text and image formatting, in a way independent of application software, hardware , and the operating system. Based on the PostScript language, each PDF file encapsulates the full description of a fixed layout flat document, including text, fonts, vector graphics, raster images and other information necessary to display it. PDF is standard as an open format, ISO 32000, in 2008, and does not require royalties for its implementation.
Currently, PDF files may contain various content other than text and flat images including logical set elements, interactive elements such as annotations and form fields, layers, rich media (including video content) and three-dimensional objects using U3D or PRC, and various other data formats. The PDF specification also provides encryption and digital signatures, file attachments and metadata to enable workflows that require these features.
Video PDF
History and standardization
Adobe Systems made the PDF specification available for free in 1993. In the early years, PDF was popular primarily in the desktop publishing workflow, and competed with various formats like DjVu, Envoy, Common Ground Digital Paper, Farallon Replica and even Adobe's own PostScript Format.
PDF is an ownership format controlled by Adobe until it is released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1: 2008, where time specification control is passed to the ISO Committee of volunteer industry experts. In 2008, Adobe issued a Public Patent License to ISO 32000-1 which grants royalty-free rights to all Adobe patents required to create, use, sell, and distribute PDF-compliant implementations.
PDF 1.7, the sixth edition of the PDF specification to ISO 32000-1, including some proprietary technologies only defined by Adobe, such as Adobe XML Forms Architecture (XFA) and the JavaScript extension for Acrobat, referenced by ISO 32000-1 as normative and very required for full implementation of ISO 32000-1 specification. This exclusive technology is not standardized and their specifications are only published on the Adobe website. Many of them are also not supported by popular third-party PDF implementations.
On July 28, 2017, ISO 32000-2 (PDF 2.0) was published by ISO. ISO 32000-2 excludes proprietary technology as a normative reference.
Maps PDF
Technical foundations
This PDF combines three technologies:
- Part of the programming language description of the PostScript page, to generate layout and graphics.
- The font/substitution system to allow fonts to run with documents.
- Structured storage system to combine these elements and any related content into a single file, with appropriate data compression. PostScript
PostScript is a page description language that runs in an interpreter to produce an image, a process that requires a lot of resources. It can handle standard graphics and programming language features like if
and loop
commands. PDFs are mostly based on PostScript but are simplified to remove the flow control feature like this, while graphical commands like lineto
persist.
Often, PostScript-like PDF codes are generated from source PostScript files. Graphical commands issued by PostScript code are collected and given tokens. Any files, graphics, or fonts referring to documents are also collected. Then, everything is compressed into one file. Therefore, the entire PostScript world (font, layout, measurement) remains intact.
As a document format, PDF has several advantages over PostScript:
- PDF contains tokenized and interpreted results from PostScript source code, for direct correspondence between item changes in PDF page descriptions and changes to the resulting page views.
- PDF (from version 1.4) supports graphical transparency; PostScript is not.
- PostScript is a programming language that is interpreted with implicit global circumstances, so the instructions that accompany the one-page description may affect the subsequent page views. Therefore, all previous pages in the PostScript document must be processed to determine the correct page view, whereas every page in the PDF document is not affected by the others. As a result, PDF viewers allow users to quickly skip to the long document end pages, while the PostScript viewer needs to process all pages in sequence before it can display the destination page (unless the optional PostScript Document Settings Policy is carefully complied with).
src: www.wikihow.com
Technical overview
File structure
The PDF file is a 7-bit ASCII file, except for certain elements that may have binary content. The PDF file starts with a header containing the magic number and format versions like % PDF-1.7
. The format is part of the COS format ("Carousel" Object Structure). The COS tree file consists primarily of objects , which are of eight types:
- The Boolean value, representing true or false
- Numbers
- The string, enclosed in brackets (
(...)
), may contain 8-bit characters. - The name, starting with a slash (
/
) - Array, ordered collection of objects enclosed in square brackets (
[...]
) - Dictionary, collection of indexed objects by Name flanked by double pointed brackets (
& lt; & lt;... & gt; & gt;
) - The stream, usually contains large amount of data, which can be compressed and binary
- The zero object
In addition, there may be a comment, introduced with a percent sign (%
). Comments may contain 8-bit characters.
Objects can be either directly (embedded in another object) or indirectly . Objects are not directly numbered with object number and generation number and are specified between keyword obj
and endobj
. An index table, also called a cross-reference table and tagged with the xref
keyword, follows the main body and gives offset bytes of each indirect object from the beginning of the file. This design allows for efficient random access to objects in the file, and also allows for minor changes to be made without rewriting the entire file ( additional updates ). Starting with PDF version 1.5, indirect objects can also be found in a special stream known as object flow . This technique reduces the file size that has a large number of small indirect objects and is very useful for Tagged PDF .
At the end of the PDF file are snippets that are introduced with the trailer
keyword. It contains
- Dictionary
- An offset to the beginning of the cross-reference table (the table that begins with the keyword
xref
) - And the end-file marker is
%% EOF
.
The dictionary contains
- The reference to the root object of the tree structure, also known as catalog
- The indirect object count in the cross reference table
- And other optional information.
There are two layouts for PDF files: non-linear (not "optimized") and linear ("optimized"). Non-linear PDF files consume less disk space than their linear counterparts, though they are slower to access because parts of the data needed to assemble the document pages are scattered across PDF files. Linear PDF files (also called "optimized" or "optimized web" PDF files) are built in a way that allows them to be read in web browser plugins without waiting for the entire file to be downloaded, since they are written to disk in linear fashion (as in page order). PDF files can be optimized using Adobe Acrobat or QPDF software.
Imagery model
The basic design of how graphs are represented in PDFs is very similar to PostScript, except for the use of transparency, which is added in PDF 1.4.
The PDF graph uses an independent Cartesian coordinate system to describe the surface of the page. PDF page descriptions can use matrices to scale, rotate, or tilt graphic elements. The main concept in PDF is from graphical state , which is a set of graphical parameters that can be changed, saved, and restored by the page description. PDF has (like version 1.6) 24 properties graph status, of which some of the most important are:
- The current transformation matrix (CTM), which determines the coordinate system
- Clipping path .
- Color space
- Constant alpha , which is a key component of transparency
Vector graphics
As in PostScript, vector graphics in PDFs are built with paths . The path usually consists of Brizier's cubic line and curve, but can also be constructed from the outline of the text. Unlike PostScript, PDF does not allow one path to mix text lines with lines and curves. The lines can be caressed, filled, clipping. Strokes and contents can use any color set in the graph status, including the pattern .
PDF supports several types of patterns. The simplest is the tile pattern in which a piece of art is determined to be drawn repeatedly. It may be colored colored, , with the color specified in the pattern object, or unpolarized jagged pattern , which prevents the color specification to the time the pattern is drawn. Starting with PDF 1.3 there is also a shading pattern , which draws continuous color variations. There are seven types of the simplest shadow patterns are axial colors (Type 2) and radial colors (Type 3).
Raster image
A raster image in a PDF (called Picture XObjects ) is represented by a dictionary with associated flow. The dictionary describes the image property, and the stream contains image data. (Less commonly, raster images can be embedded directly in the page description as inline images .) Images are usually filtered for compression purposes. Supported image filters in PDF include general-purpose filters
- ASCII85Decode filter used to place stream to 7-bit ASCII
- ASCIIHexDecode similar to ASCII85Decode but less compact
- FlateDecode commonly used filters based on the deflate algorithm defined in RFC 1951 (deflate is also used in gzip, PNG, and zip file formats, among others); introduced in PDF 1.2; it can use one of the two predictor functional groups for the compact zlib/deflate compression: Predictors 2 of the TIFF 6.0 specification and predictor (filter) of the PNG specification (RFC 2083)
- LZWDecode filters based on LZW Compression; it can use one of two predictor function groups for LZW compression that is more concise: Predictors 2 of the TIFF 6.0 specification and predictor (filter) of the PNG specification
- RunLengthDecode a simple compression method for streaming with recurring data using custom-path encoding algorithms and image filters
- DCTDecode lossy filters based on the JPEG standard
- CCITTFaxDecode a lossless (black/white) bi-level filter based on the Group 3 or Group 4 CCITT (ITU-T) fax compression standards set out in ITU-T T.4 and T. 6
- JBIG2Decode a lossy or lossless (black/white) bi-level filter based on JBIG2 standard, introduced in PDF 1.4
- JPXDecode lossy or lossless filters based on the JPEG 2000 standard, introduced in PDF 1.5
Usually all the image content in the PDF is embedded in the file. But PDF allows image data to be stored in an external file by using external flow or Alternative Draw . Standard subset of PDFs, including PDF/A and PDF/X, prohibit these features.
Text
Text in PDF is represented by text elements in the content stream of the page. The text element specifies that the character should be drawn in a certain position. Characters are specified using encoding from the font source selected.
Font
The font object in the PDF is a description of the digital typography. This may describe typographical characteristics, or may include embedded font files . The last case is called an embedded font while the first is called unpinned font . Possible embedded font files are based on the widely used standard digital font format: Type 1 (and its compressed variations CFF ), TrueType , and ( start with PDF 1.6) OpenType . Additionally, PDF supports the Type 3 variant in which the font components are explained by the graphical PDF operator.
Standard Fonts Type 1 (Standard 14 Font)
Fourteen typography, known as the 14 font standard , has a special meaning in the PDF document:
- Times (v3) (in italics, italic, bold, and bold)
- Courier (in regular italics, italic, bold and bold)
- Helvetica (v3) (in italics, italics, bold and thick)
- Symbols
- Zapf Dingbats
This font is sometimes called the fourteenth base font . These fonts, or alternate fonts that correspond to the same metric, should be available in most PDF readers, but they are not guaranteed available in readers, and can only be displayed correctly if the system has installed them. Fonts can be replaced if they are not embedded in the PDF.
Encodings
In a text string, characters are displayed using character codes (integers) that map to glyphs in the current font using encoding . There are a number of predefined encodings, including WinAnsi , MacRoman , and a large number of encodings for East Asian languages, and fonts can have their own integrated encodings. (Although WinAnsi and MacRoman encodings are derived from the historical properties of Windows and Macintosh operating systems, the fonts that use this encoding work well on any platform.) PDFs can specify a predefined encoding for use, the font's default encoding or provide a lookup table of differences to the encoding predefined or built-in (not recommended with TrueType fonts). The encoding mechanism in PDF is designed for Type 1 fonts, and the rules for applying it to TrueType fonts are very complex.
For large fonts or non-standard flying fonts, special coding Identity-H (for horizontal writing) and Identity-V (for vertical) is used. With such a font it is necessary to provide a ToUnicode table if the semantic information about the characters should be preserved.
Transparency
The original PDF imaging model is, like PostScript, opaque : each object drawn on the page completely replaces whatever was previously marked in the same location. In PDF 1.4, the imaging model is expanded to allow transparency. When transparency is used, new objects interact with previously marked objects to produce mixing effects. The addition of transparency to PDF is done by means of new extensions that are designed to be ignored in products written to PDF 1.3 and prior specifications. As a result, files that use a small amount of transparency may be seen in older audiences, but files that use transparency can be viewed extensively in older viewers without warning.
Transparency extensions are based on key concepts of transparency group , blending mode , form , and alpha . This model is very much aligned with the Adobe Illustrator version 9 features. The blend mode is based on that used by Adobe Photoshop at the time. When the PDF 1.4 specification is published, the formula for calculating blend modes is kept secret by Adobe. They have been published.
The transparency group concept in the PDF specification does not depend on the "group" or "layer" ideas that exist in applications such as Adobe Illustrator. The grouping reflects a logical connection between objects that matter when editing the object, but they are not part of the imaging model.
Interactive elements
PDF files may contain interactive elements such as annotations, form fields, videos, 3D, and multimedia.
Rich Media PDF is a PDF file including interactive content that can be embedded or linked in a file.
Interactive Form is the mechanism for adding forms to PDF file formats.
PDF currently supports two different methods for integrating data and PDF forms. Both formats today coexist in PDF specification:
- AcroForms (also known as Acrobat forms ), is introduced in the PDF format specification 1.2 and is included in all PDF specifications later.
- The Adobe XML Forms Architecture (XFA) form , introduced in the PDF format specification 1.5. The Adobe XFA form is not compatible with AcroForms. XFA is not used anymore from PDF with PDF 2.0.
AcroForms
AcroForms was introduced in PDF 1.2 format. AcroForms allows the use of objects ( for example. text boxes, Radio buttons, etc ) and some code ( for example. JavaScript).
In addition to standard PDF action types, interactive forms (AcroForms) support sending, resetting, and importing data. The "send" action sends the name and value of the selected interactive form field to the specified uniform resource seeker (URL). The names and values ââof the interactive form fields can be submitted in one of the following formats, (depending on the settings of the ExportFormat, SubmitPDF, and XFDF actions):
- HTML Format Format (HTML 4.01 Specification since PDF 1.5; HTML 2.0 since 1.2)
- Form Data Format (FDF)
- XML Form Data Format (XFDF) (XML Format Data Format External, Version 2.0; supported since PDF 1.5; replaces the "XML" form submission format defined in PDF 1.4)
- PDF (all documents can be submitted rather than individual fields and values). (defined in PDF 1.4)
AcroForms can store form field values ââin an external stand-alone file containing keys: value pairs. External files can use Format Data Format (FDF) and Format Data Format XML (XFDF) files. UR rights signatures determine the right to import data files in FDF, XFDF and text (CSV/TSV) formats, and export form data files in FDF and XFDF formats.
Form Data Format (FDF)
The Format Data Form (FDF) is based on PDF, using the same syntax and essentially having the same file structure, but is much simpler than PDF, since the FDF document body consists only of one required object. Format Data Form is defined in the PDF specification (since PDF 1.2). Format Data Forms can be used when submitting form data to a server, receiving responses, and incorporating it into an interactive form. It can also be used to export form data to a stand-alone file that can be imported back into the appropriate interactive PDF form.
Starting from PDF 1.3, FDF can be used to specify containers for annotations that are separate from applicable PDF documents. FDF usually encapsulates information such as X.509 certificates, certificate requests, directory settings, time server settings, and embedded PDF files for network transmission. FDF uses the MIME/vnd.fdf content application type,.fdf file name extension and in Mac OS using the 'FDF' file type.
In PDF 1.5, Adobe Systems introduces a proprietary format for forms; Adobe XML Forms Architecture (XFA). Adobe XFA Forms is not compatible with AcroForms ISO 32000 features, and most PDF processors do not handle XFA content. The XFA specification is referenced from ISO 32000-1/PDF 1.7 as an exclusive external specification, and is completely abandoned from PDF with ISO 32000-2 (PDF 2.0).
Logical structure and accessibility
A "tag" PDF (see clause 14.8 in ISO 32000) includes document structure and semantic information to allow reliable text extraction and accessibility. Technically, tagged PDF is the use of styles of formats built on top of the logical structure framework introduced in PDF 1.3. Tagged PDF defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes.
Tagged PDF is not required in situations where PDF files are only intended for printing. Because this feature is optional, and since the rules for Tagged PDF are relatively unclear in ISO 32000-1, support for marked PDFs among devices that consume, including auxiliary technology (AT), is uneven at present. ISO 32000-2, however, includes a better discussion of the marked PDFs that are expected to facilitate
Standard subsets of PDFs that are specifically targeted at accessibility; PDF/UA, first published in 2012.
Optional Content Groups (layers)
With the introduction of PDF 1.5 (2003) the concept of Layers appeared. Layers, or because they are more formally known as the Additional Content Group (OCG), see the content section in a PDF document that can be viewed or hidden selectively by the document or consumer author. This capability is useful in CAD drawings, layered artwork, maps, multi-language documents, etc. Basically, it consists of an Optional Content Property Dictionary added to the document root. This dictionary contains a series of Optional Content Groups (OCG), each describing a set of information and each of which can be displayed or pressed separately, plus a set of Optional Content Configuration Dictionary, which gives the status (Displayed or Pressed) of the given OCG.
Security and signature
PDF files can be encrypted for security, or digitally signed for authentication. However, since the SHA-1 collision was found using PDF format, digital signatures using SHA-1 have proven unsafe.
The default security provided by Acrobat PDF consists of two different methods and two different passwords: a user password, which encrypts files and prevents the opening, and owner password , which specifies the operation that should is restricted even when the document is decrypted, which may include modifying, printing, or copying text and images from documents, or adding or modifying text notes and AcroForm fields. The user password encrypts the file, while the owner password does not, instead of relying on client software to respect this restriction. Owner passwords can be easily removed by the software, including some free online services. Thus, the usage restrictions created by the document authors on PDF documents are not secure, and can not be ascertained after the files are distributed; this warning is displayed when applying such restrictions using Adobe Acrobat software to create or edit PDF files.
Even without deleting passwords, most open source or open source PDF readers ignore "protection" permissions and allow users to print or make copies of text excerpts as if they were not restricted by password protection.
There are a number of commercial solutions that offer a stronger information rights management tool. Not only can they restrict document access but they can also enforce permissions in ways that are not owned by standard security guards.
Usage rights
Starting with PDF 1.5, Usage Rights (UR) marks are used to enable additional interactive features that are not available by default in certain PDF viewer applications. Signatures are used to validate that permissions have been granted by bona fide gifting authorities. For example, it can be used to allow users:
- To save PDF documents along with modified form and/or annotation data ââli>
- Import form data files in FDF, XFDF, and text format (CSV/TSV)
- Exporting data files in FDF and XFDF formats
- Submit form data ââli>
- Install the new page from a page template named
- Apply digital signatures to existing digital signature form fields
- Create, delete, modify, copy, import, and export annotations
For example, Adobe Systems gives permission to enable additional features in Adobe Reader, using public key cryptography. Adobe Reader verifies that signatures use certificates from an Adobe-authorized certificate authority. Any PDF application can use this same mechanism for its own benefit.
File attachment
PDF files can have file attachments that can be accessed and opened by the processor or saved to the local file system.
Metadata âââ ⬠<â â¬
A PDF file can contain two types of metadata. The first is the Document Information Dictionary, a set of key/value fields such as author, title, subject, creation and date of renewal. This is stored in the Optional Info snippet of the file. A set of small fields is defined, and can be extended with additional text values ââif needed. This method is no longer used in PDF 2.0.
In PDF 1.4, support is added to Metadata Streams, using the Extensible Metadata Platform (XMP) to add XML-based expandable metadata as used in other file formats. This allows the metadata to be attached to any stream within the document, such as information about embedded illustrations, as well as all documents (attached to the catalog catalog), using an expandable scheme.
Use and monitoring restrictions
PDFs can be encrypted so that a password is required to view or edit the content. PDF 2.0 defines AES 256-bit encryption as standard for PDF 2.0 files. The PDF Reference also defines the ways in which third parties can define their own encryption system for PDF.
PDF files can be digitally signed; complete details of the application of digital signatures in PDF are provided in ISO 32000-2.
PDF files may also contain embedded DRM restrictions that provide further control that restricts copying, editing or printing. This restriction is dependent on the reader's software to comply with it, so the security it provides is limited.
Default display settings
PDF documents can contain display settings, including page view layout and zoom level. Adobe Reader uses this setting to override the default user settings when opening documents. Free Adobe Reader can not remove this setting.
src: seniorvarlden.com
Intellectual property
Anyone can create applications that can read and write PDF files without having to pay royalties to Adobe Systems; Adobe holds a patent for PDF, but licenses it for royalty-free use in developing software in accordance with its PDF specifications.
src: beatexcel.com
Technical issues
Accessibility
PDF files can be created specifically to be accessible to people with disabilities. PDF file formats used in 2014 may include tags, equivalent text, captions, audio descriptions, and more. Some software can automatically generate marked PDFs, but this feature is not always enabled by default. The leading screen readers, including JAWS, Window-Eyes, Hal, and Kurzweil 1000 and 3000 can read marked PDFs. Additionally, marked PDFs can be streamed and zoomed for readers with visual impairments. Adding tags to old PDFs and resulting from scanned documents can present some challenges.
One significant challenge with the accessibility of PDFs is that PDF documents have three different views, which, depending on document creation, can be inconsistent with each other. The three views are (i) the physical appearance, (ii) the display of the tag, and (iii) the content display. The physical appearance is displayed and printed (what most people think of as a PDF document). The tag view is what screen readers and other auxiliary technologies use to provide high-quality navigation and read experience to users with disabilities. The content display is based on the object's physical order in the PDF content stream and can be displayed by software that does not fully support the display of tags, such as the Reflow feature in Adobe Reader.
PDF/UA, International Standards for PDF accessible under ISO 32000-1 were first published as ISO 14289-1 in 2012, and set the normative language for accessible PDF technology.
Virus and exploits
The PDF attachment brings the virus first discovered in 2001. The virus, named OUTLOOK.PDFWorm or Peachy , uses Microsoft Outlook to send itself as an attachment to an Adobe PDF file. It is enabled with Adobe Acrobat, but not with Acrobat Reader.
From time to time, new vulnerabilities are found in various versions of Adobe Reader, which prompted companies to issue security fixes. Other PDF readers are also vulnerable. One of the incriminating factors is that PDF readers can be configured to start automatically if the web page has an embedded PDF file, providing vector for attack. If a malicious web page contains an infected PDF file that takes advantage of a vulnerability in a PDF reader, the system can be compromised even if the browser is secure. Some of these vulnerabilities are the result of a PDF standard that allows PDF documents to be written with JavaScript. Disabling JavaScript execution in a PDF reader can help reduce future exploits, although it does not protect against exploits in other parts of the PDF viewing software. Security experts say that JavaScript is not important to PDF readers, and that the security benefits that come from disabling JavaScript outweigh any compatibility issues caused. One way to avoid exploiting a PDF file is to have a web service or locally convert the file to another format before viewing it.
On March 30, 2010, security researcher Didier Stevens reported Adobe Reader and Foxit Reader exploit that run malicious executables if the user allows to launch when asked.
src: www.notebooksapp.com
Content
PDF files are often a combination of vector graphics, text, and bitmap graphics. The basic types of content in PDF are:
- Text is stored as a stream of content (i.e., not encoded in plain text)
- Vector graphics for illustration and design consisting of shapes and lines
- Raster graphics for photos and other image types
- Multimedia object in document
In PDF revisions later, PDF documents may also support links (within documents or webpages), forms, JavaScript (originally available as plugins for Acrobat 3.0), or other types of embedded content that can be handled using plug-ins.
PDF 1.6 supports interactive 3D documents embedded in PDF - 3D images can be embedded using U3D or RRC and various other data formats.
Two PDF files that look similar on a computer screen may have very different sizes. For example, high-resolution raster images require more space than low resolution. Normally a higher resolution is required to print a document than to display it on the screen. Other things that can increase the file size is to embed a full font, especially for Asiatic scripts, and save the text as a graph.
src: cdn1.techadvisor.co.uk
Software
PDF viewers are generally provided for free, and many versions are available from multiple sources.
There are many software options for creating PDFs, including PDF printing capabilities built into macOS, iOS, and most Linux distributions, LibreOffice, Microsoft Office 2007 (if updated to SP2) and later, WordPerfect 9, Scribus, many PDF print drivers for Microsoft Windows, pdfTeX typesetting system, DocBook PDF tools, apps developed around Ghostscript and Adobe Acrobat itself as well as Adobe InDesign, Adobe FrameMaker, Adobe Illustrator, Adobe Photoshop. Google Docs online office package also allows to upload and save to PDF.
Raster image processors (RIPs) are used to convert PDF files into raster formats suitable for imaging onto paper and other media within the printer, suppressing digital production and prepress in a process known as rasterization. RIPs capable of processing PDFs directly include Adobe PDF Print Engine from Adobe Systems and Jaws and Harlequin RIP from Global Graphics.
Editing
Adobe Illustrator reads and writes PDFs as a semi-native format. With a multi-page document, an open dialog allows the user to select a page to edit. Editing text paragraphs usually disrupts line justification and paragraph wrapping, since multiline text is converted to individual rows. In many page documents, only edited pages can be saved.
Inkscape version 0.46 and then allow editing of PDFs from one page through the intermediate translation steps that involve Poppler, then documents can be re-exported as PDFs.
Scribus allows opening and editing multi-page PDFs, then documents can be re-exported as PDFs.
LibreOffice Draw and Apache OpenOffice Draw (using PDFimport plugin) can open and edit multi-page PDFs, then documents can be exported again as PDFs.
PagePlus Serif can open, edit and save existing PDF documents, as well as publish documents made in packages.
Enfocus PitStop Pro, a plugin for Acrobat, allows manual editing and automatic PDF files, while the free Enfocus Browser allows for editing low-level PDF structures.
Dochub, is a free online PDF editing tool that you can use without buying anything.
Annotations
Adobe Acrobat is one example of proprietary software that allows users to annotate, highlight, and add notes to a created PDF file. One UNIX application available as free software (under the GNU General Public License) is PDFedit. Another GPL licensed app that comes from a unix environment is Xournal. Xournal allows to annotating in various fonts and colors, as well as rules to quickly underline and highlight lines of text or paragraphs. Xournal also has a form recognition tool for boxes, rectangles and circles. In Xournal annotations can be moved, copied and pasted. The Freeware Foxit Reader, available for Microsoft Windows, macOS and Linux, allows annotating documents. PDF-Xchange tracker from Tracker Software allows unlimited annotations and markup in its freeware alternatives. The Apple MacOS integrated PDF viewer, Preview, also allows annotations as well as Skim open source software, with recent support interactions with LaTeX, SyncTeX, and PDFSync and integration with BibDesk reference management software. Qiqqa freeware can create annotation reports that summarize all annotations and notes that have been created in their PDF library.
For cellular annotations, iAnnotate PDF (from Branchfire) and GoodReader (from Aji) allow PDF annotations as well as export annotation annotations.
There are also web annotation systems that support annotations in pdf and other document formats, for example, A.nnotate, crocodoc, WebNotes.
In cases where PDF is expected to have all the functions of paper documents, ink annotations are required. Some programs that receive ink input from the mouse may not be responsive enough for handwriting input on the tablet. Existing solutions on PC include PDF Annotator and Qiqqa.
More
Examples of PDF software as an online service include Scribd for viewing and saving, Pdfvue for online editing, and Thinkfree, Zamzar for conversions.
In 1993, Jaws raster image processor from Global Graphics became the first delivery of RIP prepress that interpreted the PDF natively without conversion to other formats. The company released an upgrade to Harlequin RIP with similar capabilities in 1997.
Agfa-Gevaert introduced and sent Apogee, the first prepress workflow system based on PDF, in 1997.
Many commercial offset printers have received submission of ready-made PDF files as print sources, especially the same PDF/X-1 subset and variations. The submission of press ready PDF files is a substitute for the problematic requirement to receive the original working files collected.
PDF is selected as the "original" metafile format for Mac OS X, replacing the PICT format from the previous classic Mac OS. The imaging model of the Quartz graphics layer is based on a model common to PostScript Views and PDFs, leading to a nickname Show PDF . The Preview app can display PDF files, as well as version 2.0 and later versions of the Safari web browser. System-level support for PDF allows Mac OS X applications to automatically create PDF documents, as long as they support the standard OS printing architecture. The files are then exported in PDF 1.3 format according to the header file. When taking screenshots under Mac OS X 10.0 to 10.3 versions, images are also taken as PDFs; newer versions save screenshots as PNG files, although this behavior can be set back to PDF if desired.
In 2006, PDF was widely accepted as a standard print job format in the Open Source Development Labs Printing Summit. It's supported as a print job format by Common Unix Printing System and desktop application projects like GNOME, KDE, Firefox, Thunderbird, LibreOffice and OpenOffice have switched to emit print jobs in PDF.
Some desktop printers also support direct PDF printing, which can interpret PDF data without external help. Currently, all PDF printers are capable of supporting PostScript, but most PostScript printers do not support direct PDF printing.
The Free Software Foundation once considered one of their high-priority projects to "develop a high-quality, fully-functional, free library of libraries and programs that implement PDF file formats and related technologies to the ISO 32000 standard." But in 2011, the GNU PDF project was removed from the list of "high priority projects" due to the maturation of the Poppler library, which has enjoyed wider use in applications such as Evince with the GNOME desktop environment. Poppler is based on the Xpdf code base. There are also commercial development libraries available as listed in the PDF software list.
Apache PDFBox project from the Apache Software Foundation is an open source Java library for working with PDF documents. PDFBox is licensed under the Apache License.
src: img.etsystatic.com
See also
src: crazytechpoint.org
References
src: www.linuxtechi.com
Further reading
- Hardy, M. R. B.; Brailsford, D. F. (2002). "Mapping and displaying structural transformations between XML and PDF". Proceedings of the 2002 ACM Symposium on Document Engineering - DocEng '02 (PDF) . Proceedings of the 2002 ACM Symposium on Document Engineering. pp.Ã, 95-102. doi: 10.1145/585058.585077. ISBNÃ, 1-58113-594-7. Ã,
- Standard
- PDF 1.7 [1]
- PDF 1.6 (ISBNÃ, 0-321-30474-8)
- PDF 1.4 (ISBNÃ, 0-201-75839-3)
- PDF 1.3 (ISBNÃ, 0-201-61588-6)
src: www.wikihow.com
External links
- How is the PDF format created? Quora
- PDF Associations - PDF Associations are industry associations for software developers that produce or process PDF files.
- Adobe PDF 101: PDF Summary
- Adobe: PostScript vs. PDF - Introduction to the official comparison of PS, EPS vs. PDF.
- PDF standard... transfers PDF specifications from de facto standard to de jure standard on Wayback Machine (archived April 24, 2011) - Information about PDF/E and PDF/UA specifications for accessible file format (archived by The Wayback Machine)
- ISO 19005-1: 2005 ISO/PDF standard A-1 issued by International Organization for Standardization (subject to charge)
- PDF Reference and Adobe Extensions to PDF Specifications
- Portable Document Format: Introduction to Programmers - Introduction to PDF vs. PostScript and internal PDF (up to v1.3)
- The Camelot Paper - a paper in which John Warnock outlines the project that created the PDF
- Everything you want to know about PDF but afraid to ask - recorded conversations by Leonard Rosenthol (Adobe Systems) at TUG 2007
- How to generate PDF with XSL-FO
- PDF To Excel Converter
- Some examples of interactive 3-D PDFs
Source of the article : Wikipedia
if
and loop
commands. PDFs are mostly based on PostScript but are simplified to remove the flow control feature like this, while graphical commands like lineto
persist.% PDF-1.7
. The format is part of the COS format ("Carousel" Object Structure). The COS tree file consists primarily of objects , which are of eight types: (...)
), may contain 8-bit characters. /
) [...]
) & lt; & lt;... & gt; & gt;
) %
). Comments may contain 8-bit characters. obj
and endobj
. An index table, also called a cross-reference table and tagged with the xref
keyword, follows the main body and gives offset bytes of each indirect object from the beginning of the file. This design allows for efficient random access to objects in the file, and also allows for minor changes to be made without rewriting the entire file ( additional updates ). Starting with PDF version 1.5, indirect objects can also be found in a special stream known as object flow . This technique reduces the file size that has a large number of small indirect objects and is very useful for Tagged PDF . trailer
keyword. It contains xref
) %% EOF
. - PDF 1.7 [1]
- PDF 1.6 (ISBNÃ, 0-321-30474-8)
- PDF 1.4 (ISBNÃ, 0-201-75839-3)
- PDF 1.3 (ISBNÃ, 0-201-61588-6)