Why use open formats?
What is a format?
When you are writing an article, retouching an image, building a webpage, listening to a song or watching your favorite film on your computer, you are dealing with files. In order for these files to be opened, read or modified with your favorite applications, they need to have a
format. A format is what enables an application to interpret the raw data contained in a file. A format, in other words, is the mode of representation of these data. Very often, file formats are marked in the extension of the file name: the three-letter suffix with which the file name ends. For instance
mypage.htm is a document written in HTML; There are specific formats for images (like JPEG, PNG, GIF, TIF, BMP), simple text (ASCII, often marked with the
.txt extension), for formatted text (HTML, RTF, DOC) and for printer-ready documents (PDF, PS).
Open vs. proprietary formats
In order to work with a file, you will need most of the times some application allowing you to read, edit and save the data contained in this file.
OPEN FORMAT - We will say that a file format is
open if the mode of presentation of its data is transparent and/or its specification is publicly available. Open formats are ordinarily standards fixed by public authorities or international institutions whose aim is to establish norms for software interoperability. There are nevertheless cases of open formats promoted by software companies which choose to make the specification of the formats used by their products publicly available.
It should be noted that an open format can either be coded in a
transparent way (readable in any text editor: this is the case of markup languages) or in a
binary mode (unreadable in a text editor but thoroughly decodable once the format specifications are known).
PROPRIETARY FORMAT - We will say that a file format is
proprietary if the mode of presentation of its data is opaque and its specification is not publicly available. Proprietary formats are developed by software companies in order to encode data produced by their applications: only the software produced by a company who owns the specification of a file format will be able to read
correctly and completely the data contained in this file. Proprietary formats can be further protected through the use of
patents and the owner of the patent can ask royalties for the use or implementation of the formats in third-party's software.
Terminological note: we call here proprietary what others call
closed; We do not mean to suggest by our terminological choice that everything that is not proprietary should be a public standard. We stressed in the paragraph above that there are many formats with a public specification (="open") which have been developed by software companies. The difference between open and proprietary (or closed) only consists in the availability or non-availability of a public specification of the format.
Using and exchanging files in proprietary formats
Proprietary formats are widely diffused nowadays because they are used by a large number of publishing, image or text processing applications. A proprietary format encodes data in such a way that a file will only readable with the original software used to create it. On the contrary, an open format guarantees that a file can be read by any software written for that purpose.
The difference between open and proprietary formats might go unseen in cases of
local use of files. By local use, we mean any personal use of files but also sharing of these files with other users who came to an agreement on which sotware should be used to read them.
As soon as the use is no more local and files are
exchanged (by uploading them on the net, by publishing them, by sending them as e-mail attachments, i.e. any form of communication in which the sender and the receiver have no previous agreement on which software should be used for reading these files), the open vs. proprietary distinction becomes crucial.
Four reasons not to use proprietary formats
Proprietary formats can be used
locally without risk by any user for his own personal use or by a group of users who have
formerly decided to use a specific format and a specific software for cooperating on a given project. On the other hand,
sharing files is a social act, whose effects concern not only the user itself, but also the set of all possible users of these files.
Sharing files requires converting the original format into a format appropriate for file exchange. Open formats are by definition exchange formats: they insure accessibility, interoperability, and perennity to the data.
When exchanging files in proprietary formats, you contribute to the propagation and enforcement of undesired practices.
1. Taking the risk that the recipient may not be able to access the data
- A proprietary format makes the use of a specific software compelling for having access to the file content. By exchanging files in proprietary formats you tacitly assume that all the recipients of your file possess the software needed for opening the file: any user that for technical reasons (e.g., users working on a different platform) or financial reasons (users that cannot afford buying the required software) cannot run that specific software, will never be able to use the file.
- Now, let us assume the user possesses the application needed to open the file. Will this guarantee the complete accessibility of the file content? Unfortunately not: a strategy largely adopted by software producers consists in regularly upgrading the data formats they implement in their applications. Such strategy is meant to lock the user in to the use of a specific proprietary software. In this way, the only way for the user to assure future accessibility to his/her own data or to guarantee perennity to old files is to regularly buy updates of a specific software.
- [Semi-proprietary formats and predatory and predatory practices]. A similar strategy to lock the user in to a specific data format (« Embrace and Extend »∞) consists in adopting at the beginning an open format for storing a software data and then progressively modifying this format with proprietary extensions, which make the resulting format incompatible or unreadable with other software based on the original format. This strategy are often adopted to turn a public standard in a semi-proprietary format.
The adoption of proprietary or semi-proprietary formats is the result of corporate strategies that go against the user's needs, which should privilege accessibility, interoperability and perennity of exchanged data.
2. Taking the risk of transmitting confidential information
A proprietary format encodes information which is not publicly visible. Only the producer of the format or the owner of the software which reads this format, which holds the key to totally decode the format, is able to access this information.
Often, at the moment of recording, the software adds some information to the file which is not accessible to the lay user, such as the user's name, the software's serial number, the type of operating system, the computer on which the user works, the folder in which the file is to be found, etc. Some of this information can sometimes be coded in a legible manner unbeknownst to the author and they can then be accessible to everyone: the anecdote of the anonymous political manifesto sent out in MSWord format with the name of the author clearly legible in the properties of the document is probably the most famous case of unexpected consequences of using proprietary formats as exchange formats.
There are more serious consequences than failing to protect personal data, such as transmitting military information or trade secrets. It is somewhat curious to learn that people accept without raising an eyebrow that the Ministry of Defense of a given country produces and shares documents with information accessible only to a private company in a foreign country.
Transmitting documents in a proprietary format means transmitting information nobody reallys knows about, other than the owner of the software that can read this format.
3. Contributing to virus propagation and exposing oneself to the risk of contamination
Most viruses are carried by infected files exchanged by users. Such viruses exploit the vulnerability of specific applications or security breaches of specific operative systems to execute malicious code.
Virus proliferation in these cases relies on the fact that the majority of users use the very same kind of software and share data in the native (=proprietary) format of such software. Most viruses are hence not only
platform-specific but also
application-specific: in many cases, simply switching to a different application makes a system immune against a class of viruses (see for instance the large number of MSWord-macro viruses). Using open formats - data formats that are software-independent, interoperable and accessible on different platforms - weakens the overall impact of viruses and discourages their propagation: it is extremely easier to create a virus exploiting known vulnerabilities of a single, largely used software and the lack of awareness of users, than adding malicious code within a format that can be read by a large number of applications and on different kinds of platforms.
4. Propping up existing monopolies in the domain of electronic communication
This problem may not be meaningful for the individual user, but it affects dramatically the community of users. By exchanging and publishing files in proprietary formats, you implicitly force your addressee to use the same software that you used for producing and storing your data. The message that is implicitly conveyed when exchanging a file in a proprietary format is
"Use software X or you won't be able to read this file".
This practice - which also appears when you exchange a file in a given format by considering
self-evident that any other users possess the required software - has a twofold consequence:
- On the one hand, this practice enforces and strengthens the usage of a proprietary format owned by a company as a de facto standard∞: this means making interoperability, accessibility and perennity of your data "hostage" of the contingent policy of a software company. If the software producer decides or is forced to stop developing the software needed for interpreting a specific format, all the existing files encoded in this format will suddenly become unusable: since the format specifications are not publicly available, it will be impossible to retrieve the full content encoded in a file.
- On the other hand, by propping up a de facto monopoly, this practice hinders fair competition between software producers - which is admittedly an essential condition for technological development - and weakens the initiatives for promoting open format specifications and public standards - which are commonly regarded as minimum guarantees for free and fair competition.
Four reasons to adopt open formats
Using open formats in data exchange and publication means:
- Granting accessibility and perennity to your data: both you and the users of your data will always be able to read and access them.
- Granting a complete transparency to the content of your files.
- Limiting the propagation of viruses: adopting open formats drastically helps reducing the risk of contamination.
- Promoting diversity and interoperability in the domain of electronic communication.
Which proprietary formats should be avoided
Proprietary formats are not exchange formats. Most of the data that you stored in proprietary formats and that are meant for diffusion or electronic publication can be easily converted to the corresponding open formats.
Main proprietary formats to be avoided include the following:
Proprietary formats with public specifications
Some file formats remain proprietary although they have public specifications. The fact the specifications are available now is not an indication they will continue to be available in the future, with restrictions in their license's publication, format implementation or both. Some examples:
MS Word documents (DOC)
MS Word document format is a semi-trasparent proprietary format developped by Microsoft. Part of the data it encodes are accessible, while most of them are opaque.
The same formatting and word processing capabilities of MS Word documents are supported by the "Ooo" open format - an XML-based standard developped by the OpenOffice.org free software suite, which satisfies accessibility criterions established by the W3 Consorsium. OpenOffice.org format - because of its portability and compatibility - aims at becoming the reference standard for formatted text documents.
If the text is not aimed at edition by the recipient, the best solution is to use the HTML open format, readable into any web browser, and editable into any text editor. In case a precise page formatting is needed (for instance for documents that will be printed), the PS and PDF open formats are the best solution. For scientific texts, suitable open formats are TeX and DVI.
As an alternative solution (although not optimal, it is still better than the MS Word document format), for co-authoring documents is the semi-proprietary RTF format, which, in its native form, has a specification, and can be read by almost every word-processing software.
MS Excel tables and databases (XLS, XLW)
MS Excel document format is a proprietary format developped by Microsoft. The best open-format alternative to save and publish huge arrays of string/values is to use plain text with separators ("Comma Separated Values" - CSV). CSV format an be read, modified and saved by any database edition software, moreover this format only requires few disk space.
MS Power-Point slide shows (PPS, PPT)
More and more slide shows available online were created using MS Power-Point proprietary format. The best open-format solution to publish slide shows is PDF format. This format doesn't handle animations nor transition effects, but it has interesting rendering capabilities:
- both fonts and graphics are handled as vectorial objects, which allows to strech/enlarge slides without quality loss.
- portability: pagination and formatting are identical on which ever platform you are viewing it.
High-quality bitmap images (BMP, TIF)
For images which require higher chromatic definition, proprietary formats such as BMP or TIF can be replaced by the JPEG open standard.
Vectorial images (WMF)
The vectorial image format WMF can be replaced by its open equivalent: SVG.
Plain text (ASCII)
Whenever possible, just avoid using formatted text: using plain text (either ascii or .txt format) guarantees complete access for everyone, regardless of their software, their operating system or the computer they are using. In your emails, if what is important to you is the content and not the formatting, send the text directly in the body of your message instead of sending it as an attachment.
Plain text can carry no virus, it is extremely light and can be easily used to create tables (with tabs or commas) which any software is able to read.
Hyper Text Markup Language (HTML)
HTML format is the standard language for the web, and it was defined by an standardizing international organization (the W3_Consortium). HTML is a flexible universal format, rich and compact. Native HTML (with no javascript) can carry no virus and can be read on any platform.
Note: The HTML code produced by Word is semi-proprietary, and it is prone to include information which cannot be displayed on all platforms.
W3: HyperText Markup Language (HTML)∞
TeX, LaTeX and Device Independent Format (DVI)
TeX is both a language to typeset documents and a programming language. Originally written to typeset mathematical documents in a professional manner, it is now used in many other areas.
LaTeX is also a typsetting and programming language. It's actually a simplified version of TeX which enables top level instruction manipulation, just as HTML is a simplified version of SGML.
DVI. A TeX or LaTeX source file must be compiled. The result of this compilation is in DVI format, readable on any platform. Most of the time, the result of the compilation will, in turn, be converted to PDF or PS.
TeX User Group (TUG)∞
LaTeX Project∞
TeX Showcase∞
Open Document Format for Office Application
OpenDocument is:
- An open, XML-based file format.
- An open standard, supported by the OASIS (IBM, SUN, Openoffice.org team) and ISO standards groups.
- The default file format for OpenOffice.org 2.0 and KOffice 1.4.
- A top prospect for an official format for the European Commission.
- Our best chance to fight vendor lock-in associated with proprietary formats.
External links
OASIS: OpenDocument specifications∞
Rich Text Format (RTF)
RTF format was introduced by Microsoft to create a standard format for text formatting. It offers the same format variety than DOC, all the while being (at least in its native version) a format with public specifications. Most word-processing programs are capable or reading and writing this format, but because certain programs tend to use proprietary extensions of this format, its compatibility remains uncertain.
PostScript (PS)
The PostScript format is a language describing a page, developped by Adobe in 1985, created for printing and widely used in typography. One of its advantages is that it is universal (it is independent from the format of the original file) and it cannot carry viruses. Contrary to PDF format, PostScript does not allow to copy text viewed on a screen to paste it in another application. It can be generated with compatible printers (option: 'print in file') and with the GhostScript program.
PostScript Language Specifications∞
Portable Document Format (PDF)
PDF format (Portable Document Format), developed by Adobe, is a document presentation format, the specifications for PDF are available on the web. It is a universal format (regardless of which platform and software are used to generate it), compatible with any printer, flexible (you can substitute fonts, add links, bookmarks, notes) and legible onscreen with the appropriate plugins. It can be generated with Adobe Acrobat, with the open source software GhostScript or created on the fly in a Unix environment.
Adobe PDF Specifications∞
Joint Photographic Expert Group (JPEG)
JPEG is one of the most efficient picture compression formats currently available. This open format is very light and allows you to determine the rate of data compression, knowing that the higher the compression rate, the lower the quality of the picture. JPEG follows a process of cumulative compression: the image is clearly affected if you open it and save it with a new compression rate.
A variant of this format, progressive JPEG, allows you to optimise the time it takes to display the picture on internet. The new JPEG_2000 standard, currently being defined, will allow for a better quality/compression ratio as well as the indexing of pictures with keywords.
Joint Photographic Expert Group∞
W3: JPEG Overview and Specifications∞
JPEG 2000 Overview and Specifications∞
Portable Network Graphics (PNG)
PNG-8 and PNG-24 are two open formats which are also license-free. They represent the principal alternative to the GIF format, specially created to optimise the display of images on internet. They allow data compression without loss of information and are supported by most browsers.
The size of a PNG file remains significantly higher than its JPEG equivalent. However, PNG will advantageously replace GIF for images which are 8-bit or less.
W3 - Portable Network Graphics: Overview and Specifications∞
Scalable Vector Graphics (SVG)
For vectorial formats, there now exists an open format thanks to the work of a research group created in 1998 by the W3_Consortium: the Scalable Vector Graphics (SVG). This is a format based on other public standards (XML, CSS, HTML) which allows for the creation of vectorial images which are re-scaleable, perfect to save bandwidth, to optimise layout and to allow zooming without losing the quality of the image. Graphics created in SVG can be dynamic or interactive, can group, transform, create graphic objects within other objects and be given style attributes.
W3: Scalable Vector Graphics (SVG) - Overview and Specifications∞
Links
a. Organizations
b. Initiatives for promoting standards and open formats
c. Institutional resolutions and law proposals
About this document
Copyright © 2004
openformats.org
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled
"GNU Free Documentation License".
The goal of
openformats.org is to create a freely-available and comprehensive documentation on open formats. The texts are edited and discussed by a community of voluntary
contributors. The license we use grants free access to our content in the same sense as free software is licensed freely. This principle is known as copyleft. That is to say, openformats.org content can be copied, modified, and redistributed so long as the new version grants the same freedoms to others and acknowledges the authors of the openformats.org article used (a direct link back to the article satisfies our author credit requirement). openformats.org articles therefore will remain free forever and can be used by anybody subject to certain restrictions, most of which serve to ensure that freedom.
To fulfill the above goals, the text contained in
openformats.org is licensed to the public under the GNU Free Documentation License (GFDL). The full text of this license can be found here:
GNU Free Documentation License.
The text of the GFDL is the only legally binding document; what follows is our interpretation of the GFDL: the rights and obligations of users and contributors.
IMPORTANT: If you want to use content from
openformats.org, first read the Users' rights and obligations section. You should then read the GNU Free Documentation License.
Users' rights and obligations
If you want to use
openformats.org materials in your own books/articles/web sites or other publications, you can do so, but you have to follow the GFDL. If you are simply duplicating an
openformats.org article, you must follow section 2 of the GFDL on verbatim copying.
If you create a derivative version by changing or adding content, this entails the following:
- your materials in turn have to be licensed under GFDL,
- you must acknowledge the authorship of the article (section 4B), and
- you must provide access to the "transparent copy" of the material (section 4J). (The "transparent copy" of an openformats.org article is its wiki text.)
You may be able to partially fulfil the latter two obligations by providing a conspicuous direct link back to the
openformats.org article hosted on this website. You also need to provide access to a transparent copy of the new text. However, please note that the administrators of the
openformats.org website makes no guarantee to retain authorship information and a transparent copy of articles. Therefore, you are encouraged to provide this authorship information and a transparent copy with your derived works.
Example notice
An example notice, for an article that uses content from
openformats.org might read as follows:
This article is licensed under the <a href="http://www.gnu.org/copyleft/fdl.html">GNU Free Documentation License</a>. It uses material from the <a href="http://www.openformats.org/foo">openformats.org article "Foo"</a>.
("Foo" and the openformats.org URL must of course be substituted accordingly.)
Alternatively you can distribute your copy of Foo along with a copy of the GFDL (as explained in the text) and list at least five (or all if fewer than five) principal authors on the title page (or top of the document).
Contributors' rights and obligations
If you contribute material to
openformats.org, you thereby license it to the public under the GFDL (with no invariant sections, front-cover texts, or back-cover texts). In order to contribute, you therefore must be in a position to grant this license, which means that either:
- you own the copyright to the material, for instance because you produced it yourself, or
- you acquired the material from a source that allows the licensing under GFDL, for instance because the material is in the public domain or is itself published under GFDL.
the first case, you retain copyright to your materials. You can later republish and relicense them in any way you like. However, you can never retract the GFDL license for the versions you placed here: that material will remain under GFDL forever. In the second case, if you incorporate external GFDL materials, as a requirement of the GFDL, you need to acknowledge the authorship and provide a link back to the network location of the original copy. If the original copy required invariant sections, you have to incorporate those into the
openformats.org article; it is however very desirable to replace GFDL texts with invariant sections by original content without invariant sections whenever possible.
Using copyrighted work from others
If you use part of a copyrighted work under "fair use", or if you obtain special permission to use a copyrighted work from the copyright holder under the terms of our license, you must make a note of that fact (along with names and dates). It is our goal to be able to freely redistribute as much of
openformats.org material as possible, so original images and sound files licensed under the GFDL or in the public domain are greatly preferred to copyrighted media files used under fair use.
Never use materials that infringe the copyrights of others. This could create legal liabilities and seriously hurt the project. If in doubt, write it yourself.
Note that copyright law governs the creative expression of ideas, not the ideas or information themselves. Therefore, it is perfectly legal to read an encyclopedia article or other work, reformulate it in your own words, and submit it to
openformats.org.
List of contributors
People who contributed to this document are (in chronological order):
openformats.org is constantly looking for translators and contributors. Want to join? Go to the
registration page.
Home
How to contact us
Feedback is very much welcome.
You can freely add comments to any page or post your own suggestions and contributions in the
intranet section (
registration required).
Or if you prefer, you can directly send us your
feedback.