eBook File Formats

There are a lot of eBook file formats: TXT, HTML, AZW, DOC/DOCX, OPF, TR2/3, ARG, DTB, FB2, XML, CHM, PDF, PS, DJVU, LIT, PDB, DNL… ok, I think I'll stop now. It's obvious that while one file format might be a nice ideal, it is anything but reality. Not everyone uses the same software, and there's no such thing as a universal e-book file format that all market players have adopted (EPUB stands out, but not every eBook retailer uses it).

At its simplest form, e-books are just text files. But text files are too simple. They don't contain the characteristics necessary for an e-book to rival a printed book in appearance. Also, TXT files do not support DRM.

E-book readers—both software and hardware—are a topic unto itself. For this post, I therefore want to focus just on the file formats that these (software or hardware-based) readers support. Also, I'll only focus on those formats I feel are the most relevant. It's not very realistic, IMO, for someone to read an e-book of any length in TXT format or even HTML. Other formats, such as PKG (which was a file format for reading e-books on an Apple Newton), are outdated enough to not garner further attention.

So, here are the formats and a bit of information about each.



AZW is the file format used by the Amazon Kindle e-reader. It is proprietary to Amazon and is DRM protected. The best way to both convert a file to this format and publish on Amazon's Kindle store is to use their Digital Text Platform site.

Their recommendation for having a successful conversion:

The preferred format for uploading content is as a single HTML file. To include images, provide a ZIP file that includes the images as well as the HTML file that refers to them (check the formatting guides to find out how to link to images from HTML). The HTML and image files all have to be in the same folder inside the zip file.

I've gone through this process to publish my novel, The Hall of the Wood, on the Kindle store; it is a pretty painless process.

Note: A lot of people/reviewers think Kindle only supports AZW. This isn't true. Kindle also supports (natively) TXT, PDF, Audible (Audible Enhanced (AA, AAX)), MP3, unprotected MOBI and (through conversion) HTML and DOC.



PDB is DRM-protected format advocated by Palm Digital Media. It stands for Palm Database, and originally was intended as a file format meant to be read on the Palm handheld device. It seems from looking around that many retailers support this format and that it isn't necessarily required to have a Palm handheld to read files in this format as software for PC's or Mac's is available. Also, the format is supported on many other handheld devices.



PDF stands for Portable Document File. It was established by Adobe in hopes of creating a universal file format to promote the ready exchange of data, specifically document files. DRM-free PDF's can be read by the free Adobe Reader. PDF's protected by DRM can be read by Adobe Digital Editions, which has the ability to allow or deny access to a downloaded PDF depending on the conditions under which the file was obtained.

If an e-book was outright purchased, you should be good to go, though you will have to read the PDF using Digital Editions and will be further restricted from saving or printing the e-book. On the other hand, if you checked an e-book out from an online library and that e-book contains DRM, chances are the e-book will "expire" after the loan period is up, at which time you will no longer be able to view the e-book.

PDF documents can be created by any number of freely available software converters. My preferred method of conversion is to use the Microsoft Save as PDF or XPS add-in for Microsoft  Word 2007. Of course, there's always Adobe Acrobat Professional, too.



OpenDocument Format is an XML-based file format used to represent spreadsheets, presentations, word processing documents, and more. While ODF has emerged as an industry standard, the specification having been ratified by over 600 technology companies (including Microsoft and Adobe), it is of some note that while applications such as Microsoft Office support ODF, that suite also still defaults to its own proprietary file formats. ODF is, however, the default file format for OpenOffice, a popular open source alternative to Microsoft Office.

ODT, or OpenDocument Text, is the word processing specific version of the ODF file format standard. Similarly, there are presentation (ODP), spreadsheet (ODS), and other formats.



The Rich Text Format was developed by Microsoft in the 1980's. Not surprising, it is an 8-bit based format, and while it can address larger character sets, it is through means that relegate the format to mostly a legacy role. Still, the format is quite prolific; converting to RTF is supported by most word processing and other applications.



The default file format supported by Microsoft Word. With Word 2007, Microsoft introduced the DOCX format, which is billed as an open, XML format that, unfortunately, has not been as widely adopted as Microsoft might have hoped. One of the nice things about the DOCX format is that it results in much leaner files. However, it is not backward compatible with previous versions of Word.



EPUB is an e-book specific format engineered by the International Digital Publishing Forum (IDPF) and intended to replace the Open eBook (OEB) standard. EPUB includes optional support for DRM. The standard is supported by the Barnes & Noble nook, Sony Reader, and Apple's iPhone as well as other devices.

As far as converting a document to the EPUB format, it looks like there are several options: BookGlutton hosts an HTML-to-EPUB file converter, Google Code contains a software library called epub-tools which looks suitable for batch style conversion of files, and LexCycle has something called Stanza which looks to be a desktop application. I'll have to give each of them a whirl to see which is the best option.



The PRC/MOBI file format is based on the Open eBook (OEB) standard (which I discovered was superseded by the EPUB standard; see above), and is considered one of the most prolific e-book file formats for mobile devices. The biggest proponent of this format is Mobipocket.

Mobipocket offers both reader and publisher software, both free. Mobipocket Reader will run on PC's as well as a number of handheld devices. There are two ways to use Mobipocket Creator to author e-books: use the application to create the e-book and then add content and design from there or, the more practical approach, import Word, text, or PDF documents.

The PRC/MOBI format does, of course, support DRM.




BBeB, or Broadband eBook, is Sony's proprietary file format for e-books, as if we needed yet another one. It comes in two varieties: LRX for encrypted (DRM) e-books and LRF for unencrypted e-books.

Sony has their own e-book store where one can download e-books in these formats. The newest version of the Sony Reader is a device widely expected to give Amazon's Kindle a run for the money. In order to read books in the BBeB format, you will need a Sony Reader, much like the AZW format is married to the Kindle.

However, Sony opened the Reader up so that it also supports the EPUB format. This is a good thing, and leaves the Kindle as virtually the only device that locks its users into a proprietary format.

I haven't yet found a viable method by which to publish e-books in this format.

Two options have come to light for converting from a more standard format to BBeB:

1.) As ZenEngineer points out in the comments below, there is a freeware program called Calibre that will perform the conversion.

2.) Also, there is the bbebinder open-source project hosted on Google Code which converts HTML and TXT files to the BBeB format.



This is a Microsoft-specific file format whose time I can't help but wonder may be at an end. LIT files are readable only on Microsoft Reader, and while there are versions of the software for PC's and handhelds, the major players in those areas (Amazon, Sony, Apple) have their own proprietary formats.

Creation of LIT files seems a bit problematic as well. There is a Read in Microsoft Reader add-in for Microsoft Word 2000 and higher, but "higher" here does not include Word 2007. That kind of tells me the format is being abandoned by Microsoft.

References/Further Reading

Libraries Going Digital

image licensed under Creative Commons Attribution 2.0, obtained from http://commons.wikimedia.org/wiki/File:SanDiegoCityCollegeLearingRecourceCity-bookshelf.jpgAccessing the full text of books online is nothing new. One can peruse the digital shelves of such sites as the World Digital Library, the International Children's Digital Library, the University of Pennsylvania's Online Books Pages, or even The New York Public Library, which is working with Google Books to offer a searchable subset of their collection online.

Therein, though, lies the kicker: these so-called digital libraries are incomplete. Though a somewhat quick perusal of the NYPL's online catalog was impressive, the fact that they maintain an "e" catalog separate from the main one tells me their digital collection differs from what you might find by walking into the actual library.

That might be true of the NYPL, but not so for the library of Cushing Academy. You see, Cushing is going digital, all-the-way:

This year, after having amassed a collection of more than 20,000 books, officials at the pristine campus about 90 minutes west of Boston have decided the 144-year-old school no longer needs a traditional library. The academy’s administrators have decided to discard all their books and have given away half of what stocked their sprawling stacks - the classics, novels, poetry, biographies, tomes on every subject from the humanities to the sciences. The future, they believe, is digital.

The school's headmaster, Dr. James Tracy, is leading this effort. Dr. Tracy believes that printed books are "outdated technology, like scrolls before books".

In place of their traditional library, they are spending $500,000 to create an all-new "learning center". Upon entering this new facility, students will no longer see rows and rows of books, but instead will find:

  • three large flat-screen TVs that will project data from the Internet (at a cost of $42,000 these are obviously Dallas Cowboys-sized screens)
  • special laptop-friendly study carrels (at a cost of $20,000)
  • a coffee shop ($50,000) that will include a $12,000 cappuccino machine
  • 18 electronic readers ($10,000) made by Amazon.com and Sony

(No, I did not make those dollar figures up)

I'll put aside the discussion of why they're turning hallowed reading ground into a social mecca with televisions and coffee shops to instead focus on the switchover of their printed material to an all digital library. There's no doubt the world is going digital more and more, so it's not surprising to see this happen. Convenience, accessibility, sheer breadth of titles are all good reasons for this. Also, as technology for digital readers continues to evolve, and as the readers themselves become more affordable, we'll continue to see growth in the e-book space. But will we ever reach the point where physical libraries either do not exist or exist exclusively as meeting places for people who share a common desire to learn (while sipping coffee made by a $12,000 cappuccino machine, of course)?

I have to say that personally I really have nothing against an all-digital world. While I continue to read printed books daily, I haven't set foot in a library since I graduated college. In fact, as I was performing some research for this post, I became impressed enough with the NYPL's digital catalog that I might sign up for a library card (or is it library e-card?) as soon as I'm done here. One of the most amazing aspects of this is that I don't live anywhere near New York, nor will I ever have to set foot into their library to take advantage of their catalog.

At some point, I plan to buy a Kindle or similar e-book reader. At that time, I may very well never buy another printed book again. You can bet that as successive generations become more familiar with e-Ink than the kind that comes from a printing press, that the perception of what a book is will change.

So, too, then will libraries.

Is this the last word on Kindlegate?

Amazon's Kindle digital e-book reader has had its share of controversy. The latest, and perhaps biggest misstep, came when they remotely deleted e-books legally purchased by consumers, but which had been illegally made available for sale by an unscrupulous vendor who ignored certain copyright laws.

Ever since Amazon performed that act of deletion, removing such works as 1984 and Animal Farm right from under readers' noses, they have been playing make-up with consumers who, in some cases, have resorted to lawsuits to "ease their pain." Jeff Bezos, CEO of Amazon, even issued an apology.

Now, we've come full circle. Amazon has offered to replace copies of 1984 and Animal Farm at no charge to Kindle consumers. The message was sent in an email, and reads (source here):

"As you were one of the customers impacted by the removal of "Nineteen Eighty-Four" from your Kindle device in July of this year, we would like to offer you the option to have us re-deliver this book to your Kindle along with any annotations you made. You will not be charged for the book."

"This is an apology for the way we previously handled illegally sold copies of 1984 and other novels on Kindle. Our "solution" to the problem was stupid, thoughtless and painfully out of line with our principles. It is wholly self-inflicted, and we deserve the criticism we've received. We will use the scar tissue from this painful mistake to help make better decisions going forward, ones that match our mission."


Amazon said in an e-mail message to those customers that if they chose to have their digital copies restored, they would be able to see any digital annotations they had made. Those who do not want the books are eligible for an Amazon gift certificate or a check for $30, the company said.

It would seem they're pulling out all the stops, giving consumers enough options that how could anyone not wind up satisfied?

If only it were that simple…

Amazon violated a fundamental right of people who live in a free society when they deleted those e-books. Yobie Benjamin says it best:

In most cases, it would require a government subpoena, grand jury summons or court order to require you to reveal the contents of your device, turn over the contents of your device and/or to delete the contents in the device.

Yet Amazon did so without any of those things. Clearly, they overstepped their bounds. Their attempts to make amends is proof enough of that. But did they go so far as to make the act unredeemable? Have they single-handedly crushed any potential for mass adoption of their Kindle and other similar devices that make use of DRM?

I think if anything good is to come of this it will be the shortening of the lifespan of digital rights management technology. We've already seen this in the music industry, where Amazon—and even Apple now—sell DRM-free MP3's. Amazon has laid bare the true evil of DRM for all to see. 'All', in this case, is the wider audience they are still trying to sell their device to. Sure, the Kindle is doing well, but there's a lot more people without the device than with it. If the device—and e-books as a whole—is to succeed, it needs this mass adoption.

Amazon, I'm afraid, may have cooked this goose a bit too long. It's past done.

As an aside, this post marks my 200th on this blog. It's not my 200th post overall, because I was blogging on another platform before I made the switch to scottmarlowe.com, but I did move over the "best of the best" of those posts onto here, so maybe we can call it my 200th 'good' post. Anyway, thought it was worth mentioning.

How much do you make selling through Amazon's Kindle store?

If you're interested in this topic you might also want to read Half of self-published authors earn less than $500.

I recently uploaded my first fantasy novel to Amazon's Kindle Store. The idea behind making it available on Amazon is (1) to hopefully gain more exposure and (2) maybe make a buck or two in the process. I'd like to take a moment to look at the latter of those reasons by asking the following question: How much, really, can one make selling an e-book in the Kindle store?

First, there's what Amazon calls the "Suggested Retail Price", or SRP. This is set by the author at the time the e-book is uploaded:


The price you charge can range from a minimum of $0.99 to a max only Bill Gates, Warren Buffett, or Bernie Madoff (before he admitted to his Ponzi scheme and was locked up for 150 years) could hope to afford. Amazon, however, discourages price points above $9.99; you'll find many bestsellers featured prominently on the Kindle store-front selling at this price due to discounts Amazon has applied.

That brings us to our next point of discussion: Amazon's discount. We've all seen it, where Amazon takes a product that normally retails for $129.99 and discounts it to $69.99. The same principle applies here, though the discount in no way impacts an author's royalty. From my extensive research (which consisted of reading through a handful of posts on the Amazon DTP Forums until I found this one), I discovered this statement from Customer Service:

"...please know that as per our terms and conditions, our decision to discount products is based on a number of considerations which can vary over time. You will continue to receive the set percentage of the list price you set for every sale, even if Amazon changes the retail price for your content."

What this basically means is that while there may not be a method to their madness concerning what gets discounted and by how much, if and when they do discount your e-book, it will not negatively affect your royalties.

So now we come to the royalty itself, or how much we actually make per sale. The simple answer is 35% of the SRP. For a longer answer one can look to Amazon's DIGITAL PUBLICATION DISTRIBUTION AGREEMENT:

5. Royalties. Provided you are not in breach of your obligations under this Agreement, we will pay you, for each Digital Book we sell, a royalty equal to thirty-five percent (35%) of the applicable Suggested Retail Price for such Digital Book, net of refunds, bad debt, and any taxes charged to a customer (including without limitation sales taxes) (a “Royalty”).

That means for every e-copy of The Hall of the Wood sold, currently priced at $0.99, I'll make $0.35. Amazon gets the remaining $0.64. As above, should Amazon choose to discount my e-book, I'll still make the 35% royalty on the original SRP, so still $0.35. I can adjust my price point up a bit and make a little more per unit sold, but of course can't drop it below the minimum $0.99 threshold.

So, that might be more information than you cared to know, but there it is.

Sell Your E-books in the Amazon Kindle Store

I've been interested in Amazon's Kindle digital book reader since its inception (though, admittedly, I didn't start blogging about it until the second version came out). I haven't bought one yet because I'm waiting for the inevitable price reduction, but that doesn't mean I haven't been exploring its features and some of the content for the device.

The biggest source of content for the Kindle is, of course, Amazon's Kindle Store. The store features a lot of e-books. A quick run down of some of the categories:

Fantasy 5,267 e-books
Science Fiction 7,299 e-books
Mystery & Thrillers 13,570 e-books

Total, there's over 300,000 titles available for download to your Kindle. That's a lot of books.

I recently discovered one of the best things about the Kindle store: anyone can post products there. JA Konrath clued me into the possibility, and he does a nice job of breaking down some of his own sales numbers. You can see that he's had no small success at it thus far. Granted, Konrath is a published author, so his name is out there via other, more traditional channels, but he also puts forth a lot of effort online as well. Nonetheless, is the possibility of an unpublished writer posting his or her work to the Kindle store gold waiting to be mined? I plan to find out.

As of a couple of days ago, my novel, The Hall of the Wood, is available for purchase via the Kindle store:


I wanted to make the price $0.25, but $0.99 is the minimum allowed price. The one catch is that, of course, you must have a Kindle to which to download the e-book to. So, if you've already spent $300 for the device, what's another $0.99? ;-)

The concept of an unpublished author finding success in this channel is a challenge. As noted above, there are over 5,000 fantasy titles available for purchase in the Kindle store. How to make my novel stand out amongst those? For one, I created a book cover. Nothing fancy, but it gives the potential buyer something to look at other than "No image available". Second, I gave it a product description, which is the standard blurb taken from my web site:

Jed's wife and unborn child are dead, killed by a legacy he dare share with no one. Seeking a reprieve from his guilt, he sets out for his former home, the Ranger Hall of the Wood. Along the way, he discovers all is not well. Aliah Starbough, a friend from Jed's past, sends him a chilling warning: the rangers are dead, the Simarron Forest, thrown into peril. Nearby Homewood has issued a plea for help, a summonings which Kayra Weslin, knight errant, and her chronicler, Holly, answer. Along with Murik Alon Rin'kres, an Eslar sorcerer who harbors a secret purpose all his own, the four attempt to unravel the mystery of the missing rangers. They soon find tales of their disappearance frighteningly untrue.

The third way to gain attention is through customer reviews. This one is huge, and the one that in my mind will allow us as writers to break free of the traditional agent/publisher dependency. It's a stamp of approval, a guarantee of quality, a statement saying that your book is not crap. Customer reviews, to a point, validate a book's worth. In general, low reviews indicate a lack of quality. High reviews, the opposite. This is not to say that every review should be taken as gospel. But given enough reviews and a trend should emerge.

I often read of the struggle authors undergo in finding an agent or publisher. There's really no rhyme or reason to it: the decision-making is subjective, and how often have you come across a published novel that, to be frank, sucks? I've begun to doubt the vindication that supposedly comes with having your work blessed by a "real" publisher, and let's face it: business models change. We might be witnessing the beginning of the end for traditional publishers here. If not that, certainly a sea change in the way we purchase and read books.

The Hall of the Wood has been available as a free pdf download for a long time now. As Konrath points out, Amazon's web site gets a lot more traffic than his own. That volume has a lot of potential to increase sales. Selling on the Kindle store seems like a real no-brainer to me.