Inside the Ebook file
In the previous section we looked at the principles and some useful tools for examining Ebooks. Today we will take a detailed overview of what is inside the Ebook file.
Apart from the book text, the files stored inside the Ebook Container will also include some key administrative files that help the E-reader navigate through the files and also give it the information about the book:
- a Metadata file
This holds the information about the book title, the author, the publisher, the language of the book, ISBN, description of the book, keywords etc.
- a Contents file
This holds a list of all the files contained in the package/container
It also contains the reading order for the text files tells the E-reader where to place images;
- and an NCX file
This is the Navigation Control file for XML, which holds the navigational Table of Contents for the book. This contents file is different from the Internal Table of Contents file that is displayed inside your book and we will go into that later when we talk about layout.
Top level folders inside a .epub file
If you crack open an EPUB file, using one of the tools we covered earlier, and look at the files you will find two main top-folders or directories, META-INF and OEBPS:
Top level Files inside an Epub
The META-INF directory holds the Meta Information about the Ebook package. This is produced automatically when you create the book file and you are not generally going to change anything inside it.
OEBPS stands for Open Ebook Publication Structure And this is where all your book text is stored. If you are going to make any changes to the book at the HTML level this is where you will go.
The Minetype file tells the processor in the Reader about the book structure. You will not need to do anything with this file.
Second Level files inside a .epub file
The content of these folders may vary a bit depending on your conversion software but essentially they will have a list that is like the one shown in the snapshot below:
The key file in the Meta_inf folder is container.xml. This just holds a pointer to tell the reader where to find the content.opf file, which in this example is inside the OEBPS folder. It can also hold other files also, including the encryption.xml file for DRM, and a file for embedded fonts depending on the way the book has been prepared.
Inside the OEBPS folder
All your book content will be in the OEBPS folder.
- CSS: All your text Styles are stored here.
If you needed to make a style change or to correct style problems this is where you go.
- content.opf (Open Package Format): Describes the contents of the Epub
Metadata – metadata for the Epub
Manifest – list of files used
Spine – Order of appearance for the parts of the book
Guide – Role that each XHTML file plays
- .xhtml: There will be a long list of files with the .XHTML extension. These are your book’s text and you would normally have one file for every chapter ‑ or wherever you have inserted a page break.
If you need to change something in the text of the book after you have converted it this is where you would look.
- toc.ncx (Navigational Control for XML): where the Table of Contents style exports to. This is the built-in navigational control for the Epub
- Although not shown in this example structure, if you have images in you book they will all be stored inside another folder titled image.
Why do I need to know about the structure inside the Ebook file?
Even if you never plan to look inside your Ebook file, you now know that it is structured like a series of web pages. When you format the file in your word processor you need to keep in mind that as web pages there is a severe limit on the formatting that can be imposed on the text.
You need to create a “style” for each type of formatting, or use the inbuilt defined styles inside your word processor, or the formatting may not appear correctly when your list of styles is converted to a CSS (Cascading Style Sheet).
You can’t use fancy fonts: If the font is not in the E-reader that is used to display the book then whatever text used the fancy font will be displayed using a default font from the Reader and will not appear correctly. In reality you can “Embed” a “fancy” font so that it will display correctly but this is a separate issue with other problems. We will cover this later.
You need to define where your page breaks will appear. If you don’t do this then the file will appear as one long document with no breaks because the Reader will not know where to start displaying a new page.
You need to define a basic Table of Contents so that the NCX (Navigational Control for Xml) file can be generated during conversion. If this file is not present then your customer will have a problem navigation through the book. Once you have defined the style for your chapter headers this is generally not to difficult to do with most word processors. We will go into the details later.
Converting from word-processor to Ebook
Most of us will want to compose our text on a standard word processor and then make a simple conversion from that format to the Ebook format that we want to give our customers.
All converters work on the same basic principle: they take your text and translate it into HTML and create a CSS file to describe how the text should look, based on how you have “tagged” the text in your word processor. This kind of automatic HTML code generation can produce uncertain results and most professional Ebook developers end up going inside the container file to clean up problems with the HTML.
I think I am correct in believing that most people do NOT want to get into the HTML editing business and would prefer to make a couple of clicks with a mouse and produce the converted file. The Good News is that this CAN be done ‑ but the quality of the results can range from Good to Awful.
The Bad(ish) News is that if you want to get a ‘Good’ result you have to spend time learning more about your word-processor software so that you can get the formatting right. If you spend the time on it you can learn how to do it properly and save yourself a lot of agony trying to fix files that are rejected by your Ebook retailer.
Generally speaking, if you make some relatively easy change in the way that you format your files you can make your Ebook production painless.
In the next Part we will talk about the format for you Ebook file, what to do with tables and things that will affect your pricing (and your profit). Then we will have finished all the background and be ready to get to the detail of the common problems in Ebooks and how to avoid and fix them.
Glossary, for the technically minded
HTML means ‘HyperText Markup Language’. Web pages must conform to the rules of HTML in order to be displayed correctly in a Web browser. The HTML syntax is based on a list of tags that describe the page’s format and what is displayed on the Web page.
XML stands for ‘Extensible Markup Language’ and is used to define documents with a standard format
XHTML is short for ‘Extensible Hypertext Markup Language’. It is a hybrid between HTML and XML specifically designed for Net device displays. Because XHTML is ‘extensible’, Web developers can create their own objects and tags for each Web page they build. However, XHTML has a stricter syntax than standard HTML pages and is less tolerant of things like missing quotes or incorrect capitalization. The advantage is that, although it is more meticulous to write, it ensures that the Web pages will appear more uniform across different browser platforms – i.e. E-readers.