It occurs to me that I’ve never mentioned this here, but I make public domain ebooks. I take public domain texts, mark them up using modern, semantic technologies, package them as EPUB ebooks, then release them back into the public domain.
Making ebooks is something I do as a hobby. I am a vocal supporter of open culture and very pleased that there are so many high-quality projects like the Gutenberg Project, Gutenberg Canada, and the Internet Archive. These are excellent resources, packed with millions of books. Unfortunately, those books are usually either images of scans, or optical character recognition (OCR) transcriptions of scans.
Why is this a problem? Well, scans of the original books are wonderful… for ably sighted readers. For readers with vision challenges or reading disabilities, not so much. Even for sighted readers, scanned images can’t be searched, and copying blocks of text for quotation or research is impossible. Plus, if the images are at a decent resolution they take up a whackload of space.
That’s why old books are usually scanned then OCR transcribed into text. Unfortunately, OCR is not a perfect technology, and although most ebook archives use human editors to try to clean up the mistakes, they don’t catch them all. I can’t even begin to estimate how many free e-texts I’ve read that have silly OCR errors in them like mistaking “the” for “tire”. And of course, scanned-and-transcribed texts almost never include things like figures, tables, or special formatting relevant to the text.
But that’s not the real problem with most free ebooks. The real problem is that they are almost never marked up semantically. For example, in (X)HTML, <i> means italics, but <em> means emphasis, which is usually rendered as italics. The intention is that you use <i> to mark up italics that are merely presentational (for example, ship names, words in other languages, etc.), and <em> to mark up things that are to be emphasized (whether that emphasis is displayed via italicization or not). To an ably sighted reader, the following four sentences will be identical:
- The <i>mayor</i> is <i>here</i>!
- The <em>mayor</em> is <em>here</em>!
- The <em>mayor</em> is <i>here</i>!
- The <i>mayor</i> is <em>here</em>!
All four will likely rendered visually as, “The mayor is here!” However, they have very different meanings, and they will be rendered very differently by other mechanisms. For example, a screen reader will provide verbal stress on emphasized words, but no stress on words that are merely italicized. It will read the four sentences as:
- The mayor is here!
- The MAYOR is HERE!
- The MAYOR is here!
- The mayor is HERE!
Any one of those readings might be the correct one, depending on the context. If the text is just marked up with <i> for all italics, whether for emphasis or presentation, that information is simply not available to readers who don’t rely on the visual aspects of the text for meaning. Unfortunately, that is how the vast majority of free etexts are marked up.
And that’s just the tip of the iceberg. I have never seen a non-trivial free etext marked up with the correct semantics. Most of the sins I’ve seen stem from considering only the visual appearance of the text – for example, tables are often presented using spaces and newlines to lay out the data in a vaguely columnar format (which breaks when the font changes, of course). But more generally, the text is simply not marked up with due consideration for semantics, which makes them problematic for people with accessibility issues.
So what I do is seek out public domain texts, and mark them up semantically, using modern XHTML5 with EPUB semantic extensions. I pay particular attention to EPUB 3 accessibility guidelines and I test with various tools (like Fangs) and different EPUB readers to make sure everything displays properly – whether “displays” means visually or not.
Then I style the text, trying as much as possible to match the appearance of the original book (if I have access to an original book or scans of one). I reproduce any figures and images I practically can (preferably using accessible SVG), then package it all up in an EPUB. I add a colophon and make a cover (which is plain and ugly, at least for now), then I release it all back into the public domain.
At the moment, I only have 4 books partially released. They’re all philosophy texts for now, but I expect to add some science fiction and mystery texts, and maybe some other classic texts that I find interesting (it is just a hobby, after all).
Perhaps in the future I might run a Kickstarter or Indiegogo drive to raise cash to buy as many e-reader systems as possible, so that I can properly test the books I make on a wide range of systems. Or maybe even Patreon – after all, I am making books with some original content (though the core text is public domain). Certainly I’ll be adding many, many more books. You can even make suggestions for future books.
Because the books I release are released back into the public domain, feel free to take them and use them as you please.