Mark here, and I introduced myself in another place on the Dark Lantern Tales web site. Readers have asked me how these stories came to land in a Dark Lantern Tales book. Here, in a couple of parts, is the story in brief.
Detective mysteries in urban settings gradually became more popular in the late 1800s, and some of those are quite readable to an historical fiction reader now. I have been reading and studying these old stories since my ‘teens, and after reading quite a few hundred of these adventures, I have been selecting works for Dark Lantern Tales that in my opinion still hold up as a good story. These amount to a very tiny percentage of the many thousands of stories published in the 1870s to 1890s.
I have to say, too, that part of my motivation is that as a reader, I was really disappointed in some of the eBook versions of old stories I bought for myself. “Seven Stories for $1.00” should have been a tip-off to low quality, but I also kept running into badly transferred text in full price eBooks that appeared to otherwise be well produced. I like to read these on my phone when I’m stuck waiting somewhere, and I kept buying books that had not been decently cleaned up from OCR transfers. Some were altogether unreadable. So, I am motivated to publish well crafted and enjoyable editions that please a discerning reader of historical fiction (readers like you and I).
My sources include originals I have collected over the years. I also refer to scans I can locate online. The copy of a particular story that I have may well be missing chunks of text, copies I find online may have different text missing, and if it is a later printing, the plates may well have been worn to the point where whole lines are unreadable. So, referring to multiple copies is useful. Around 1980, a scholastic company made photocopies of quite a few dime library issues and distributed them to universities on microfilm. Those are some pretty rough copies but they were also the only alternative to working directly from the originals until recently. Now some universities have high quality scans. I try to gather two or three sources of the text, including my own copies, before setting out to create a Dark Lantern Tales edition.
Pulp paper mostly replaced better paper types for popular literature in the decades after the Civil War. Oddly, some earlier publications may exist in much better condition because the paper stock was better. Pulp is basically made by mixing wood chips and acid, and when new it can offer an inexpensive, somewhat white, paper stock. But the acid is still in the paper and continues to digest it forever unless chemically stopped. That chemical process to counteract the acid is difficult and uncertain, and none of my copies have been treated that way. Copies of original printings that were stored away from the air, such as in a trunk or bound into book covers, may be in quite good shape. Copies that sat for decades in an attic can be a completely different story. With such a small quantity of these original publications still around, the best copy that can be found is often brittle and dark brown from acid damage.
When I created a pilot project a few years ago to learn about publishing these editions, I found a later copy of the first story I used that was in very poor condition. It was a “Thick Book,” or what would be called a pocket book now. I deliberately chose that copy because I planned to take it apart and use a scanner. That approach just isn’t sustainable, so after the pilot I completely stopped using a scanner for text images.
The first OCR (Optical Character Recognition) software I used was Prizmo. OCR is necessary to convert the image of text into a document that can be edited. Prizmo had features I liked, such as showing the original image on the left and the image Prizmo had “read” on the right. That made for easier comparisons, but as you may be able to see in this picture, the amount of errors could be huge. On the right, the lower part of the page has not been cleaned up and the errors are visible. In defense of Prizmo, the originals were deeply browned and printed from worn plates, but for me, this OCR process was not much better than just transcribing the text by re-typing it.
I learned from creating the pilot, and my methods have improved considerably! Here is an image of a dime library page that I recently photographed to begin the process toward a new edition of Steam-Age Crime Stories
This original is in pretty fragile shape, and to photograph the pages, I used a jig that I made from acid-free mat board that allows a copy to be opened to about 95˚. The page I am not photographing is held gently in place with a couple of clips and some cardboard from a file folder. Great care must be taken because the spine is only a fold in the paper and it can easily split in this process. I use a good copy stand along with the jig, which helps me get an image with better focus and less distortion. But inevitably I will still have to deal with issues in the image.
The pilot was Joe Phenix, The Police Spy, and a new Dark Lantern Tales edition has replaced my first effort. After what I learned while making the pilot, I was recommended to Finereader ABBYY software for OCR work. And for me, it does convert images of text to files with fewer errors. The Finereader software also has provisions for editing and cleaning up the image in lots of ways to reduce the number of errors.
Here is Finereader with a page loaded and work begun to correct the image perspective:
Once the image has been straightened up with various processes in Finereader, I can chop off image areas that are unneeded, like the sides, top, and bottom, and also slice the columns into separate files:
Once I have those elements separated, I can further crop the images, as shown in the next picture. All this is to reduce the chance of Finereader attempting to read unwanted visual elements as text. Usually I also go through the images and remove spots from foxing or paper blemishes for the same reason. It saves time later when I am cleaning up text errors.
Finereader can export directly to MS Word, but I have found that the export typically includes lots of formatting that I don’t want. All can be corrected in Word, but the fewer corrections the better, so now I export from Finereader as a text file with almost no formatting. It looks like this:
I import all the text files into a Word docx project and call that the first revision once I have them in place. That yields an image like this with the basic formatting from the text file (a few errors were also corrected):
Centering, bold or Italic type, line spacing, and more has to be done next. I fix obvious spelling and other errors I see along the way, but I want to get the whole document to a better state before I seriously dig in to fix typos and errors from the OCR read. After going through the document, revision two looks like this:
The original printings were made with type set by hand, so spacing can translate oddly through OCR software. Further, lots of lower case “h” characters can be made into lower case “b” characters, etc. Even with the best of transfers, there is still quite a bit of cleanup to be done. I do most by hand, usually going through the book several times just to find most of the errors. I also do sweeps with spelling tools in Word to locate all kinds of things, including extra spaces.
With my best efforts, typos still make it through. Please tell me about any that you see with as much detail (even a screen shot) as you can. I appreciate it!
While working in Word, I usually go through five or six revisions, concentrating increasingly on editing the text more than hunting OCR errors. I keep images of the original pages handy for reference.
Each of the Dark Lantern Tales books represents up to a couple of hundred hours’ work. In the second part of this article, I’ll talk a little more about editing, images, and formatting for eBooks and Print-On-Demand.
Thanks for reading!