What is GutenMark?

GutenMark is a command-line tool for automatically creating high-quality HTML or LaTeX markup from Project Gutenberg etexts. As of April 2008, there is also a graphical front-end called GUItenMark that greatly simplifies usage for casual users. Both Windows and Linux ‘x86 are supported. Mac OS X is also supported, though in some respects it lags the others. Limited iPhone support is also possible.

In combination with other freely-available conversion tools GutenMark aims to convert Project Gutenberg etexts into publication-quality Postscript or PDF, for print-on-demand applications. The goal is for this conversion to be completely automatic, without manual markup or editing, but for the forseeable future some manual intervention will almost always be needed—at least, if your standards are at least as high as mine.

I took the Project Gutenberg plain text file of The Adventures of Sherlock Holmes and ran it through this.

Amazingly, this:

To Sherlock Holmes she is always THE woman.

was transformed to this:

To Sherlock Holmes she is always the woman.

As it should be!

I was impressed with the available options and did some light testing. It could be a very useful tool for Project Gutenberg etexts that have only a plain text version available.

On the other hand, I also downloaded the Project Gutenberg HTML of the same Holmes and it was superior.

But this tool remains a very painless way of changing those text files into a format that can then go on to further processing to create an eBook.

10 Comments on “Reference: GutenMark”

  1. bowerbird Says:


    gutenmark has a philosophy that’s pretty close to the one
    that i expressed over here:

    i’ve updated the examples from that entry:


    there’s a .pdf too:

    i’ve also created another .pdf, which shows
    my digital text on the left side of the page,
    and the scan on the right side of the page:

    it’s a hefty download, since it contains the
    scans for the entire book — 61 megs — but
    it shows you how well i cloned the p-book…


  2. bowerbird Says:

    good luck with that. sincerely… :+)

    my system is simple, so i can move faster than you:

    that’s using dirty o.c.r. text, so my next task is to
    splice in “clean” p.g e-text. (but you should know
    that if you use the p.g. e-text, it’s got errors in it.)

    how will you handle the authentication question,
    if someone accuses you of modifying the text?


  3. mikecane Says:

    Speed doesn’t equal quality.

  4. bowerbird Says:

    so, mike, did you have any feedback on my “jungle”?

    i’d love to hear constructive criticism about its “quality”.


  5. mikecane Says:

    No. I’m not looking at it at all. Have my own version I’m doing. In industry-standard ePub.

  6. bowerbird Says:

    i see. so that jab about “quality” was completely blind. ok.
    i figured as much…

    and your comment that you’d _pay_for_ a nicely-formatted
    copy of the book was the typical all-talk-no-action blather,
    since you’re not even interested in a copy that i give freely.


    for someone who rails so loudly against the publishing biz,
    it kind of surprises me that you’re such a… “cheerleader” for
    the “industry-standard” format. it’s as if you really believe
    the i.d.p.f. is acting in accordance with needs of the reader.

    do you also think the r.i.a.a. is on the side of the music fan?


  7. mikecane Says:

    I looked at your original. That was enough.

    Time for you to go bye-bye now.

  8. bowerbird Says:

    > I looked at your original.

    and gave no feedback on that either.
    i gave you a chance, mike cane, but
    you’re pretty much just a lame joke.


  9. mikecane Says:

    I don’t give a fuck what a crazy son of a bitch like you thinks, bowerbird. I see you’ve been banned from other places. Add this one to the growing list now too.

