Thursday, July 09, 2009

Lost in translation

One of the great joys of the mammoth project has been going over the primary sources. Just a few years ago, I wouldn't have been able to research this topic without massive financial support. The seventeenth and eighteenth century sources are hard to find and the research would have involved traveling around to visit various rare book collections. Only a few have been reprinted in more recent times. Now, thanks to Project Gutenberg, Google Books, the Library of Congress, and others, I can look at digitalized versions of most of the sources I need by way of the internet. The digitalized versions preserve more than just the text, by presenting the appearance of the original--the fonts, the layout, and the illustrations--I can get a much better sense of how these ideas were communicated and experienced at the time. The only thing missing is the smell of old paper. The price I pay is in temptation. What started out as a popular history is slowly being transformed into the dissertation I never wrote. Why take anyone's word on anything when I can go hunting for the original source?

With the Teutobocus, the original sources are all in French, a language I do not read. Computers and the internet help here, but it's hard work. My process is something like this. When I find a French source, what I usually find are PDF or JPEG images of the pages. If I find them on Google Books or Internet Archive, I can use the Optical Character Recognition (OCR) scan that they provide to get a text that I can work on. If not, I have to download the PDF or JPEG images and run them through an OCR program on my computer to get a text file. Once I have a text file, I need to clean it up. OCR scans are always filled with mistakes and pre-nineteenth century printing is always messy and out of alignment. On top of that, OCR programs are completely stymied by older typographic conventions like ligatures and the long S. Of course, there are some images that the OCR programs can't read at all. In those cases I have to transcribe it by hand. Typing something in a language you don't understand amounts to pounding it out one letter at a time.

After creating an accurate text file, the translation begins. I need at least five browser tabs open to translate. I usually have two machine translation tools, a dictionary, a verb reference, and a search window in front of me. I copy a paragraph out of my text file and paste it into the first translation tool (Google Translate). The first translation is rarely usable. A little history is in order here.

The first Académie française dictionary was published in 1694, so when I work on any documents from prior to that date, I'm dealing with a melange of regionalisms, outdated traditional spelling, and personal preferences of the printers. My job might be easier if I had a translation tool that worked in Occitan (the southern french dialect), for comparison purposes, but I haven't been able to find a free one. The two dictionaries published in the eighteenth century made major spelling reformations. The dictionary of 1835 made a vowel change that affected the imperfect conjugation of every single verb in the language. So, my first machine translation only serves to identify the words I need to work on. I'm getting fairly adept at identifying the patterns of change and can correct a paragraph for a second pass through the translator in less than a minute.

If I can make out the sense of the author's meaning at this point, I paste the text into a new file and go on to the next paragraph. If I can't, I start using the other browswer windows to do some detective work. If the first translator produces English words, but nonsensical sentences, I try using the other translation program. I also go to the second translator for words that stump the first program. Sometimes breaking a sentence into phrases gives me a better result than attempting to translate full sentences. Splitting apart the contractions that appear in every sentence also helps.

If the second translator doesn't help, I move on to the dictionaries and verb references. I can usually recognize when a word is a verb and look for less common conjugations. I can also hunt for secondary meanings for words. If I can't find a meaning for a word, my final resort is to Google it and see what turns up. When looking at some letters of Nicolas-Claude Fabri de Peiresc, a savant who wrote in Aix-en-Provence in the first third of the seventeenth century, I found the solution to several problematic words in Catalan and Italian.

If none of that works, i call up someone who can read French and whine until they help me.

No comments: