How to translate PDF files

The PDF format is truly ingenious – documents in the PDF format will appear exactly as originally created and intended across all platforms, be it various versions of Windows, MacOS, Linus, Android, you name it.

Their only disadvantage is that they are not editable, or difficult to edit at best. This was probably the intention of Adobe, i.e. the manufacturer of this software, so that nobody could change the contents of such a document. They achieved this by effectively removing all information regarding the document’s structure. So you cannot load a PDF into your MS Word or SDL Trados. Sure, the computer must still know what to display and how to format it, but in case of PDF, it seems that such a document is optimized for viewing and printing, not for editing. And it takes some pretty clever algorithms to convert it back into an editable format. Even Adobe doesn’t seem be able to fully reconstruct the documents back into an editable form – parts of text might still remain ebmedded as images.

And as you can imagine, as soon as you try to actually translate such a document, a PDF format can be a major pain in the ass.

So let’s take a look at how this can be actually done.

How to translate a PDF document – step by step

First things first: You certainly don’t want to be translating it by opening the PDF, looking into it and then writing all the text into a Word document, manually formatting it and copying all the images… that would be very exhausting and would take an eternity to translate….. with you ending up working for a ridiculous rate per hour. So you need to automate this process a little bit.

Convert into an editable format

  • The first step is to actually convert it into an editable MS Word format (doc/docx). There are numerous programs and websites available, but I would recommend this one: https://pdf2doc.com/

It’s free and has pretty good results. In case you need to deal with some complicated document formatting, you can also try Adobe’s https://www.adobe.com/uk/acrobat/online/pdf-to-word.html.

This will allow you to convert a few documents free of charge, so it’s ok if you translate just a few PDFs here and there. Unfortunately, PDF to Word conversion is highly in demand, so paid services are not too cheap.

  • Open the https://pdf2doc.com/ website and upload your PDF document.
  • Your document will be converted. Click the button to download.

Translate in a CAT tool

6. Depending on how difficult the PDF’s formatting is, you might need to check if all text on your PDF has actually been translated. Open the original PDF on one side of the screen, the translated MS Word on the other and carefully compare them if everything has been captured by the converter and then translated by you.

7. And that’s it! Congratulations, you have just translated your very first PDF document!

Please watch this video to see the whole process in action:

CAT Tools Crash Course – Part 1: What is a CAT tool and how it can be useful to you

Before I explain the basic idea behind all CAT tools, let me consider the naive approach to translating documents.

The naive translator

You could, of course, open your source document in your source language on one side of your screen, then open a blank document on the other side. And you could start translating right away, reading the source on the left, writing your translations on the right.

But soon, you discover a few problems with this…

1) It is exhausting. You are switching between your source and target document, steadily looking with your eyes where you left the source before you went to the target to write your translations. This is also very much prone to leaving parts of the text untranslated because you simply lose yourself in the text.

2) You need to graphically format the target to make it look more or less similar to the source. While this might not be too much of an issue with simply formatted documents, it can soon turn into a nightmare when you translate a document full of images, tables, and difficult text formatting.

3) You find yourself translating identical or similar texts all over again and again. So when you translate the “Keep this manual for future reference.” phrase for a gazillion time already, you start asking yourself if this could perhaps be optimized in some way.

4) It’s exhausting too keep consistence in terminology. There might be so many terms in your document that you soon lose track of how you have actually translated that “safety interlock” on page 45 or “brake pads” on page 67.

The professional translator

And this is where the CAT tools come into the game. They can take care of all the problems mentioned above and they can even provide some more benefits on top of that.

The fundamental ideas behind CAT tools

A CAT tool is basically a simple database combined with a simple text editor. Consider the image below.

So what you can actually do with a CAT tool is to take your source document, upload it to your CAT, convert it to a translatable format… and simply start translating.

The advantages of a CAT tool

Convenient arrangement of text

You will immediately recognize this very useful feature: Your text is nicely separated into individual segments (they look like cells and usually contain 1 whole sentence each… but it can also be just a meaningful block of text, not a sentence per se).

You can see that your source is on the left and you will write your translations on the right. This is the vertical arrangement. It’s perhaps the most common one, but some tools (e.g. Transit NXT) can also be arranged horizontally.

Notice that your active segment, i.e. the segment you are currently translating, is highlighted by color.

This is of a great advantage, as you don’t have to look with your eyes where you left your source text – it’s immediately apparent.

Translation Memories – they remember everything you have ever translated!

And now comes an even greater advantage…. the database feature!

Remember that “Keep this manual for future reference.” sentence you have already translated a thousand times before, from the introduction of this text?

Now get this: a CAT tool will store this sentence along with your translation into its database! In fact, it will store ALL the sentences you translate over your whole professional life! And you know what’s even better? Yes, you’ve guessed it right! It will remember them and offer you those stored translations from its database every time an identical or similar sentence comes up in your text you are currently working on.

This means you translate the “Keep this manual for future reference.” sentence just ONCE and FOR ALL. You will never have to translate it again, the software will insert the translation for you every time such sentence appears in your text again. Pretty cool, huh?

Such database of translations is often referred to as a “translation memory” or TM. You might also hear the term TMX (Translation Memory eXchange) – this is a translation memory XML export file that you can conveniently import into your own translation memory. This means you can take advantage of other people’s translation memories too.

Text formatting, tables and images

This is yet another great feature of all CAT tools: you don’t need to worry about text formatting, tables or even images! CAT will take care of it for you. You just concentrate on translating the text, and the CAT will format it into different fonts, neat tables, and will even place all the images into their appropriate locations. So when you finalize your document, i.e. export it into its original format, it will look nearly identical to the source, except it will be in a different language.

Terminology

What was that term I used to translate the “heat pipe” with? And what did I use for “safety interlock”? Damn! I can’t remember anymore… will have to go back in the text and look it up in what I have already translated. I am losing so much time!

Well, not anymore. Not with CAT tools. CAT tools can do this for you. You can just define a list of terms you would like to use during the translation process and your CAT tool (I mean a good tool) will offer a corresponding translation every time such term appears in your source text. Most tools will even highlight such term in the text! And you know what the best part is? You can even add such terms dynamically as you progress through you text. Your text just crawls with “safety joints” and “safety latches”? Just add them to your term list so that you can always translate them in the same way and so keep the consistency of your translations.

The lifecycle of a translated document

To finalize this very first part of this mini-series, let me explain how a source document (e.g. in MS Word) can be loaded into a CAT tool and then returned to its original form, except in another language.

1) You start by creating a project in your CAT tool and loading all your documents you would like to translate into it. You can also attach translation memory and a terminology list.

2) You translate your document

3) Once you are finished translating your document, you export it back to its original format by clicking the Export/Finalize option. And that’s it, that’s the lifecycle of a translated document.

The final file exported from SDL Trados Studio – it looks almost identical to the source, except it’s in a different language (Czech)

Please watch this video summarizing this text:

How to use machine translation in Across (GoogleTranslate, DeepL…)

This is a short video on how you can configure the Across CAT tool (made by Across GmbH, Germany) to machine translate your files.

There are two options:

  1. use the MT in Across itself (leaves the machine-translation marks), or
  2. use Bohemicus to machine translate and leave NO marks at all.

The Ctrl+Alt+Space shortcut and why you should definetely use it

The raison d’etre

The Ctrl+Alt+Space shortcut was originally conceived to deal with the formatting tags in SDL Studio, memoQ, Wordfast, XTM etc. That’s because Bohemicus cannot, at least for now, work with the formating tags and it will remove them when using the machine translation feature.

The basic idea

The basic idea of this shortcut: select the portion of text between these formating tags and press Ctrl+Alt+Space to translate such text. In this way, the formating tags will be preserved.

Online CAT tools

You can also take advantage of this keyboard shortcut when working with online CAT tools (SmartCat, Wordfast Anywhere, etc.) that are not 100% compatible with Bohemicus.

You can use this function in this way:

Copy all text from the source segment into the target segment (usually the Ctrl/Alt+Insert keyboard shortcut, but can be different – please check your CAT settings), select all text in the target segment (usually Ctrl+A), press Ctrl+Alt+Space. The selected portion of text (i.e. all text, because you have selected all text) will be translated.

You might also want to watch this video:

How to remove the machine-translation marks in every CAT tool

What are they?

As soon as you try to machine translate just about anything in any major CAT tool, such as Trados Studio, Across, WordFast…., you will notice that it will actually always leave a trace, a mark. You have those “AT” marks in Trados, “MT” in WordFast, a special icon along with a note in Across, etc. The only CAT tool that does not apparently leave any marks seems to be memoQ. Or at least I cannot see any.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Sometimes, such behavior might be undesirable.

How to get rid of machine translation marks

You can get rid of those marks in e.g. SDL Trados Studio by pre-translating the project again. What it means? After you finish your translation in this CAT, you just right click your file, choose Batch Tasks, then Pre-translate Files and then you choose the “Always overwrite existing translation”. This will translate your document and every segment will be replaced by its corresponding match in the translation memory. All “AT” marks will be overwritten by the “CM” mark (Context Match).

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Not an ideal solution

This might be OK, but there can be some risks to it:

1) You are stressed, under time pressure, and you just forget to do it

2) You might complicate things greatly for your proofreader, becase he/she won’t be able to tell which segments to proofread. There will be no new segments, everything will be marked “CM” or “100%”.

I believe this tactics can be probably done in Across, WordFast and other tools as well, but cannot be sure, as I have not tried it. Also, I have tried the machine translation feature in Across, it its free translator edition, and it worked. But I believe, your project manager can disable the machine translation feature for you in Across project settings.

The ultimate solution

Yes, this is one of the advantages of this tool of mine. It does not leave any trace at all. Everything appears as manually translated. You can just press Ctrl+(Alt)+Space and have your segment machine translated without leaving any trace or mark.

What is means: You continue working in your CAT tool as usual (Trados, Across, WordFast…), but instead of taking advantage of your CAT’s machine translation feature, you connect your CAT to Bohemicus and press Ctrl+(Alt)+Space. Your segment will then be machine translated without leaving any trace or mark.

No alt text provided for this image

Recommendation

Of course, this might border on some work ethics. If your agency or customer expressly forbids you from using machine translation, because you are working on some classified document… then you should by all means respect their wish and also your professional ethics. I do not recommend online machine translation services in such cases.

A clipboard manager. Why is it useful?

Definition of a clipboard manager

For those who don’t know what a clipboard manager is. Imagine the Windows clipboard on steroids. That’s exactly what a clipboard manager is.

An example: You are working on a document and you have these 5 strings, perhaps company names or spare part designations, that repeat all over the document. Sure, you could copy each of them into your clipboard (Ctrl+C) and then just paste them whereever you need (Ctrl+V). But the problem is that you can only store 1 such string. Also, whenever you copy something new into your clipboard, the previous string will be overwritten.

This is not very practical. Wouldn’t it be better if you just had, say, 10 memory banks, where you could store your strings, and leave the Windows clipboard free for your usual business? And then just press a dedicated hotkey or hotkeys to insert those strings stored in your memory banks into your text?

And that’s exactly the idea of a clipboard manager

The Story

Of course, during my translation work, I soon realized how useful a clipboard manager could be. So I started looking for a software solution to this problem. But at that time, there was not much I could use – all of the available clipboard managers were very clumsy to use or did occasionally not work, as they collided with other software programs running on my computer. The situation might be better now, but I am just a C# enthusiast, I enjoy programming… and most importantly, I very much enjoy using my own software written to exactly match my needs. It brings me a great deal of inner satisfaction. Also, I don’t want to have a gazillion of small utility programs running in my system – I just want one piece software that would address most of my translation needs.

So, I said to me: Hey, how difficult could it actually be to create a clipboard manager myself? These were my requirements: 1) I want to see what’s stored in my memory banks all the time and 2) I want to be able to very easily store and re-insert my strings by pressing a hotkey

The Solution

I already knew how to work with the Clipboard class and hotkeys in C# so this was just a real piece of cake. My biggest problem? How to create those semi-transparent boxes on the screen to indicate what’s stored where and how to make them appear and disappear every time I press the Ctrl key. I did not want them to be on the screen all the time, because that would hinder the view on my document I was currently translating. So only show the clipboard manager hints when I press the Ctrl key. When I release the key, make them disappear again.

It took me a couple of hours to tune this up so that these hints would not collide with other windows on the screen and would not steal the keyboard focus from my main application (usually a CAT tool).

You can see those hints on this picture:

No alt text provided for this image

How it works 

And this is how it works: I can select a portion of text in my CAT tool that I would like to store in one of those memory banks, and I just press Ctrl+Alt+Shift+1…9,0 (so all the way from the key 1 up to 0 on my alphanumeric keyboard) to store it into any of my 10 memory banks. If I wish to re-insert one of those stored strings into my text in my CAT tool, I simply press Ctrl+1…9,0 to re-insert it.

I have tuned it up really nicely, no collisions with other software at all, and it’s nicely animated to gradually appear and disappear every time I press and release the Ctrl key. And the best thing is: it preserves the content of my Windows clipboard so it is free for my usual work.

You can see it in action here:

A clipboard manager in action

Working with local offline dictionaries

While working with Bohemicus, you can plug-in 2 local offline dictionaries. This can be very useful, because you can look up your terms just by pressing a hotkey combination (Ctrl+Alt+K), without ever leaving your application window. You stay in your current application (e.g. a CAT tool), while Bohemicus displays all results in a separate window. Much more convenient than just copying your term to clipboard, switching windows and pasting it to your dictionary, right?

How to look up a term in an offline dictionary – example

Example: Let’s say you want to look up the term “cord” in your offline dictionaries. So, while in your CAT, or in actually just about any other application, just press Ctrl+Alt+ K. Your term will be instantly looked up in your dictionaries as well as in Bohemicus’ translation memories and in your predefined online dictionary:

Look up your term by selecting it in your CAT tool and pressing Ctrl+Alt+K
You can use this feature in ANY application. e.g. in a web browser

These can be your preffered offline applications, it can be a bilingual dictionary or actually any database application where you would like to look up your terms.

The only condition is that your dictionary software must be able to receive the paste-in command, that means when you press the Ctrl+V keyboard shortcut. Ideally, such application should display all results immediately upon pasting the searched term. i.e. without pressing the Enter key.

How to plug-in offline dictionaries

Plugging-in such offline application is easy. Just open your dictionary application, click on its title bar to bring it to the forefront, and press Alt+Shift+F1 or, respectively, Alt+Shift+F2 – that depends on whether you want to have your application connected as dictionary #1 or as dictionary #2. You can see your connected dictionary application on the Language&Settings tab.

Your offline dictionary is now successfully connected to Bohemicus

And that’s it! You can now just press the Ctrl+Alt+K hotkey and your term will be instantly searched for in your dictionaries. You don’t need to leave your application window at all.

Ideally, you would use this feature on a large screen or, even better, on 2 displays so that the Bohemicus’ window and also your offline dictionaries always stay unobstructed, clearly visible on the screen. In such a case, you don’t need to leave your application window to search your terms; just move your eyes and review the results displayed in Bohemicus and in your offline dictionaries.

Yet another small thing to make you more productive!

You might also want to watch this video:

Working with offline local dictionaries in Bohemicus

Taking notes in Bohemicus

Why do translators and writers take notes

Note taking is an important task for all translators or writers. You might stumble upon a term that you don’t really know how to translate right now. You might decide to change certain terms in your document later in your workflow. Or you might have any other valid reason to make a note of something. And that’s when Bohemicus’ note taking feature comes in.

Available software solutions

Of course, there are a great many of options of how to make notes in Windows. Ranging from the ultrasimple notepad, through sticky notes to sophisticated applications such as Evernote. But to say the truth, I have always found the other solutions kind of cumbersome. I wanted someting that would be directly connected to the CAT tool I am currently using, with all the notes being made in my own note-taking app by just pressing a hotkey. And that’s exactly what I built.

Sticky notes in Windows

Taking notes in Bohemicus

To make a note of something in Bohemicus, you can just select the portion of text you would like to save as a note from your current app (e.g. a CAT tool) and press Control+Alt+Shift+N. Your text will be transferred to the Bohemicus’ note taking tab. Yes, this is a bit clumsy keyboard shortcut, but it is designed in such a way so it does not not collide with other applications, notably Evernote for instance. And if you don’t like this shortcut, you can always change it on the Hotkeys tab.

Taking notes from Across to Bohemicus

Making adjustments and saving notes

On your note tab, you can make adjustments as needed – add some text, make it bold or underlined or change the background color of your notes. Once you are done, just klick the Save button or press Ctrl+S to save your notes. Or just leave saving to Bohemicus – your notes are automatically saved every minute.

Available note tabs

You have 5 tabs available for your notes. By double clicking a tab, you can
also change its name, which might be useful if you wish to further differentiate between your tabs.

Rename a note tab…

You might also want to watch this video:

Taking notes in Bohemicus

Working with translation memories in Bohemicus

How to work with translation memories in Bohemicus

Use your own translation memories… anywhere

While working with Bohemicus, you can use your own TMs to search terms in the concordance mode. This is especially useful when you work in CAT tools where you cannot officially use your own translation memories… which applies for instance for online tools such as XTM or Coach. And even some offline tools do not allow you to use your own TM, e.g. the free edition of Across…. which might be pretty annoying.

Look up terms in both languages

In this example, we are working in the Across CAT tool. As you might already know, you can search in Bohemicus’ TM by selecting your term in your CAT tool and pressing the Ctrl+Alt+K keyboard shortcut. So, let’s select a term and let’s search for it in our translation memory. And as you can see, Bohemicus will display all hits from our 2 translation memories:

Searching in translation memories….

Notice that your term will be looked up in both languages. That means, if we have an English-Czech memory opened here, you can look up terms in English as well as in Czech. So let’s try to look up something in Czech….

Searching in translation memories….

Activate/deactivate a translation memory and set up the number of displayed hits

You can also activate or deactivate a translation memory or specify the number of displayed hits by clicking the corresponding numeric boxes.

Activate/deactivate translation memories and set the number of hits in Bohemicus

Connect a translation memory to Bohemicus

If you want to connect your TM to Bohemicus and use it while working with your CAT tool, just go to the Bohemicus Concordance tab where you can set up your own TM.

The Bohemicus Concordance tab

You have 3 options: either you can directly import an SDL Studio memory, you can import a TMX file to your opened memory or you can create a fresh new memory and import a TMX file into it. Also, you can use 2 TMs at the same time.

Set of buttons to work with translation memories in Bohemicus

Now, let’s try to open an SDL Studio memory. So, you can just click the corresponding button and search for your memory. Once you open it, it will be opened in Bohemicus.

Important: when you open an SDL memory in Bohemicus, a local copy will be created. This is due to technical reasons and it means that if you make any changes to your SDL memory outside of Bohemicus, such changes will not be reflected in the memory opened in Bohemicus.

Create a new translation memory

You can create a fresh new memory by blicking the corresponding button.

Create a new translation memory in Bohemicus

Import a TMX file

 Or you can also import a TMX file into your existing memory. Just klick a button, select your TMX file and import it. Of couse, export is also possible – just click another button and export your memory to a TMX file so that you can import it into another CAT tool.

Import a TMX file…

One advantage Bohemicus offers is that by importing multiple TMX files, you can mix translation memories containing any languages. So for instance, you can mix British with American English. Because frankly, when translating into target Czech, you might not really care if the source is in British English or in its American variant. The SDL Studio’s feature deliberately blocking you from mixing such memories can be perhaps perceived as an unnecessary nuisance. But of course, it is your responsibility not to mix 2 completely different languages, e.g English with Hungarian…. or rather, to only mix them if you exactly know what you are doing.

Watch a Youtube video

Also, you might want to watch this video on Youtube:

Working with translation memories in Bohemicus