Super-easy way to convert Microsoft Term Collections for indexing
Thread poster: Claudio Porcellana (X)
Claudio Porcellana (X)
Claudio Porcellana (X)  Identity Verified
Italy
Local time: 10:49
English to Italian
Apr 28, 2022

At this address for Italy (for the other languages I think there is an automatic redirect): Microsoft Term Collections

You can find the latest Microsoft terms with explanations, but they are in the TBX format, and are one gazillion of pages each!

I first tried to index them with Archivarius (the easier way, that I use almost always), but the files were hard or impossible to read
... See more
At this address for Italy (for the other languages I think there is an automatic redirect): Microsoft Term Collections

You can find the latest Microsoft terms with explanations, but they are in the TBX format, and are one gazillion of pages each!

I first tried to index them with Archivarius (the easier way, that I use almost always), but the files were hard or impossible to read
Then I tried to import them in memoQ (both as a TM or a glossary), but it failed
I tried Heartsome TMX and other tools I have, but all them failed

I didn't want to install another software, so I did a search and found that a TBX can be opened even by a TXT viewer, so I opened it with MS Word (2016 ed), I converted the TBX file in TXT and indexed with Archivarius

P.S: Note that they are apparently monolingual, but they are actually bilingual with the English always present

So e.g. the DE file is actually German and English, etc. and if you translate e.g. German to Italian the Rosetta Stone will be the English term that lets you find the translation in your language, with the further advantage to avoid a cumbersome aligning task

Take care
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
English interface Apr 29, 2022

The English interface is here.

I read about your experiences with memoQ and Heartsome, so I tried to import an Ms TBX file with CafeTran Espresso.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:49
Member (2006)
English to Afrikaans
+ ...
@Claudio Apr 29, 2022

Claudio Porcellana wrote:
I didn't want to install another software...

But if you did, the Goldpan viewer works well:
https://logrusglobal.com/goldpan.html

WFP3 took 25 minutes to import the Italian file (about 35 000 terms), and 5 seconds to export to TXT.

OmegaT imported the Italian file within seconds, and I was able to perform a regex search for "." in the glossary, select all terms from the output window, and paste it into a TXT file (although that gets you only the source and target text, without the definitions).

I see a Trados screenshot on your web site. If you have Trados, you can get the free glossary converter:
https://appstore.rws.com/language/app/glossary-converter/195/
...which converts the Italian TBX file to XLSX in about 2 minutes, including all columns.

I did a search and found that a TBX can be opened even by a TXT viewer, so I opened it with MS Word (2016 ed), I converted the TBX file in TXT...

You may have misunderstood what you read. TBX can be opened in a TXT viewer, but MS Word is not a TXT viewer. What MS Word does with a TBX file is to interpret it as XML, so it removes all the tags from view, leaving you with just the untagged text (which just happens to be useful to you as well). To see what TBX looks like in a TXT viewer, open Notepad, and then drag and drop the TBX file into it (this won't be useful to you except for interest sake).



[Edited at 2022-04-29 08:57 GMT]

[Edited at 2022-04-29 09:07 GMT]


Jorge Payan
 
Claudio Porcellana (X)
Claudio Porcellana (X)  Identity Verified
Italy
Local time: 10:49
English to Italian
TOPIC STARTER
Super-easy way to convert Microsoft Term Collections for indexing Apr 29, 2022

But if you did, the Goldpan viewer works well
CafeTran Espresso

I know, I know, but I didn't want another software: why having one more mess in my PC if a SW I have does the job?

and, above all, I don't want any software from "that" area in my system

About the rest I don't have Trados (and don't want it anymore)

And in my title I mentioned indexing by purpose, where a clean
... See more
But if you did, the Goldpan viewer works well
CafeTran Espresso

I know, I know, but I didn't want another software: why having one more mess in my PC if a SW I have does the job?

and, above all, I don't want any software from "that" area in my system

About the rest I don't have Trados (and don't want it anymore)

And in my title I mentioned indexing by purpose, where a clean file is what I need to query through Archivarius

cheers
Collapse


 
Claudio Porcellana (X)
Claudio Porcellana (X)  Identity Verified
Italy
Local time: 10:49
English to Italian
TOPIC STARTER
Super-easy way to convert Microsoft Term Collections for indexing Apr 29, 2022

Samuel Murray wrote:
WFP3 took 25 minutes to import the Italian file (about 35 000 terms), and 5 seconds to export to TXT.


Well, MS Word took 5 minutes to import the English & Italian file and 5 seconds to export to TXT...
a bit faster IMHO


 
Recep Kurt
Recep Kurt  Identity Verified
Türkiye
Local time: 11:49
Member (2011)
English to Turkish
+ ...
Xbench works beautifully Apr 30, 2022

You can load the Microsoft glossaries in Xbench, then export tem in tab delimited text file, TMX, XLSX or DOCX formats. Takes a few seconds.

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:49
Member (2006)
English to Afrikaans
+ ...
Re: Archivarius May 1, 2022

Claudio Porcellana wrote:
I mentioned indexing by purpose, where a clean file is what I need to query through Archivarius.

I just downloaded and tested Archivarius. Search results in Archivarius remove line breaks and show text from multiple lines, so the way MS Word converts the TBX files would indeed be useful in Archivarius. I use a different indexer (Wilbur) which has the option to show only the line where the match occurs, so for me it would be necessary to create a file in which the entire term entry is on a single line.

One thing that both Wilbur and Archivarius does, which annoys me, is that it shows a single file as a single result only, so if the file is very large, the preview shows only one or two matches from the entire file. For this reason, what I do is I split my glossary files into smaller files containing only about 1 screen's worth of text each, so that I can more easily get view multiple search results from a single glossary simultaneously.


 
Claudio Porcellana (X)
Claudio Porcellana (X)  Identity Verified
Italy
Local time: 10:49
English to Italian
TOPIC STARTER
Super-easy way to convert Microsoft Term Collections for indexing May 1, 2022

Samuel Murray wrote:
I use a different indexer (Wilbur)


Interesting!
Never knew it existed, I'll check it


I tried almost all indexers I found (with a demo), but no one beated Archivarius in speed and reliability, apart Logiterm maybe, but all them lack some important feature

For example, there is no automatic folder naming, and no fast option to set more than one "Multiple selection of indexes for searching"

Moreover Archivarius is a (apparently) dead project so there is a chance it will cease working, a day or another


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:49
Member (2006)
English to Afrikaans
+ ...
Wilbur May 1, 2022

Claudio Porcellana wrote:
Interesting!
Never knew it existed, I'll check it...

https://redtree.com/wilbur/index.html
Wilbur does have a file size limit of around 1 GB (i.e. it can't have more than about 1 GB of files in a single index). I typically create separate indexes for separate types of content. An index is stored as a .WIL file, and you can create a shortcut to the .WIL file, so my desktop has about ten such shortcuts on it for the various searches. I tried Wilma but was not happy with it for some or other reason.

[Edited at 2022-05-01 14:32 GMT]


 
Tiberiu Leon
Tiberiu Leon
Local time: 11:49
English to Romanian
+ ...
Good old Excel Jun 8, 2023

TBX files are plain XML files.
You can open .tbx files as XML files in Excel and then save to whatever format you may need, which in turn can be indexed even with Windows in-build search tools.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Super-easy way to convert Microsoft Term Collections for indexing






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »