Finding and translating all subdirectories in an HTML file Thread poster: Paul Lambert
| Paul Lambert Sweden Local time: 00:45 Member (2006) Swedish to English + ...
I suspect this will be an obvious question to you younger tech-savvy types out there. Lately, I have been getting plenty of jobs in translating websites. Often, the client does not provide me with a nice set of folders containing all the html text, but rather just a link to the web page he wants translated. Now, if it is just the one page that needs translating, it is simple enough to go to "view page source" source and get the html. However, I have now been asked to translate all of a rat... See more I suspect this will be an obvious question to you younger tech-savvy types out there. Lately, I have been getting plenty of jobs in translating websites. Often, the client does not provide me with a nice set of folders containing all the html text, but rather just a link to the web page he wants translated. Now, if it is just the one page that needs translating, it is simple enough to go to "view page source" source and get the html. However, I have now been asked to translate all of a rather elaborate site containing many pages, including pages with links that map to pages with other links to pages with other links etc. I could use brute force and map out each page and gather the "view page source" for each page individually, but that would be painstaking and prone to me missing something. I must believe there is an easy way to go about it. So for instance if I went to a site called http://paulspage.com, I need to get all the page source for that page and all the subpages and the subpages of the subpages etc etc. Any ideas? ▲ Collapse | | | | Paul Lambert Sweden Local time: 00:45 Member (2006) Swedish to English + ... TOPIC STARTER
Thanks, Thomas. I will check it out right now. | | |
It's old software, but it still works, also on Windows 10. Be sure to confirm with the client exactly which files with how many words you intend to translate. If they use advanced techniques such as SQL, Expression Web may not find them all. | |
|
|
Paul Lambert Sweden Local time: 00:45 Member (2006) Swedish to English + ... TOPIC STARTER Worked like a charm | Sep 25, 2020 |
Thanks again. What great advice. This software is excellent. And yes, I will confirm on Monday that everything is included. This is an enormous task. No point missing anything. Have a great weekend. | | | Glad it worked | Sep 25, 2020 |
Thanks, you too. | | | Sheila Wilson Spain Local time: 23:45 Member (2007) English + ... My experience has been 100% negative | Sep 25, 2020 |
The first couple of times I tried to gather all the text to work on, the client complained that I'd missed some and I had to do a rush job -- unpaid -- to complete it to their satisfaction. So then I insisted that the client (a communications agency) select the text. They grumbled but came up with it. A while after delivery, they came back with a hyper-urgent request for more text to be worked on. This time they'd missed it, and this time they had to pay my rush rate! I've since always insisted ... See more The first couple of times I tried to gather all the text to work on, the client complained that I'd missed some and I had to do a rush job -- unpaid -- to complete it to their satisfaction. So then I insisted that the client (a communications agency) select the text. They grumbled but came up with it. A while after delivery, they came back with a hyper-urgent request for more text to be worked on. This time they'd missed it, and this time they had to pay my rush rate! I've since always insisted on receiving the text in Word or Excel files. ▲ Collapse | | | Samuel Murray Netherlands Local time: 00:45 Member (2006) English to Afrikaans + ... A web site ripper, I imagine | Sep 25, 2020 |
Paul Lambert wrote: I must believe there is an easy way to go about it. So for instance if I went to a site called http://paulspage.com, I need to get all the page source for that page and all the subpages and the subpages of the subpages etc etc. Yes, there are such utilities (web site rippers, strippers, or sometimes "offline browsers"), and 10-20 years ago when the web was younger, they were fairly reliable tools. However, web sites are no longer simple and web servers are no longer all the same, so many of these web site ripper programs no longer work as expected or promised. A well-known free one is HTtrack, but I've never had good results with it. I've had reasonable results with VWget for ripping large archives (say, 10 000 HTML files in nested subfolders), but it's not easy to use (I've had most success with the commandline version). See also my post here where I recommend Web Downloader 2.2, which you can still find on some download sites if you look really hard. I just tried it again, and it still works for simple sites. I've uploaded it here for 7 days.
[Edited at 2020-09-25 17:29 GMT] | |
|
|
Endre Both Germany Local time: 00:45 English to German Have the client send you the source files | Sep 25, 2020 |
Approaching it from the public (Internet) side of things as web rippers do is absolutely the wrong way to go. Your client has access to all the source files (unless they want to translate a third party's site without their knowledge), even if they may not be aware of this. So you need to get them to send you all source files. For static websites, this is a matter of copying all files from an FTP server. For dynamic websites, they have to export the strings from the dat... See more Approaching it from the public (Internet) side of things as web rippers do is absolutely the wrong way to go. Your client has access to all the source files (unless they want to translate a third party's site without their knowledge), even if they may not be aware of this. So you need to get them to send you all source files. For static websites, this is a matter of copying all files from an FTP server. For dynamic websites, they have to export the strings from the database that is used to dynamically generate the site. None of this is your business – you need to insist on being provided with all relevant files without ripping them from a website. As Sheila says, this also puts the onus on them to catch all content. When you have got all files, you need to check what types they are and how to best translate them. ▲ Collapse | | | Paul Lambert Sweden Local time: 00:45 Member (2006) Swedish to English + ... TOPIC STARTER Thanks. Forget the answer I just erased. | Sep 25, 2020 |
I just seemed like a jerk. I meant to say, thank you. So, yes, thank you. Indeed, I will try to get the HTML files in question from the client, and if that does not work, then as a second resort I will use what I got from the software discussed above. Take care!
[Edited at 2020-09-26 18:31 GMT] | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Finding and translating all subdirectories in an HTML file TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |