- May
- 239
- 2
Not sure if this is the right forum for this but...
I sometimes use TCC to download HTML pages from web sites, convert them to text and then parse out some wanted information for further processing/presentation.
TCC has builtin support for the first part (COPY with HTTP support) and the last (various text processing functions) but there is no direct support for converting HTML to (readable) text. I could of course try to parse to HTML file directly but it's much easier to first convert it to a text format (Matches what you see in the browser).
So far I have used the Windows port of the links console browser with the "-dump" option to do the conversion. But it's an old unmaintained port that does not seem to work on Windows Vista.
Does anyone know of some good html to txt console programs for Windows? Open source and native (not requiring e.g. Python) would be a plus.
Would it perhaps be possible to add something like this in future TCC versions? Seems like it would be a good complement to what's already supported.
I sometimes use TCC to download HTML pages from web sites, convert them to text and then parse out some wanted information for further processing/presentation.
TCC has builtin support for the first part (COPY with HTTP support) and the last (various text processing functions) but there is no direct support for converting HTML to (readable) text. I could of course try to parse to HTML file directly but it's much easier to first convert it to a text format (Matches what you see in the browser).
So far I have used the Windows port of the links console browser with the "-dump" option to do the conversion. But it's an old unmaintained port that does not seem to work on Windows Vista.
Does anyone know of some good html to txt console programs for Windows? Open source and native (not requiring e.g. Python) would be a plus.
Would it perhaps be possible to add something like this in future TCC versions? Seems like it would be a good complement to what's already supported.