12.8.04
21:15 Convert anything to text/RTF/HTML (
,
,
,
)
Today I was looking for a Java RTF to text converter. The RTFEditor in JDK didn't feet our needs as it is only a quick hack on for example does not support different encodings (as all other classes in JDK/swing do).
I remember evaluating 2 years ago the dtSearch Index engine which is a very good, fast and cheap (compare this to other commercial engine as google, which of course are also another dimension) full text indexing engine, and what's particular interesting, it runs on Linux and has a Java API.
Beside the index engine itself it contains also a converter for a large amount of document types to text, rtf and html.
The converter is also accessible through the Java API, allowing to convert with few lines of code any document to text. Both file conversion and memory conversion are supported. All you have to do is add dtsearchengine.jar and two DLL's to your classpath.
Here is a sample code (from mind, don't have my development platform here):
FileConverter fc = new FileConverter();
fc.setInputFile(inFile);
// for memory, use fc.setDocBytes(byte []) instead
fc.setOutputFormat(FileConverter.it_Utf8);
fc.setOutputFile(outFile);
// for memory, use fc.setOutputToString(true) and read fc.getOutputString() after execute
fc.execute();
posted by Jean-Marc Autexier |
1 comments | Permalink | Send to Friends | Google it!
|