Javability (Java, Zaurus, Linux, Live) by Jean-Marc Autexier, Saarland/Germany
cat /dev/www | egrep 'Java|Linux|Zaurus|ITnews|Live' > blog

12.8.04 21:15 Convert anything to text/RTF/HTML ( , , , )

Today I was looking for a Java RTF to text converter. The RTFEditor in JDK didn't feet our needs as it is only a quick hack on for example does not support different encodings (as all other classes in JDK/swing do).

I remember evaluating 2 years ago the dtSearch Index engine which is a very good, fast and cheap (compare this to other commercial engine as google, which of course are also another dimension) full text indexing engine, and what's particular interesting, it runs on Linux and has a Java API.

Beside the index engine itself it contains also a converter for a large amount of document types to text, rtf and html.
The converter is also accessible through the Java API, allowing to convert with few lines of code any document to text. Both file conversion and memory conversion are supported. All you have to do is add dtsearchengine.jar and two DLL's to your classpath.

Here is a sample code (from mind, don't have my development platform here):

FileConverter fc = new FileConverter();
fc.setInputFile(inFile);
// for memory, use fc.setDocBytes(byte []) instead
fc.setOutputFormat(FileConverter.it_Utf8);
fc.setOutputFile(outFile);
// for memory, use fc.setOutputToString(true) and read fc.getOutputString() after execute
fc.execute();

posted by Jean-Marc Autexier | 1 comments | Permalink | Send to Friends | Google it!
Comments:
^^
// for memory, use fc.setOutputToString(true) and read fc.getOutputString().substring(0,outputString.length()-11) after execute
// for some reason the result looks better if you remove the FileName:\n
 

Kommentar veröffentlichen
Subscribe

Locations of visitors to this page
selected blogs
ressources
Security
Unsorted
Fun
Free&Open Software
archives
This is a personal web page. Things said here do not represent the position of my employer.
RSS icons by: FastIcon.com