Total Pageviews

Search: This Blog, Linked From Here, The Web, My fav sites, My Blogroll

17 May 2010

Ubuntu --- convert file formats


Convert .chm to .html | .pdf

Suppose you want to edit and republish a CHM file into another format. To do so, you first need to extract the original HTML files from the CHM archive.
   First we must  install libchm-bin package containing chm library and its included helper application extract_chmLib.

$ sudo apt-get install libchm-bin

To convert .chm files in to .html files we use the following command:

$ extract_chmLib book.chm output_dir

where book.chm is the path to your .chm file and output_dir is a new directory that will be created to contain the html extracted from the chm file.

 To convert .chm files in to .pdf files we must first install htmldoc(a program for writing documentation in HTML and producing indexed HTML, PostScript, or PDF output (with tables of contents)):

$ sudo apt-get install htmldoc

To use htmldoc we must type the following command in terminal:

$ htmldoc

You should see a screen similar to the following: 

here you must:
  1. go to direcrory tha t contain the html files 
  2. select all the files to include in .pdf 
  3. go to output tab and type the name of pdf file 
  4. go to PDF tab to selct the version of pdf and the desired compression level of text and images
  5. clik Generate and convert them to pdf

Convert .pdf to .html | .xml

poppler-utils package (maybe installed per default in Ubuntu Lucid Lynx) contain a program called pdftohtml to convert a .pdf to a .html file. In case that's not true we must install it:

$ sudo apt-get install poppler-utils

The command below gives you a simple HTML file without any PNG files, so you won’t be able to see any embedded graphics. It’s a great utility if you just want to extract the text from an Adobe file.

$ pdftohtml file.pdf file.html

If you want to see graphics, you’ll need to use the -c (as in “complex”) option:

$ pdftohtml -c file.pdf file.html

This option produces individual HTML files, one for each page of the PDF file, with the PNG references mixed in. 
The graphics in the original PDF file show up in a browser and the text part can be cut and pasted. The total size of the HTML and PNG files generated with the -c option tend to be roughly equivalent to that of the original PDF.
    More info about can be found trough: 

$ man pdftohtml

Resources html>pdf pdf>html
help.ubuntu(about multimedia formats)