How to read Doc file data n Python?

A forum for general discussion of the Python programming language.

How to read Doc file data n Python?

Postby vampo650 » Wed Mar 20, 2013 6:36 am

Hi,

How can I read one Doc file data and print in another doc file in Python? Not only text but also tables, pics etc..


Thanks for advance
vampo650
 
Posts: 9
Joined: Wed Mar 20, 2013 6:32 am

Re: How to read Doc file data n Python?

Postby setrofim » Wed Mar 20, 2013 7:27 am

You need to have pywin32 extensions installed. If you're using ActiveState Python, then those will be bundled in by default; otherwise you will have to install them from here. These extensions allow you to (among other things) communicate with Office applications through COM. This means that you will need Microsoft Office (or, in your case, at least Word) installed on your system as well.

Once you have the requirements, you can invoke the win32com.client to start an instance of Word and load your doc files which you can then manipulate through the COM APIs. This tutorial will show you the basics of working with Word documents in Python; and you can refer to the MSDN documentation to see how Word documents are structured and what APIs are available.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm

Re: How to read Doc file data n Python?

Postby vampo650 » Thu Mar 21, 2013 2:19 am

Thank you setrofilm..But i am using Ubuntu operating system...Is Ubuntu supports Windows extensions?

How can I read ODT(OpenDocumentText) format file data to print in another ODT format file data with tables, pics etc.Thanks for advance
vampo650
 
Posts: 9
Joined: Wed Mar 20, 2013 6:32 am

Re: How to read Doc file data n Python?

Postby setrofim » Thu Mar 21, 2013 4:40 am

vampo650 wrote:i am using Ubuntu operating system...Is Ubuntu supports Windows extensions?

No, and those extensions just use COM to interact with office programs, so you'd need Office installed...

vampo650 wrote:How can I read ODT(OpenDocumentText) format file data to print in another ODT format file data with tables, pics etc.Thanks for advance

Open Office has its own scripting interface. It is exposed to Python through the PyUNO module. I haven't played with it myself, so can't comment on how good it is, but from looking at the docs, it will let you do what you want. Since ODF files are just zip archives of XML documents, you could also just extract and parse them yourself like this. I would recommend giving PyUNO a try first though.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm


Return to General Discussions

Who is online

Users browsing this forum: Bing [Bot] and 1 guest