Is docx to pdf conversion impeded by OOXML inconsistency?

Did you know you can Web Print a .docx file, and that SavaPage uses LibreOffice Convertor to render it to PDF?

But beware, incompatibilities and inconsistencies in the Office Open XML (OOXML) document format may lead to surprising results.

Did you encounter any surprises? Do you have a best practice for creating compatible .docx documents?

Personally I have found that fonts have a large impact on the final rendering. Having the Windows fonts available in LibreOffice will help render it correctly, including when rendering pdf. For floating elements the font can have even larger effects because font size will influence placement of object.
Disclaimer: this is my experience in the UI, not in the SavaPage render implementation though.

1 Like

@nico , good point. Matching fonts as used in .docx to the ones available in LibreOffice is crucial for valid rendering. As SavaPage is concerned, make sure that at least Microsoft True Type Core Fonts are installed on the GNU/Linux host. For more information on font mapping see this entry in the User Manual.

when i upload .xlsx file contains a table , in windows7 os open with microsoft office it shows fine (one page), after i upload to savapage in user page, appears four pages,please help how can i solve this problem

Hi @xiaobai. Welcome to the community.
Is it possible you upload the excell sheet here (or post a link to where we can download it) so we can investigate your problem?

i upload it here named test.xlsx . link is https:|||zhgitit|printtest (can’t contain link,i use | replace /). i tried many ways but can’t solve the problem , please help

Hi @xiobai… thanks for your question.
I just looked at the file to see what could be wrong with it.
And in order to do that, I opened it in LibreOffice Calc. There it shows as a single sheet too.
But as soon I export it as pdf, it just doesn’t fit on a A4 page anymore and 4 seperate pages are generated.
IMO it is a matter of sizing.
Remember, SavaPage converts ALL printjobs to PDF first before printing!
Here is the converted pdf:

Hi @xiaobai, I confirm @robb 's finding. As an alternative to Web Print you can try a Driver Print. In that way you can directly print from MS Office to SavaPage, and tell the PostScript driver to scale the spreadsheet down, for example to a single A4 page.

i have tested in libreoffice gui , when i open the text.xlsx,then click file->print, a preview is shown right side, it shows 2 pages,not the same in ms office(only one page). before convert to PDF, i think its style has been changed

hello,if use Driver Print,savapage use ip based ,how can i know the user‘s name if he never login in the web page

The User ID (as owner) of the OS session is passed to the SavaPage receiving queue, along with the PostScript stream. If you actually trust this user as authenticated user is a matter of how user trust is configured in your network (AD authentication?). Read the Trusted SavaPage Queue part in the User Manual. And of course, for driver printing to work, the user id must be present in SavaPage either by synchronization with a user source (AD) or as internal user.

thanks for your response , by the way , can this project deploy on windows os . if do , what should i pay attention to?

Although the server part is Linux only, SavaPage can perfectly act in a Windows network because of Active Directory integration. Windows clients can print to SavaPage with a generic PostScript driver, or use driverless printing like Web Print.

thanks for you response,can i deploy cups on another server

CUPS and SavaPage server must reside on the same machine. If there are valid reasons to separate the two, we will investigate how we can implement this.

thank you , i have another question, why need to convert all files to pdf after uploading,does cups don’t support direct print files like MS office type?

i use cups command test,only txt、pdf、image type supported,through i have installed libreoffice,it still unsupports ms office type.

That’s correct. CUPS doesn’t support direct print of MS Office documents. It’s not CUPS’ job to be a convertor for all kinds of file formats. It supports some common formats, from which PDF and PostScript are the main ones. Reader/editor programs with a printing function are responsible for converting their documents to PDF (or PostScript) and use a printer driver (installed on the OS) to send them to the printer via CUPS.

hello,can you give me an email address,i have a few questions to consult you,can you help me

I cant get an office file to upload at all - I get undefined as an error. server.log shows long dump starting with
2021-06-10 09:32:22,869 ERROR DocContentPrintProcessor:1845 - [jetty-threadpool-261]

Doesnt matter if its docx, xlsx, doc, etc… Other filetypes are fine.