基于openoffice实现html、doc互换

格式：docx
大小：50.88 KB
文档页数：5

下载文档原格式

Java实现在线预览的示例代码（openOffice实现）

Java实现在线预览的⽰例代码（openOffice实现）简介之前有写了poi实现在线预览的⽂章，⾥⾯也说到了使⽤openOffice也可以做到，这⾥就详细介绍⼀下。

我的实现逻辑有两种：⼀、利⽤jodconverter(基于OpenOffice服务)将⽂件(.doc、.docx、.xls、.ppt)转化为html格式。

⼆、利⽤jodconverter(基于OpenOffice服务)将⽂件(.doc、.docx、.xls、.ppt)转化为pdf格式。

转换成html格式⼤家都能理解，这样就可以直接在浏览器上查看了，也就实现了在线预览的功能；转换成pdf格式这点，需要⽤户安装了Adobe Reader XI，这样你会发现把pdf直接拖到浏览器页⾯可以直接打开预览，这样也就实现了在线预览的功能。

将⽂件转化为html格式或者pdf格式话不多说，直接上代码。

package com.pdfPreview.util;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;import java.io.InputStream;import java.io.OutputStream;import .ConnectException;import java.text.SimpleDateFormat;import java.util.Date;import com.artofsolving.jodconverter.DocumentConverter;import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;import com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter;/*** 利⽤jodconverter(基于OpenOffice服务)将⽂件(*.doc、*.docx、*.xls、*.ppt)转化为html格式或者pdf格式，* 使⽤前请检查OpenOffice服务是否已经开启, OpenOffice进程名称：soffice.exe | soffice.bin** @author yjclsx*/public class Doc2HtmlUtil {private static Doc2HtmlUtil doc2HtmlUtil;/*** 获取Doc2HtmlUtil实例*/public static synchronized Doc2HtmlUtil getDoc2HtmlUtilInstance() {if (doc2HtmlUtil == null) {doc2HtmlUtil = new Doc2HtmlUtil();}return doc2HtmlUtil;}/*** 转换⽂件成html** @param fromFileInputStream:* @throws IOException*/public String file2Html(InputStream fromFileInputStream, String toFilePath,String type) throws IOException {Date date = new Date();SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");String timesuffix = sdf.format(date);String docFileName = null;String htmFileName = null;if("doc".equals(type)){docFileName = "doc_" + timesuffix + ".doc";htmFileName = "doc_" + timesuffix + ".html";}else if("docx".equals(type)){docFileName = "docx_" + timesuffix + ".docx";htmFileName = "docx_" + timesuffix + ".html";}else if("xls".equals(type)){docFileName = "xls_" + timesuffix + ".xls";htmFileName = "xls_" + timesuffix + ".html";}else if("ppt".equals(type)){docFileName = "ppt_" + timesuffix + ".ppt";htmFileName = "ppt_" + timesuffix + ".html";}else{return null;}File htmlOutputFile = new File(toFilePath + File.separatorChar + htmFileName);File docInputFile = new File(toFilePath + File.separatorChar + docFileName);if (htmlOutputFile.exists())htmlOutputFile.delete();htmlOutputFile.createNewFile();if (docInputFile.exists())docInputFile.delete();docInputFile.createNewFile();/*** 由fromFileInputStream构建输⼊⽂件*/try {OutputStream os = new FileOutputStream(docInputFile);int bytesRead = 0;byte[] buffer = new byte[1024 * 8];while ((bytesRead = fromFileInputStream.read(buffer)) != -1) {os.write(buffer, 0, bytesRead);}os.close();fromFileInputStream.close();} catch (IOException e) {}OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);try {connection.connect();} catch (ConnectException e) {System.err.println("⽂件转换出错，请检查OpenOffice服务是否启动。

PHP实现wordexcelppt转换为PDF

PHP实现wordexcelppt转换为PDF 前段时间负责公司内部⽂件平台的设计，其中有⼀个需求是要能够在线浏览⽤户上传的 office ⽂件。

我的思路是先将 office 转换成 PDF，再通过 pdf.js 插件解析 PDF ⽂件，使其能在任何浏览器下查看。

可以通过 PHP 的 COM 组件，调⽤其它能够处理 office ⽂件的应⽤程序，利⽤提供的接⼝来转换 PDF ⽂件。

OpenOfficeOpenOffice 是⼀套开源跨平台的办公软件，由许多⾃由软件⼈⼠共同来维持，让⼤家能在 Microsoft Office 之外，还能有免费的 Office 可以使⽤。

OpenOffice 与微软的办公软件套件兼容，能将 doc、xls、ppt 等⽂件转换为 PDF 格式，其功能绝对不⽐ Microsoft Office 差。

OpenOffice 需要 java ⽀持，请确认安装了 JDK，并配置了 JRE 环境变量。

1. 配置组件服务OpenOffice 安装完成之后，按 win+R 快捷键进⼊运⾏菜单，输⼊ Dcomcnfg 打开组件服务。

[组件服务] >> [计算机] >> [我的电脑] >> [DCOM配置] >> [OpenOffice Service Manager]右键打开属性⾯板，选择安全选项卡，分别在启动和激活权限和访问权限上勾选⾃定义，添加 Everyone 的权限。

↑启动和激活权限和访问权限都使⽤⾃定义配置↑添加 Everyone ⽤户组，记得确认前先检查名称↑两个⾃定义配置相同，允许 Everyone 拥有所有权限再选择标识选项卡，勾选交互式⽤户，保存设置后退出。

2. 后台运⾏软件安装完 OpenOffice 后，需要启动⼀次确认软件可以正常运⾏，然后再打开命令⾏运⾏以下命令：切换到安装⽬录：后台运⾏该软件：PS：该命令只需要执⾏⼀次，就可以使软件⼀直在后台运⾏，即使重启服务器也不受影响。

officetohtml实现方式

officetohtml实现方式1. 引言1.1 概述officetohtml是一种将Office文档（如Word、Excel或PowerPoint）转换为HTML格式的工具或技术。

它可以帮助用户将传统的Office文档转化为网页友好的格式，使其在各种平台和设备上都可以方便地访问和阅读。

1.2 文章结构本文将详细介绍officetohtml实现方式，并探讨各种可能的方法。

首先，我们会对officetohtml进行简要概述，了解其基本原理和用途。

接下来，我们将重点介绍两种常见的实现方式，并比较它们的优缺点。

最后，我们会总结文章内容并对officetohtml实现方式进行评价和展望。

1.3 目的本文旨在帮助读者更好地理解officetohtml实现方式，并为他们选择适合自己需求的方法提供指导和参考。

通过详细介绍和比较不同的实现方式，读者可以更全面地了解这项技术，并能够根据自身情况做出明智的决策。

以上是“1. 引言”部分内容，请根据需要进行修改和完善。

2. officetohtml实现方式2.1 什么是officetohtmlOfficetohtml是一种将Office文档（如Word、Excel和PowerPoint）转换为HTML格式的工具或技术。

它能够将这些文档转换为在Web浏览器中进行展示或共享的可交互的HTML页面。

2.2 实现方式一实现方式一可以使用Microsoft Office提供的内置功能来实现officetohtml转换。

以下是基本步骤：1. 打开相应的Office文档（例如Word文档）。

2. 选择“另存为”选项，并选择HTML格式作为要保存的文件类型。

3. 自定义HTML选项，以便满足特定需求，比如页面布局、样式等。

4. 单击“保存”按钮，将Office文档保存为HTML格式。

使用此方法实现officetohtml转换的优点是简单且易于操作。

同时，由于该功能是由Microsoft Office自身提供的，所以它能够确保高度兼容性和准确性。

Word文档转换为HTML帮助文档操作手册范本

Word文档转换为HTML帮助文档操作手册一、使用到的软件●DOC2CHM●Dreamweaver CS3●Help and manual 4二、操作步骤1. 先建立一个工作目录。

如hhwork。

2.将需要转换的文件复制到此工作目录下。

如果是中文文件名，最好将其改为英文文件名。

例：现在要将《小神探点检定修信息管理系统使用手册0.3.6.doc》转换为Html格式的帮助文档，首先将此文档复制到hhwork目录下并将其更名为manual36.doc。

如图1所示。

图13.打开软件DOC2CHM，然后找到manual36.doc，然后点击“Convert”按钮，如图2所示。

图24. 程序分析文档后，打开如图3所示的界面。

图35. 在图3所示的界面中选择默认的“Outline”，然后点击“Last>>”按钮，打开图4所示的界面。

图46. 在图4所示的界面中点击“Convert”按钮，程序开始将文档Manual36.doc转换为Html文档，并保存在Manual36子目录下。

7. 在子目录下的以Outline开头的文件夹下，将后缀名为jpg的文件名更改一下，目的是每个文件的名称不同。

8. 用Dreamweaver打开此目录中的所有htm文件，如图5。

图59. 在图5所示的界面中将出现在标题前的标签删除掉，然后将标题复制到标题框中。

然后将图片的更改正确。

10. 打开Help and Manual 4，如图6。

图611. 在图6所示的界面中点击“新建”按钮创建新的帮助方案。

如图7所示。

图712. 在图7所示的界面中选择“导入现有的文件从…”，然后选择“常规HTML和文本文件”，在下面的框中指定源文件夹的位置。

然后点击“下一步”。

程序打开图8所示的界面。

图813. 在上图中指定输出文件的位置，可以采用默认位置。

然后点击“下一步”打开图9所示的界面。

图914. 在图9所示的界面中将不需要的文件移除，然后点击“下一步”打开图10所示的界面。

java调用openoffice将office系列文档转换为PDF的示例方法

java调⽤openoffice将office系列⽂档转换为PDF的⽰例⽅法前导：发过程中经常会使⽤java将office系列⽂档转换为PDF，⼀般都使⽤微软提供的openoffice+jodconverter 实现转换⽂档。

openoffice既有windows版本也有linux版。

不⽤担⼼⽣产环境是linux系统。

1、openoffice依赖jar，以maven为例：<dependency><groupId>com.artofsolving</groupId><artifactId>jodconverter</artifactId><version>2.2.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>jurt</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>ridl</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>juh</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.openoffice</groupId><artifactId>unoil</artifactId><version>3.0.1</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-jdk14</artifactId><version>1.4.3</version></dependency>2、直接上转换代码，需要监听openoffice应⽤程序8100端⼝即可。

jodconverterlocalproperties 说明

jodconverterlocalproperties 说明
jodconverter是一个Java 库，用于将Office 文档（如Microsoft Word、Excel 和PowerPoint）转换为其他格式，如PDF、HTML、纯文本等。

它使用LibreOffice 或OpenOffice 的后台进程来执行转换。

jodconverterlocalproperties可能是指在使用jodconverter时需要配置的一些本地
属性。

这些属性可能包括：
1.Office Home Location: 指定LibreOffice 或OpenOffice 的安装路径。

这
是jodconverter需要知道以便能够启动后台进程并执行文档转换的位置。

2.Port Number: jodconverter启动LibreOffice 或OpenOffice 的后台进程时可能使
用的端口号。

3.Timeout Settings: 指定转换操作的超时时间，以避免转换操作无限制地挂起。

4.Logging Configuration: 配置日志记录，以便跟踪和调试转换过程中的任何问题。

5.Other System Properties: 可能还包括其他与LibreOffice 或OpenOffice 后台进程
交互所需的系统属性。

这些属性通常在配置文件或环境变量中设置，以便jodconverter在执行转换时使用。

具体的属性和配置方法可能因jodconverter的版本和使用的Office 套件的不同而有所变化。

【好文翻译】一步一步教你使用Spire.Doc转换Word文档格式

【好⽂翻译】⼀步⼀步教你使⽤Spire.Doc转换Word⽂档格式背景：本⽂试图证明和审查的格式转换能⼒。

很长的⼀段时间⾥，为了操作⽂档，开发⼈员不得不在服务器上安装Office软件。

⾸先，这是⼀个很糟糕的设计和实践。

第⼆，微软从没打算把Office作为⼀个服务器组件，它也⽤来在服务器端解释和操作⽂档的。

于是乎，产⽣了类似Spire.Doc这样的类库。

当我们讨论这个问题时，值得⼀提的是Office Open Xml. Office Open XML (也有⾮正式地称呼为 OOXML 或OpenXML) 是⼀种压缩的, 基于XML的⽂件格式，由微软开发，⽤于表现电⼦表格，展⽰图表，演⽰和⽂字处理等。

在2005年11⽉，微软宣布作为ECMA国际主要合作伙伴，将其开发的基于XML的⽂件格式标准化，称之为"Office Open XML"。

Open XML的引进使office⽂档结构更加标准化，并且开发⼈员使⽤ Open XML SDK可以直接进⾏很多简单的操作，但是仍然有很多差距，如将word⽂档转换成其他格式，⽐如PDF，图像，或者HTML等。

这就是Spire.Doc 来拯救开发⼈员的原因。

⽂档转换：我将在⽂章的其余部分来介绍Spire.Doc可以适⽤的多种场景。

⽂中展⽰的所有例⼦均可以在 Spire.Doc 的DEMO中找到，你可以很容易地下载并使⽤它们。

我的例⼦是⼀个简单的控制台程序，当然它也⽀持其他平台，如web项⽬或者Silverlight项⽬等。

⽤他们⾃⼰的话来说，Spire.Doc 宣称："Spire.Doc for .NET 可以将word⽂件转换成最常见和流⾏的格式。

"为了开始使⽤Spire.Doc，你⾸先需要添加Spire.Doc,Spire.License 和Spire.Pdf引⽤到你的项⽬中，这两个组件是打包在Spire.Doc中的.你需要⼀个有效的Spire.Doc授权⽂件才能使⽤这个类库，否则它将在⽂档中显⽰"评估版本"警告。

Java实现word文档在线预览，读取office（word,excel,ppt）文件

Java实现word⽂档在线预览，读取office（word,excel,ppt）⽂件想要实现word或者其他office⽂件的在线预览，⼤部分都是⽤的两种⽅式，⼀种是使⽤openoffice转换之后再通过其他插件预览，还有⼀种⽅式就是通过POI读取内容然后预览。

⼀、使⽤openoffice⽅式实现word预览主要思路是：1.通过第三⽅⼯具openoffice，将word、excel、ppt、txt等⽂件转换为pdf⽂件2.通过swfTools将pdf⽂件转换成swf格式的⽂件3.通过FlexPaper⽂档组件在页⾯上进⾏展⽰我使⽤的⼯具版本：openof：3.4.1swfTools：1007FlexPaper：这个关系不⼤，我随便下的⼀个。

推荐使⽤1.5.1JODConverter：需要jar包，如果是maven管理直接引⽤就可以操作步骤：1.office准备下载openoffice：从过往⽂件，其他语⾔中找到中⽂版3.4.1的版本下载后，解压缩，安装然后找到安装⽬录下的program ⽂件夹在⽬录下运⾏soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard如果运⾏失败，可能会有提⽰，那就加上 .\ 在运⾏试⼀下这样openoffice的服务就开启了。

2.将flexpaper⽂件中的js⽂件夹(包含了flexpaper_flash_debug.js，flexpaper_flash.js,jquery.js,这三个js⽂件主要是预览swf⽂件的插件)拷贝⾄⽹站根⽬录;将FlexPaperViewer.swf拷贝⾄⽹站根⽬录下(该⽂件主要是⽤在⽹页中播放swf⽂件的播放器)项⽬结构：页⾯代码：fileUpload.jsp<%@ page language="java" contentType="text/html; charset=UTF-8"pageEncoding="UTF-8"%><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>⽂档在线预览系统</title><style>body {margin-top:100px;background:#fff;font-family: Verdana, Tahoma;}a {color:#CE4614;}#msg-box {color: #CE4614; font-size:0.9em;text-align:center;}#msg-box .logo {border-bottom:5px solid #ECE5D9;margin-bottom:20px;padding-bottom:10px;}#msg-box .title {font-size:1.4em;font-weight:bold;margin:0 0 30px 0;}#msg-box .nav {margin-top:20px;}</style></head><body><div id="msg-box"><form name="form1" method="post" enctype="multipart/form-data" action="docUploadConvertAction.jsp"><div class="title">请上传要处理的⽂件，过程可能需要⼏分钟，请稍候⽚刻。

openoffice教程

Openoffice入门教程首先安装Apache_OpenOffice_4.0.1_Win_x86_install_zh-CN.exe或者Apache_OpenOffice_3.2.0_LinuxX86-64_install_wJRE_zh-CN.tar.gzLinux 说明1、复制Apache_OpenOffice_3.2.0_LinuxX86-64_install_wJRE_zh-CN.tar.gz到/opt目录安装，安装命令如下：tar -xzvf Apache_OpenOffice_3.2.0_LinuxX86-64_install_wJRE_zh-CN.tar.gzcd OOO320_m12_native_packed-1_zh-CN.9483/RPMSrpm -ivh *.rpm2、在/etc/rc.d/rc.local 中加入如下内容（假设openoffice安装在/opt/3）：/opt/3/program/soffice"-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo -headless -nofirststartwizard &3、在shell命令窗口执行命令：nohup /opt/3/program/soffice -accept="socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo -headless -nofirststartwizard &4、查看启动命令ps -ef | grep openoffice6、重启oaWindow说明echo "用以下命令启动OpenOffice服务"进入目录cd C:\Program Files (x86)\OpenOffice 4\program输入启动服务soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizardJar包说明可以直接在当前word文档下载jar包例子1commons-io-1.4.jar jodconverter-2.2.0.jar jodconverter-cli-2.2.0.jar juh-2.2.0.jar jurt-2.2.0.jar ridl-2.2.0.jar slf4j-api-1.4.0.jar slf4j-jdk14-1.4.0.jar unoil-2.2.0.jar xstream-1.2.2.jar例子2需要多家两个jar包jooconverter-2.0rc2.jarridl-2.0.jar。

odconv应用实例

odconv应用实例
odconv是一个用于将OpenDocument格式转换为其他格式的命令行工具。

它可以将ODF文档（.odt，.ods，.odp等）转换为其他常见的文档格式，比如PDF，HTML，纯文本等。

下面我将从几个方面来介绍odconv的应用实例。

1. 将ODF文档转换为PDF格式：
通过使用odconv命令行工具，你可以将一个OpenDocument 格式的文档（比如.odt文件）转换为PDF格式。

这在需要与他人共享文档，或者需要打印文档时非常有用。

你只需要运行odconv命令并指定输入文件和输出文件的路径即可完成转换。

2. 将ODF文档转换为HTML格式：
另一个常见的应用实例是将ODF文档转换为HTML格式，这样可以在网页上轻松地展示文档内容，或者在网页中嵌入文档。

odconv可以帮助你将.odt或.ods文件转换为HTML格式，使得文档内容可以在网页上方便地展示。

3. 将ODF文档转换为纯文本格式：
有时候我们可能需要将文档的内容提取为纯文本格式，以便进行文本分析或者其他处理。

odconv可以帮助你将ODF文档转换为纯文本格式，方便进行后续的处理和分析。

4. 批量转换：
odconv还支持批量转换功能，你可以指定一个文件夹，让odconv批量处理其中的所有ODF文档，将它们转换为指定的格式，这在处理大量文档时非常方便。

总之，odconv是一个非常实用的工具，可以帮助你将OpenDocument格式的文档转换为其他常见格式，方便文档的分享、展示和处理。

希望以上介绍能够帮助你更好地了解odconv的应用实例。

Linux下openoffice转换word文档到pdf文档时中文乱码问题

Linux下openoffice转换word文档到pdf文档时中文乱码问题报错显示：INFO: connectedJun 1, 2009 11:21:52 AMcom.artofsolving.jodconverter.openoffice.connection.AbstractOpenOffic eConnection disposingINFO: disconnectedException in thread "main"com.artofsolving.jodconverter.openoffice.connection.OpenOfficeExcepti on: conversion failed: could not load input documentatcom.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocument Converter.loadAndExport(OpenOfficeDocumentConverter.java:131)atcom.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocument Converter.convertInternal(OpenOfficeDocumentConverter.java:120)atcom.artofsolving.jodconverter.openoffice.converter.AbstractOpenOffice DocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:10 4)atcom.artofsolving.jodconverter.openoffice.converter.AbstractOpenOffice DocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:74) atcom.artofsolving.jodconverter.openoffice.converter.AbstractOpenOffice DocumentConverter.convert(AbstractOpenOfficeDocumentConverter.java:70) atcom.artofsolving.jodconverter.cli.ConvertDocument.convertOne(ConvertD ocument.java:154)atcom.artofsolving.jodconverter.cli.ConvertDocument.main(ConvertDocument.java:139)问题解决：此时可能是linux下的jre没有相应的中文字体的问题下载 simhei.ttf 黑体simsun.ttc 宋体两种字体文件找到jre的字体路径：/usr/jdk1.6.0_22/jre/lib/fonts新建文件夹fallback：mkdir fallback将字体simhei.ttf 、simsun.ttc拷贝到/usr/jdk1.6.0_22/jre/lib/fonts/fallback目录下重启openofficeps ax|grep soffice显示如下：22739 pts/5 S 0:00 /bin/sh/opt/3/program/soffice -headless-accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard22747 pts/5 Sl 0:01/opt/3/program/soffice.bin -headless-accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard23789 pts/5 S+ 0:00 grep soffice关闭soffice进程：kill 22739以后台启动openoffice：/opt/3/program/soffice -headless-accept=socket,host=127.0.0.1,port=8100;urp; -nofirststartwizard & 问题解决了！！但是，这种情况下只能解决，宋体和黑体的乱码问题，其他字体的还需添加字体文件来解决。

C#实现HTML转WORD及WORD转PDF的方法

C#实现HTML转WORD及WORD转PDF的⽅法本⽂实例讲述了C#实现HTML转WORD及WORD转PDF的⽅法。

分享给⼤家供⼤家参考。

具体如下：功能：实现HTML转WORD，WORD转PDF具体代码如下：using System;using System.Collections.Generic;using ponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using Word = Microsoft.Office.Interop.Word;using oWord = Microsoft.Office.Interop.Word;using System.Reflection;using System.Configuration;using System.Web;using System.Web.Security;using System.Web.UI;using System.Web.UI.WebControls;using System.Web.UI.WebControls.WebParts;using System.Web.UI.HtmlControls;using Microsoft.Office.Core;using System.Text.RegularExpressions;namespace WindowsApplication2{public partial class Form1 : Form{public Form1(){InitializeComponent();}private void button1_Click(object sender, EventArgs e){object oMissing = System.Reflection.Missing.Value;object oEndOfDoc = "\\endofdoc"; /* \endofdoc is a predefined bookmark *///Start Word and create a new document.Word._Application oWord;Word._Document oDoc;oWord = new Word.Application();oWord.Visible = true;oDoc = oWord.Documents.Add(ref oMissing, ref oMissing,ref oMissing, ref oMissing);//Insert a paragraph at the beginning of the document.Word.Paragraph oPara1;oPara1 = oDoc.Content.Paragraphs.Add(ref oMissing);oPara1.Range.Text = "Heading 1";oPara1.Range.Font.Bold = 1;oPara1.Format.SpaceAfter = 24; //24 pt spacing after paragraph.oPara1.Range.InsertParagraphAfter();//Insert a paragraph at the end of the document.Word.Paragraph oPara2;object oRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oPara2 = oDoc.Content.Paragraphs.Add(ref oRng);oPara2.Range.Text = "Heading 2";oPara2.Format.SpaceAfter = 6;oPara2.Range.InsertParagraphAfter();//Insert another paragraph.Word.Paragraph oPara3;oRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oPara3 = oDoc.Content.Paragraphs.Add(ref oRng);oPara3.Range.Text = "This is a sentence of normal text. Now here is a table:";oPara3.Range.Font.Bold = 0;oPara3.Format.SpaceAfter = 24;oPara3.Range.InsertParagraphAfter();//Insert a 3 x 5 table, fill it with data, and make the first row//bold and italic.Word.Table oTable;Word.Range wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oTable = oDoc.Tables.Add(wrdRng, 3, 5, ref oMissing, ref oMissing);oTable.Range.ParagraphFormat.SpaceAfter = 6;int r, c;string strText;for (r = 1; r <= 3; r++)for (c = 1; c <= 5; c++){strText = "r" + r + "c" + c;oTable.Cell(r, c).Range.Text = strText;}oTable.Rows[1].Range.Font.Bold = 1;oTable.Rows[1].Range.Font.Italic = 1;//Add some text after the table.Word.Paragraph oPara4;oRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oPara4 = oDoc.Content.Paragraphs.Add(ref oRng);oPara4.Range.InsertParagraphBefore();oPara4.Range.Text = "And here's another table:";oPara4.Format.SpaceAfter = 24;oPara4.Range.InsertParagraphAfter();//Insert a 5 x 2 table, fill it with data, and change the column widths.wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oTable = oDoc.Tables.Add(wrdRng, 5, 2, ref oMissing, ref oMissing);oTable.Range.ParagraphFormat.SpaceAfter = 6;for (r = 1; r <= 5; r++)for (c = 1; c <= 2; c++){strText = "r" + r + "c" + c;oTable.Cell(r, c).Range.Text = strText;}oTable.Columns[1].Width = oWord.InchesToPoints(2); //Change width of columns 1 & 2oTable.Columns[2].Width = oWord.InchesToPoints(3);//Keep inserting text. When you get to 7 inches from top of the//document, insert a hard page break.object oPos;double dPos = oWord.InchesToPoints(7);oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range.InsertParagraphAfter();do{wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;wrdRng.ParagraphFormat.SpaceAfter = 6;wrdRng.InsertAfter("A line of text");wrdRng.InsertParagraphAfter();oPos = wrdRng.get_Information(Word.WdInformation.wdVerticalPositionRelativeToPage);}while (dPos >= Convert.ToDouble(oPos));object oCollapseEnd = Word.WdCollapseDirection.wdCollapseEnd;object oPageBreak = Word.WdBreakType.wdPageBreak;wrdRng.Collapse(ref oCollapseEnd);wrdRng.InsertBreak(ref oPageBreak);wrdRng.Collapse(ref oCollapseEnd);wrdRng.InsertAfter("We're now on page 2. Here's my chart:");wrdRng.InsertParagraphAfter();//Insert a chart.Word.InlineShape oShape;object oClassType = "MSGraph.Chart.8";wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;oShape = wrdRng.InlineShapes.AddOLEObject(ref oClassType, ref oMissing,ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing);//Demonstrate use of late bound oChart and oChartApp objects to//manipulate the chart object with MSGraph.object oChart;object oChartApp;oChart = oShape.OLEFormat.Object;oChartApp = oChart.GetType().InvokeMember("Application",BindingFlags.GetProperty, null, oChart, null);//Change the chart type to Line.object[] Parameters = new Object[1];Parameters[0] = 4; //xlLine = 4oChart.GetType().InvokeMember("ChartType", BindingFlags.SetProperty,null, oChart, Parameters);//Update the chart image and quit MSGraph.oChartApp.GetType().InvokeMember("Update",BindingFlags.InvokeMethod, null, oChartApp, null);oChartApp.GetType().InvokeMember("Quit",BindingFlags.InvokeMethod, null, oChartApp, null);//... If desired, you can proceed from here using the Microsoft Graph//Object model on the oChart and oChartApp objects to make additional//changes to the chart.//Set the width of the chart.oShape.Width = oWord.InchesToPoints(6.25f);oShape.Height = oWord.InchesToPoints(3.57f);//Add text after the chart.wrdRng = oDoc.Bookmarks.get_Item(ref oEndOfDoc).Range;wrdRng.InsertParagraphAfter();wrdRng.InsertAfter("THE END.");//Close this form.this.Close();}private void button2_Click(object sender, EventArgs e){string s = "";if (openFileDialog1.ShowDialog() == DialogResult.OK){s = openFileDialog1.FileName;}else{return;}// 在此处放置⽤户代码以初始化页⾯Word.ApplicationClass word = new Word.ApplicationClass();Type wordType = word.GetType();Word.Documents docs = word.Documents;// 打开⽂件Type docsType = docs.GetType();object fileName = s;Word.Document doc = (Word.Document)docsType.InvokeMember("Open",System.Reflection.BindingFlags.InvokeMethod, null, docs, new Object[] { fileName, false, false }); // 转换格式，另存为Type docType = doc.GetType();object saveFileName = "d:\\Reports\\aaa.doc";//下⾯是Microsoft Word 9 Object Library的写法，如果是10，可能写成：/*docType.InvokeMember("SaveAs", System.Reflection.BindingFlags.InvokeMethod,null, doc, new object[]{saveFileName, Word.WdSaveFormat.wdFormatFilteredHTML});*////其它格式：///wdFormatHTML///wdFormatDocument///wdFormatDOSText///wdFormatDOSTextLineBreaks///wdFormatEncodedText///wdFormatRTF///wdFormatTemplate///wdFormatText///wdFormatTextLineBreaks///wdFormatUnicodeTextdocType.InvokeMember("SaveAs", System.Reflection.BindingFlags.InvokeMethod,null, doc, new object[] { saveFileName, Word.WdSaveFormat.wdFormatDocument });// 退出 WordwordType.InvokeMember("Quit", System.Reflection.BindingFlags.InvokeMethod,null, word, null);}private void WordConvert(string s){oWord.ApplicationClass word = new Microsoft.Office.Interop.Word.ApplicationClass();Type wordType = word.GetType();//打开WORD⽂档/*对应脚本中的var word = new ActiveXObject("Word.Application");var doc = word.Documents.Open(docfile);*/oWord.Documents docs = word.Documents;Type docsType = docs.GetType();object objDocName =s;oWord.Document doc = (oWord.Document)docsType.InvokeMember("Open", System.Reflection.BindingFlags.InvokeMethod, null, docs, new Object[] { objDocName, true, true });//打印输出到指定⽂件//你可以使⽤ doc.PrintOut();⽅法,次⽅法调⽤中的参数设置较繁琐,建议使⽤ Type.InvokeMember 来调⽤时可以不⽤将PrintOut的参数设置全,只设置4个主要参数Type docType = doc.GetType();object printFileName = @"c:\aaa.ps";docType.InvokeMember("PrintOut", System.Reflection.BindingFlags.InvokeMethod, null, doc, new object[] { false, false, oWord.WdPrintOutRange.wdPrintAllDocument, printFileName });//new object[]{false,false,oWord.WdPrintOutRange.wdPrintAllDocument,printFileName}//对应脚本中的word.PrintOut(false, false, 0, psfile);的参数//退出WORD//对应脚本中的word.Quit();wordType.InvokeMember("Quit", System.Reflection.BindingFlags.InvokeMethod, null, word, null);object o1 = "c:\\aaa.ps";object o2 = "c:\\aaa.pdf";object o3 = "";//引⽤将PS转换成PDF的对象//try catch之间对应的是脚本中的 PDF.FileToPDF(psfile,pdffile,""); //你可以使⽤ pdfConvert.FileToPDF("c:\\test.ps","c:\\test.pdf","");这样的转换⽅法,本⼈只是为了保持与WORD相同的调⽤⽅式 try{ACRODISTXLib.PdfDistillerClass pdf = new ACRODISTXLib.PdfDistillerClass();Type pdfType = pdf.GetType();pdfType.InvokeMember("FileToPDF", System.Reflection.BindingFlags.InvokeMethod, null, pdf, new object[] { o1, o2, o3 });pdf = null;}catch { } //读者⾃⼰补写错误处理//为防⽌本⽅法调⽤多次时发⽣错误,必须停⽌acrodist.exe进程foreach (System.Diagnostics .Process proc in System.Diagnostics.Process.GetProcesses()){int begpos;int endpos;string sProcName = proc.ToString();begpos = sProcName.IndexOf("(") + 1;endpos = sProcName.IndexOf(")");sProcName = sProcName.Substring(begpos, endpos - begpos);if (sProcName.ToLower().CompareTo("acrodist") == 0){try{proc.Kill(); //停⽌进程}catch { } //读者⾃⼰补写错误处理break;}}}private void button3_Click(object sender, EventArgs e){if (openFileDialog1.ShowDialog() == DialogResult.OK){string s = openFileDialog1.FileName;WordConvert(s);}}//getnextcodeprivate void button4_Click(object sender, EventArgs e){WorkCell myWorkCell = new WorkCell(textBox2.Text,textBox1.Text);textBox3.Text = myWorkCell.GetNextCode();}}public class WorkCell{private string workCellCode;private string parentCellCode;private string commonCode;private char[] code;private char[] pCode;private char[] standCode;private string s;public WorkCell( string mycode,string parentcode){workCellCode = mycode;parentCellCode = parentcode;standCode = new char[] { '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'W', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' }; commonCode = Regex.Replace(parentCellCode,@"0+","");code = workCellCode.Substring(commonCode.Length).ToCharArray();}public string WorkCellCode{set{workCellCode = value;}get{return workCellCode;}}public string ParentCellCode{set{workCellCode = value;}get{return workCellCode;}}public string GetNextCode(){string s="";if (code.Length > 0){int i = 0;for (i = code.Length - 1; i >= 0; i--){if (code[i] != '0'){GetNextChar(i);break;}}for(i=0;i<code.Length;i++){s+=code[i].ToString();}return commonCode + s;}else{return "null";}}//设置code中的下⼀个代码，从右边起，找到第⼀个⾮0字符，将其按标准代码⾃加1，溢出则进位private char GetNextChar(int j){int i = -1;int flag = 0;for (i = 0; i < standCode.Length; i++){if (code[j] == standCode[i]){flag = 1;break;}}//MessageBox.Show(code[j].ToString()+" "+standCode[i].ToString()+" "+i.ToString());if (i >= standCode.Length-1 || flag==0){code[j] = standCode[0];if (j > 0)code[j - 1] = GetNextChar(j - 1);}else{code[j] = standCode[i + 1];}return code[j];}}}希望本⽂所述对⼤家的C#程序设计有所帮助。

批量转换HTML文件转换成WORD文档

批量转换HTML文件转换成WORD文档批量转换HTML文件转换成WORD文档如果一个一个转换将会是一项浩大的工程，其实，利用Word 2003的“转换向导”就可以实现文档格式的批量转换。

step1.启动“转换向导”。

运行Word 2003，单击“文件”→“新建”，打开“新建文档”任务窗格，单击“本机上的模板…”（右侧导航栏），弹出“模板”对话框，切换到“其他文档”标签卡，双击列表中的“转换向导”，启动“转换向导”。

step2.选择转换格式。

单击“下一步”，设置转换格式，这里有两个转换选项，一个是将其他格式的文档转换为Word文档格式，另一个是将Word文档格式转换为其他文件格式，大家可以根据操作过程中实际需要进行选择，如果需要将Word文档格式转换为纯文本文件格式，这里就应该选择“从Word文档格式转换为其他文件格式”，并在下拉列表中选中“纯文本”。

step3.选择转换文件。

接下来向导要求你选择源文件夹和目标文件夹，点击“源文件夹”右侧的“浏览”按钮，找到需要转换的文档所在的文件夹；单击“目标文件夹”右侧的“浏览”按钮，为转换后的文件指定一个文件夹。

单击“下一步”，双击“可用文件”列表中需要转换的DOC文件，将文件添加到“转换文件”列表中（单击“全选”按钮可以将所有文件一次性添加到列表中）。

step4.完成文件转换。

将需要转换的文件全部添加列表中后，单击“完成”按钮，系统开始转换文件，转换时会显示进度、已转换文件数量，并将转换后的文件以设定的格式自动存入指定的文件夹中。

Word 2003的“转换向导”不仅可以把DOC格式的文档转换成TXT格式的文本文件，只要在“选择转换格式”窗口中进行设置，还可以把HTML文档、Outlook 通讯簿、RTF格式文档、WPS文件、文本文档等转换为DOC文档，或者把Word 文档转换成HTML文档、MS DOS文本、RTF等格式的文档。

JAVA：借用OpenOffice将上传的Word文档转换成Html格式

本文由我司收集整编，推荐下载，如有疑问，请与我司联系JAVA：借用OpenOffice 将上传的Word 文档转换成Html 格式2013/09/24 0 为什么会想起来将上传的word 文档转换成html 式呢？设想，如果一个系统需要发布在页面的文章都是来自word 文档，一般会执行下面的流程：使用word 打开文档，Ctrl A，进入发布文章页面，Ctrl V。

看起来也不麻烦，但是，如果文档中包含大量图片呢？尴尬的事是图片都需要重新上传吧？如果可以将已经编写好的word 文档上传到服务器就可以在相应页面进行展示，将会是一件非常惬意的事情，最起码信息发布人员会很开心。

程序员可能就不会这么想了，囧。

将Word 转Html 的原理是这样的：1、客户上传Word 文档到服务器2、服务器调用OpenOffice 程序打开上传的Word 文档3、OpenOffice 将Word 文档另存为Html 式4、Over至此可见，这要求服务器端安装OpenOffice 软件，其实也可以是MS Office，不过OpenOffice 的优势是跨平台，你懂的。

恩，说明一下，本文的测试基于MS Win7 Ultimate X64 系统。

下面就是规规矩矩的实现。

1、下载OpenOffice，download.openoffice/index.htmlSo easy...2、下载Jodconverter artofsolving/opensource/jodconverter 这是一个开启OpenOffice进行式转化的第三方jar 包。

3、泡杯热茶，等待下载。

4、安装OpenOffice，安装结束后，调用cmd，启动OpenOffice 的一项服务：C:\Program Files (x86)\OpenOffice 3\program soffice -headless -accept=socket,port=8100;urp;5、打开eclipse6、喝杯热茶，等待eclipse 打开。

htmldocx.asblob原理

htmldocx.asblob原理
htmldocx.asblob是一种将HTML文档转换为Word文档的方法，可以将HTML内容以Word文档的形式进行导出和保存。

其
原理是将HTML代码转换为Office Open XML格式的文档，
其中使用到了相关的HTML和CSS解析器和处理器。

具体原理如下：
1. 传入HTML字符串：将HTML字符串作为参数传入htmldocx.asblob方法中。

2. 创建文档对象：使用相关的HTML解析器将HTML字符串
解析为一个DOM树结构，然后创建一个Word文档对象。

3. 样式处理：将HTML中的CSS样式转换为Word文档中的
相应样式，例如字体、字号、颜色、对齐方式等。

4. 内容处理：解析HTML中的标签，并根据每个标签的语义
和属性生成相应的Word文档内容，包括文字、图片、表格、
超链接等。

5. 导出为二进制格式：将生成的Word文档对象转换为二进制
格式，即Office Open XML格式，以便于保存为.docx文件或
进行其他处理。

总之，htmldocx.asblob方法通过解析HTML代码，将其转换
为Word文档对象，并最终导出为二进制格式的Word文档，实现了将HTML内容转换为Word文档的功能。

java平台,使用openoffice将word转换为html

java环境下将word转换为html目前没有很简单的方法。

使用openOffice实现应该算是“矬子里面拔大个”。

1，首先下载openOffice。

这是个第三方开源的项目，专门用于在java环境中进行类似word 文档编写（要是连个word编辑都做不出来，那java在外行心目中地位就蹭蹭地下去了）。

我下载的是 3.2版本。

2，下载后安装。

通过cmd进入“安装目录\ 3\program”文件夹下。

运行一下命令soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard意思是启动openoffice的一个服务，以备为其他程序使用（看看咱们的开源领袖多大方，不像微软那么小气，生怕自己的用）。

3，测试一下8100端口是否能使用。

cmd命令“telnet localhost 8100”，如果开启了，就会有黑的不能再黑的屏幕显现，如果没开启，就会出现连接不上的消息。

4，下载jodconverter项目，我下的是2.2.2版本。

（咱就不重复制造轮子了，直接就上车吧！）5，自己创建项目，把jodconverter文件夹lib中的所有jar包都引用一下。

然后写下以下代码public static void main(String args[]) {File inputFile = new File("D:\\test\\广告测试.doc");File outputFile = new File("D:\\test\\广告测试.html");OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);try{connection.connect();}catch(Exception e){e.printStackTrace();}DocumentConverter converter = new OpenOfficeDocumentConverter(connection);converter.convert(inputFile, outputFile);connection.disconnect();}然后就运行一下，应该没什么问题。

PHP调用OpenOffice实现word转PDF的方法

PHP调⽤OpenOffice实现word转PDF的⽅法最近⼀直在研究PHP word⽂档转PDF，也在⽹上搜索了很多类似的资料，⼤多数都是通过OpenOffice进⾏转换的。

核⼼的代码如下：function MakePropertyValue($name,$value,$osm){$oStruct = $osm->Bridge_GetStruct("com.sun.star.beans.PropertyValue");$oStruct->Name = $name;$oStruct->Value = $value;return $oStruct;}function word2pdf($doc_url, $output_url){$osm = new COM("com.sun.star.ServiceManager") or die ("Please be sure that is installed.n");$args = array(MakePropertyValue("Hidden",true,$osm));$oDesktop = $osm->createInstance("com.sun.star.frame.Desktop");$oWriterDoc = $oDesktop->loadComponentFromURL($doc_url,"_blank", 0, $args);$export_args = array(MakePropertyValue("FilterName","writer_pdf_Export",$osm));$oWriterDoc->storeToURL($output_url,$export_args);$oWriterDoc->close(true);}$doc_file=dirname(__FILE__)."/11.doc"; //源⽂件，DOC或者WPS都可以$output_file=dirname(__FILE__)."/11.pdf"; //欲转PDF的⽂件名$doc_file = "file:///" . $doc_file;$output_file = "file:///" . $output_file;$document->word2pdf($doc_file,$output_file);⽤上述发现代码⼀直在报错( ! ) Fatal error: Uncaught exception 'com_exception' with message '<b>Source:</b> [automation bridge] <br/><b>Description:</b> com.sun.star.task.ErrorCodeIOException: ' in I:\phpStudy\WWW\DocPreview\test2.php on line 27( ! ) com_exception: <b>Source:</b> [automation bridge] <br/><b>Description:</b> com.sun.star.task.ErrorCodeIOException: in I:\phpStudy\WWW\DocPreview\test2.php on line 27最后发现原来是转出路径的问题：通过调试得出上述代码的转出路径$output_file 是file:///I:\phpStudy\WWW\DocPreview\sdds.pdf。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

为什么会想起来将上传的word文档转换成html格式呢？设想，如果一个系统需要发布在页面的文章都是来自word文档，一般会执行下面的流程：使用word打开文档，Ctrl+A，进入发布文章页面，Ctrl+V。

看起来也不麻烦，但是，如果文档中包含大量图片呢？尴尬的事是图片都需要重新上传吧？如果可以将已经编写好的word文档上传到服务器就可以在相应页面进行展示，将会是一件非常惬意的事情，最起码信息发布人员会很开心。

程序员可能就不会这么想了，囧。

将Word转Html的原理是这样的：1、客户上传Word文档到服务器2、服务器调用OpenOffice程序打开上传的Word文档3、OpenOffice将Word文档另存为Html格式4、Over至此可见，这要求服务器端安装OpenOffice软件，其实也可以是MS Office，不过OpenOffice 的优势是跨平台，你懂的。

恩，说明一下，本文的测试基于MS Win7 Ultimate X64 系统。

下面就是规规矩矩的实现。

1、下载OpenOffice，/index.html So easy...2、下载Jodconverter /opensource/jodconverter 这是一个开启OpenOffice进行格式转化的第三方jar包。

3、泡杯热茶，等待下载。

4、安装OpenOffice，安装结束后，调用cmd，启动OpenOffice的一项服务：C:\Program Files (x86)\ 3\program>soffice -headless -accept="socket,port=8100;urp;"5、打开eclipse6、喝杯热茶，等待eclipse打开。

7、新建eclipse项目，导入Jodconverter/lib 下得jar包。

8、Coding...查看代码package com.mzule.doc2html.util;import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.IOException;import java.io.InputStreamReader;import .ConnectException;import java.util.Date;import java.util.regex.Matcher;import java.util.regex.Pattern;import com.artofsolving.jodconverter.DocumentConverter;importcom.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;importcom.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;importcom.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter;/*** 将Word文档转换成html字符串的工具类** @author MZULE**/publicclass Doc2Html {publicstaticvoid main(String[] args) {System.out.println(toHtmlString(new File("C:/test/test.doc"),"C:/test"));}/*** 将word文档转换成html文档** @param docFile* 需要转换的word文档* @param filepath* 转换之后html的存放路径* @return转换之后的html文件*/publicstatic File convert(File docFile, String filepath) {// 创建保存html的文件File htmlFile = new File(filepath + "/" + new Date().getTime()+ ".html");// 创建Openoffice连接OpenOfficeConnection con =new SocketOpenOfficeConnection(8100);try{//连接con.connect();}catch(ConnectException e) {System.out.println("获取OpenOffice连接失败...");e.printStackTrace();}//创建转换器DocumentConverter converter =new OpenOfficeDocumentConverter(con);//转换文档问htmlconverter.convert(docFile, htmlFile);//关闭openoffice连接con.disconnect();return htmlFile;}/*** 将word转换成html文件，并且获取html文件代码。

**@param docFile* 需要转换的文档*@param filepath* 文档中图片的保存位置*@return转换成功的html代码*/public static String toHtmlString(File docFile, String filepath) {//转换word文档File htmlFile = convert(docFile, filepath);//获取html文件流StringBufferhtmlSb =new StringBuffer();try{BufferedReaderbr =new BufferedReader(new InputStreamReader(new FileInputStream(htmlFile)));while(br.ready()) {htmlSb.append(br.readLine());}br.close();//删除临时文件htmlFile.delete();}catch(FileNotFoundException e) {e.printStackTrace();}catch(IOException e) {e.printStackTrace();}//HTML文件字符串String htmlStr = htmlSb.toString();//返回经过清洁的html文本return clearFormat(htmlStr, filepath);}/*** 清除一些不需要的html标记**@param htmlStr* 带有复杂html标记的html语句*@return去除了不需要html标记的语句*/protected static String clearFormat(String htmlStr, String docImgPath) {//获取body内容的正则String bodyReg = "<BODY .*</BODY>";Pattern bodyPattern = pile(bodyReg);Matcher bodyMatcher = bodyPattern.matcher(htmlStr);if(bodyMatcher.find()) {//获取BODY内容，并转化BODY标签为DIVhtmlStr = bodyMatcher.group().replaceFirst("<BODY", "<DIV").replaceAll("</BODY>", "</DIV>");}//调整图片地址htmlStr = htmlStr.replaceAll("<IMG SRC=\"", "<IMG SRC=\"" + docImgPath+ "/");//把<P></P>转换成</div></div>保留样式//content = content.replaceAll("(<P)([^>]*>.*?)(<\\/P>)",//"<div$2</div>");//把<P></P>转换成</div></div>并删除样式htmlStr = htmlStr.replaceAll("(<P)([^>]*)(>.*?)(<\\/P>)", "<p$3</p>");//删除不需要的标签htmlStr = htmlStr.replaceAll("<[/]?(font|FONT|span|SPAN|xml|XML|del|DEL|ins|INS|meta|META|[ovwxpOVWXP]:\\w+) [^>]*?>","");//删除不需要的属性htmlStr = htmlStr.replaceAll("<([^>]*)(?:lang|LANG|class|CLASS|style|STYLE|size|SIZE|face|FACE|[ovwxpOVWXP]:\\ w+)=(?:'[^']*'|\"\"[^\"\"]*\"\"|[^>]+)([^>]*)>","<$1$2>");return htmlStr;}}。

基于openoffice实现html、doc互换

合集下载

Java实现在线预览的示例代码（openOffice实现）

PHP实现wordexcelppt转换为PDF

officetohtml实现方式

Word文档转换为HTML帮助文档操作手册范本

java调用openoffice将office系列文档转换为PDF的示例方法

jodconverterlocalproperties 说明

【好文翻译】一步一步教你使用Spire.Doc转换Word文档格式

Java实现word文档在线预览，读取office（word,excel,ppt）文件

openoffice教程

odconv应用实例

Linux下openoffice转换word文档到pdf文档时中文乱码问题

C#实现HTML转WORD及WORD转PDF的方法

批量转换HTML文件转换成WORD文档

JAVA：借用OpenOffice将上传的Word文档转换成Html格式

htmldocx.asblob原理

java平台,使用openoffice将word转换为html

PHP调用OpenOffice实现word转PDF的方法

文档推荐

最新文档