poi word转pdf-用poi读取word总是抱异常,我真无语了,我的QQ是45071...

2022-11-24 11:52:49

用poi读取word总是抱异常,我真无语了,我的qq是45071...

/**
* 读取office文件
* @param office
* @return
* @throws exception
*/
public string readoffice(string officepath) {
string text = null;
fileinputstream in = null;
try {
in = new fileinputstream(officepath);
poitextextractor extractor = extractorfactory.createextractor(in);
text = extractor.gettext();
} catch (exception e) {
e.printstacktrace();
}finally{
try {
in.close();
} catch (ioexception e) {
e.printstacktrace();
}
}
return text;
}

不知道你需要实现什么功能，这个是我通过poi读取office文件的方法，你试试，看是否会报错，通过office文件路径读取文件中字符串，如果遇到是图片的肯定不能读取的。

有什么办法用程序把pdf转换成word?

pdf文档转换成word文档及其它
一、怎样从多页的pdf文档里抽取若干页成为jpg格式图片最方便的要数使用adobe acrobat，点击adobe acrobat里的“导出”工具按钮，并选择“jpg”，pdf文档的所有页面就自动转换成一个个jpg格式文件。如果你有photoshop软件，可以这样操作，在photoshop里打开pdf文档，photoshop会让你选择打开那个页面，选定页面后打开，再另存为jpg格式图片文件。这个方法的缺点是一次只能抽取一页，需要重复操作，效率比较低。二、怎样将pdf文档转换成word文档在adobe acrobat里将要转换的pdf文档打开，操作菜单“文件”→“另存为”，将保存类型选为“microsoft word”就能转换成word文档。或者点击adobe acrobat里的“导出”工具按钮，导出文件类型选择“word”，结果是一样的。一款名字为e-pdf to word converter的软件专门用来将pdf文档转换成word文档，网上有汉化特别版可供下载。 http://www.pdftoword.com/ 网站提供在线转换服务，只需要将待转换的pdf文档上传到这个网站，网站就会将载转换成的word文档发到你的电子信箱里。特别要说明的是，如果是通过扫描纸质文件生成的pdf文档，转换成word文档以后得到的每一页都是插在word文档里的图片，无法进行编辑。即便是由word等其它可以编辑的电子文档生成的pdf文档，转换成word文档以后，文字也是分布在一个个图文框里，如果版面稍复杂一些，文字还有可能重叠在一起，重新编辑的工作量还很大。三、怎样从pdf文档里导出文字假如pdf文档是由word等其它可以编辑的电子文档生成的，用adobe reader或adobe acrobat的“选择工具”选中文字后按“ctrl”+“c”键，就可以将文字复制到剪贴板里，随便你粘贴到那里。adobe reader在安装后的默认状态下，“选择工具”是看不见的，需要操作菜单“工具”→“自定义工具栏”，在“选择工具”前打上钩，将它显现出来。若要将整篇pdf文档的文字都导出来，在adobe reader里只要操作菜单“文件”→“另存为文本”，导出的文字就存到了一个文本文件里了。在adobe acrobat里则是操作菜单“文件”→“另存为”，保存类型选择“纯文本”，或者点击“导出”工具按钮，导出文件类型选择“更多格式”→“纯文本”。四、使用ocr文本识别技术导出文字如果pdf文档是通过扫描纸质文件生成的，就不能够用上述简单的方法导出文字了，但也可以借助ocr文本识别技术来导出文字。在adobe acrobat里可以操作菜单“文档”→“ocr文本识别”→“使用ocr识别文本”，经过识别以后，就可以用“选择工具”选中文字进行复制了。 adobe reader没有自带ocr文本识别功能，但可以这样操作，操作菜单“文件”→“打印”，打印机名称选择“microsoft office document image writer”，这是一个随microsoft office 2003一起安装到计算机里的虚拟打印机，它将pdf文档打印到后缀为“mdi”的文件里，并自动打开，在打开的“mdi”文件里依次操作菜单“工具”→“使用ocr识别文本”和“工具”→“将文本发送到word”就可以将文字导出到一个word文档里。 ocr文本识别技术的识别率取决于创建pdf文档时的扫描精度，那些字迹模糊不清的文档，能够正确识别出的文字也不会太多。五、怎样复制出pdf文档里的插图用adobe reader或adobe acrobat打开pdf以后用“选择工具”选中插图后按“ctrl”+“c”键，就可以将插图复制到剪贴板里。在adobe acrobat里操作菜单“高级”→“文档处理”→“导出所有图像”，就能够一次将文档里的所有图片全部导出成一个个图像文件。六、怎样复制pdf文挡里的表格用adobe acrobat打开pdf以后用“选择工具”选中pdf文挡里的表格，右击鼠标，在弹出的菜单里点选“复制为表”，在excel里操作菜单“选择性粘贴”，在对话框里选择“csv”后点击“确定”，pdf文挡里的表格就复制到excel里了。也可以选中pdf文挡里的表格，右击鼠标，在弹出的菜单里点选“在excel里打开”，表格就自动变成了excel表。如果需要将表格粘贴到word文挡，应当采用“选择性粘贴”以“无格式文本”进行粘贴，粘贴之后选中这些数据，再操作菜单“表格”→“转换”→“文本转换表格”。如果pdf文档是通过扫描纸质文件生成的，就不能复制成表格，只可以将表格复制成图片。以上将pdf文档转换成word文档以及从pdf文档里导出文字或图像的方法，取决于该文档没有加密对文档进行限制，如果文档已经加密进行了限制，只有在解密了以后才有可能进行。
七、怎样去除word等软件里的adobe acrobat菜单栏安装了adobe acrobat以后，在word、excel、autocad软件的菜单里会多出几个adobe acrobat的菜单栏，其实这是adobe acrobat和这些应用软件的无缝结合，利用这些菜单可以方便地将这些应用软件创建的文档转换成pdf格式的文档，并能对转换的有关参数进行控制和调整。但是由于这些菜单占据了位置，不少使用者希望将它们去除。去除的方法很简单，在“控制面板”里进入“添加和删除程序”，找到 adobe acrobat后点击“更改”，在“程序维护”对话框里选“修改”，到下一步的“自定义安装”里将office、autocad去掉即可。安装adobe acrobat的时候如果选择“自定义安装”，就可以在安装过程里提前将office、autocad等剔除，就不会在word、excel、autocad等软件里添加adobe acrobat菜单栏了。

java poi 如何操作word 格式

1、环境支持

1.1 添加poi支持：包下载地址 http://www.apache.org/dyn/closer.cgi/poi/release/

1.2 poi对excel文件的读取操作比较方便，poi还提供对word的doc格式文件的读取。但在它的发行版本中没有发布对word支持的模块，需要另外下载一个poi的扩展的jar包。下载地址为 http://www.ibiblio.org/maven2/org/textmining/tm-extractors/0.4/ 下载extractors-0.4_zip这个文件

package com.ray.poi.util;

import java.io.bytearrayinputstream;
import java.io.file;
import java.io.fileinputstream;
import java.io.fileoutputstream;
import java.io.ioexception;

import org.apache.poi.poifs.filesystem.directoryentry;
import org.apache.poi.poifs.filesystem.documententry;
import org.apache.poi.poifs.filesystem.poifsfilesystem;
import org.textmining.text.extraction.wordextractor;

/**
* 读写doc
* @author wangzonghao
*
*/
public class poiwordutil {
/**
* 读入doc
* @param doc
* @return
* @throws exception
*/
public static string readdoc(string doc) throws exception {
// 创建输入流读取doc文件
fileinputstream in = new fileinputstream(new file(doc));
wordextractor extractor = null;
string text = null;
// 创建wordextractor
extractor = new wordextractor();
// 对doc文件进行提取
text = extractor.extracttext(in);
return text;
}
/**
* 写出doc
* @param path
* @param content
* @return
*/
public static boolean writedoc(string path, string content) {
boolean w = false;
try {

// byte b[] = content.getbytes("iso-8859-1");
byte b[] = content.getbytes();

bytearrayinputstream bais = new bytearrayinputstream(b);

poifsfilesystem fs = new poifsfilesystem();
directoryentry directory = fs.getroot();

documententry de = directory.createdocument("worddocument", bais);

fileoutputstream ostream = new fileoutputstream(path);

fs.writefilesystem(ostream);

bais.close();
ostream.close();

} catch (ioexception e) {
e.printstacktrace();
}
return w;
}

}
测试

package com.ray.poi.util;

import junit.framework.testcase;

public class poiutiltest extends testcase {

public void testreaddoc() {
try{
string text = poiwordutil.readdoc("e:/work_space/poi/com/ray/poi/util/demo.doc");
system.out.println(text);
}catch(exception e){
e.printstacktrace();
}

}

public void testwritedoc() {
string wr;
try {
wr = poiwordutil.readdoc("e:/work_space/poi/com/ray/poi/util/demo.doc");

boolean b = poiwordutil.writedoc("c:\\demo.doc",wr);
} catch (exception e) {
// todo auto-generated catch block
e.printstacktrace();
}

}

}

急求poi 将数据导出到word的实例

import java.io.*;
import java.util.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.util.littleendian;

public class wordtest {
public wordtest() {
}
public static boolean writewordfile(string path, string content) {
boolean w = false;
try {

// byte b[] = content.getbytes("iso-8859-1");
byte b[] = content.getbytes();

bytearrayinputstream bais = new bytearrayinputstream(b);

poifsfilesystem fs = new poifsfilesystem();
directoryentry directory = fs.getroot();

documententry de = directory.createdocument("worddocument", bais);

fileoutputstream ostream = new fileoutputstream(path);

fs.writefilesystem(ostream);

bais.close();
ostream.close();

} catch (ioexception e) {
e.printstacktrace();
}
return w;
}
public static void main(string[] args){
boolean b = writewordfile("e://test.doc","hello");
}
}
/*
public string extracttext(inputstream in) throws ioexception {
arraylist text = new arraylist();
poifsfilesystem fsys = new poifsfilesystem(in);

documententry headerprops = (documententry) fsys.getroot().getentry("worddocument");
documentinputstream din = fsys.createdocumentinputstream("worddocument");
byte[] header = new byte[headerprops.getsize()];

din.read(header);
din.close();
// prende le informazioni dall'header del documento
int info = littleendian.getshort(header, 0xa);

boolean usetable1 = (info & 0x200) != 0;

//boolean usetable1 = true;

// prende informazioni dalla piece table
int complexoffset = littleendian.getint(header, 0x1a2);
//int complexoffset = littleendian.getint(header);

string tablename = null;
if (usetable1) {
tablename = "1table";
} else {
tablename = "0table";
}

documententry table = (documententry) fsys.getroot().getentry(tablename);
byte[] tablestream = new byte[table.getsize()];

din = fsys.createdocumentinputstream(tablename);

din.read(tablestream);
din.close();

din = null;
fsys = null;
table = null;
headerprops = null;

int multiple = findtext(tablestream, complexoffset, text);

stringbuffer sb = new stringbuffer();
int size = text.size();
tablestream = null;

for (int x = 0; x < size; x++) {

wordtextpiece nextpiece = (wordtextpiece) text.get(x);
int start = nextpiece.getstart();
int length = nextpiece.getlength();

boolean unicode = nextpiece.usesunicode();
string tostr = null;
if (unicode) {
tostr = new string(header, start, length * multiple, "utf-16le");
} else {
tostr = new string(header, start, length, "iso-8859-1");
}
sb.append(tostr).append(" ");

}
return sb.tostring();
}

private static int findtext(byte[] tablestream, int complexoffset, arraylist text)
throws ioexception {
//actual text
int pos = complexoffset;
int multiple = 2;
//skips through the prms before we reach the piece table. these contain data
//for actual fast saved files
while (tablestream[pos] == 1) {
pos++;
int skip = littleendian.getshort(tablestream, pos);
pos += 2 + skip;
}
if (tablestream[pos] != 2) {
throw new ioexception("corrupted word file");
} else {
//parse out the text pieces
int piecetablesize = littleendian.getint(tablestream, ++pos);
pos += 4;
int pieces = (piecetablesize - 4) / 12;
for (int x = 0; x < pieces; x++) {
int filepos =
littleendian.getint(tablestream, pos + ((pieces + 1) * 4) + (x *<img src="/images/forum/smiles/icon_cool.gif"/> + 2);
boolean unicode = false;
if ((filepos & 0x40000000) == 0) {
unicode = true;
} else {
unicode = false;
multiple = 1;
filepos &= ~(0x40000000); //gives me fc in doc stream
filepos /= 2;
}
int totlength =
littleendian.getint(tablestream, pos + (x + 1) * 4)
- littleendian.getint(tablestream, pos + (x * 4));

wordtextpiece piece = new wordtextpiece(filepos, totlength, unicode);
text.add(piece);

}

}
return multiple;
}
public static void main(string[] args){
wordtest w = new wordtest();
poifsfilesystem ps = new poifsfilesystem();
try{

file file = new file("c:\\test.doc");

inputstream in = new fileinputstream(file);
string s = w.extracttext(in);
system.out.println(s);

}catch(exception e){
e.printstacktrace();
}

}
public boolean writewordfile(string path, string content) {
boolean w = false;
try {

// byte b[] = content.getbytes("iso-8859-1");
byte b[] = content.getbytes();

bytearrayinputstream bais = new bytearrayinputstream(b);

poifsfilesystem fs = new poifsfilesystem();
directoryentry directory = fs.getroot();

documententry de = directory.createdocument("worddocument", bais);

fileoutputstream ostream = new fileoutputstream(path);

fs.writefilesystem(ostream);

bais.close();
ostream.close();

} catch (ioexception e) {
e.printstacktrace();
}

return w;
}

}

class wordtextpiece {
private int _fcstart;
private boolean _usesunicode;
private int _length;

public wordtextpiece(int start, int length, boolean unicode) {
_usesunicode = unicode;
_length = length;
_fcstart = start;
}
public boolean usesunicode() {
return _usesunicode;
}

public int getstart() {
return _fcstart;
}
public int getlength() {
return _length;
}

}
*/

利用poi技术从数据库里提取数据,生成一个excel文档...

http://java.ccidnet.com/art/3737/20030321/479599_1.html

代码：

hssfworkbook wb = new hssfworkbook();hssfsheet sheet = wb.createsheet();bytearrayoutputstream bytearrayout = new bytearrayoutputstream();bufferedimage bufferimg = imageio.read(new file("d:\\fruit.png"));imageio.write(bufferimg,"png",bytearrayout);hssfclientanchor anchor = new hssfclientanchor(5,0,500,122,(short) 0, 5,(short)10,15); hssfpatriarch patri = sheet.createdrawingpatriarch();patri.createpicture(anchor , wb.addpicture(bytearrayout.tobytearray(), hssfworkbook.picture_type_png)); bytearrayoutputstream outstream = new bytearrayoutputstream();wb.write(outstream);

上面代码只是大概，但需要用到的类都已经列出。接下来需要做的就是把 outstream输出到excel文件中去了。

具体的类的document可以去下面网站上查看:
http://jakarta.apache.org/poi/apidocs/org/apache/poi/hssf/usermodel/hssfworkbook.html

补充......
http://deepfuture.javaeye.com/blog/615081

java poi-读写word、excel
package zl.file;

import java.io.bytearrayinputstream;
import java.io.fileinputstream;
import java.io.fileoutputstream;

import org.apache.poi.hssf.usermodel.hssfcell;
import org.apache.poi.hssf.usermodel.hssfrichtextstring;
import org.apache.poi.hssf.usermodel.hssfrow;
import org.apache.poi.hssf.usermodel.hssfsheet;
import org.apache.poi.hssf.usermodel.hssfworkbook;
import org.apache.poi.hwpf.extractor.wordextractor;
import org.apache.poi.poifs.filesystem.directoryentry;
import org.apache.poi.poifs.filesystem.documententry;
import org.apache.poi.poifs.filesystem.poifsfilesystem;

// code run against the jakarta-poi-1.5.0-final-20020506.jar.
public class myexcel {

static public void main(string[] args) throws exception {
//－－－－－－－－－－－－在xls中写入数据
fileoutputstream fos = new fileoutputstream("e:\\text.xls");
hssfworkbook wb = new hssfworkbook();
hssfsheet s = wb.createsheet();
wb.setsheetname(0, "first sheet");
hssfrow row = s.createrow(0);
hssfcell cell = row.createcell((short)0,0);
hssfrichtextstring hts=new hssfrichtextstring("nihao");
cell.setcellvalue(hts);
wb.write(fos);
fos.flush();
fos.close();
//－－－－－－－－－－－－从xls读出数据
wb = new hssfworkbook(new fileinputstream("e:\\text.xls"));
s = wb.getsheetat(0);
hssfrow r = s.getrow(0);
cell=r.getcell((short)0);
if(cell.getcelltype() == hssfcell.cell_type_string){
system.out.println(cell.getrichstringcellvalue());
}
//－－－－－－－－－－－－从doc读出数据
fileinputstream in = new fileinputstream("e:\\text.doc");
wordextractor extractor = new wordextractor(in);
string text =extractor.gettext();
// 对doc文件进行提取
system.out.println(text);
in.close();
//------------------在doc中写入

byte[] a=new string("看见了！").getbytes();
bytearrayinputstream bs = new bytearrayinputstream(a);
poifsfilesystem fs = new poifsfilesystem();
///////////////////////////////////
directoryentry directory = fs.getroot();
documententry de = directory.createdocument("worddocument", bs);
//以上两句代码不能省略，否则输出的是乱码
fos = new fileoutputstream("e:\\text.doc");
fs.writefilesystem(fos);
bs.close();
fos.flush();
fos.close();
}
}

上一页：PDF转Word使用哪款浏览器最方便？

下一页：pdf文件改成word-如何把PDF格式的文件转变为Word的文件?