Java Html Table to Plain textAsk Question


We save incomming emails in database. We then save one version with all the html-tags removed. The problem with this is that if the mail includes a table like this:

Heading1 Heading2

column1 column2

it looks like this after removing tags





Is there a simple way to get a html table and turn it to plain text but with the formating still intact. At least with linebreaks in the right places

So the table turns into something like: Heading1 Heading2 \r\n column1 column2 \r\n. Or something similar.

Any ideas?


A simple way? Not really. HTML tables are complex, and can have row spans and column spans, not to mention normal HTML attributes like bidirectional text. CSS attributes like display: table-cell; can also cause otherwise ordinary HTML to suddenly become a table.

However, if you don't really care too much about formatting and just want to output multiple columns onto the same line, you could parse the HTML using something like JTidy or Jericho, then output multiple <td> or <th> tags by putting spaces between them, and when you get the end of a <tr> element, you could output "\r\n".

If you really don't want to parse the HTML, you could just replace <td> and <th> tags themselves with a single space or tab, and <tr> with a linebreak. This may get you at least some reasonable results.

标签: html html-table
© 2014 TuiCode, Inc.