Java Forum

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds

Parsing HTML

  Asked By: Arland    Date: Jul 31    Category: Java    Views: 735

i am writing a program for simulating download speeds.

First i need to be able to get the full size of the website so html
+ images . the HTML size i managed to get ok mut trying to get the
image sizes is alot harder i have been trying to parse the html file
for the tag IMG then take the value of the SRC attribute but for
some reason it wont work i have included the bit of code below. for
some reason if i try to get the HREF vale of an Anchor it works ok.

EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();

// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
try {
// Create a reader on the HTML content.
StringReader r=new StringReader(complete);

// Parse the HTML.
kit.read(r, doc, 0);
// Iterate through the elements of the HTML document.
ElementIterator it = new ElementIterator(doc);
Element elem;
while ( (elem = it.next()) != null )

SimpleAttributeSet s =(SimpleAttributeSet)
if (s != null)
// disiplay attribue value
catch (Exception e)



No Answers Found. Be the First, To Post Answer.

Didn't find what you were looking for? Find more on Parsing HTML Or get search suggestion and latest updates.