Logo 
Search:

Java Forum

Ask Question   UnAnswered
Home » Forum » Java       RSS Feeds

Parsing HTML

  Asked By: Arland    Date: Jul 31    Category: Java    Views: 552
  

i am writing a program for simulating download speeds.

First i need to be able to get the full size of the website so html
+ images . the HTML size i managed to get ok mut trying to get the
image sizes is alot harder i have been trying to parse the html file
for the tag IMG then take the value of the SRC attribute but for
some reason it wont work i have included the bit of code below. for
some reason if i try to get the HREF vale of an Anchor it works ok.
================================================================

EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();

// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
try {
// Create a reader on the HTML content.
StringReader r=new StringReader(complete);


// Parse the HTML.
kit.read(r, doc, 0);
// Iterate through the elements of the HTML document.
ElementIterator it = new ElementIterator(doc);
Element elem;
while ( (elem = it.next()) != null )
{

SimpleAttributeSet s =(SimpleAttributeSet)
elem.getAttributes().getAttribute(HTML.Tag.A);
if (s != null)
{
// disiplay attribue value
warnUser((String)s.getAttribute
(HTML.Attribute.HREF));
}
}
}
catch (Exception e)
{
e.printStackTrace();
}

Share: 

 

No Answers Found. Be the First, To Post Answer.

 
Didn't find what you were looking for? Find more on Parsing HTML Or get search suggestion and latest updates.




Tagged: