ObjectWeb Consortium
Search ObjectWeb Mail Archive: 

Advanced Search - Powered by Google


Mail Archive Home | xmlc List | May 2007 Index

<--  Date Index  --> <--  Thread Index  -->

RE: [xmlc] xmlc2.3 include-ignorable-whitespace feature


Quoting ʯöÎ <shixin129@xxxxxxx>:

> I have some html pages like this :
> ...   <ul id="DemoList">
>      <li></li>
>      <li></li>
>    </ul>
> ...and use "xmlcObject.getElementDemoList().getChildNodes()" to get all <li>
> elements,
> but in xmlc2.3 the "getChildNodes()" return a NodeList contains
> org.w3c.dom.Text Object.
> Now I use "xmlcObject.getElementDemoList().getElementsByTagName("li")"
> instead.
>

Your solution is more reliable than getChildNodes().  However, I want to 
explore
this a bit more.  Read on...

Are you using the HTML DOM or the XHTML DOM?  The 
"include-ignorable-whitespace"
feature applies *only* to the latter.  Since HTML isn't validated, there's no
way for the parser to know what whitespace is ignorable.  As such, the parser
makes no attempt to remove whitespace, because without the DTD telling it what
to remove, any attempt may remove important whitespace.

I think there might be some confusion here.  You originally expressed concern
that "include-ignorable-whitespace" was "false" and wanted to be able to
configure it, presumably, to "true".  However, based on your example above, 
you
are concerned that you are getting extra whitespace nodes in places where they
it's arguable that they ought not be.  This is exactly what setting
"include-ignorable-whitespace" to "false" is for.  It removes ignorable
whitespace.  I think this is pretty much what one would want (and what you 
seem
to be trying to achieve) and don't see any benefit of making the feature
configurable.

My guess is that you are using the HTML DOM, not the XHTML DOM.  If you 
upgraded
from XMLC-2.2.xx and, all of a sudden, began seeing extra Text nodes where 
they
didn't get created before, such between children of <ul>, <ol>, etc..., this 
is
because Xerces-1.4.4 (which is what XMLC-2.2.xx uses) strips whitespace from
HTML where Xerces2 (or NekoHTML) does not.  IMO, Xerces2/NekoHTML is doing the
right thing and Xerces-1.4.4 is doing the wrong thing.  Without a DTD to
validate against, Xerces-1.4.4 has no business in removing whitespace.  For
instance, it might remove whitespace inside <pre> tags without a DTD to tell 
it
not to.  The best way to avoid this problem is to use the XHTML DOM, which 
uses
the validating XML parser instead of the non-validating HTML parser.

That said, it is possible to mimick the include-ignorable-whitespace="false" 
in
the HTML parser if we are very careful about following the rules of the XHTML
1.0 Transitional DTD.  If you would like to take a crack at it, take a look at
XercesHTMLDOMParser.java [1].  I even have a limited attempt that I commented
out.  Look at the commented out characters() method.  That method might
actually be correctly implemented as-is, but I wasn't 100% sure that it would
be correct, so I left it commented out.  It could be uncommented in a future
release, but we'd have to be sure it isn't removing whitespace where it
shouldn't.


[1]
http://cvs.forge.objectweb.org/cgi-bin/viewcvs.cgi/xmlc/xmlc/xmlc/modules/xmlc/src/org/enhydra/xml/xmlc/parsers/xerces/XercesHTMLDOMParser.java

>
> XMLC is compatible with OSGi , XMLCObject can be easily uesed in OSGi HTTP
> Service or in Eclipse RCP , but jsp can't.
> I use XMLC with OSGi for a long time, it work very well.
>

I'm interested in this.  Do you have an external references that can show me 
and
others how to integrate XMLC with OSGI.  You're not obligated to, but if it
isn't too much trouble, it would be much appreciated.


> Sorry for my weak english.
>

Hey, no problem.  You don't see me being able to speak Chinese, do you?  
You're
one big step ahead of me!

Jake

>
> Curry
>
>
>
> > Date: Thu, 24 May 2007 01:37:13 -0500> To: xmlc@xxxxxxxxxxxxx> From:
> hoju@xxxxxxxx> Subject: Re: [xmlc] xmlc2.3 include-ignorable-whitespace
> feature> > > Well, right now it isn't configurable, though it > could be
> added as an option in the metadata in > the future. Can you explain why you
> need > ignorable whitespace to be included? The DTD > defines whitespace as
> ignorable or not. Why > would it lie? Can I assume you are using > XHTML?
> Please describe what is getting broken so > I can better understand the
> problem. And are you > using XMLC's DOMFormatter to output your markup or
> some other mechanism?> > BTW, I'm curious, how are you using OSGI with 
> XMLC?>
> > > Jake> > At 07:53 PM 5/23/2007, you wrote:> > >Hi,> >> > I upgrade my
> application (xmlc + osgi) to > > xmlc 2.3 , then I found that > >
> "include-ignorable-whitespace" feature default > > is false. I can't find 
> how
> to configure.> >> > Who have a good idea?> >> >> > Curry> >> >> >> >>
> >----------> >ͨ¹ý Live.com > >²é¿´×ÊѶ¡¢ÓéÀÖÐÅÏ¢ºÍÄú¹ØÐĵÄÆäËûÐÅÏ¢£¡ >
> ><http://www.live.com/getstarted.aspx>Á¢¼´³¢ÊÔ£¡> >--> >You receive this
> message as a subscriber of the > >xmlc@xxxxxxxxxxxxx mailing list.> >To
> unsubscribe: mailto:xmlc-unsubscribe@xxxxxxxxxxxxx> >For general help:
> mailto:sympa@xxxxxxxxxxxxx?subject=help> >ObjectWeb mailing lists service
> home page: http://www.objectweb.org/wws> >
> _________________________________________________________________
> ʹÓÃÏÂÒ»´úµÄ MSN Messenger¡£
>
http://imagine-msn.com/messenger/launch80/default.aspx?locale=zh-cn&source=wlmailtagline






<--  Date Index  --> <--  Thread Index  -->

Reply via email to:

Powered by MHonArc.

Copyright © 1999-2005, ObjectWeb Consortium | contact | webmaster.