python - Unwrap "a" tag from image, without losing content -
i wanted remove 'a' tag (link) images found. hence performance made list of images in html , wrapping tag , remove link.
i using beautifulsoup , not sure doing wrong, instead of removing tag removing inside content.
this did
from bs4 import beautifulsoup html = '''<div> <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>" ''' soup = beautifulsoup(html) img in soup.find_all('img'): print 'this begining /////////////// ' #print img.find_parent('a').unwrap() print img.parent.unwrap()
this gives me following output
> >> print img.parent() <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a> > >> print img.parent.unwrap() <a href="http://somelink"></a> <a href="http://somelink2"></a>
i have tried replacewith
, replacewithchildren
not working when use object.parent
or findparent
i not sure doing wrong. been few weeks since started python.
the unwrap()
function returns tag has been removed. tree has been modified. quoting unwrap()
documentation:
like
replace_with()
,unwrap()
returns tag replaced.
in other words: works correctly! print new parent of img
instead of return value of unwrap()
see <a>
tags have indeed been removed:
>>> bs4 import beautifulsoup >>> html = '''<div> <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>" ''' >>> soup = beautifulsoup(html) >>> img in soup.find_all('img'): ... img.parent.unwrap() ... print img.parent ... <a href="http://somelink"></a> <div> <img src="http://imgsrc.jpg"/> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>"/></a></div> <a href="http://somelink2"></a> <div> <img src="http://imgsrc.jpg"/> <img src="http://imgsrc2.jpg /></a>"/></div>
here python echoes img.parent.unwrap()
return value, followed output of print
statement showing parent of <img>
tag <div>
tag. first print shows other <img>
tag still wrapped, second print shows them both direct children of <div>
tag.
Comments
Post a Comment