python - Unwrap "a" tag from image, without losing content -
i wanted remove 'a' tag (link) images found. hence performance made list of images in html , wrapping tag , remove link.
i using beautifulsoup , not sure doing wrong, instead of removing tag removing inside content.
this did
from bs4 import beautifulsoup html = '''<div> <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>" ''' soup = beautifulsoup(html) img in soup.find_all('img'): print 'this begining /////////////// ' #print img.find_parent('a').unwrap() print img.parent.unwrap() this gives me following output
> >> print img.parent() <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a> > >> print img.parent.unwrap() <a href="http://somelink"></a> <a href="http://somelink2"></a> i have tried replacewith , replacewithchildren not working when use object.parent or findparent
i not sure doing wrong. been few weeks since started python.
the unwrap() function returns tag has been removed. tree has been modified. quoting unwrap() documentation:
like
replace_with(),unwrap()returns tag replaced.
in other words: works correctly! print new parent of img instead of return value of unwrap() see <a> tags have indeed been removed:
>>> bs4 import beautifulsoup >>> html = '''<div> <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>" ''' >>> soup = beautifulsoup(html) >>> img in soup.find_all('img'): ... img.parent.unwrap() ... print img.parent ... <a href="http://somelink"></a> <div> <img src="http://imgsrc.jpg"/> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>"/></a></div> <a href="http://somelink2"></a> <div> <img src="http://imgsrc.jpg"/> <img src="http://imgsrc2.jpg /></a>"/></div> here python echoes img.parent.unwrap() return value, followed output of print statement showing parent of <img> tag <div> tag. first print shows other <img> tag still wrapped, second print shows them both direct children of <div> tag.
Comments
Post a Comment