python - Unwrap "a" tag from image, without losing content -


i wanted remove 'a' tag (link) images found. hence performance made list of images in html , wrapping tag , remove link.

i using beautifulsoup , not sure doing wrong, instead of removing tag removing inside content.

this did

from bs4 import beautifulsoup  html = '''<div> <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>"  ''' soup = beautifulsoup(html) img in soup.find_all('img'):     print 'this begining /////////////// '     #print img.find_parent('a').unwrap()     print img.parent.unwrap() 

this gives me following output

> >> print img.parent()  <a href="http://somelink"><img src="http://imgsrc.jpg" /></a>  <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>  > >> print img.parent.unwrap()  <a href="http://somelink"></a>  <a href="http://somelink2"></a> 

i have tried replacewith , replacewithchildren not working when use object.parent or findparent

i not sure doing wrong. been few weeks since started python.

the unwrap() function returns tag has been removed. tree has been modified. quoting unwrap() documentation:

like replace_with(), unwrap() returns tag replaced.

in other words: works correctly! print new parent of img instead of return value of unwrap() see <a> tags have indeed been removed:

>>> bs4 import beautifulsoup >>> html = '''<div> <a href="http://somelink"><img src="http://imgsrc.jpg" /></a> <a href="http://somelink2"><img src="http://imgsrc2.jpg /></a>"  ''' >>> soup = beautifulsoup(html) >>> img in soup.find_all('img'): ...     img.parent.unwrap() ...     print img.parent ...  <a href="http://somelink"></a> <div> <img src="http://imgsrc.jpg"/> <a href="http://somelink2"><img src="http://imgsrc2.jpg /&gt;&lt;/a&gt;"/></a></div> <a href="http://somelink2"></a> <div> <img src="http://imgsrc.jpg"/> <img src="http://imgsrc2.jpg /&gt;&lt;/a&gt;"/></div> 

here python echoes img.parent.unwrap() return value, followed output of print statement showing parent of <img> tag <div> tag. first print shows other <img> tag still wrapped, second print shows them both direct children of <div> tag.


Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

Function that returns a formatted array in VBA -