Is it a scanned in PDF?
This is a very good reading if you are dealing with tags / accessibility:
I suspect, given your description of “hidden tags” that this might be
what you are working with. These may be spots or smudges the OCR
picked up and tried to convert and embed as images. You might try the
method of dumping the text into a word processor as describe above.
This will eliminate the detritus.