A library for parsing (possibly malformed) HTML/XML documents
http://hackage.haskell.org/package/tagsoup
ghc-tagsoup.spec | ||
README.md |
ghc-tagsoup
TagSoup is a library for parsing and extracting information from (possibly malformed) HTML/XML documents. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.