HTML and XML are preferred over JSON and other types. Note that while the default Accept HTTP header will allow any content type to be received, Use the Additional MIME types ( additionalMimeTypes) input option. If you want the crawler to process other content types, Content typesīy default, Cheerio Scraper only processes web pages with the text/html, application/json, application/xml, application/xhtml+xml MIME content types (as reported by the Content-Type HTTP header),Īnd skips pages with other content types. If you'd like to learn more about the inner workings of the scraper, see the respective documentation. Under the hood, Cheerio Scraper is built using the CheerioCrawler classįrom Crawlee.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |