{"id":466,"date":"2023-02-16T09:37:53","date_gmt":"2023-02-16T09:37:53","guid":{"rendered":"https:\/\/tinyytopic.com\/?p=466"},"modified":"2023-02-16T09:35:16","modified_gmt":"2023-02-16T09:35:16","slug":"how-to-webscrap-by-python-function","status":"publish","type":"post","link":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/","title":{"rendered":"How to Webscrap by Python Function?"},"content":{"rendered":"\n<div class=\"wp-block-uagb-advanced-heading uagb-block-4e98021f\"><h5 class=\"uagb-heading-text\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\"><br>How to Webscrap by Python Function?<\/mark><\/h5><\/div>\n\n\n\n<p style=\"font-size:clamp(14px, 0.875rem + ((1vw - 3.2px) * 0.104), 15px);\">Install the following module(s) if you haven&#8217;t installed them already:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install requests<\/code><\/pre>\n\n\n\n<p style=\"font-size:clamp(14px, 0.875rem + ((1vw - 3.2px) * 0.104), 15px);\">Ready-to-use Python function to scrap content of a website:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"atomic\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def webscrap(url, TimeoutSec=5, verify_ssl=True):\n    # scrap any permissible webpages by sending URL\n    try:\n        page = requests.get(url, timeout=TimeoutSec, verify=verify_ssl, headers={\"User-Agent\": \"Mozilla\/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/81.0.4044.141 Safari\/537.36\"})\n    except requests.exceptions.RequestException:\n        return 'Webpage is not reachable!'\n    \n    # Extract html text\n    HtmlTxt = page.text\n    \n    # get success code\n    status = page.status_code\n    \n    return HtmlTxt, status<\/pre>\n\n\n\n<p style=\"font-size:clamp(14px, 0.875rem + ((1vw - 3.2px) * 0.104), 15px);\">Write your main code as a sample below,<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import requests\n\nprint(webscrap(\"https:\/\/www.digikey.com\/en\/products\/detail\/vishay-dale\/CRCW1206100RFKEA\/1176530\"))<\/code><\/pre>\n\n\n\n<p style=\"font-size:clamp(14px, 0.875rem + ((1vw - 3.2px) * 0.104), 15px);\">The output of the code is,<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"html\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">('&lt;!DOCTYPE html>&lt;html lang=\"en-us\" dir=\"ltr\">&lt;head>&lt;meta charSet=\"utf-8\"\/>&lt;meta name=\"theme-color\" content=\"#CC0000\"\/>&lt;meta name=\"generator\" content=\"Digi-Key Search Engine\"\/>&lt;link rel=\"icon\" type=\"image\/x-icon\" href=\"\/favicon.ico\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Global\/fonts\/fonts.css?la=en-US&amp;amp;ts=2943f40b-f61e-49aa-952b-963240340aa8\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/digit\/global.css\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Global\/EnavHeaderMVC\/CSS\/empty.css?la=en-US&amp;amp;ts=2aecf11d-87f3-4f54-a433-8e8d1acdb795\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Header\/ENav2021\/CSS\/combined.css?la=en-US&amp;amp;ts=506e0306-ad96-41ba-8648-a3458e01664f\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Header\/ENav2021\/CSS\/banner.css?la=en-US&amp;amp;ts=498625d8-0dfd-41a2-920b-debb017ca9ff\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Header\/ENav2021\/CSS\/cookie-notice.css?la=en-US&amp;amp;ts=c04caf01-0bbe-4f42-a7a2-a55af095fcd0\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Header\/ENav2021\/CSS\/modal.css?la=en-US&amp;amp;ts=1db0b6d2-b2df-405a-8100-203d26a93771\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Misc\/SuggestionSearchBar\/CSS\/searchsuggest.css?la=en-US&amp;amp;ts=41f98d6f-b221-4c88-a872-a91f60fe5338\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Footer\/Footer%20Redesign\/MVC\/CSS\/cobrowse.css?la=en-US&amp;amp;ts=6087909a-ef4a-4385-9dfd-3c8415fcba01\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Footer\/Footer%20Redesign\/MVC\/CSS\/footer.css?la=en-US&amp;amp;ts=c5d5e528-2e9d-49ab-a2a0-72e2f91b2e86\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Footer\/Footer%20Redesign\/MVC\/CSS\/intl-country-select-popup.css?la=en-US&amp;amp;ts=0fb63111-2531-4d25-98ac-31bca9089fe2\"\/>&lt;link rel=\"stylesheet\" type=\"text\/css\" href=\"\/\/www.digikey.com\/-\/media\/Designer\/Footer\/Footer%20Redesign\/MVC\/CSS\/needHelp.css?la=en-US&amp;amp;ts=95faf60f-8e96-41c1-a4b3-5d908c0628d1\"\/>&lt;script type=\"text\/javascript\">window[\\'__DK_STORE__\\'] = window[\\'__DK_STORE__\\'] || {\\n    PRICING_REQUEST_TIMEOUT:6000,\\n    FEATURE_FLAG_MOSAIC_CART:undefined\\n  };&lt;\/script>&lt;script type=\"text\/javascript\">var sdkInstance=\"appInsightsSDK\";window[sdkInstance]=\"appInsights\";var <\/pre>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-b3170781\"><h5 class=\"uagb-heading-text\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">How does the Python function work?<\/mark><\/h5><\/div>\n\n\n\n<p style=\"font-size:clamp(14px, 0.875rem + ((1vw - 3.2px) * 0.104), 15px);\">This is a Python function that performs web scraping on a specified URL using the Requests library. Here&#8217;s how the function works:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The function <code>webscrap<\/code> takes three arguments: <code>url<\/code> (the URL of the webpage to scrape), <code>TimeoutSec<\/code> (the maximum time the function should wait for a response before timing out, which defaults to 5 seconds), and <code>verify_ssl<\/code> (a boolean value that indicates whether SSL certificates should be verified, which defaults to True).<\/li>\n\n\n\n<li>The function uses a <code>try-except<\/code> block to handle any exceptions that may occur while attempting to scrape the webpage. If the request is unsuccessful, the function returns the message &#8216;Webpage is not reachable!&#8217;.<\/li>\n\n\n\n<li>If the request is successful, the function extracts the HTML content of the webpage using the <code>.text<\/code> method of the <code>Response<\/code> object returned by the <code>get<\/code> method of the <code>requests<\/code> library.<\/li>\n\n\n\n<li>The function also retrieves the HTTP status code of the request using the <code>.status_code<\/code> attribute of the <code>Response<\/code> object.<\/li>\n\n\n\n<li>Finally, the function returns a tuple containing the HTML content of the webpage and the HTTP status code.<\/li>\n<\/ol>\n\n\n\n<p style=\"font-size:clamp(14px, 0.875rem + ((1vw - 3.2px) * 0.104), 15px);\">Note that the function also includes a custom <code>User-Agent<\/code> header in the request, which simulates a web browser to prevent the server from blocking the request due to the default <code>User-Agent<\/code> header used by the <code>requests<\/code> library.<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Install the following module(s) if you haven&#8217;t installed them already: Ready-to-use Python function to scrap content of a website: Write your main code as a sample below, The output of the code is, This is a Python function that performs web scraping on a specified URL using the Requests library. Here&#8217;s how the function works: [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[12,17],"tags":[],"class_list":["post-466","post","type-post","status-publish","format-standard","hentry","category-python","category-useful-function"],"aioseo_notices":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Webscrap by Python Function? - tinyytopic.com<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Webscrap by Python Function? - tinyytopic.com\" \/>\n<meta property=\"og:description\" content=\"Install the following module(s) if you haven&#8217;t installed them already: Ready-to-use Python function to scrap content of a website: Write your main code as a sample below, The output of the code is, This is a Python function that performs web scraping on a specified URL using the Requests library. Here&#8217;s how the function works: [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/\" \/>\n<meta property=\"og:site_name\" content=\"tinyytopic.com\" \/>\n<meta property=\"article:published_time\" content=\"2023-02-16T09:37:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-16T09:35:16+00:00\" \/>\n<meta name=\"author\" content=\"tinyytopic.com\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"tinyytopic.com\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/\",\"url\":\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/\",\"name\":\"How to Webscrap by Python Function? - tinyytopic.com\",\"isPartOf\":{\"@id\":\"https:\/\/tinyytopic.com\/#website\"},\"datePublished\":\"2023-02-16T09:37:53+00:00\",\"dateModified\":\"2023-02-16T09:35:16+00:00\",\"author\":{\"@id\":\"https:\/\/tinyytopic.com\/#\/schema\/person\/56c840cea8539fb221a03c5fa2ef32eb\"},\"breadcrumb\":{\"@id\":\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/tinyytopic.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Webscrap by Python Function?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/tinyytopic.com\/#website\",\"url\":\"https:\/\/tinyytopic.com\/\",\"name\":\"tinyytopic.com\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/tinyytopic.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/tinyytopic.com\/#\/schema\/person\/56c840cea8539fb221a03c5fa2ef32eb\",\"name\":\"tinyytopic.com\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/tinyytopic.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5f153681c8ca1e6d7287d858de51f968bb687221c89cf96d763ead4393881029?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5f153681c8ca1e6d7287d858de51f968bb687221c89cf96d763ead4393881029?s=96&d=mm&r=g\",\"caption\":\"tinyytopic.com\"},\"sameAs\":[\"http:\/\/tinyytopic.com\"],\"url\":\"https:\/\/tinyytopic.com\/index.php\/author\/mmkmuthukumar21gmail-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Webscrap by Python Function? - tinyytopic.com","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/","og_locale":"en_US","og_type":"article","og_title":"How to Webscrap by Python Function? - tinyytopic.com","og_description":"Install the following module(s) if you haven&#8217;t installed them already: Ready-to-use Python function to scrap content of a website: Write your main code as a sample below, The output of the code is, This is a Python function that performs web scraping on a specified URL using the Requests library. Here&#8217;s how the function works: [&hellip;]","og_url":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/","og_site_name":"tinyytopic.com","article_published_time":"2023-02-16T09:37:53+00:00","article_modified_time":"2023-02-16T09:35:16+00:00","author":"tinyytopic.com","twitter_card":"summary_large_image","twitter_misc":{"Written by":"tinyytopic.com","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/","url":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/","name":"How to Webscrap by Python Function? - tinyytopic.com","isPartOf":{"@id":"https:\/\/tinyytopic.com\/#website"},"datePublished":"2023-02-16T09:37:53+00:00","dateModified":"2023-02-16T09:35:16+00:00","author":{"@id":"https:\/\/tinyytopic.com\/#\/schema\/person\/56c840cea8539fb221a03c5fa2ef32eb"},"breadcrumb":{"@id":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/tinyytopic.com\/index.php\/2023\/02\/16\/how-to-webscrap-by-python-function\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/tinyytopic.com\/"},{"@type":"ListItem","position":2,"name":"How to Webscrap by Python Function?"}]},{"@type":"WebSite","@id":"https:\/\/tinyytopic.com\/#website","url":"https:\/\/tinyytopic.com\/","name":"tinyytopic.com","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/tinyytopic.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/tinyytopic.com\/#\/schema\/person\/56c840cea8539fb221a03c5fa2ef32eb","name":"tinyytopic.com","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/tinyytopic.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5f153681c8ca1e6d7287d858de51f968bb687221c89cf96d763ead4393881029?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5f153681c8ca1e6d7287d858de51f968bb687221c89cf96d763ead4393881029?s=96&d=mm&r=g","caption":"tinyytopic.com"},"sameAs":["http:\/\/tinyytopic.com"],"url":"https:\/\/tinyytopic.com\/index.php\/author\/mmkmuthukumar21gmail-com\/"}]}},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"tinyytopic.com","author_link":"https:\/\/tinyytopic.com\/index.php\/author\/mmkmuthukumar21gmail-com\/"},"uagb_comment_info":83,"uagb_excerpt":"Install the following module(s) if you haven&#8217;t installed them already: Ready-to-use Python function to scrap content of a website: Write your main code as a sample below, The output of the code is, This is a Python function that performs web scraping on a specified URL using the Requests library. Here&#8217;s how the function works:&hellip;","_links":{"self":[{"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/posts\/466","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/comments?post=466"}],"version-history":[{"count":3,"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/posts\/466\/revisions"}],"predecessor-version":[{"id":469,"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/posts\/466\/revisions\/469"}],"wp:attachment":[{"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/media?parent=466"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/categories?post=466"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tinyytopic.com\/index.php\/wp-json\/wp\/v2\/tags?post=466"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}