{"id":183,"date":"2022-06-28T13:25:25","date_gmt":"2022-06-28T05:25:25","guid":{"rendered":"http:\/\/hiycz.cn\/?p=183"},"modified":"2023-06-29T19:24:43","modified_gmt":"2023-06-29T11:24:43","slug":"beautifulsoup","status":"publish","type":"post","link":"http:\/\/hiycz.cn\/index.php\/2022\/06\/28\/beautifulsoup\/","title":{"rendered":"beautifulsoup"},"content":{"rendered":"\n<p>\u4f7f\u7528\u5b98\u65b9\u6587\u6863\u6765\u6f14\u793a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>html = \"\"\"\r\n&lt;html>&lt;head>&lt;title>The Dormouse's story&lt;\/title>&lt;\/head>\r\n&lt;body>\r\n&lt;p class=\"title\" name=\"dromouse\">&lt;b>The Dormouse's story&lt;\/b>&lt;\/p>\r\n&lt;p class=\"story\">Once upon a time there were three little sisters; and their names were\r\n&lt;a href=\"http:\/\/example.com\/elsie\" class=\"sister\" id=\"link1\">&lt;!-- Elsie -->&lt;\/a>,\r\n&lt;a href=\"http:\/\/example.com\/lacie\" class=\"sister\" id=\"link2\">Lacie&lt;\/a> and\r\n&lt;a href=\"http:\/\/example.com\/tillie\" class=\"sister\" id=\"link3\">Tillie&lt;\/a>;\r\nand they lived at the bottom of a well.&lt;\/p>\r\n&lt;p class=\"story\">...&lt;\/p>\r\n\"\"\"<\/code><\/pre>\n\n\n\n<div class=\"wp-block-qubely-heading qubely-block-ed706d\"><div class=\"qubely-block-heading  \"><div class=\"qubely-heading-container\"><h2 class=\"qubely-heading-selector\">1\u3001\u5bfc\u5165\u5904\u7406<\/h2><\/div><\/div><\/div>\n\n\n\n<pre class=\"wp-block-code\"><code>from bs4 import BeautifulSoup\nimport lxml\nsoup = BeautifulSoup(html,'lxml')  #\u521b\u5efa beautifulsoup \u5bf9\u8c61\n#\u4e5f\u53ef\u4ee5\u672c\u5730\u521b\u5efa\n#soup1 = BeautifulSoup(open('index.html'))  #\u7528\u672c\u5730 HTML \u6587\u4ef6\u6765\u521b\u5efa\u5bf9\u8c61\nprint soup.prettify()  #\u6253\u5370 soup \u5bf9\u8c61\u7684\u5185\u5bb9\uff0c\u683c\u5f0f\u5316\u8f93\u51fa<\/code><\/pre>\n\n\n\n<div class=\"wp-block-qubely-heading qubely-block-4df13f\"><div class=\"qubely-block-heading  \"><div class=\"qubely-heading-container\"><h2 class=\"qubely-heading-selector\">2\u3001beautifulsoup\u7684\u56db\u79cd\u5bf9\u8c61<\/h2><\/div><\/div><\/div>\n\n\n\n<p>Beautiful Soup\u5c06\u590d\u6742HTML\u6587\u6863\u8f6c\u6362\u6210\u4e00\u4e2a\u590d\u6742\u7684\u6811\u5f62\u7ed3\u6784,\u6bcf\u4e2a\u8282\u70b9\u90fd\u662fPython\u5bf9\u8c61,\u6240\u6709\u5bf9\u8c61\u53ef\u4ee5\u5f52\u7eb3\u4e3a4\u79cd:<\/p>\n\n\n\n<ul><li>Tag<\/li><li>NavigableString<\/li><li>BeautifulSoup<\/li><li>Comment<\/li><\/ul>\n\n\n\n<p><strong>\uff081\uff09Tag<\/strong><\/p>\n\n\n\n<p>Tag\u5c31\u662f HTML \u4e2d\u7684\u4e00\u4e2a\u4e2a\u6807\u7b7e\uff0c\u5982a,h1,div\u7b49\u7b49<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup = BeautifulSoup('&lt;b class=\"boldest\">Extremely bold&lt;\/b>')\r\ntag = soup.b\r\ntype(tag)\r\n# &lt;class 'bs4.element.Tag'><\/code><\/pre>\n\n\n\n<p>\u53ef\u4ee5\u901a\u8fc7.\u76f4\u63a5\u83b7\u53d6<\/p>\n\n\n\n<p>tag\u6709\u4e24\u4e2a\u6bd4\u8f83\u91cd\u8981\u7684\u65b9\u6cd5\uff0c.name\u548c.attrs<\/p>\n\n\n\n<p><strong>.name<\/strong><\/p>\n\n\n\n<p>\u83b7\u53d6tag\u7684\u540d\u5b57\uff0c\u53ef\u4ee5\u4fee\u6539<\/p>\n\n\n\n<p>.attrs<\/p>\n\n\n\n<p>\u83b7\u53d6tag\u4e0b\u5c5e\u6027\u7684\u503c<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup = BeautifulSoup('&lt;b class=\"boldest\">Extremely bold&lt;\/b>')\nb_class=soup.b.attr&#91;\"class\"]\nb_class=soup.b.get('class')#\u8fd9\u4e24\u4e2a\u65b9\u5f0f\u662f\u4e00\u6837\u7684\ndel soup.b&#91;'class']#\u5bf9class\u5c5e\u6027\u8fdb\u884c\u5220\u9664<\/code><\/pre>\n\n\n\n<p><strong>\uff082\uff09NavigableString<\/strong><\/p>\n\n\n\n<p>\u4f7f\u7528.string \u5373\u53ef\u83b7\u53d6\u6807\u7b7e\u5185\u90e8\u7684\u6587\u5b57\uff0c\u4f8b\u5982\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup.p.string<\/code><\/pre>\n\n\n\n<p>tag\u4e2d\u5305\u542b\u7684\u5b57\u7b26\u4e32\u4e0d\u80fd\u7f16\u8f91,\u4f46\u662f\u53ef\u4ee5\u88ab\u66ff\u6362\u6210\u5176\u5b83\u7684\u5b57\u7b26\u4e32,\u7528\u00a0<a href=\"https:\/\/beautifulsoup.cn\/#replace-with\">replace_with()<\/a>\u00a0\u65b9\u6cd5<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>tag.string.replace_with(\"No longer bold\")\r\ntag\r\n# &lt;blockquote>No longer bold&lt;\/blockquote><\/code><\/pre>\n\n\n\n<p><strong>\uff083\uff09BeautifulSoup<\/strong><\/p>\n\n\n\n<p><code>BeautifulSoup<\/code>&nbsp;\u5bf9\u8c61\u8868\u793a\u7684\u662f\u4e00\u4e2a\u6587\u6863\u7684\u5168\u90e8\u5185\u5bb9.\u5927\u90e8\u5206\u65f6\u5019,\u53ef\u4ee5\u628a\u5b83\u5f53\u4f5c&nbsp;<code>Tag<\/code>&nbsp;\u5bf9\u8c61,\u5b83\u652f\u6301&nbsp;<a href=\"https:\/\/beautifulsoup.cn\/#id19\">\u904d\u5386\u6587\u6863\u6811<\/a>&nbsp;\u548c&nbsp;<a href=\"https:\/\/beautifulsoup.cn\/#id28\">\u641c\u7d22\u6587\u6863\u6811<\/a>&nbsp;\u4e2d\u63cf\u8ff0\u7684\u5927\u90e8\u5206\u7684\u65b9\u6cd5.<\/p>\n\n\n\n<p>\u56e0\u4e3a&nbsp;<code>BeautifulSoup<\/code>&nbsp;\u5bf9\u8c61\u5e76\u4e0d\u662f\u771f\u6b63\u7684HTML\u6216XML\u7684tag,\u6240\u4ee5\u5b83\u6ca1\u6709name\u548cattribute\u5c5e\u6027.\u4f46\u6709\u65f6\u67e5\u770b\u5b83\u7684&nbsp;<code>.name<\/code>&nbsp;\u5c5e\u6027\u662f\u5f88\u65b9\u4fbf\u7684,\u6240\u4ee5&nbsp;<code>BeautifulSoup<\/code>&nbsp;\u5bf9\u8c61\u5305\u542b\u4e86\u4e00\u4e2a\u503c\u4e3a \u201c[document]\u201d \u7684\u7279\u6b8a\u5c5e\u6027&nbsp;<code>.name<\/code><\/p>\n\n\n\n<p><strong>\uff084\uff09Comment<\/strong><\/p>\n\n\n\n<p>Comment\u00a0\u5bf9\u8c61\u662f\u4e00\u4e2a\u7279\u6b8a\u7c7b\u578b\u7684\u00a0NavigableString\u00a0\u5bf9\u8c61\uff0c\u5176\u5b9e\u8f93\u51fa\u7684\u5185\u5bb9\u4ecd\u7136\u4e0d\u5305\u62ec\u6ce8\u91ca\u7b26\u53f7\u3002\u6211\u4eec\u627e\u4e00\u4e2a\u5e26\u6ce8\u91ca\u7684\u6807\u7b7e\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print soup.a\r\nprint soup.a.string\r\nprint type(soup.a.string)\n\n\n\r\u7ed3\u679c\uff1a\n&lt;a class=\"sister\" href=\"http:\/\/example.com\/elsie\" id=\"link1\">&lt;!-- Elsie -->&lt;\/a>\r\n Elsie \r\n&lt;class 'bs4.element.Comment'><\/code><\/pre>\n\n\n\n<div class=\"wp-block-qubely-heading qubely-block-caf39f\"><div class=\"qubely-block-heading  \"><div class=\"qubely-heading-container\"><h2 class=\"qubely-heading-selector\">3\u3001\u904d\u5386\u6587\u6863\u6811<\/h2><\/div><\/div><\/div>\n\n\n\n<p><strong>\uff081\uff09\u76f4\u63a5\u5b50\u8282\u70b9<\/strong><\/p>\n\n\n\n<p><strong>.content<\/strong><\/p>\n\n\n\n<p>tag \u7684 .content\u00a0\u5c5e\u6027\u53ef\u4ee5\u5c06tag\u7684\u5b50\u8282\u70b9\u4ee5\u5217\u8868\u7684\u65b9\u5f0f\u8f93\u51fa\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print soup.head.contents \n&#91;&lt;title>The Dormouse's story&lt;\/title>]\n#\u53ef\u901a\u8fc7\u7d22\u5f15\u76f4\u63a5\u83b7\u53d6\u5185\u5bb9\nprint soup.head.contents&#91;0]<\/code><\/pre>\n\n\n\n<p><strong>.children<\/strong><\/p>\n\n\n\n<p>\u5b83\u8fd4\u56de\u7684\u4e0d\u662f\u4e00\u4e2a list\uff0c\u4e0d\u8fc7\u6211\u4eec\u53ef\u4ee5\u901a\u8fc7\u904d\u5386\u83b7\u53d6\u6240\u6709\u5b50\u8282\u70b9\u3002\u6211\u4eec\u6253\u5370\u8f93\u51fa .children \u770b\u4e00\u4e0b\uff0c\u53ef\u4ee5\u53d1\u73b0\u5b83\u662f\u4e00\u4e2a list \u751f\u6210\u5668\u5bf9\u8c61\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print soup.head.children\n&lt;listiterator object at 0x7f71457f5710>\n\nfor item in  soup.body.children:#\u901a\u8fc7for\u5faa\u73af\u904d\u5386\u5185\u5bb9\r\n    print item\n<\/code><\/pre>\n\n\n\n<p><strong>\uff082\uff09\u6240\u6709\u5b50\u5b59\u8282\u70b9<\/strong><\/p>\n\n\n\n<p><strong>.descendants<\/strong><\/p>\n\n\n\n<p>\u548c.children\u7c7b\u4f3c\uff0c\u83b7\u53d6\u5185\u5bb9\u9700\u8981\u904d\u5386\uff0c\u53ef\u4ee5\u83b7\u53d6\u8be5\u6807\u7b7e\u4e0b\u7684\u6240\u6709\u8282\u70b9<\/p>\n\n\n\n<p><strong>\uff083\uff09\u8282\u70b9\u5185\u5bb9<\/strong><\/p>\n\n\n\n<p><strong>.string<\/strong><\/p>\n\n\n\n<p>\u5982\u679c\u4e00\u4e2a\u6807\u7b7e\u91cc\u9762\u6ca1\u6709\u6807\u7b7e\u4e86\uff0c\u90a3\u4e48 .string \u5c31\u4f1a\u8fd4\u56de\u6807\u7b7e\u91cc\u9762\u7684\u5185\u5bb9\u3002\u5982\u679c\u6807\u7b7e\u91cc\u9762\u53ea\u6709\u552f\u4e00\u7684\u4e00\u4e2a\u6807\u7b7e\u4e86\uff0c\u90a3\u4e48 .string \u4e5f\u4f1a\u8fd4\u56de\u6700\u91cc\u9762\u7684\u5185\u5bb9\uff1a<\/p>\n\n\n\n<p>\u5982\u679ctag\u5305\u542b\u4e86\u591a\u4e2a\u5b50\u8282\u70b9,tag\u5c31\u65e0\u6cd5\u786e\u5b9a\uff0cstring\u00a0\u65b9\u6cd5\u5e94\u8be5\u8c03\u7528\u54ea\u4e2a\u5b50\u8282\u70b9\u7684\u5185\u5bb9, .string\u00a0\u7684\u8f93\u51fa\u7ed3\u679c\u662f None\uff1a<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>\uff084\uff09\u591a\u4e2a\u5185\u5bb9<\/strong><\/p>\n\n\n\n<p><strong>.strings <\/strong><\/p>\n\n\n\n<p>\u83b7\u53d6\u591a\u4e2a\u5185\u5bb9\uff0c\u4e0d\u8fc7\u9700\u8981\u904d\u5386\u83b7\u53d6<\/p>\n\n\n\n<p><strong>.stripped_strings\u00a0<\/strong><\/p>\n\n\n\n<p>\u8f93\u51fa\u7684\u5b57\u7b26\u4e32\u4e2d\u53ef\u80fd\u5305\u542b\u4e86\u5f88\u591a\u7a7a\u683c\u6216\u7a7a\u884c,\u4f7f\u7528\u00a0.stripped_strings\u00a0\u53ef\u4ee5\u53bb\u9664\u591a\u4f59\u7a7a\u767d\u5185\u5bb9\uff1a<\/p>\n\n\n\n<p><strong>(5) \u7236\u8282\u70b9<\/strong><\/p>\n\n\n\n<p><strong>.parent<\/strong><\/p>\n\n\n\n<p><strong>(6) \u5168\u90e8\u7236\u8282\u70b9<\/strong><\/p>\n\n\n\n<p><strong>\u00a0.parents<\/strong><\/p>\n\n\n\n<p>\u83b7\u53d6\u5185\u5bb9\u9700\u8981\u904d\u5386<\/p>\n\n\n\n<p><strong>\uff087\uff09\u5144\u5f1f\u8282\u70b9<\/strong><\/p>\n\n\n\n<p>\u5144\u5f1f\u8282\u70b9\u53ef\u4ee5\u7406\u89e3\u4e3a\u548c\u672c\u8282\u70b9\u5904\u5728\u7edf\u4e00\u7ea7\u7684\u8282\u70b9\uff0c<\/p>\n\n\n\n<p><strong>.next_sibling<\/strong><\/p>\n\n\n\n<p>\u5c5e\u6027\u83b7\u53d6\u4e86\u8be5\u8282\u70b9\u7684\u4e0b\u4e00\u4e2a\u5144\u5f1f\u8282\u70b9\uff0c.<\/p>\n\n\n\n<p><strong>.previous_sibling <\/strong><\/p>\n\n\n\n<p>previous_sibling\u5c5e\u6027\u83b7\u53d6\u4e86\u8be5\u8282\u70b9\u7684\u4e0a\u4e00\u4e2a\u5144\u5f1f\u8282\u70b9\uff0c\u5982\u679c\u8282\u70b9\u4e0d\u5b58\u5728\uff0c\u5219\u8fd4\u56de None<\/p>\n\n\n\n<p>\u6ce8\u610f\uff1a\u5b9e\u9645\u6587\u6863\u4e2d\u7684tag\u7684 .next_sibling \u548c .previous_sibling \u5c5e\u6027\u901a\u5e38\u662f\u5b57\u7b26\u4e32\u6216\u7a7a\u767d\uff0c\u56e0\u4e3a\u7a7a\u767d\u6216\u8005\u6362\u884c\u4e5f\u53ef\u4ee5\u88ab\u89c6\u4f5c\u4e00\u4e2a\u8282\u70b9\uff0c\u6240\u4ee5\u5f97\u5230\u7684\u7ed3\u679c\u53ef\u80fd\u662f\u7a7a\u767d\u6216\u8005\u6362\u884c\u3002<\/p>\n\n\n\n<p><strong>\uff088\uff09\u5168\u90e8\u5144\u5f1f\u8282\u70b9<\/strong><\/p>\n\n\n\n<p>\u901a\u8fc7\u00a0.next_siblings\u00a0\u548c\u00a0.previous_siblings\u00a0\u5c5e\u6027\u53ef\u4ee5\u5bf9\u5f53\u524d\u8282\u70b9\u7684\u5144\u5f1f\u8282\u70b9\u8fed\u4ee3\u8f93\u51fa\uff1a<\/p>\n\n\n\n<p><strong>\uff089\uff09\u524d\u540e\u8282\u70b9<\/strong><\/p>\n\n\n\n<p>.next_element,.previous_element<\/p>\n\n\n\n<p>\u4e0e\u00a0.next_sibling \u00a0.previous_sibling \u4e0d\u540c\uff0c\u5b83\u5e76\u4e0d\u662f\u9488\u5bf9\u4e8e\u5144\u5f1f\u8282\u70b9\uff0c\u800c\u662f\u5728\u6240\u6709\u8282\u70b9\uff0c\u4e0d\u5206\u5c42\u6b21<\/p>\n\n\n\n<p><strong>\uff0810\uff09\u6240\u6709\u524d\u540e\u8282\u70b9<\/strong><\/p>\n\n\n\n<p>\u901a\u8fc7\u00a0.next_elements\u00a0\u548c\u00a0.previous_elements\u00a0\u7684\u8fed\u4ee3\u5668\u5c31\u53ef\u4ee5\u5411\u524d\u6216\u5411\u540e\u8bbf\u95ee\u6587\u6863\u7684\u89e3\u6790\u5185\u5bb9\uff1a<\/p>\n\n\n\n<div class=\"wp-block-qubely-heading qubely-block-26e62f\"><div class=\"qubely-block-heading  \"><div class=\"qubely-heading-container\"><h2 class=\"qubely-heading-selector\">4\u3001\u00a0\u641c\u7d22\u6587\u6863\u6811<\/h2><\/div><\/div><\/div>\n\n\n\n<p><strong>\uff081\uff09find_all( name , attrs , recursive , text , **kwargs )<\/strong><\/p>\n\n\n\n<p>find_all()&nbsp;\u65b9\u6cd5\u641c\u7d22\u5f53\u524dtag\u7684\u6240\u6709tag\u5b50\u8282\u70b9,\u5e76\u5224\u65ad\u662f\u5426\u7b26\u5408\u8fc7\u6ee4\u5668\u7684\u6761\u4ef6<\/p>\n\n\n\n<ul><li><strong>name \u53c2\u6570<\/strong><\/li><\/ul>\n\n\n\n<p><strong>A.\u4f20\u5b57\u7b26\u4e32<\/strong><\/p>\n\n\n\n<p>name&nbsp;\u53c2\u6570\u53ef\u4ee5\u67e5\u627e\u6240\u6709\u540d\u5b57\u4e3a&nbsp;name&nbsp;\u7684tag,\u5b57\u7b26\u4e32\u5bf9\u8c61\u4f1a\u88ab\u81ea\u52a8\u5ffd\u7565\u6389<\/p>\n\n\n\n<p>\u6700\u7b80\u5355\u7684\u8fc7\u6ee4\u5668\u662f\u5b57\u7b26\u4e32.\u5728\u641c\u7d22\u65b9\u6cd5\u4e2d\u4f20\u5165\u4e00\u4e2a\u5b57\u7b26\u4e32\u53c2\u6570,Beautiful Soup\u4f1a\u67e5\u627e\u4e0e\u5b57\u7b26\u4e32\u5b8c\u6574\u5339\u914d\u7684\u5185\u5bb9,\u4e0b\u9762\u7684\u4f8b\u5b50\u7528\u4e8e\u67e5\u627e\u6587\u6863\u4e2d\u6240\u6709\u7684&lt;b>\u6807\u7b7e\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup.find_all('b')\r\n# &#91;&lt;b>The Dormouse's story&lt;\/b>]<\/code><\/pre>\n\n\n\n<p><strong>\u00a0B.\u4f20\u6b63\u5219\u8868\u8fbe\u5f0f<\/strong><\/p>\n\n\n\n<p>\u5982\u679c\u4f20\u5165\u6b63\u5219\u8868\u8fbe\u5f0f\u4f5c\u4e3a\u53c2\u6570,Beautiful Soup\u4f1a\u901a\u8fc7\u6b63\u5219\u8868\u8fbe\u5f0f\u7684\u00a0match()\u00a0\u6765\u5339\u914d\u5185\u5bb9.\u4e0b\u9762\u4f8b\u5b50\u4e2d\u627e\u51fa\u6240\u6709\u4ee5b\u5f00\u5934\u7684\u6807\u7b7e,\u8fd9\u8868\u793a&lt;body>\u548c&lt;b>\u6807\u7b7e\u90fd\u5e94\u8be5\u88ab\u627e\u5230<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import re\r\nfor tag in soup.find_all(re.compile(\"^b\")):\r\n    print(tag.name)\r\n# body\r\n# b<\/code><\/pre>\n\n\n\n<p><strong>C.\u4f20\u5217\u8868<\/strong><\/p>\n\n\n\n<p>\u5982\u679c\u4f20\u5165\u5217\u8868\u53c2\u6570,Beautiful Soup\u4f1a\u5c06\u4e0e\u5217\u8868\u4e2d\u4efb\u4e00\u5143\u7d20\u5339\u914d\u7684\u5185\u5bb9\u8fd4\u56de.\u4e0b\u9762\u4ee3\u7801\u627e\u5230\u6587\u6863\u4e2d\u6240\u6709&lt;a>\u6807\u7b7e\u548c&lt;b>\u6807\u7b7e\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup.find_all(&#91;\"a\", \"b\"])\r\n# &#91;&lt;b>The Dormouse's story&lt;\/b>,\r\n#  &lt;a class=\"sister\" href=\"http:\/\/example.com\/elsie\" id=\"link1\">Elsie&lt;\/a>,\r\n#  &lt;a class=\"sister\" href=\"http:\/\/example.com\/lacie\" id=\"link2\">Lacie&lt;\/a>,\r\n#  &lt;a class=\"sister\" href=\"http:\/\/example.com\/tillie\" id=\"link3\">Tillie&lt;\/a>]<\/code><\/pre>\n\n\n\n<p><strong>\u00a0\u00a0D.\u4f20 True<\/strong><\/p>\n\n\n\n<p>True\u00a0\u53ef\u4ee5\u5339\u914d\u4efb\u4f55\u503c,\u4e0b\u9762\u4ee3\u7801\u67e5\u627e\u5230\u6240\u6709\u7684tag,\u4f46\u662f\u4e0d\u4f1a\u8fd4\u56de\u5b57\u7b26\u4e32\u8282\u70b9\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for tag in soup.find_all(True):\r\n    print(tag.name)\r\n# html\r\n# head\r\n# title\r\n# body\r\n# p\r\n# b\r\n# p\r\n# a\r\n# a<\/code><\/pre>\n\n\n\n<p><strong>E.\u4f20\u65b9\u6cd5<\/strong><\/p>\n\n\n\n<p>\u5982\u679c\u6ca1\u6709\u5408\u9002\u8fc7\u6ee4\u5668,\u90a3\u4e48\u8fd8\u53ef\u4ee5\u5b9a\u4e49\u4e00\u4e2a\u65b9\u6cd5,\u65b9\u6cd5\u53ea\u63a5\u53d7\u4e00\u4e2a\u5143\u7d20\u53c2\u6570\u00a0\u00a0,\u5982\u679c\u8fd9\u4e2a\u65b9\u6cd5\u8fd4\u56de\u00a0True\u00a0\u8868\u793a\u5f53\u524d\u5143\u7d20\u5339\u914d\u5e76\u4e14\u88ab\u627e\u5230,\u5982\u679c\u4e0d\u662f\u5219\u53cd\u56de\u00a0False\u3002\u4e0b\u9762\u65b9\u6cd5\u6821\u9a8c\u4e86\u5f53\u524d\u5143\u7d20,\u5982\u679c\u5305\u542b\u00a0class\u00a0\u5c5e\u6027\u5374\u4e0d\u5305\u542b\u00a0id\u00a0\u5c5e\u6027,\u90a3\u4e48\u5c06\u8fd4\u56de\u00a0True:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def has_class_but_no_id(tag):\r\n    return tag.has_attr('class') and not tag.has_attr('id')<\/code><\/pre>\n\n\n\n<p>\u5c06\u8fd9\u4e2a\u65b9\u6cd5\u4f5c\u4e3a\u53c2\u6570\u4f20\u5165\u00a0find_all()\u00a0\u65b9\u6cd5,\u5c06\u5f97\u5230\u6240\u6709&lt;p>\u6807\u7b7e\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup.find_all(has_class_but_no_id)\r\n# &#91;&lt;p class=\"title\">&lt;b>The Dormouse's story&lt;\/b>&lt;\/p>,\r\n#  &lt;p class=\"story\">Once upon a time there were...&lt;\/p>,\r\n#  &lt;p class=\"story\">...&lt;\/p>]<\/code><\/pre>\n\n\n\n<p><strong>2\uff09keyword \u53c2\u6570<\/strong><\/p>\n\n\n\n<ul><li>id#\u68c0\u7d22id<\/li><li>href#\u5982\u679c\u4f20\u5165\u00a0href\u00a0\u53c2\u6570,Beautiful Soup\u4f1a\u641c\u7d22\u6bcf\u4e2atag\u7684\u201dhref\u201d\u5c5e\u6027<\/li><li>class_#class\u7c7b<\/li><li>attrs#\u4f8b\u5982attrs={&#8220;data-foo&#8221;: &#8220;value&#8221;}\uff0c\u627e\u5230tag\u7684\u5c5e\u6027data-foo=value\u7684tag<\/li><\/ul>\n\n\n\n<p><strong>3\uff09text \u53c2\u6570<\/strong><\/p>\n\n\n\n<p>\u901a\u8fc7\u00a0text\u00a0\u53c2\u6570\u53ef\u4ee5\u641c\u641c\u6587\u6863\u4e2d\u7684\u5b57\u7b26\u4e32\u5185\u5bb9.\u4e0e\u00a0name\u00a0\u53c2\u6570\u7684\u53ef\u9009\u503c\u4e00\u6837,\u00a0text\u00a0\u53c2\u6570\u63a5\u53d7 \u5b57\u7b26\u4e32 , \u6b63\u5219\u8868\u8fbe\u5f0f , \u5217\u8868, True\uff1a<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>soup.find_all(text=\"Elsie\")\r\n# &#91;u'Elsie']\r\n \r\nsoup.find_all(text=&#91;\"Tillie\", \"Elsie\", \"Lacie\"])\r\n# &#91;u'Elsie', u'Lacie', u'Tillie']\r\n \r\nsoup.find_all(text=re.compile(\"Dormouse\"))\r\n&#91;u\"The Dormouse's story\", u\"The Dormouse's story\"]<\/code><\/pre>\n\n\n\n<p><strong>4\uff09limit \u53c2\u6570<\/strong><\/p>\n\n\n\n<p>\u9650\u5236\u8fd4\u56de\u7684\u4e2a\u6570<\/p>\n\n\n\n<p><strong>5\uff09recursive \u53c2\u6570<\/strong><\/p>\n\n\n\n<p>\u8c03\u7528tag\u7684\u00a0find_all()\u00a0\u65b9\u6cd5\u65f6,Beautiful Soup\u4f1a\u68c0\u7d22\u5f53\u524dtag\u7684\u6240\u6709\u5b50\u5b59\u8282\u70b9,\u5982\u679c\u53ea\u60f3\u641c\u7d22tag\u7684\u76f4\u63a5\u5b50\u8282\u70b9,\u53ef\u4ee5\u4f7f\u7528\u53c2\u6570\u00a0recursive=False<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>\u8fd8\u6709\u5176\u5b83\u65b9\u6cd5\uff0c\u4f46\u662f\u5176\u547d\u540d\u89c4\u5219\u548c\u904d\u5386\u6587\u6863\u6811\u4e00\u6837\u56e0\u800c\u4e0d\u591a\u53d9\u8ff0\uff0c\u53c2\u6570\u548cfind_all\u5927\u81f4\u76f8\u540c<\/p>\n\n\n\n<p><strong>\uff082\uff09find( name , attrs , recursive , text , **kwargs )<\/strong><\/p>\n\n\n\n<p>\u5b83\u4e0e find_all() \u65b9\u6cd5\u552f\u4e00\u7684\u533a\u522b\u662f find_all() \u65b9\u6cd5\u7684\u8fd4\u56de\u7ed3\u679c\u662f\u503c\u5305\u542b\u4e00\u4e2a\u5143\u7d20\u7684\u5217\u8868,\u800c find() \u65b9\u6cd5\u76f4\u63a5\u8fd4\u56de\u7ed3\u679c<\/p>\n\n\n\n<p><strong>\uff083\uff09find_parents() find_parent()<\/strong><\/p>\n\n\n\n<p>find_all() \u548c find() \u53ea\u641c\u7d22\u5f53\u524d\u8282\u70b9\u7684\u6240\u6709\u5b50\u8282\u70b9,\u5b59\u5b50\u8282\u70b9\u7b49. find_parents() \u548c find_parent() \u7528\u6765\u641c\u7d22\u5f53\u524d\u8282\u70b9\u7684\u7236\u8f88\u8282\u70b9,\u641c\u7d22\u65b9\u6cd5\u4e0e\u666e\u901atag\u7684\u641c\u7d22\u65b9\u6cd5\u76f8\u540c,\u641c\u7d22\u6587\u6863\u641c\u7d22\u6587\u6863\u5305\u542b\u7684\u5185\u5bb9<\/p>\n\n\n\n<p><strong>\uff084\uff09find_next_siblings() find_next_sibling()<\/strong><\/p>\n\n\n\n<p>\u8fd92\u4e2a\u65b9\u6cd5\u901a\u8fc7 .next_siblings \u5c5e\u6027\u5bf9\u5f53 tag \u7684\u6240\u6709\u540e\u9762\u89e3\u6790\u7684\u5144\u5f1f tag \u8282\u70b9\u8fdb\u884c\u8fed\u4ee3, find_next_siblings() \u65b9\u6cd5\u8fd4\u56de\u6240\u6709\u7b26\u5408\u6761\u4ef6\u7684\u540e\u9762\u7684\u5144\u5f1f\u8282\u70b9,find_next_sibling() \u53ea\u8fd4\u56de\u7b26\u5408\u6761\u4ef6\u7684\u540e\u9762\u7684\u7b2c\u4e00\u4e2atag\u8282\u70b9<\/p>\n\n\n\n<p><strong>\uff085\uff09find_previous_siblings() find_previous_sibling()<\/strong><\/p>\n\n\n\n<p>\u8fd92\u4e2a\u65b9\u6cd5\u901a\u8fc7 .previous_siblings \u5c5e\u6027\u5bf9\u5f53\u524d tag \u7684\u524d\u9762\u89e3\u6790\u7684\u5144\u5f1f tag \u8282\u70b9\u8fdb\u884c\u8fed\u4ee3, find_previous_siblings()\u65b9\u6cd5\u8fd4\u56de\u6240\u6709\u7b26\u5408\u6761\u4ef6\u7684\u524d\u9762\u7684\u5144\u5f1f\u8282\u70b9, find_previous_sibling() \u65b9\u6cd5\u8fd4\u56de\u7b2c\u4e00\u4e2a\u7b26\u5408\u6761\u4ef6\u7684\u524d\u9762\u7684\u5144\u5f1f\u8282\u70b9<\/p>\n\n\n\n<p><strong>\uff086\uff09find_all_next() find_next()<\/strong><\/p>\n\n\n\n<p>\u8fd92\u4e2a\u65b9\u6cd5\u901a\u8fc7 .next_elements \u5c5e\u6027\u5bf9\u5f53\u524d tag \u7684\u4e4b\u540e\u7684 tag \u548c\u5b57\u7b26\u4e32\u8fdb\u884c\u8fed\u4ee3, find_all_next() \u65b9\u6cd5\u8fd4\u56de\u6240\u6709\u7b26\u5408\u6761\u4ef6\u7684\u8282\u70b9, find_next() \u65b9\u6cd5\u8fd4\u56de\u7b2c\u4e00\u4e2a\u7b26\u5408\u6761\u4ef6\u7684\u8282\u70b9<\/p>\n\n\n\n<p><strong>\uff087\uff09find_all_previous() \u548c find_previous()<\/strong><\/p>\n\n\n\n<p>\u8fd92\u4e2a\u65b9\u6cd5\u901a\u8fc7 .previous_elements \u5c5e\u6027\u5bf9\u5f53\u524d\u8282\u70b9\u524d\u9762\u7684 tag \u548c\u5b57\u7b26\u4e32\u8fdb\u884c\u8fed\u4ee3, find_all_previous() \u65b9\u6cd5\u8fd4\u56de\u6240\u6709\u7b26\u5408\u6761\u4ef6\u7684\u8282\u70b9, find_previous()\u65b9\u6cd5\u8fd4\u56de\u7b2c\u4e00\u4e2a\u7b26\u5408\u6761\u4ef6\u7684\u8282\u70b9<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-qubely-heading qubely-block-ca4167\"><div class=\"qubely-block-heading  \"><div class=\"qubely-heading-container\"><h2 class=\"qubely-heading-selector\">5\u3001CSS\u9009\u62e9\u5668<\/h2><\/div><\/div><\/div>\n\n\n\n<p>\u6211\u4eec\u5728\u5199 CSS \u65f6\uff0c\u6807\u7b7e\u540d\u4e0d\u52a0\u4efb\u4f55\u4fee\u9970\uff0c\u7c7b\u540d\u524d\u52a0\u70b9\uff0cid\u540d\u524d\u52a0 #\uff0c\u5728\u8fd9\u91cc\u6211\u4eec\u4e5f\u53ef\u4ee5\u5229\u7528\u7c7b\u4f3c\u7684\u65b9\u6cd5\u6765\u7b5b\u9009\u5143\u7d20\uff0c\u7528\u5230\u7684\u65b9\u6cd5\u662f\u00a0<strong>soup.select()\uff0c<\/strong>\u8fd4\u56de\u7c7b\u578b\u662f\u00a0<strong>list<\/strong><\/p>\n\n\n\n<ul id=\"block-26c20649-1c80-4698-987d-8a5f614875a7\"><li>\u6807\u7b7e\u540d\u5982title \u5199\u6cd5&#8221;title&#8221;<\/li><li>\u7c7b\u540d\u5982class=skudata,\u5199\u6cd5&#8221;.skudata&#8221;<\/li><li>id\u540d\u5982id=&#8221;link1&#8243;,\u5199\u6cd5&#8221;#link1&#8243;<\/li><li>\u5c5e\u6027\u67e5\u627e\u5982a\u6807\u7b7e\u5c5e\u6027\u4e2dclass=&#8221;abcd&#8221;.\u5199\u6cd5&#8221;a[class=&#8221;abcd&#8221;]&#8221;\u6ce8\u610f\u5b83\u4eec\u662f\u540c\u4e00\u4e2a\u6807\u7b7e\uff0c\u6240\u4ee5\u6ca1\u6709\u7a7a\u683c<\/li><\/ul>\n\n\n\n<p><strong>get_text()\u83b7\u53d6\u5185\u5bb9\u00a0<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>html_doc = \"\"\"\r\n&lt;html>&lt;head>&lt;title>The Dormouse's story&lt;\/title>&lt;\/head>\r\n    &lt;body>\r\n&lt;p class=\"title\">&lt;b>The Dormouse's story&lt;\/b>&lt;\/p>\r\n\r\n&lt;p class=\"story\">Once upon a time there were three little sisters; and their names were\r\n&lt;a href=\"http:\/\/example.com\/elsie\" class=\"sister\" id=\"link1\">Elsie&lt;\/a>,\r\n&lt;a href=\"http:\/\/example.com\/lacie\" class=\"sister\" id=\"link2\">Lacie&lt;\/a> and\r\n&lt;a href=\"http:\/\/example.com\/tillie\" class=\"sister\" id=\"link3\">Tillie&lt;\/a>;\r\nand they lived at the bottom of a well.&lt;\/p>\r\n\r\n&lt;p class=\"story\">...&lt;\/p>\r\n\"\"\"\r\n\r\nfrom bs4 import BeautifulSoup\r\nsoup = BeautifulSoup(html_doc, 'lxml')\n\nprint type(soup.select('body p&#91;class=\"story\"]'))\r\nprint soup.select('title')&#91;0].get_text()\r\n \r\nfor title in soup.select('title'):\r\n    print title.get_text()<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u4f7f\u7528\u5b98\u65b9\u6587\u6863\u6765\u6f14\u793a Beautiful Soup\u5c06 &hellip;<\/p>\n","protected":false},"author":1,"featured_media":184,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"qubely_global_settings":"","qubely_interactions":""},"categories":[7],"tags":[],"qubely_featured_image_url":{"full":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup.jpg",1000,666,false],"landscape":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup.jpg",1000,666,false],"portraits":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-540x320.jpg",540,320,true],"thumbnail":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-150x150.jpg",150,150,true],"medium":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-300x200.jpg",300,200,true],"medium_large":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-768x511.jpg",768,511,true],"large":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup.jpg",1000,666,false],"1536x1536":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup.jpg",1000,666,false],"2048x2048":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup.jpg",1000,666,false],"qubely_landscape":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup.jpg",1000,666,false],"qubely_portrait":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-540x320.jpg",540,320,true],"qubely_thumbnail":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-140x100.jpg",140,100,true],"post-thumbnail":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-450x300.jpg",450,300,true],"bravada-featured":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-583x300.jpg",583,300,true],"bravada-featured-lp":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-620x300.jpg",620,300,true],"bravada-featured-half":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-800x300.jpg",800,300,true],"bravada-featured-third":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-512x300.jpg",512,300,true],"bravada-lpbox-1":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-310x250.jpg",310,250,true],"bravada-lpbox-2":["http:\/\/hiycz.cn\/wp-content\/uploads\/2022\/06\/soup-413x300.jpg",413,300,true]},"qubely_author":{"display_name":"ycz","author_link":"http:\/\/hiycz.cn\/index.php\/author\/ycz\/"},"qubely_comment":0,"qubely_category":"<a href=\"http:\/\/hiycz.cn\/index.php\/category\/%e6%8a%80%e6%9c%af%e5%88%86%e4%ba%ab\/\" rel=\"category tag\">\u6280\u672f\u5206\u4eab<\/a>","qubely_excerpt":"\u4f7f\u7528\u5b98\u65b9\u6587\u6863\u6765\u6f14\u793a Beautiful Soup\u5c06 &hellip;","_links":{"self":[{"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/posts\/183"}],"collection":[{"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/comments?post=183"}],"version-history":[{"count":1,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/posts\/183\/revisions"}],"predecessor-version":[{"id":185,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/posts\/183\/revisions\/185"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/media\/184"}],"wp:attachment":[{"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/media?parent=183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/categories?post=183"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/hiycz.cn\/index.php\/wp-json\/wp\/v2\/tags?post=183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}