Skip to content

Instantly share code, notes, and snippets.

@gotnix
Last active July 7, 2016 10:18
Show Gist options
  • Save gotnix/1cb52dd3144229be2c0eca81ea26401c to your computer and use it in GitHub Desktop.
Save gotnix/1cb52dd3144229be2c0eca81ea26401c to your computer and use it in GitHub Desktop.
use `xmllint --html --xpath` and lynx parse html.

以前就想这么用了,一直没下工夫。最近在折腾 Alpine Linux,发现包管理没有类似 yum wahtprovides 的功能, 就到论坛发了个 帖子 询问, 说是可以通过一个 API 来做。但是 2 楼的回复输出没法看,就想用 XPath 试试看。

<table class="table table-striped table-bordered table-condensed" data-toggle="table">
......
</tbody></table>

主要工作就是获取这个表格,然后打印出来就行了。 命令如下:

curl -s 'http://pkgs.alpinelinux.org/contents?file=xmllint&path=%2Fusr%2Fbin&repo=main&arch=x86_64' \
  | xmllint --html --xpath '//*[@id="main"]/div/div[3]/table[@class="table table-striped table-bordered table-condensed" and @data-toggle="table"]' - 2>/dev/null \
  | lynx --dump -stdin
# 输出如下:
         File           Package      Branch Repository Architecture
   /usr/bin/xmllint [1]libxml2-utils v3.3   main       x86_64
   /usr/bin/xmllint [2]libxml2-utils v3.4   main       x86_64
   /usr/bin/xmllint [3]libxml2-utils edge   main       x86_64

References
# 这里的参考链接是上面 Pachage 的超链接,lynx 以参考链接的形式打印了,
# 因为是从 stdin 输入的内容,所以协议头也变成了 `file://`。
   1. file:///package/v3.3/main/x86_64/libxml2-utils
   2. file:///package/v3.4/main/x86_64/libxml2-utils
   3. file:///package/edge/main/x86_64/libxml2-utils

想要打印原来的链接,可以直接替换掉协议头:

curl -s 'http://pkgs.alpinelinux.org/contents?file=xmllint&path=%2Fusr%2Fbin&repo=main&arch=x86_64' \
  | xmllint --html --xpath '//*[@id="main"]/div/div[3]/table[@class="table table-striped table-bordered table-condensed" and @data-toggle="table"]' - 2>/dev/null \
  | lynx --dump -stdin \
  | sed "/^References$/,$ s%file://%http://pkgs.alpinelinux.org%"
Note

CentOS 6 下的 xmllint (libxml2-2.7.6-21.el6.x86_64)是没有 --xpath 这个选项的,版本信息如下:

xmllint --version
xmllint: using libxml version 20706
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib

测试能工作的是这个版本:

xmllint --version
xmllint: using libxml version 20904
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib

貌似除了版本号不一样,没有其它区别。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment