{"id":1510,"date":"2008-04-05T17:42:00","date_gmt":"2008-04-05T17:42:00","guid":{"rendered":"http:\/\/t.motd.kr\/articles\/1510\/url-monitor-getting-notified-when-a-web-page-is-updated"},"modified":"2022-12-28T01:45:56","modified_gmt":"2022-12-27T16:45:56","slug":"url-monitor-getting-notified-when-a-web-page-is-updated","status":"publish","type":"post","link":"https:\/\/vault.motd.kr\/wordpress\/posts\/1510\/url-monitor-getting-notified-when-a-web-page-is-updated\/","title":{"rendered":"URL Monitor – Getting Notified When a Web Page is Updated"},"content":{"rendered":"\n
I\u2019ve been using RSS<\/span> Generator<\/a> for a while to generate RSS<\/span> for web pages which don\u2019t provide RSS<\/span>. However, the service often goes unreliable probably due to enormous load from various RSS<\/span> readers. Another caveat was that the URL<\/span> of the generated RSS<\/span> is so long that it\u2019s not accepted by the input form of some web-based RSS<\/span> readers.<\/p>\n\n\n\n So, I rather chose to write a simple shell script which sends me an e-mail message when the web pages in my watch list change. It\u2019s name is \u2018URL<\/span> Monitor\u2019:<\/p>\n\n\n\n This quick and dirty shell script simply strips out unnecessary part from the fetched web page, caches it and notifies me (local user \u2018trustin\u2019) via an e-mail when the newly fetched stuff differs from the cached one. The following is the example configuration file (\/etc\/url-monitor.conf<\/tt>):<\/p>\n\n\n\n Each line has the following meaning:<\/p>\n\n\n\n Once configured, url-monitor<\/tt> should be executed periodically. I added the following line to my crontab<\/tt>:<\/p>\n\n\n\n As you noticed, it\u2019s very primitive and requires you to modify the script itself to configure certain parameters. However, I think it\u2019s just OK as long as the number of the web pages I have to monitor (read: which doesn\u2019t provide RSS<\/span>) is small.<\/p>\n","protected":false},"excerpt":{"rendered":" I\u2019ve been using RSS Generator for a while to generate RSS for web pages which don\u2019t provide RSS. However, the service often goes unreliable probably due to enormous load from various RSS readers. Another caveat was that the URL of the generated RSS is so long that it\u2019s not accepted by the input form of… Continue reading #!\/bin\/sh\n# Path: \/usr\/local\/bin\/url-monitor\n\nmkdir -p \/var\/cache\/url-monitor\ncat \/etc\/url-monitor.conf | while read -r NAME; do\n read -r URL || exit 1\n read -r INTERVAL || exit 2\n read -r STRIP_REGEX || exit 3\n read -r NEEDLE || exit 4\n read -r REPLACEMENT || exit 5\n\n if [ -f \"\/var\/cache\/url-monitor\/$NAME.html\" ]; then\n MTIME=`stat --format=%Z \"\/var\/cache\/url-monitor\/$NAME.html\"`\n NOW=`date +%s`\n AGE=$(($NOW - $MTIME))\n if [ $AGE -lt $INTERVAL ]; then\n continue;\n fi\n fi\n\n wget -q -T 60 -O - \"$URL\" | perl -pi -e 's\/[rn]\/ \/g' | perl -pi -e \"s\/$STRIP_REGEX\/\/gi\" | perl -pi -e 's\/s+\/ \/g' | perl -pi -e \"s\/$NEEDLE\/$REPLACEMENT\/gi\" > \"\/var\/cache\/url-monitor\/$NAME.html.new\"\n\n if [ ! -f \"\/var\/cache\/url-monitor\/$NAME.html.new\" ] || [ `stat --format=%s \"\/var\/cache\/url-monitor\/$NAME.html.new\"` == \"0\" ]; then\n echo \"Failed to fetch - $NAME\" >&2\n rm -f \"\/var\/cache\/url-monitor\/$NAME.html.new\"\n touch \"\/var\/cache\/url-monitor\/$NAME.html\"\n exit 6\n fi\n\n if [ -f \"\/var\/cache\/url-monitor\/$NAME.html\" ]; then\n diff -q \"\/var\/cache\/url-monitor\/$NAME.html\" \"\/var\/cache\/url-monitor\/$NAME.html.new\" > \/dev\/null 2>&1\n if [ \"$?\" == \"0\" ]; then\n rm -f \"\/var\/cache\/url-monitor\/$NAME.html.new\"\n touch \"\/var\/cache\/url-monitor\/$NAME.html\"\n continue\n else\n mv -f \"\/var\/cache\/url-monitor\/$NAME.html.new\" \"\/var\/cache\/url-monitor\/$NAME.html\"\n fi\n else\n mv -f \"\/var\/cache\/url-monitor\/$NAME.html.new\" \"\/var\/cache\/url-monitor\/$NAME.html\"\n fi\n\n # Send notification\n {\n echo 'From: URL Monitor <url-monitor@gleamynode.net>'\n echo 'To: Trustin Lee <trustin@gmail.com>'\n echo \"Subject: $NAME - updated\"\n echo 'Content-Type: text\/html; charset=euc-kr'\n echo\n cat \"\/var\/cache\/url-monitor\/$NAME.html\"\n echo\n } | sendmail trustin\ndone<\/code><\/pre>\n\n\n\n
JavaWorld: Featured Tutorials\nhttp://www.javaworld.com\/features\/index.html\n86400\n(^.*<div id=\"toplist\">|<p><a class=\"findmore\".*$)\n\/javaworld\/\nhttp:\/\/www.javaworld.com\/javaworld\/\nDDJ.com: High Performance Computing\nhttp:\/\/www.ddj.com\/hpc-high-performance-computing\/archives.jhtml\n86400\n(^.*Feature Articless*-->|<br clear=\"left\">.*$)\n\/hpc-high-performance-computing\/\nhttp:\/\/www.ddj.com\/hpc-high-performance-computing\/\nLono.pe.kr\nhttp:\/\/lono.pe.kr\/src\/\n86400\n(^.*[[Start]]-->|<!--[[.*$)\n\/src\/\nhttp:\/\/www.lono.pe.kr\/src\/<\/code><\/pre>\n\n\n\n
\n
# Path: \/etc\/cron.d\/url-monitor.cron\nSHELL=\/bin\/bash\nPATH=\/usr\/local\/bin:\/sbin:\/bin:\/usr\/sbin:\/usr\/bin\nMAILTO=trustin\nHOME=\/root\n\n# Run the URL monitor every three minutes\n*\/3 * * * * root \/usr\/local\/bin\/url-monitor<\/code><\/pre>\n\n\n\n