Codebox Software

OSX Shell Script to Read a Web Page Aloud

Published: 22 Jul 2017

This one-line shell script for OSX will do its best to read a web page aloud to you. The results can be a little variable, depending on exactly how the page's HTML has been structured, but it usually works quite well if semantic markup has been used.

curl -L $URL | tr '\n' ' ' | egrep -o '<(title|h\d|p|li)( [^>]*>|>).*?</\1>' | sed -E 's/<[^>]*>//g' | say

Notes

Before running the script you will need to set the URL variable to the address of the page you want it to read, for example:

URL=http://pun.me/pages/dad-jokes.php

The script contains 5 separate commands, explained below:

  • curl -L $URL downloads the HTML source code for the page and sends it on to the next command.
  • tr '\n' ' ' removes any newline characters from the HTML, so that all the markup is on a single line
  • egrep -o '<(title|h\d|p|li)( [^>]*>|>).*?</\1>' this rather hairy regular expression strips away everything that is not contained inside one of the following tags: <title> <h1> <h2> <h3> <h4> <h5> <h6> <p> <li>
  • sed -E 's/<[^>]*>//g' removes any remaining HTML tags from the page, but leaves the text that was inside them
  • say reads the resulting text out loud. The say command has a few nice options such as the ability to change the voice that it uses, and it can also save the audio into a file rather than playing it aloud.