Codebox Software
OSX Shell Script to Read a Web Page Aloud
Published:
This one-line shell script for OSX will do its best to read a web page aloud to you. The results can be a little variable, depending on exactly how the page's HTML has been structured, but it usually works quite well if semantic markup has been used.
curl -L $URL | tr '\n' ' ' | egrep -o '<(title|h\d|p|li)( [^>]*>|>).*?</\1>' | sed -E 's/<[^>]*>//g' | say
Notes
Before running the script you will need to set the URL
variable to the address of the page you want it to
read, for example:
URL=http://pun.me/pages/dad-jokes.php
The script contains 5 separate commands, explained below:
-
curl -L $URL
downloads the HTML source code for the page and sends it on to the next command. -
tr '\n' ' '
removes any newline characters from the HTML, so that all the markup is on a single line -
egrep -o '<(title|h\d|p|li)( [^>]*>|>).*?</\1>'
this rather hairy regular expression strips away everything that is not contained inside one of the following<h2>
<h3>
<h4>
<h5>
<h6>
<p>
<li>
-
sed -E 's/<[^>]*>//g'
removes any remaining HTML tags from the page, but leaves the text that was inside them -
say
reads the resulting text out loud. Thesay
command has a few nice options such as the ability to change the voice that it uses, and it can also save the audio into a file rather than playing it aloud.