Write a python program to finish a big-dataprocessing task --- finding out most frequently used words onWikipedia pages.
The execution of the program generates a list of distinct wordsused in the wikipedia pages and the number of occurrences of eachword on these web pages. The words are sorted by the number ofoccurrences in ascending order. The following is a sample of outputgenerated for 4 Wikipedia pages.
126 that
128 by
133 as
149 or
160 for
164 is
189 on
191 from
345 to
375 advertising
443 a
473 and
480 in
677 of
1080 the
Since there are a huge number of pages in Wikipedia, it is notrealistic to analyze all of them in short time on one machine. Inthe project, you need to analyze all the pages for the Wikipediaentries with two capital letters. For example, the Wikipedia pagefor entry \"AC\" is https://en.wikipedia.org/wiki/AC . Useurllib or urllib2 library to download a page.
A HTML page has HTML tags, which should be removed before theanalysis. Use BeautifulSoup library to convert a text fromHTML format to text format.