Browse For Homework Do My Homework | Get Assignment Help Here

Status: Live

Python assignment

Date Posted: 18/12/2018

Category: Computer Science
Due Date: 19/12/2018
Willing to Pay: $100.00

Instruction

We've talked about how Hadoop works, you've seen Hadoop code, and you've written some code of your own using our store sales data. Now we'd like you to solve some problems using a different dataset on your own. You'll have to write your Mappers and Reducers from scratch; please use Python. You will have to do the data processing on your local pseudo-distributed cluster, but you will be able to see if your solution was correct by submitting your results to our system. •The data set we're using is an anonymized Web server log file from a public relations company whose clients were DVD distributors. The log file is in the udacity_training/data directory, and it's currently compressed using GnuZip. So you'll need to decompress it and then put it in HDFS. If you take a look at the file, you'll see that each line represents a hit to the Web server. It includes the IP address which accessed the site, the date and time of the access, and the name of the page which was visited. •The logfile is in Common Log Format: Write a MapReduce program which will display the number of hits for each different file on the web site. - how many hits were made to the page - /assests/js/the-associates.js Write a MapReduce program which determines the number of hits to the site made by each IP address - how many hits were made by IP address 10.00.99.186? Find the most popular file on the website: that is, the file whose path occurs most often in access_log. Your reducer should output the file's path and the number of times it appears in the log. IMPORTANT: Some pathnames in the log begin with "http://www.the-associates.co.uk". Be sure to remove the portion "http://www.the-associates.co.uk" from the pathnames in your mapper so that all occurrences of a file are counted together. - what is the Most popular file's path: ___________ - what is the number of occurrences: ___________

Attached

No File uploaded yet.

Tutor Uploads

No File uploaded yet.

Bidders

Average bid

$286.92