Loading
Status: Live
Python assignment
Date Posted: 18/12/2018
Category: Computer Science
Due Date: 19/12/2018
Willing to Pay: $100.00
Instruction
We've talked about how Hadoop works, you've seen Hadoop code, and you've written some code of your own using our store sales data. Now we'd like you to solve some problems using a different dataset on your own. You'll have to write your Mappers and Reducers from scratch; please use Python. You will have to do the data processing on your local pseudo-distributed cluster, but you will be able to see if your solution was correct by submitting your results to our system. •The data set we're using is an anonymized Web server log file from a public relations company whose clients were DVD distributors. The log file is in the udacity_training/data directory, and it's currently compressed using GnuZip. So you'll need to decompress it and then put it in HDFS. If you take a look at the file, you'll see that each line represents a hit to the Web server. It includes the IP address which accessed the site, the date and time of the access, and the name of the page which was visited. •The logfile is in Common Log Format: Write a MapReduce program which will display the number of hits for each different file on the web site. - how many hits were made to the page - /assests/js/the-associates.js Write a MapReduce program which determines the number of hits to the site made by each IP address - how many hits were made by IP address 10.00.99.186? Find the most popular file on the website: that is, the file whose path occurs most often in access_log. Your reducer should output the file's path and the number of times it appears in the log. IMPORTANT: Some pathnames in the log begin with "http://www.the-associates.co.uk". Be sure to remove the portion "http://www.the-associates.co.uk" from the pathnames in your mapper so that all occurrences of a file are counted together. - what is the Most popular file's path: ___________ - what is the number of occurrences: ___________
Attached
No File uploaded yet.
Tutor Uploads
No File uploaded yet.
Bidders
13
Average bid
$286.92
Rated 9.66 earned 155205.25 around 5191 assignments.
$130.00
Rated 9.11 earned 171508.20 around 5329 assignments.
$100.00
Rated 9.37 earned 62524.46 around 2184 assignments.
$200.00
$300.00
Rated 9.44 earned 10231.65 around 281 assignments.
$250.00
Rated 9.04 earned 16287.43 around 484 assignments.
$500.00
Rated 8.96 earned 79423.13 around 2456 assignments.
$100.00
Rated 9.44 earned 21955.32 around 740 assignments.
$1000.00
Rated 8.85 earned 19614.15 around 529 assignments.
$300.00
Rated 9.53 earned 42306.37 around 1255 assignments.
$250.00
Rated 9.68 earned 34626.28 around 1163 assignments.
$250.00
Rated 9.22 earned 30568.25 around 759 assignments.
$200.00
Rated 9.57 earned 37699.55 around 1270 assignments.
$150.00