
Assessment Brief- Assessment 3- Map-Reduce Programming Challenge
Unit Code/Description ICT313 Big Data for Software Development
Course/Subject Bachelor of Information Technology
Semester S1 – 2025
Unit Learning Outcomes
Addressed
ULO3: Critically assess and implement advanced data pre-processing and
analytics strategies in a software development context, focusing on tasks like data
cleansing, transformation, and feature selection.
ULO4: Design, develop, and evaluate big data solutions using programming
models like Map-Reduce and technologies like Hadoop, tailored specifically to
address software development needs such as DevOps integration and quality
assurance.
Assessment Objective The objective of this assessment is to assess student’s knowledge and practical
skills in working with large-scale datasets and leveraging Hadoop ecosystem tools
and technologies for data processing and analysis.
Assessment Title/Type Assessment 3: Map-Reduce Programming Challenge (Individual Assignment)
Due Date Week 10, Sunday, 18 May, 11.59 PM
Weighting 20%
Instructions to Students See the assignment description in below
Format/Structure Ms Word or PDF for the report, dataset and code files
Word/Page limit length of 500 words for the report, font Calibri 12
Referencing Style American Psychological Association (APA)
Submission Guidelines • All work must be submitted on Moodle by the due date
- A PDF or Ms Word file must be submitted which includes all required steps,
discussion and evidence of completion of tasks - Students must present a demo of the project to their lecturer in week 11,
otherwise, they will receive no mark for their submission.
Plagiarism and Academic
Integrity
At CIHE, we take academic integrity seriously and expect all students to maintain
the highest standards of honesty and ethical behaviour in their academic work. As
a student, it is your responsibility to ensure that all your academic endeavors are
conducted with integrity and in accordance with the principles of honesty,
fairness, and respect for intellectual property. Please refer to “CIHE Student
Academic Integrity and Honesty Policy” in the Moodle for details.
Late Submission Policy An assessment item submitted after the assessment due date, without an
approved extension or without approved mitigating circumstances, will be
penalised. The standard penalty is the reduction of the mark allocated to the
assessment item by 10% of the total mark applicable for the assessment item, for
each day or part day that the item is late. Assessment items submitted more than
ten days after the assessment due date are awarded zero marks.
Assignment Description (Total marks 20)
Supporting Materials
All supporting materials for this assessment can be found in Hadoop Files folder in Moodle:
1- A virtual machine has been prepared for you on which Ubuntu and Hadoop have been
installed and configured (Hadoop Virtual Machine). All files related to the virtual machine
can be found in the zip file Hadoop_VM (WMWare) or Hadoop_VM (VirtualBox). You need
to download the Zip file and open it on your computer’s hard drive. Then, you need to
install VMWare Player on your computer and open the virtual machine file.
2- Virtual Machine Tutorial (Part 1 and 2) is a tutorial video on how to use the virtual
machine. It shows step by step on how you can you start Hadoop and run a WordCount
example.
3- Hadoop Tutorial.PDF also provides you with detailed instructions on how to start Hadoop
and run WordCount example.
Instructions
The following file contains user ratings for Amazon products:
Amazon Product Review
(Note: If the link doesn’t work, you can download the file from Moodle. It exists in the
Assessment section).
Each user has rated at least one product. The format of the data file is CSV and contains four
columns: User ID, Product ID, Rating, Timestamp. Rating is from 1 to 5. The timestamps are
unix seconds since 1/1/1970 UTC. For example, the following line of the file
A000681618A3WRMCK53V
B0002Y5WZM
2
1383609600
Is interpreted as follows: User A000681618A3WRMCK53V has rated product B0002Y5WZM, 2/5 at
time 1383609600 (Tuesday, Nov 05 2013 11:00:00, Australian Eastern Daylight Time).
Your task is to use MapReduce programming and find the average rating for each product.
Here is an example of the output:
Product ID
Average Rating
0321732944
0439886341
3.47
4.21
You can choose the output format. However, the required information must be included in
the output. You need to include the output file in your submission.
Deliverable
You need to submit an MS Word or a PDF file which includes the following items: –
The source code for map and reduce function (copied/pasted into the MS Word or PDF
file; no separate file is needed). – – – –
The output file.
Enough screenshots on the steps taken to get the program running.
Screenshots for the output generated by the program. The student’s name must be also
part of the printed information. Annotate all screenshots with brief descriptions (one line
or two is enough).
In all screenshots, the date and time of the computer must be clearly shown in the
corner (look at the sample below). Make sure the date and time of your computer is
correct.
A section for discussing the potential benefits of your project for Amazon. You need to
explain how Amazon can make informed decisions based on the results of your project.
This section must be 450 – 550 words.
Note: In order to receive a mark for your submission, it is mandatory to present a demo of your
project to your lecturer during week 12. Failure to do so will result in a zero mark for your
submission. The lecturer has the discretion to adjust team contributions based on individual
contributions to the demo. It is important to demonstrate your active participation and
contribution to the project during the demo to ensure fair grading and assessment.
Marking Rubric
Criteria
Poor (0-25%) Fair
(25-50%) Good (50-75%) Excellent (75-100%)
Data Preparation
(3 marks)
No evidence to show
Hadoop runs correctly
Hadoop is up and running
The data file is not
downloaded correctly
Hadoop is up and running
The data file is downloaded
correctly
Hadoop is up and running
The data file is downloaded and put in HDFS
Map and Reduce
Functions
(5 marks)
No Map or Reduce function
is included
Either Map or Reduce
function is implemented
correctly
Map and Reduce functions are
implemented but there are
minor issues
Map and Reduce functions are implemented
correctly and included in the report
Screenshots of
the whole
process
(3 marks)
No screenshot or not related
screenshots to the process of
data preparation, running
Hadoop and MapReduce
programming
Only few screenshots are
included Several steps are
missing or have no evidence
No or poor annotations
Only one or two steps are
missing or have no evidence
Annotations for screenshots are
not comprehensive
Enough screenshots are included to show the
whole process is correct: data preparation,
running Hadoop and MapReduce programming.
Screenshots are well annotated
Output
(4 marks)
The output is incorrect
No team member’s name is
shown on the screenshot
Some parts of the output are
correct
Team members’ names are
shown on the screenshot
The output is correct but no
team member’s name is shown
on the screenshot
The output is correct and all required
information is included
Screenshots as evidence attached
Names of group members are shown on the
screenshots
Discussion
(3 marks)
Discussion is vague or very
brief lacks of details and
reasoning
Discussion covers some
potential benefits of the
project for Amazon.
Discussion covers several
potential benefits of the project
for Amazon.
Length of this section is 500
words
An insightful discussion has been provided
that covers the potential benefits of the
project for Amazon.
Length of this section is 500 words
Language
(2 marks)
The report is badly
structured and written,
containing numerous
grammatical and spelling
errors. The language is
often confusing and
inappropriate for the
intended audience.
The report structure and
writing need
improvements.
It contains some
grammatical and
spelling errors. The
language is sometimes
imprecise or
inappropriate for the
intended audience.
The report is adequately
structured, clearly written,
and mostly free of
grammatical and spelling
errors. The language is
appropriate for the intended
audience.
The report is well-structured, clearly
written, and free of grammatical and
spelling errors. The language is
sophisticated, precise, and appropriate
for the intended audience.

