UEL-CN-7031
Summative assessment Final Project 60% Final Project Presentation 40%
Submission instructions
- Cover sheet to be attached to the front of the assignment when submitted
- Question paper to be attached to assignment when submitted
- All pages to be numbered sequentially
Module code | UEL-CN-7031 |
Module title | Big Data Analytics |
Assignment title | Big Data Analytics: Coursework |
Assignment number | 1 |
Weighting | 100% (Final Project 60% and Presentation 40%) |
Submission date | Week 12 |
Additional information |
UEL-CN-7031 – Big Data Analytics
This coursework (CRWK) must be attempted as an individual work. This coursework is divided into two sections: (1) Big Data analytics on a real case study and (2) presentation.
Overall mark for CRWK comes from two main activities as follows:
1- Big Data Analytics report (around 5,000 words, with a tolerance of ± 10%) (60%) 2- Presentation (around 1000 words, with a tolerance of + 10%) (40%)
Marking Scheme Big Data Analytics report
Topic | Total mark | Remarks (breakdown of marks for each sub-task) | |
Big Data Analytics using HIVE | 30 | (10) | Providing big data queries using HIVE. |
(10) | Using Built-in (Date, Math, Conditional, and String) Functions in HIVE. | ||
(10) | Visualizing the results of queries into the graphical representations and be able to interpret them | ||
Big Data Analytics using Spark | 50 | (15) | Analyzing the dataset through statistical analysis methods. |
(35) | Designing single- and multi-class classifiers and evaluate and visualize the accuracy/performance. | ||
Individual assessment | 10 | (10) | Find alternative solutions for high level languages and analytics approaches (use references), and Express findings from big data analytics with the relevant theories. |
Documentation | 10 | (10) | Write down a scientific report. |
Total | 100 |
Big Data Analytics using Hadoop and Spark
UEL-CN-7031 – Big Data Analytics
Tasks:
- Understanding Dataset: UNSW-NB15
1The raw network packets of the UNSW-NB15 dataset was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours.
Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label.
- The features are described here.
- The number of attacks and their sub-categories is described here.
- In this coursework, we use the total number of 10-million records that was stored in the CSV file (download). The total size is about 600MB, which is big enough to employ big data methodologies for analytics. As a big data specialist, firstly, we would like to read and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import it into Hadoop HDFS, then make a Hive query for printing the first 5-10 records for your understanding.
(2) Big Data Query & Analysis by Apache Hive [30 marks]
This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings.
Finally, take screenshot of your outcomes (e.g., tables and plots) together with the scripts/queries into the report.
Tip: The mark for this section depends on the level of your HIVE queries’ complexities, for instance using the simple select query is not supposed for full mark.
(3) Advanced Analytics using PySpark [50 marks]
In this section, you will conduct advanced analytics using PySpark.
3.1. Analyze and Interpret Big Data (15 marks)
We need to learn and understand the data through at least 4 analytical methods (descriptive statistics, correlation, hypothesis testing, density estimation, etc.). You need to present your work numerically and graphically. Apply tooltip text, legend, title, X-Y labels etc. accordingly to help end-users for getting insights.
3.2. Design and Build a Classifier (35 marks)
- Design and build a binary classifier over the dataset. Explain your algorithm and its configuration. Explain your findings into both numerical and graphical representations. Evaluate the performance of the model and verify the accuracy and the effectiveness of your model. [15 marks]Apply a multi-class classifier to classify data into ten classes (categories): one normal and nine attacks (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms). Briefly explain your model with supportive statements on its parameters, accuracy and effectiveness. [20 marks]
(4) Individual Assessment [10 marks]
Discuss (1) what other alternative technologies are available for tasks 2 and 3 and how they are differ (use academic references), and (2) what was surprisingly new thinking evoked and/or neglected at your end?
Tip: add individual assessment of each member in a same report.
(5) Documentation [10 marks]
Document all your work. Your final report must follow 5 sections detailed in the “format of final submission” section (refer to the next page). Your work must demonstrate appropriate understanding of academic writing and integrity.
Marking Scheme for the Presentation
Topic | Total Marks | Remarks |
Content | 50 | Covers topic in-depth with details. |
Presentation design & layout features, Animations & transitions | 20 | Makes excellent use of fonts, colors, graphics, effects, transitions to enhance the presentation. |
Length | 10 | Correct use of number of slides, Word Count (1000 words)? |
Organization | 20 | Students present information in a logical, interesting sequence that the audience can follow. |
Total | 100 |
This will be the second Submission which is located at a different submission link and here you will submit a presentation based on the report above. This will have a weight of 40% of your Final Grade.
FORMAT OF FINAL SUBMISSION
- You need to prepare one single file in PDF format as your coursework within the following sections:
- Use ONLY one Cover Page
- Table of Contents
- Report of the tasks (it needs sub-sections for few tasks, accordingly)
- References (if any)
- And one PDF file for the presentation
SUBMISSION
single PDF into Turnitin in Moodle, by the end of Week 12
single PDF into Turnitin in Moodle at the second submission link for the presentation, by the end of Week 12
PLAGIARISM
The University defines an assessment offence as any action(s) or behaviour likely to confer an unfair advantage in assessment, whether by advantaging the alleged offender or disadvantaging (deliberately or unconsciously) another or others. A number of examples are set out in the Regulations and these include:
“D.5.7.1 (e) the submission of material (written, visual or oral), originally produced by another person or persons, without due acknowledgement, so that the work could be assumed the student’s own. For the purposes of these Regulations, this includes incorporation of significant extracts or elements taken from the work of (an) other(s), without acknowledgement or reference, and the submission of work produced in collaboration for an assignment based on the assessment of individual work. (Such offences are typically described as plagiarism and collusion.)”.
Get expert help for Big Data Analytics and many more. 24X7 help, plag free solution. Order online now!