
Background
Early Incident Identification
Disc Consulting Enterprises (DCE) has identified some potentially suspicious attacks on their network and computer systems. The attacks are thought to be a new type of attack from a skilled threat actor. To date, the attacks have only been identified ‘after the fact’ by examining post-exploitation activities of the attacker on compromised systems.
Unfortunately, the attackers are skilled enough to evade detection and the exact mechanisms of their exploits have not been identified.
The incident response team, including IT services, security operations, security architecture, risk management, the CISO (Chief Information Security Officer), and the CTO (Chief Technology Officer) have been meeting regularly to determine next steps.
It has been suggested that the security architecture and operations teams could try to implement some real-time threat detection using machine learning models that build on earlier consultancy your firm has completed (i.e., building upon your Assessment 1 work).
Data description
The data have already been provided (in Assessment 1), and the ML team (you) have undertaken some initial cleaning and analysis.
Things to keep in mind:
- Each event record is a snapshot triggered by an individual network ‘packet’. The exact triggering conditions for the snapshot are unknown. But it is known that multiple packets are exchanged in a ‘TCP conversation’ between the source and the target before an event is triggered and a record created. It is also known that each event record is anomalous in some way (the SIEM logs many events that may be suspicious).
- The malicious events account for a very small amount of data. As such, your training needs to consider the “imbalanced” data and the effect these data may have on accuracy (both specificity and sensitivity).
A very small proportion of the data are known to be corrupted by their source systems and some data are incomplete or incorrectly tagged. The incident response team indicated this is likely to be less than a few hundred records.
Assembled Payload Size (continuous) | The total size of the inbound suspicious payload. Note: This would contain the data sent by the attacker in the “TCP conversation” up until the event was triggered |
DYNRiskA Score (continuous) | An un-tested in-built risk score assigned by a new SIEM plug-in |
IPV6 Traffic (binary) | A flag indicating whether the triggering packet was using IPV6 or IPV4 protocols (True = IPV6) |
Response Size (continuous) | The total size of the reply data in the TCP conversation prior to the triggering packet |
Source Ping Time (ms) (continuous) | The ‘ping’ time to the IP address which triggered the event record. This is affected by network structure, number of ‘hops’ and even physical distances. E.g.: < 1 ms is typically local to the device1-5ms is usually located in the local network5-50ms is often geographically local to a country~100-250ms is trans-continental to servers250+ may be trans-continental to a small network. Note, these are estimates only and many factors can influence ping times. |
Operating System (Categorical) | A limited ‘guess’ as to the operating system that generated the inbound suspicious connection. This is not accurate, but it should be somewhat consistent for each ‘connection’ |
Connection State (Categorical) | An indication of the TCP connection state at the time the packet was triggered. |
Connection Rate (continuous) | The number of connections per second by the inbound suspicious connection made prior to the event record creation |
Ingress Router (Binary) | DCE has two main network connections to the ‘world’. This field indicates which connection the events arrived through |
Server Response Packet Time (ms) (continuous) | An estimation of the time from when the payload was sent to when the reply |
packet was generated. This may indicate server processing time/load for the event | |
Packet Size (continuous) | The size of the triggering packet |
Packet TTL (continuous) | The time-to-live of the previous inbound packet. TTL can be a measure of how many ‘hops’ (routers) a packet has traversed before arriving at our network. |
Source IP Concurrent Connection (Continuous) | How many concurrent connections were open from the source IP at the time the event was triggered |
Class (Binary) | Indicates if the event was confirmed malicious, i.e. 0 = Non-malicious, 1 = Malicious |
The needle in the haystack
The data were gathered over a period of time and processed by several systems in order to associate specific events with confirmed malicious activities. However, the number of confirmed malicious events was very low, with these events accounting for less than 1% of all logged network events.
Because the events associated with malicious traffic are quite rare, rate of ‘false negatives’ and ‘false positives’ are important.
Scenario
Following the meetings of the security incident response team, it has been decided to try to make an ‘early warning’ system that extends the functionality of their current SIEM. It has been proposed that DCE engage 3rd party developers to create a ‘smart detection plugin’ for the SIEM.
The goal is to have a plug-in that would extract data from real-time network events, send it to an external system (your R script) and receive a classification in return.
However, for the plugin to be effective it must consider the alert-fatigue experienced by security operations teams as excessive false-positives can lead to the team ignoring real incidents. But, because the impact of a successful attack is very high, false negatives could result in attackers overtaking the whole network.
You job
Your job is to develop the detection algorithms that will provide the most accurate incident detection. You do not need to concern yourself about the specifics of the SIEM plugin or software integration, i.e., your task is to focus on accurate classification of malicious events using R.
You are to test and evaluate two machine learning algorithms to determine which supervised learning model is best for the task as described.
Task
You are to import and clean the same MLData2023.csv, that was used in the previous assignment. Then run, tune and evaluate two supervised ML algorithms (each with two types of training data) to identify the most accurate way of classifying malicious events.
Part 1 – General data preparation and cleaning
- Import the MLData2023.csv into R Studio. This version is the same as Assignment 1.
- Write the appropriate code in R Studio to prepare and clean the MLData2023 dataset as follows:
- Clean the whole dataset based on what you have suggested / feedback received for Assignment 1.
- Filter the data to only include cases labelled with Class = 0 or 1.
- For the feature Operating.System, merge the three Windows categories together to form a new category, say Windows_All. Furthermore, merge iOS, Linux (Unknown), and Other to form the new category named Others. Hint: use the forcats:: fct_collapse(.) function.
- Similarly, for the feature Connection.State, merge INVALID, NEW and RELATED for form the new category named Others.
- Select only the complete cases using the na.omit(.) function, and name the dataset MLData2023_cleaned.
Briefly outline the preparation and cleaning process in your report and why you believe the above steps were necessary.
- Use the code below to generated two training datasets (one unbalanced mydata.ub.train and one balanced mydata.b.train) along with the testing set (mydata.test). Make sure you enter your student ID into the command set.seed(.).
For each of your two ML modelling approaches, you will need to:
statistics (i.e. CV results, tables and plots), where appropriate. If you are using repeated CVs, a minimum of 2 repeats are required.
For the precision, recall and F-score metrics, you will need to do a bit of research as to how they can be calculated. Make sure you define each of the above metrics in the context of the study.
What to submitGather your findings into a report (maximum of 5 pages) and citing relevant sources, if necessary. Present how and why the data was ‘cleaned and prepared’, how the ML models were tuned and provide the relevant CV results. Lastly, present how they performed to each other in both the unbalanced and balanced scenarios. You may use graphs, tables and images where appropriate to help your reader understand your findings. All tables and figures should be appropriately captioned, and referenced in-text. Make a final recommendation on which ML modelling approach is the best for this task. Your final report should look professional, include appropriate headings and subheadings, should cite facts and reference source materials in APA-7th format. Your submission must include the following:
Note that no marks will be given if the results you have provided cannot be confirmed by your code. No more than 20% of your code can be from online resources, including ChatGPT. Furthermore, all pages exceeding the 5-page limit will not be read or examined. Marking Criteria
Academic Misconduct Edith Cowan University regards academic misconduct of any form as unacceptable. Academic misconduct, which includes but is not limited to, plagiarism; unauthorised collaboration; cheating in examinations; theft of other student’s work; collusion; inadequate and incorrect referencing; will be dealt with in accordance with the ECU Rule 40 Academic Misconduct (including Plagiarism) Policy. Ensure that you are familiar with the Academic Misconduct Rules. Assignment ExtensionsInstructions to apply for extensions are available on the ECU Online Extension Request and Tracking System to formally lodge your assignment extension request. The link is also available on Canvas in the Assignment section. Normal work commitments, family commitments and extra-curricular activities are not accepted as grounds for granting you an extension of time because you are expected to plan ahead for your assessment due dates. Where the assignment is submitted not more than 7 days late, the penalty shall, for each day that it is late, be 5% of the maximum assessment available for the assignment. Where the assignment is more than 7 days late, a mark of zero shall be awarded. ![]() Get expert help for Machine Learning Modelling and many more. 24X7 help, plag free solution. Order online now! Universal Assignment (June 2, 2023) Assignment 2 Machine Learning Modelling. Retrieved from https://universalassignment.com/assignment-2-machine-learning-modelling/. "Assignment 2 Machine Learning Modelling." Universal Assignment - June 2, 2023, https://universalassignment.com/assignment-2-machine-learning-modelling/ Universal Assignment May 22, 2023 Assignment 2 Machine Learning Modelling., viewed June 2, 2023,<https://universalassignment.com/assignment-2-machine-learning-modelling/> Universal Assignment - Assignment 2 Machine Learning Modelling. [Internet]. [Accessed June 2, 2023]. Available from: https://universalassignment.com/assignment-2-machine-learning-modelling/ "Assignment 2 Machine Learning Modelling." Universal Assignment - Accessed June 2, 2023. https://universalassignment.com/assignment-2-machine-learning-modelling/ "Assignment 2 Machine Learning Modelling." Universal Assignment [Online]. Available: https://universalassignment.com/assignment-2-machine-learning-modelling/. [Accessed: June 2, 2023] Please note along with our service, we will provide you with the following deliverables:
Please do not hesitate to put forward any queries regarding the service provision. We look forward to having you on board with us. ![]() Recent AssignmentsCategoriesGet 90%* Discount on Assignment HelpMost Frequent Questions & AnswersUniversal Assignment Services is the best place to get help in your all kind of assignment help. We have 172+ experts available, who can help you to get HD+ grades. We also provide Free Plag report, Free Revisions,Best Price in the industry guaranteed. We provide all kinds of assignmednt help, Report writing, Essay Writing, Dissertations, Thesis writing, Research Proposal, Research Report, Home work help, Question Answers help, Case studies, mathematical and Statistical tasks, Website development, Android application, Resume/CV writing, SOP(Statement of Purpose) Writing, Blog/Article, Poster making and so on. We are available round the clock, 24X7, 365 days. You can appach us to our Whatsapp number +1 (613)778 8542 or email to info@universalassignment.com . We provide Free revision policy, if you need and revisions to be done on the task, we will do the same for you as soon as possible. We provide services mainly to all major institutes and Universities in Australia, Canada, China, Malaysia, India, South Africa, New Zealand, Singapore, the United Arab Emirates, the United Kingdom, and the United States. We provide lucrative discounts from 28% to 70% as per the wordcount, Technicality, Deadline and the number of your previous assignments done with us. After your assignment request our team will check and update you the best suitable service for you alongwith the charges for the task. After confirmation and payment team will start the work and provide the task as per the deadline. Yes, we will provide Plagirism free task and a free turnitin report along with the task without any extra cost. No, if the main requirement is same, you don’t have to pay any additional amount. But it there is a additional requirement, then you have to pay the balance amount in order to get the revised solution. The Fees are as minimum as $10 per page(1 page=250 words) and in case of a big task, we provide huge discounts. We accept all the major Credit and Debit Cards for the payment. We do accept Paypal also. Popular AssignmentsNURBN3034 Assessment TaskNURBN3034 Assessment Task 1a Global Health Issues Group ePoster Presentation Weighting: 30% Due date: Thursday August 18th 13.59pm each group Purpose: The purpose of this learning task is to choose one of the following global health issues that under the right conditions has or is likely to have a significant ISYS3375 Business AnalyticsSchool of Business, IT and Logistics — ISYS3375 Business Analytics Assessment 2: Case Study Assessment Type: Individual report Word limit: 2000-3000 (+/– 10%) Each Table or Figure is counted as Due date: Sunday of Week 5 23:59 (Melbourne time) Weighting: 35% 50 words + the number of words in its NURBN3032 Task 2: Managing a Transition to Practice IssueNURBN3032 Task 2: Managing a Transition to Practice Issue Weight: 60% Due: Thursday 18th May (Week 11) In this task, students are required to demonstrate knowledge relating to understanding and addressing a transitional issue that can affect new graduate nurses. Using evidence from current scholarly literature (i.e., less than seven ATS2561 Sex and the MediaAssessment Guide – Research Essay Due: Friday Week 12, submit on Moodle Weighting: 40% Length: 2000 words Write an essay responding to one of the following questions/topics: ‘objectifying’ images might be different, or that they should be understood as the same, for men and women. Your argument should address why Shoulder Range of Motion: A Key Component of Upper Body FunctionalityTitle: Shoulder Range of Motion: A Key Component of Upper Body Functionality Short Descriptiont: Shoulder range of motion is crucial for optimal upper body functionality as it involves a complex interplay of muscles, bones, and joints. Limitations in shoulder mobility can impact daily activities, athletic performance, as well as overall ITC597 Digital ForensicsITC597 Digital Forensics – SAMPLE EXAM ONLY This paper is for Distance Education (Distance), Port Macquarie, Study Centre Sydney and Study Centre Melbourne students. EXAM CONDITIONS: NO REFERENCE MATERIALS PERMITTED No calculator is permitted No dictionary permitted WRITING TIME: 2 hours plus 10 minutes reading time Writing is permitted during Simulation Project- Computer Lab ProjectModel and analyse the communication tower at the Casuarina campus. Apply dead, live and wind load as per in AS 1170 or other relevant standards in SAP2000. You should measure size of the elements as far as you can from or make reasonable assumptions about the dimensions. Reasonable assumptions should COM621 UX StrategySolent University Coursework Assessment Submission Module Name: UX Strategy Module Code: COM621 Module Leader: Assessment Submission Date: Student Number: UX Strategy Contents Part 1 – Introduction to System (1K words) 2 1.0 Introduction. 2 1.1 Current SUAA UX Design and Business Model 2 1.2 Academic and Market Research. 3 1.3 NOTES ON REPORT WRITINGNOTES ON REPORT WRITING The purpose of the reports is to get you to read widely and critically on each topic, to analyse your data and observations in the light of this reading, and to present your report in a concise, well-structured and neat manner. 1. Read as widely as MIT302 Internet of ThingsGroup Presentation and Video (part 2) Unit: MIT302 Internet of Things Due Date: 09/06/2023 Total Marks: This assessment is worth 10% of the full marks in the unit. Instructions: 1. Students are required to cover all stated requirements. 3. Please save the document as: MIT302_Firstname_Surname_StudentNumber[assessment1].ppt Requirements: Write a PowerPoint of MBA600 Capstone: StrategyAssessment 1 Information Subject Code: Subject Name: Assessment Title: Assessment Type: Length: Weighting: Total Marks: Submission: Due Date: MBA600 Capstone: Strategy Competitive Advantage Video Project Individual video recording 5 minutes (no more) 25% 100 Online Week 5 Your task Individually, you are required to record a 5-minute video, in which WHY SHOULD ALL NURSES LEARN ABOUT END OF LIFE CARE?Background You are a newly graduated nurse in the emergency department. Tom has been admitted with left abdominal pain radiating through to the back exacerbated by eating and drinking. The pain has significantly increased over the past two days and he currently rates it as 9/10. He has been unable Strategic Management AssignmentAssignment: Prepare a Comprehensive Strategic Management Analysis Report of Infosys Task: You need to develop a max 3,000-word Comprehensive Strategic Management Analysis Report addressing the four specific tasks set out in the strategic management assignment brief. The 3,000 words, exclude the Title, Abstract, Table of contents, Bibliography and Appendices. The company MID-PLACEMENT PRESENTATION ASSIGNMENTProgress: What is going on well CHALLENGES (WORKING REMOTELY) AREAS FOR DEVELOPMENT STRENGTHS NEXT PART OF PLACEMENT MULLER, 2014 The power of story Song line and dreaming tracks Defining knowledge, theories, and purposes: The most significant part was likened to a snake whose tail was cut off, and it had QUATTRO-CANNA HOLDINGS RESEARCH PROJECTBackground Local hemp products development company Quattro-Canna Holdings has signed a licence agreement with hemp processing equipment developer, Canadian Greenfield Technologies, to manufacture the HempTrain decorticator plant, designed for mass-processing of hemp straw- bales into bast fibre, hurd and green microfiber (GMF), in South Africa. The HempTrain will be manufactured Information Booklet ScaffoldAn information booklet contains-relevant information on a topic for a particular target audience. The format of an information booklet can vary, however, there are common elements, including: The following process can support you in developing an information booklet on a topic for a particular audience. There are three stages to CHCCSM004 Coordinate Complex Case RequirementsAssessment Task 1: Written Questions b. List and describe eight of a coordinators responsibilities. a. Explain how information about external service providers might be sourced. b. List three circumstances it might be necessary for a coordinator to use external service providers to ensure that a consumer’s care plan meets their needs and BU7401 Leadership in ActionA: Assessment Details Module Title Leadership in Action Module Code BU7401 Module Leader Component Number 1 Assessment Type, Word Count & Weighting Individual written assignment 4000 words 100% of module grade Submission Deadline 21/10/2022 Submission Instructions Online submission using Turn It In Feedback Return Date 4 weeks after submission B: Management Research PerspectivesSBS – DBA Assignment – 2023 UNIT TITLE: NAME (in Full): GENERAL INSTRUCTIONS converted to 90 marks. Total Marks / 90 PLAGIARISM Plagiarism is a form of cheating, by representing someone else’s work as your own or using someone else’s work (another student or author) without acknowledging it MARKETING PLAN ASSIGNMENT HELPI. Executive Summary The executive summary is a synopsis of the overall marketing plan and easier to write last, after the entire marketing plan has been written. II. Environmental Analysis Micro Analysis: Competitive forces (Five Force Analysis) Who are our major competitors? What are their characteristics (size, Bunnyland and Otherland: One Year Later – Exploring Food, Art, Leadership, Music, Psychology, and Self-ImprovementWord count – 2000 words Total Marks – 65 we return to Bunnyland and Otherland one year later! When we last saw them things had come to a tentative conclusion but substantial challenges remained. Could people from both lands manage to work together to solve their food problem? Would tensions MQBS7030 Final Assessment Data Analysis and ReportASSIGNMENT TASK: For this assignment, you need to refer to “Fringe” dataset. Fringe is concerned with the factors that contribute to the fringe benefits of employees. The dataset includes a range of different variables, which allows for a range of different tests to be performed. You should note that our 33116 SDA Group Report InformationPurpose: The purpose of this task is for you to demonstrate that you can apply the tools and statistical thinking you have learnt during the course. The Group Report and video will be done in teams, and is intentionally open-ended to expose you to the joys of problem solving with MIS770 Foundation Skills in Business AnalysisMIS770 Foundation Skills in Business Analysis Department of Information Systems and Business Analytics Deakin Business School Faculty of Business and Law, DeakinUniversity Assignment Two Analysis of Click Sales Data Particulars Assurance of Learning This assignment assesses the following Graduate Learning Outcomes and related Unit Learning Outcomes: Graduate Learning Outcome (GLO) Myopia and Later Physical Activity in Adolescence: A Prospective StudyQuestion 1 ( Read the paper Deere K, Williams C, Leary S, et al (2009). Myopia and later physical activity in adolescence: a prospective study. British Journal of Sports Medicine, 43,542–544. Critically appraise of the statistical material in this paper against items 10, 12-17 of the STROBE checklist. Present your ITECH7407 – Real Time AnalyticsAssessment Task – Data Analytics Assignment Overview For this assessment task, you will work in a group to analyse a selected data set, and provide recommendations to the leadership of the company based on your findings. Timelines and Expectations Percentage Value of Task: 25% Due: Week 11, Sunday 5pm Minimum BSB123 Data AnalysisBSB123 Data Analysis Research Report Assessment Semester 1, 2021 Due Date: 11:59 30th May The data for the Assignment can be found in the file Research Report Assessment (2021-01).xlsx on Blackboard The Problem FringeTech is an information technology / electrical engineering company that employs thousands of people Australia wide. Recently Final Analysis Assignment HelpRefer to the attached excel file, answer the questions below. Use graph if required. The file that can be accessed through the link below contains data on 100 employees in a particular occupation. Suppose that interest centres on investigating the factors that explain salary differences. The data set contains the following VETS6103 Data Analysis AssignmentFactors influencing milk production in Australian dairy cattle Assignment overview: This assignment involves analysing a dataset, interpreting results, and drawing conclusions based on the analyses. The dataset can be found in the file “practical_assignment_2021.xls” which is on Canvas under the Assignments folder. It is a group task worth 50% of ECON 1030 – BUSINESS STATISTICSECON 1030 – BUSINESS STATISTICS 1: Individual Assignment Instructions: This is an individual assignment with a total of 40 marks. The allocation of marks is as follows: Statistical Analysis (including excel) 32 Professional Report 8 Total 40 The response to the assignment must be provided in the form Can't Find Your Assignment?Open chat
1
Free Assistance
Universal Assignment
Hello 👋 How can we help you? |