Assignment 2 Machine Learning Modelling

Machine Learning Modelling

Background

Early Incident Identification

Disc Consulting Enterprises (DCE) has identified some potentially suspicious attacks on their network and computer systems. The attacks are thought to be a new type of attack from a skilled threat actor. To date, the attacks have only been identified ‘after the fact’ by examining post-exploitation activities of the attacker on compromised systems.

Unfortunately, the attackers are skilled enough to evade detection and the exact mechanisms of their exploits have not been identified.

The incident response team, including IT services, security operations, security architecture, risk management, the CISO (Chief Information Security Officer), and the CTO (Chief Technology Officer) have been meeting regularly to determine next steps.

It has been suggested that the security architecture and operations teams could try to implement some real-time threat detection using machine learning models that build on earlier consultancy your firm has completed (i.e., building upon your Assessment 1 work).

Data description

The data have already been provided (in Assessment 1), and the ML team (you) have undertaken some initial cleaning and analysis.

Things to keep in mind:

  • Each event record is a snapshot triggered by an individual network ‘packet’. The exact triggering conditions for the snapshot are unknown. But it is known that multiple packets are exchanged in a ‘TCP conversation’ between the source and the target before an event is triggered and a record created. It is also known that each event record is anomalous in some way (the SIEM logs many events that may be suspicious).
  • The malicious events account for a very small amount of data. As such, your training needs to consider the “imbalanced” data and the effect these data may have on accuracy (both specificity and sensitivity).

A very small proportion of the data are known to be corrupted by their source systems and some data are incomplete or incorrectly tagged. The incident response team indicated this is likely to be less than a few hundred records.

Assembled Payload Size (continuous)The total size of the inbound suspicious payload. Note: This would contain the data sent by the attacker in the “TCP conversation” up until the event was triggered
DYNRiskA Score (continuous)An un-tested in-built risk score assigned by a new SIEM plug-in
IPV6 Traffic (binary)A flag indicating whether the triggering packet was using IPV6 or IPV4 protocols (True = IPV6)
Response Size (continuous)The total size of the reply data in the TCP conversation prior to the triggering packet
Source Ping Time (ms) (continuous)The ‘ping’ time to the IP address which triggered the event record. This is affected by network structure, number of ‘hops’ and even physical distances.   E.g.: < 1 ms is typically local to the device1-5ms is usually located in the local network5-50ms is often geographically local to a country~100-250ms is trans-continental to servers250+ may be trans-continental to a small network. Note, these are estimates only and many factors can influence ping times.
Operating System (Categorical)A limited ‘guess’ as to the operating system that generated the inbound suspicious connection. This is not accurate, but it should be somewhat consistent for each ‘connection’
Connection State (Categorical)An indication of the TCP connection state at the time the packet was triggered.
Connection Rate (continuous)The number of connections per second by the inbound suspicious connection made prior to the event record creation
Ingress Router (Binary)DCE has two main network connections to the ‘world’. This field indicates which connection the events arrived through
Server Response Packet Time (ms) (continuous)An estimation of the time from when the payload was sent to when the reply
 packet was generated. This may indicate server processing time/load for the event
Packet Size (continuous)The size of the triggering packet
Packet TTL (continuous)The time-to-live of the previous inbound packet. TTL can be a measure of how many ‘hops’ (routers) a packet has traversed before arriving at our network.
Source IP Concurrent Connection (Continuous)How many concurrent connections were open from the source IP at the time the event was triggered
Class (Binary)Indicates if the event was confirmed malicious, i.e. 0 = Non-malicious, 1 = Malicious

The needle in the haystack

The data were gathered over a period of time and processed by several systems in order to associate specific events with confirmed malicious activities. However, the number of confirmed malicious events was very low, with these events accounting for less than 1% of all logged network events.

Because the events associated with malicious traffic are quite rare, rate of ‘false negatives’ and ‘false positives’ are important.

Scenario

Following the meetings of the security incident response team, it has been decided to try to make an ‘early warning’ system that extends the functionality of their current SIEM. It has been proposed that DCE engage 3rd party developers to create a ‘smart detection plugin’ for the SIEM.

The goal is to have a plug-in that would extract data from real-time network events, send it to an external system (your R script) and receive a classification in return.

However, for the plugin to be effective it must consider the alert-fatigue experienced by security operations teams as excessive false-positives can lead to the team ignoring real incidents. But, because the impact of a successful attack is very high, false negatives could result in attackers overtaking the whole network.

You job

Your job is to develop the detection algorithms that will provide the most accurate incident detection. You do not need to concern yourself about the specifics of the SIEM plugin or software integration, i.e., your task is to focus on accurate classification of malicious events using R.

You are to test and evaluate two machine learning algorithms to determine which supervised learning model is best for the task as described.

Task

You are to import and clean the same MLData2023.csv, that was used in the previous assignment. Then run, tune and evaluate two supervised ML algorithms (each with two types of training data) to identify the most accurate way of classifying malicious events.

Part 1 – General data preparation and cleaning

  1. Import the MLData2023.csv into R Studio. This version is the same as Assignment 1.
  2. Write the appropriate code in R Studio to prepare and clean the MLData2023 dataset as follows:
    1. Clean the whole dataset based on what you have suggested / feedback received for Assignment 1.
    1. Filter the data to only include cases labelled with Class = 0 or 1.
    1. For the feature Operating.System, merge the three Windows categories together to form a new category, say Windows_All. Furthermore, merge iOS, Linux (Unknown), and Other to form the new category named Others. Hint: use the forcats:: fct_collapse(.) function.
    1. Similarly, for the feature Connection.State, merge INVALID, NEW and RELATED for form the new category named Others.
    1. Select only the complete cases using the na.omit(.) function, and name the dataset MLData2023_cleaned.

Briefly outline the preparation and cleaning process in your report and why you believe the above steps were necessary.

  • Use the code below to generated two training datasets (one unbalanced mydata.ub.train and one balanced mydata.b.train) along with the testing set (mydata.test). Make sure you enter your student ID into the command set.seed(.).
Text Box: # Separate samples of non-malicious and malicious events
dat.class0 <- MLData2023_cleaned %>% filter(Class == 0) # non-malicious
dat.class1 <- MLData2023_cleaned %>% filter(Class == 1) # malicious

# Randomly select 19800 non-malicious and 200 malicious samples, then combine them to form the training samples
set.seed(Enter your student ID)
rows.train0 <- sample(1:nrow(dat.class0), size = 19800, replace = FALSE) rows.train1 <- sample(1:nrow(dat.class1), size = 200, replace = FALSE)
# Your 20000 unbalanced training samples
train.class0 <- dat.class0[rows.train0,] # Non-malicious samples train.class1 <- dat.class1[rows.train1,] # Malicious samples mydata.ub.train <- rbind(train.class0, train.class1) mydata.ub.train <- mydata.ub.train %>%
mutate(Class = factor(Class, labels = c("NonMal","Mal")))
“></td></tr></tbody></table></figure>



<p>Note that in the master data set, the percentage of malicious events is less than 1%. This distribution is roughly represented by the unbalanced data. The balanced data is generated based on up-sampling of the minority class using bootstrapping. The idea here is to ensure the trained model is not biased towards the majority class, i.e. non-malicious event.</p>



<h2 class=Part 2 – Compare the performances of different ML algorithms
  • Randomly select two supervised learning modelling algorithms to test against one another by running the following code. Make sure you enter your student ID into the command set.seed(.). Your 2 ML approaches are given by myModels.
set.seed(Enter your student ID) models.list1 <- c(“Logistic Ridge Regression”, “Logistic LASSO Regression”, “Logistic Elastic-Net Regression”) models.list2 <- c(“Classification Tree”, “Bagging Tree”, “Random Forest”) myModels <- c(sample(models.list1, size = 1), sample(models.list2, size = 1)) myModels %>% data.frame

For each of your two ML modelling approaches, you will need to:

  • Run the ML algorithm in R on the two training sets with Class as the outcome variable.
  • Perform hyperparameter tuning to optimise the model:
    • Outline your hyperparameter tuning/searching strategy for each of the ML modelling approaches. Report on the search range(s) for hyperparameter tuning, which 𝑘-fold CV was used, and the number of repeated CVs (if applicable), and the final optimal tuning parameter values and relevant CV

statistics (i.e. CV results, tables and plots), where appropriate. If you are using repeated CVs, a minimum of 2 repeats are required.

  • If your selected tree model is Bagging, you must tune the nbagg, cp and minsplit hyperparameters, with at least 3 values for each.
    • If your selected tree model is Random Forest, you must tune the num.trees and mtry hyperparameters, with at least 3 values for each.
    • Be sure to set the randomisation seed using your student ID.
  • Evaluate the predictive performance of your two ML models, derived from the balanced and unbalanced training sets, on the testing set. Provide the confusion matrices and report and describe the following measures in the context of the project:
    • False positive rate
    • False negative rate
    • Overall Accuracy
    • Precision
    • Recall
    • F-score

For the precision, recall and F-score metrics, you will need to do a bit of research as to how they can be calculated. Make sure you define each of the above metrics in the context of the study.

  • Provide a brief statement on your final recommended model and why you chose it. Parsimony, and to a lesser extent, interpretability maybe taken into account if the decision is close. You may outline your penalised model if it helps with your argument.

What to submit

Gather your findings into a report (maximum of 5 pages) and citing relevant sources, if necessary.

Present how and why the data was ‘cleaned and prepared’, how the ML models were tuned and provide the relevant CV results. Lastly, present how they performed to each other in both the unbalanced and balanced scenarios. You may use graphs, tables and images where appropriate to help your reader understand your findings. All tables and figures should be appropriately captioned, and referenced in-text.

Make a final recommendation on which ML modelling approach is the best for this task.

Your final report should look professional, include appropriate headings and subheadings, should cite facts and reference source materials in APA-7th format.

Your submission must include the following:

  • Your report (5 pages or less, excluding cover/contents/reference/appendix page). The report must be submitted through TURNITIN and checked for originality.
  • A copy of your R code, and three csv files corresponding to your two training sets and a testing set. The R code and data sets are to be submitted separately via another submission link.

Note that no marks will be given if the results you have provided cannot be confirmed by your code. No more than 20% of your code can be from online resources, including ChatGPT. Furthermore, all pages exceeding the 5-page limit will not be read or examined.

Marking Criteria

CriterionContribution to assignment mark
Accurate implementation data cleaning and of each supervised machine learning algorithm in R. Strictly about codeDoes the code work from start to finish?Are the results reproducible?Are all the steps performed correctly?Is it your own work?      20%
Explanation of data cleaning and preparation. Corresponds to Part 1 b)Briefly outline the reasons for sub-parts (i) and (ii).Provide justifications for merging of categories, i.e. sub- part(iii) and (iv).    10%
An outline of the selected modelling approaches, the hyperparameter tuning and search strategy, the corresponding performance evaluation in the training sets (i.e. CV results, tables and plots), and the optimal tuning hyperparameter values. Penalised logistic regression model – Outline the range of value for your lambda and alpha (if elastic-net). Plot/tabulate the CV results. Outline the optimal value(s) of your hyperparameter(s). Outline the coefficients if required for your arguments of model choice.        20%
Tree models – Outline the range of the hyperparameters (bagging and RF). Tabulate, e.g. the top combinations and the optimal OOB misclassification error, or plot the CV results (e.g. classification tree). 
Presentation, interpretation and comparison of the performance measures (i.e. confusion matrices, false positives, false negatives, and etc) among the selected ML algorithms. Justification of the recommended modelling approach. Provide the confusion matrices (frequencies, proportions) in the test set.   Predicted/Actual          Yes                                          No Yes                                     Freq1 (Sensitivity %)       Freq2 (False positives %) No                                      Freq3 (False                        Freq4 (Specificity %) negatives %)   Interpretation of the above metrics, including accuracy, precision, recall and F-score) in the context of the study.                30%
Report structure and presentation (including tables and figures, and where appropriate, proper citations and referencing in APA- 7th style). Report should be clear and logical, well structured, mostly free from communication, spelling and grammatical errors. Overall structure, presentation and narrative.ReferencingTable and figures are clear, and properly labelled and referenced.No screenshots of R output, except of plots.Spelling and grammar.            20%
                     

Academic Misconduct

Edith Cowan University regards academic misconduct of any form as unacceptable. Academic misconduct, which includes but is not limited to, plagiarism; unauthorised collaboration; cheating in examinations; theft of other student’s work; collusion; inadequate and incorrect referencing; will be dealt with in accordance with the ECU Rule 40 Academic Misconduct (including Plagiarism) Policy. Ensure that you are familiar with the Academic Misconduct Rules.

Assignment Extensions

Instructions to apply for extensions are available on the ECU Online Extension Request and Tracking System to formally lodge your assignment extension request. The link is also available on Canvas in the Assignment section.

Normal work commitments, family commitments and extra-curricular activities are not accepted as grounds for granting you an extension of time because you are expected to plan ahead for your assessment due dates.

Where the assignment is submitted not more than 7 days late, the penalty shall, for each day that it is late, be 5% of the maximum assessment available for the assignment.

Where the assignment is more than 7 days late, a mark of zero shall be awarded.

Order Now

Get expert help for Machine Learning Modelling and many more. 24X7 help, plag free solution. Order online now!

Universal Assignment (July 16, 2024) Assignment 2 Machine Learning Modelling. Retrieved from https://universalassignment.com/assignment-2-machine-learning-modelling/.
"Assignment 2 Machine Learning Modelling." Universal Assignment - July 16, 2024, https://universalassignment.com/assignment-2-machine-learning-modelling/
Universal Assignment May 22, 2023 Assignment 2 Machine Learning Modelling., viewed July 16, 2024,<https://universalassignment.com/assignment-2-machine-learning-modelling/>
Universal Assignment - Assignment 2 Machine Learning Modelling. [Internet]. [Accessed July 16, 2024]. Available from: https://universalassignment.com/assignment-2-machine-learning-modelling/
"Assignment 2 Machine Learning Modelling." Universal Assignment - Accessed July 16, 2024. https://universalassignment.com/assignment-2-machine-learning-modelling/
"Assignment 2 Machine Learning Modelling." Universal Assignment [Online]. Available: https://universalassignment.com/assignment-2-machine-learning-modelling/. [Accessed: July 16, 2024]

Please note along with our service, we will provide you with the following deliverables:

Please do not hesitate to put forward any queries regarding the service provision.

We look forward to having you on board with us.

Categories

Get 90%* Discount on Assignment Help

Most Frequent Questions & Answers

Universal Assignment Services is the best place to get help in your all kind of assignment help. We have 172+ experts available, who can help you to get HD+ grades. We also provide Free Plag report, Free Revisions,Best Price in the industry guaranteed.

We provide all kinds of assignmednt help, Report writing, Essay Writing, Dissertations, Thesis writing, Research Proposal, Research Report, Home work help, Question Answers help, Case studies, mathematical and Statistical tasks, Website development, Android application, Resume/CV writing, SOP(Statement of Purpose) Writing, Blog/Article, Poster making and so on.

We are available round the clock, 24X7, 365 days. You can appach us to our Whatsapp number +1 (613)778 8542 or email to info@universalassignment.com . We provide Free revision policy, if you need and revisions to be done on the task, we will do the same for you as soon as possible.

We provide services mainly to all major institutes and Universities in Australia, Canada, China, Malaysia, India, South Africa, New Zealand, Singapore, the United Arab Emirates, the United Kingdom, and the United States.

We provide lucrative discounts from 28% to 70% as per the wordcount, Technicality, Deadline and the number of your previous assignments done with us.

After your assignment request our team will check and update you the best suitable service for you alongwith the charges for the task. After confirmation and payment team will start the work and provide the task as per the deadline.

Yes, we will provide Plagirism free task and a free turnitin report along with the task without any extra cost.

No, if the main requirement is same, you don’t have to pay any additional amount. But it there is a additional requirement, then you have to pay the balance amount in order to get the revised solution.

The Fees are as minimum as $10 per page(1 page=250 words) and in case of a big task, we provide huge discounts.

We accept all the major Credit and Debit Cards for the payment. We do accept Paypal also.

Popular Assignments

Score good marks in your Master of Pharmacy (M.Pharm)

Master Your Knowledge: A Guide to a Fulfilling Career in Pharmacy with an M.Pharm The Master of Pharmacy (M.Pharm) program equips you to become a medication management specialist, delving into the science behind drugs and their impact on the human body. You’ll explore advanced pharmaceutical topics, from drug discovery and

Read More »

Score good marks in your Master of Science (M.Sc) in Microbiology

Unveiling the Microscopic Marvels: A Guide to Your Master of Science (M.Sc.) in Microbiology The Master of Science (M.Sc.) in Microbiology program equips you to delve into the fascinating world of microorganisms, from life-saving bacteria to infectious pathogens. You’ll explore their role in health, disease, the environment, and even industrial

Read More »

Score good marks in your Master of Science (M.Sc) in Microbiology

Master Your Craft: A Guide to a Thriving Career in Biotechnology with an M.Sc. The Master of Science (M.Sc.) in Biotechnology program equips you to be at the forefront of scientific discovery, innovation, and problem-solving in the exciting field of biotechnology. From developing life-saving drugs to creating sustainable biofuels, the

Read More »

Score good marks in your Master of Technology (M.Tech) in Information Technology

The Master of Technology (M.Tech) in Information Technology program propels you to the forefront of the ever-evolving IT landscape. You’ll delve into advanced computing concepts, cutting-edge technologies, and specialized areas of IT expertise. But navigating complex algorithms, intricate software systems, and in-depth research projects can feel overwhelming. Universal Assignment Solutions

Read More »

Score good marks in your MBA in Human Resource Management

The Master of Business Administration (MBA) in Human Resource Management (HRM) equips you to become a strategic HR professional, shaping the future of workplaces by attracting, developing, and retaining top talent. However, navigating complex workforce management issues, crafting effective HR policies, and staying abreast of evolving labor laws can feel

Read More »

A Guide to Your Master of Computer Applications (MCA) Journey

The Master of Computer Applications (MCA) program equips you with the skills to become a sought-after software developer or IT professional. However, navigating complex programming languages, intricate algorithms, and advanced software development methodologies can feel overwhelming. Universal Assignment Solutions can be your guiding light! We offer comprehensive assignment help designed

Read More »

Expert Assignment Help for Master of Climate Science Students

Embark on your Master of Climate Science journey with confidence! As you delve into the complexities of climate systems, atmospheric dynamics, and the pressing challenges of climate change, feeling overwhelmed by demanding coursework is natural. Universal Assignment Solutions can be your guide! We offer comprehensive assignment help designed to empower

Read More »

Expert Assignment Help for Master of Information Systems Students

Mastering the complexities of information systems (IS) in today’s ever-evolving digital landscape requires a blend of technical expertise and strategic thinking. Enrolled in a Master of Information Systems (MSIS) program, you’re poised to become a leader in designing, implementing, and managing the information systems that power our world. But feeling

Read More »

Expert Assignment Help for Your Master of Science in Statistics

Embarking on your Master of Science in Statistics (MS Statistics) program is an exciting step towards a rewarding career. As you delve into the intricacies of statistical theory, data analysis techniques, and advanced modeling, feeling overwhelmed by challenging coursework is natural. Universal Assignment Solutions can be your guide! We offer

Read More »

Expert Assignment Help for Master of Urban Design Students

Master of Urban Design (MUD) programs equip you to transform cities into vibrant, sustainable, and equitable spaces. But navigating complex urban design theories, crafting master plans, and tackling real-world design challenges can feel overwhelming. Universal Assignment Solutions can be your design compass! We offer comprehensive assignment help designed to empower

Read More »

Expert Assignment Help for Master of Public History Students

Embark on your Master of Public History journey with confidence! As you delve into the complexities of museums, archives, historic preservation, and interpreting the past for diverse audiences, feeling overwhelmed by demanding coursework is natural. Universal Assignment Solutions can be your trusted partner! We offer comprehensive assignment help designed to

Read More »

Expert Assignment Help for Master of Conservation Biology Students

Are you passionate about protecting our planet’s incredible biodiversity but feeling overwhelmed by the complexities of conservation biology? Struggling with research proposals, population modeling, or navigating the intricacies of habitat restoration in your Master’s program? Universal Assignment Solutions can be your scientific compass! We offer comprehensive assignment help designed to

Read More »

Expert Assignment Help for Master of International Education Students

Embarking on your Master of International Education (MIE) journey is a noble pursuit. As you delve into the complexities of intercultural learning, global citizenship education, and preparing students for a globally interconnected world, feeling overwhelmed by demanding coursework is natural. Universal Assignment Solutions can be your trusted guide! We offer

Read More »

Expert Assignment Help for Master of Public Art Studies Students

Embarking on your Master of Public Art Studies program is an exciting venture. As you delve into the world of public art theory, community engagement, and artistic interventions in the urban landscape, feeling overwhelmed by demanding coursework is natural. Universal Assignment Solutions can be your trusted guide! We offer comprehensive

Read More »

Expert Assignment Help for Master of Real Estate Finance Students

Feeling lost in the labyrinth of loan-to-value ratios, cap rates, and complex financial modeling for real estate projects? Drowning in the sea of market analysis and feasibility studies in your Master of Real Estate Finance program? Universal Assignment Solutions can be your compass, navigating you towards becoming a real estate

Read More »

Expert Assignment Help for Master of Corporate Finance Students

Feeling lost in the labyrinth of financial models, complex valuation techniques, and demanding coursework in your Master of Corporate Finance program? Universal Assignment Solutions can be your compass to navigating the exciting world of corporate finance! We offer comprehensive assignment help designed to empower you to become a financial whiz

Read More »

Expert Assignment Help for Master of Arts in Teaching Students

Embarking on your Master of Arts in Teaching (MAT) journey is a noble pursuit. As you delve into the complexities of pedagogy, curriculum development, and educational leadership, feeling overwhelmed by demanding coursework is natural. Universal Assignment Solutions can be your trusted guide! We offer comprehensive assignment help designed to empower

Read More »

Expert Assignment Help for Master of Food Science Students

Feeling overwhelmed by the intricate dance of chemistry, biology, and engineering in your food? Drowning in complex food processing techniques and demanding coursework in your Master of Food Science program? Universal Assignment Solutions can be your culinary compass! We offer comprehensive assignment help designed to empower you to become a

Read More »

Expert Assignment Help for Master of Educational Leadership Students

Feeling overwhelmed by the complexities of educational leadership, the weight of educational policy, and demanding coursework in your Master of Educational Leadership program? Universal Assignment Solutions can be your guiding light! We offer comprehensive assignment help designed to empower you to become a visionary leader who transforms schools. Why Choose

Read More »

Your Premier Applied Mathematics Assignment Help Service Globally

Are you struggling with your Master of Applied Mathematics assignments? Look no further than Universal Assignment Solutions, the top assignment help service globally. Our dedicated team of experts is here to support you through your academic journey, ensuring you achieve top grades and a deep understanding of applied mathematics concepts.

Read More »

Your Premier Media Studies Assignment Help Service Globally

Are you struggling with your Master of Media Studies assignments? Look no further than Universal Assignment Solutions, the top assignment help service globally. Our dedicated team of experts is here to support you through your academic journey, ensuring you achieve top grades and a deep understanding of media studies concepts.

Read More »

Can't Find Your Assignment?

Open chat
1
Free Assistance
Universal Assignment
Hello 👋
How can we help you?