Practical Assessment 2 Data Wrangling

Practical Assessment 2 Data Wrangling

Practical Assessment 2 Data Wrangling

Assessment type:Written report (PDF document) using R Markdown
Due date:26th May 2022, 5 pm Melbourne time
Weighting:30%
Word limit:Maximum 25 pages
Feedback mode:Feedback will be provided using Canvas marking tool and general text comments.

Group assessment

You will work on this assessment in a team of up to three students. You will select your own team member. Although you will work on the assessment together as a team, one of the team members will submit the report (PDF document) using R Markdown (otherwise there will be mixed up). Please write the details of your team members at the beginning of the report. If you prefer to work individually, that is also fine.

Purpose

The purpose of this assessment is to put to work the tools and knowledge that you gain throughout this course. This provides you with multiple benefits.

  • It will provide you with more experience using data preprocessing tools on real life data sets.
  • It helps you to self-direct your learning and interests to find unique and creative ways to wrangle your data.
  • It starts to build your data analytics portfolio. Portfolios (or e-portfolios) are a great way to show potential employers what you are capable of.

Overview

This assessment requires you to find some open data, and use your knowledge, skills gained during the course to preprocess the data. You will create a report using R Markdown to explain the steps taken by you in order to perform the data preprocessing tasks.

Assessment criteria and weighting

Please see the marking rubric to know the assessment criteria and weightage. Course learning outcomes This assessment is linked to the following course learning outcomes:

Course Learning outcomes

This assessment is linked to the following course learning outcomes:

  • Accurately, logically and ethically combine data from multiple sources to make suitable for statistical analysis and draw valid interpretations.
    • Articulate how data meets the best practice standards (e.g. tidy data principles).
    • Select, perform and justify data validation processes for raw datasets.
    • Use leading open source software (e.g. R) for reproducible, automated data processing.

Assignment data sources

Assessment 2 is open-ended however you are required to find suitable datasets that fulfil the minimum requirements given below. All the datasets that you use in this Assessment must be open and ideally have a Creative Commons Licence. This will ensure you can share your work with anyone provided you make proper attribution. If you’re not sure if data is Open, contact the provider, read the documentation or post on the discussion board and I will investigate. Some open data sources are provided below, but I encourage you to find others:

Minimum requirements for the data sets

Considering this is a data preprocessing class, I do expect your data set to have certain requirements so that you can demonstrate your knowledge of data preprocessing. The following are the minimum requirements for the data sets that I will look for:

  1. At least two data sets should be merged to create your assessment data (for example you can take crime statistics for the cities/states in Australia and merge this data set with cities/states’ per capita income data).
  • Your data set should include multiple data types (numeric, character, factor, etc).
  • Your data set should include variables suitable for data type conversions so that you should be able to apply the required data type conversions (e.g., character -> factor, character -> date, numeric -> factor, etc. conversions).
  • Your data set should include at least one factor variable that needs to be labelled and/or ordered.
  • At least one of the data sets that you use should be Untidy. You need to explain why the data set or data sets you used is/are Untidy. Then you need to apply the required steps to reshape your data into a tidy format.
  • At least one variable needs to be created/mutated from the existing ones (e.g., the data may contain income and expense variables and you may create a savings variable out of the income and expense variables).
  • You are expected to scan all variables for missing values, special values, and obvious errors (i.e., inconsistencies). If there are missing values, use any of the suitable techniques outlined in Module 5 to deal with them, reason and document your approach properly. If there are no missing values in the data, then scan all variables for any special values and obvious errors, use any of the suitable techniques outlined in Module 5 to deal with them, reason and document your approach properly.
  • You are expected to scan all numeric variables for outliers. If there are outliers, use any of the suitable techniques outlined in Module 6 to deal with them, reason and document your approach properly.
  • You are expected to apply data transformations on at least one of the variables. The purpose of this transformation should be one of the following reasons: i) to change the scale for better understanding of the variable, ii) to convert a non-linear relation into linear one, or

iii) to decrease the skewness and convert the distribution into a normal distribution.

  1. The packages/functions readr, xlsx, readxl, foreign, gdata, rvest, dplyr, tidyr, deductive, deducorrect, editrules, validate, Hmisc, forecast, stringr, lubridate, car, outliers, MVN, infotheo, MASS, caret, MLR, ggplot2, knitr and base R will be useful. You can also use your own functions. This will show your accumulated knowledge that you gained throughout the semester in this course.

Optional things that you can do to preprocess data:

You can subset your data by selecting variables and/or filtering in (or out) cases. Please don’t

forget to put an explanation in your report if you do so.

  • Your data set can include date or string information or both. If this is the case, I expect you to apply required date conversions for dates and string manipulations for strings as required.
  • Depending on your level of knowledge gained in other courses (i.e., Applied Analytics

and/or Machine Learning, etc) you may apply data normalisation, feature selection and

feature extraction. Note that, this is an optional task, and you don’t have to apply any of these techniques if you don’t know the theory and the fundamentals.

Important Note:

Note that sometimes the order of the tasks may be different than the order given here. For example, you may need to tidy the data sets first to be able to create the common key to merge. Therefore, for such cases you may have a different ordering of the sections. Any further or optional pre-processing tasks can be added to the template using an additional section in the R Markdown file. Make sure your code is visible (within the margin of the page).

Create the report using R Markdown

The assessment 2 report must be completed using the R Markdown template. Note that this is an R Markdown notebook template. Information for using the R Markdown package can be found Here. The R Markdown template must be updated with your name(s) and student number(s). You must use the headings and chunks provided in the template. You can add more chunks if required. Your report will be composed of the following sections. In the report, all R chunks and outputs need to be visible. Failure to do so will result in a loss of marks.

Sections of the report:

  1. Students’ details [YAML input]: Add students’ full names, numbers, and the percentage of contributions in table “Group information”. Add the leader’s information (the one that submits assessment report) in “author” entries in the YAML header (located at the top of the R Markdown Template). If you work individually, then add only your own details in the table and write 100 % in percentage of contribution.
  2. Required packages [R code]: Provide the packages required to reproduce the report.
  3. Executive Summary [Plain text]: In your own words, provide a summary of the preprocessing. Explain the steps that you have taken to preprocess your data. Write this section last after you have performed all data preprocessing. (Word count Max: 300 words).
  4. Data [Plain text & R code & Output]: A clear description of data sets, their sources, and variable descriptions should be provided. In this section, you must also provide the R codes with outputs (e.g., head of data sets) that you used to import/read/scrape the data set. You need to fulfil the minimum requirement #1 and merge at least two data sets to create the one you are going to work on. In addition to the R codes and outputs, you need to explain the steps that you have taken.
  5. Understand [Plain text & R code & Output]: Summarise the types of variables and data structures, check the attributes in the data and apply proper data type conversions. In addition to the R codes and outputs, briefly explain the steps that you have taken. In this section, show that you have fulfilled minimum requirements 2-4.
  • Tidy & Manipulate Data I [Plain text & R code & Output]: Explain why your data (or one of the data sets) doesn’t conform the tidy data principles (minimum requirement #5). Apply the required steps to reshape the data into a tidy format. In addition to the R codes and outputs, explain everything that you do in this step.
  • Tidy & Manipulate Data II [Plain text & R code & Output]: Create/mutate at least one variable from the existing variables (minimum requirement #6). In addition to the R codes and outputs, explain everything that you do in this step.
  • Scan I [Plain text & R code & Output]: Scan the data for missing values, special values, and obvious errors (i.e., inconsistencies). In this step, you should fulfil the minimum requirement #7. In addition to the R codes and outputs, explain your methodology (i.e., explain why you have chosen that methodology and the actions that you have taken to handle these values) and communicate your results clearly.
  • Scan II [Plain text & R code & Output]: Scan the numeric data for outliers. In this step, you should fulfil the minimum requirement #8. In addition to the R codes and outputs, explain your methodology (i.e., explain why you have chosen that methodology and the actions that you have taken to handle these values) and communicate your results clearly.
  • Transform [Plain text & R code & Output]: Apply an appropriate transformation for at least one of the variables. In addition to the R codes and outputs, explain everything that you do in this step. In this step, you should fulfil the minimum requirement #9.

Submission Format

  • Upload the report as one single file (PDF) via the assessment 2 page in CANVAS.
  • The easiest way to produce a PDF file from the RMarkdown is to Run all R chunks, then Preview your notebook in HTML (by clicking Preview) → Open in Browser (Chrome) → Right- click on the report in Chrome → Click Print and Select the Destination Option to Save as PDF.
  • After creating your PDF file make sure and check that your codes and outputs are visible.

Referencing guidelines

You must acknowledge all the sources of information you have used in your assessments. Refer to the RMIT Easy Cite Referencing Tool to see examples and tips on how to reference in the appropriate style. You can also refer to the Library Referencing Page for more tools such as

EndNote, referencing tutorials and referencing guides for printing. Use the RMIT Harvard referencing method for this assessment.

Collaboration

You are permitted to discuss and collaborate on the assessment with other groups. However, the write-up of the report must be with your own allocated group effort. Assignments will be

submitted through Turnitin, so if you’ve copied from other groups, it will be detected. It is your responsibility to ensure you do not copy or do not allow another group to copy your work. If plagiarism is detected, both groups will be responsible. It is good practice to never share assessment files with others. You should ensure you understand your responsibilities by reading the RMIT University website on Academic Integrity. Ignorance is no excuse.

Academic integrity and plagiarism

Academic integrity is about the honest presentation of your academic work. It means acknowledging the work of others while developing your own insights, knowledge, and ideas. You should take extreme care that you have:

  • Acknowledged words, data, diagrams, models, frameworks, and/or ideas of others you have quoted (i.e., directly copied), summarised, paraphrased, discussed, or mentioned in your assessment through the appropriate referencing methods.
  • Provided a reference list of the publication details so your reader can locate the source if

necessary. This includes material taken from internet sites.

If you do not acknowledge the sources of your material, you may be accused of plagiarism because you have passed off the work and ideas of another person, without appropriate referencing, as if they were your own.

RMIT University treats plagiarism as a very serious offense constituting misconduct. Plagiarism covers a variety of inappropriate behaviours, including:

  • Failure to properly document a source.
  • Copyright material from the internet or databases.
  • Collusion between students. For further information on our policies and procedures, please refer to the University Website.

Assessment declaration

When you submit work electronically, you agree to the Assessment Declaration.

Extensions and special consideration

This course follows the RMIT University Assessment policy for extensions and special consideration. Information is available Here. Ensure you understand these guidelines before applying.

Extensions will only be granted in accordance with the RMIT University Extension and Special Consideration Policy. No exceptions. Assignments submitted late will be penalised (see below for further details).

Late submission of assessment

Late submissions, without an approved extension or special consideration, will incur a penalty of 10% of the total mark per day for up to 5 days late (so the maximum late penalty is 50%). Submissions more than 5 days late are not accepted.

Penalty for exceeding maximum number of 25 Pages

A penalty of 5% of the total mark will be applied per each extra page.

Assessment2 marking rubric

CriteriaTo meet all requirements and get the full point, you must complete the following criteria for each part
Executive Summary (5)  A complete summary of the data preprocessing tasks was provided
Data (10)Complete & clear description ofdata sets,their sources,variable descriptions were provided.Data met the minimum requirement #1.Merging data was correct (you may want to merge your data sets after tidying data set(s)).R codes with outputs (head of data) were providedBrief explanations of steps were given.
Understand (15)Complete inspection ofdata structure,variables types, were done.Attributes were checked & proper data type conversions were applied.Inspection met the minimum requirements #2-4.R codes with outputs were provided.Brief explanations of steps were given.
Tidy & Manipulate 1(15)Able to reflect on the tidy data principles.Clear explanation was provided.Complete set of tasks were provided to tidy and manipulate the data properly (you may want to tidy your data set(s) before merging them).R codes with outputs were provided.Brief explanations of steps were given.
Tidy & Manipulate 2(5)Able to create/mutate at least one variable from the existing variables fulfilling the (minimum requirement #6).R codes with outputs were provided.Brief explanations of steps were given.
Scan I (20)Complete set of tasks were provided to scan the data for missing values, special values and obvious errors (minimum requirement #7).Safe and suitable methodology was followed to scan and deal with missing values, special values and obvious errors.Methodology taken was explained thoroughly.R codes were provided.Results and outputs were presented clearly.
Scan II (20)Complete sets of tasks were provided to scan the data for outliers.Safe and suitable methodology was followed to scan and deal with outliers.Methodology taken was explained thoroughly.R codes were provided.Results and outputs were presented clearly.
Transform (5)Complete set of tasks were provided to apply the transformation properly, fulfilling requirement #9.R codes with outputs were provided.Brief explanations of steps were given.
Succinct (5)The report was written succinctly and clearly.
Order Now

Get expert help for Practical Assessment 2 Data Wrangling and many more. 24X7 help, plag free solution. Order online now!

Universal Assignment (June 2, 2023) Practical Assessment 2 Data Wrangling. Retrieved from https://universalassignment.com/practical-assessment-2-data-wrangling/.
"Practical Assessment 2 Data Wrangling." Universal Assignment - June 2, 2023, https://universalassignment.com/practical-assessment-2-data-wrangling/
Universal Assignment May 22, 2023 Practical Assessment 2 Data Wrangling., viewed June 2, 2023,<https://universalassignment.com/practical-assessment-2-data-wrangling/>
Universal Assignment - Practical Assessment 2 Data Wrangling. [Internet]. [Accessed June 2, 2023]. Available from: https://universalassignment.com/practical-assessment-2-data-wrangling/
"Practical Assessment 2 Data Wrangling." Universal Assignment - Accessed June 2, 2023. https://universalassignment.com/practical-assessment-2-data-wrangling/
"Practical Assessment 2 Data Wrangling." Universal Assignment [Online]. Available: https://universalassignment.com/practical-assessment-2-data-wrangling/. [Accessed: June 2, 2023]

Please note along with our service, we will provide you with the following deliverables:

Please do not hesitate to put forward any queries regarding the service provision.

We look forward to having you on board with us.

Get 90%* Discount on Assignment Help

Most Frequent Questions & Answers

Universal Assignment Services is the best place to get help in your all kind of assignment help. We have 172+ experts available, who can help you to get HD+ grades. We also provide Free Plag report, Free Revisions,Best Price in the industry guaranteed.

We provide all kinds of assignmednt help, Report writing, Essay Writing, Dissertations, Thesis writing, Research Proposal, Research Report, Home work help, Question Answers help, Case studies, mathematical and Statistical tasks, Website development, Android application, Resume/CV writing, SOP(Statement of Purpose) Writing, Blog/Article, Poster making and so on.

We are available round the clock, 24X7, 365 days. You can appach us to our Whatsapp number +1 (613)778 8542 or email to info@universalassignment.com . We provide Free revision policy, if you need and revisions to be done on the task, we will do the same for you as soon as possible.

We provide services mainly to all major institutes and Universities in Australia, Canada, China, Malaysia, India, South Africa, New Zealand, Singapore, the United Arab Emirates, the United Kingdom, and the United States.

We provide lucrative discounts from 28% to 70% as per the wordcount, Technicality, Deadline and the number of your previous assignments done with us.

After your assignment request our team will check and update you the best suitable service for you alongwith the charges for the task. After confirmation and payment team will start the work and provide the task as per the deadline.

Yes, we will provide Plagirism free task and a free turnitin report along with the task without any extra cost.

No, if the main requirement is same, you don’t have to pay any additional amount. But it there is a additional requirement, then you have to pay the balance amount in order to get the revised solution.

The Fees are as minimum as $10 per page(1 page=250 words) and in case of a big task, we provide huge discounts.

We accept all the major Credit and Debit Cards for the payment. We do accept Paypal also.

Popular Assignments

NURBN3034 Assessment Task

NURBN3034 Assessment Task 1a Global Health Issues Group ePoster Presentation Weighting: 30% Due date: Thursday August 18th 13.59pm each group Purpose: The purpose of this learning task is to choose one of the following global health issues that under the right conditions has or is likely to have a significant

Read More »

ISYS3375 Business Analytics

School of Business, IT and Logistics — ISYS3375 Business Analytics Assessment 2: Case Study Assessment Type: Individual report                                               Word limit: 2000-3000 (+/– 10%) Each Table or Figure is counted as Due date: Sunday of Week 5 23:59 (Melbourne time) Weighting: 35% 50 words + the number of words in its

Read More »

NURBN3032 Task 2: Managing a Transition to Practice Issue

NURBN3032 Task 2: Managing a Transition to Practice Issue Weight: 60% Due: Thursday 18th May (Week 11) In this task, students are required to demonstrate knowledge relating to understanding and addressing a transitional issue that can affect new graduate nurses. Using evidence from current scholarly literature (i.e., less than seven

Read More »

ATS2561 Sex and the Media

Assessment Guide – Research Essay Due: Friday Week 12, submit on Moodle Weighting: 40% Length: 2000 words Write an essay responding to one of the following questions/topics: ‘objectifying’ images might be different, or that they should be understood as the same, for men and women. Your argument should address why

Read More »

ITC597 Digital Forensics

ITC597 Digital Forensics – SAMPLE EXAM ONLY This paper is for Distance Education (Distance), Port Macquarie, Study Centre Sydney and Study Centre Melbourne students. EXAM CONDITIONS: NO REFERENCE MATERIALS PERMITTED No calculator is permitted No dictionary permitted WRITING TIME:                     2 hours plus 10 minutes reading time Writing is permitted during

Read More »

Simulation Project- Computer Lab Project

Model and analyse the communication tower at the Casuarina campus. Apply dead, live and wind load as per in AS 1170 or other relevant standards in SAP2000. You should measure size of the elements as far as you can from or make reasonable assumptions about the dimensions. Reasonable assumptions should

Read More »

COM621 UX Strategy

Solent University Coursework Assessment Submission Module Name:    UX Strategy Module Code:    COM621 Module Leader: Assessment Submission Date: Student Number: UX Strategy Contents Part 1 – Introduction to System (1K words) 2 1.0 Introduction. 2 1.1       Current SUAA UX Design and Business Model 2 1.2       Academic and Market Research. 3 1.3      

Read More »

MIT302 Internet of Things

Group Presentation and Video (part 2) Unit:             MIT302 Internet of Things Due Date:       09/06/2023 Total Marks:    This assessment is worth 10% of the full marks in the unit. Instructions: 1.        Students are required to cover all stated requirements. 3.        Please save the document as: MIT302_Firstname_Surname_StudentNumber[assessment1].ppt Requirements: Write a PowerPoint of

Read More »

MBA600 Capstone: Strategy

Assessment 1 Information Subject Code: Subject Name: Assessment Title: Assessment Type: Length: Weighting: Total Marks: Submission: Due Date: MBA600 Capstone: Strategy Competitive Advantage Video Project Individual video recording 5 minutes (no more) 25% 100 Online Week 5 Your task Individually, you are required  to record a 5-minute video, in which 

Read More »

WHY SHOULD ALL NURSES LEARN ABOUT END OF LIFE CARE?

Background You are a newly graduated nurse in the emergency department. Tom has been admitted with left abdominal pain radiating through to the back exacerbated by eating and drinking. The pain has significantly increased over the past two days and he currently rates it as 9/10. He has been unable

Read More »

Strategic Management Assignment

Assignment: Prepare a Comprehensive Strategic Management Analysis Report of Infosys Task: You need to develop a max 3,000-word Comprehensive Strategic Management Analysis Report addressing the four specific tasks set out in the strategic management assignment brief. The 3,000 words, exclude the Title, Abstract, Table of contents, Bibliography and Appendices. The company

Read More »

MID-PLACEMENT PRESENTATION ASSIGNMENT

Progress: What is going on well CHALLENGES (WORKING REMOTELY) AREAS FOR DEVELOPMENT STRENGTHS NEXT PART OF PLACEMENT MULLER, 2014 The power of story Song line and dreaming tracks Defining knowledge, theories, and purposes: The most significant part was likened to a snake whose tail was cut off, and it had

Read More »

QUATTRO-CANNA HOLDINGS RESEARCH PROJECT

Background Local hemp products development company Quattro-Canna Holdings has signed a licence agreement with hemp processing equipment developer, Canadian Greenfield Technologies, to manufacture the HempTrain decorticator plant, designed for mass-processing of hemp straw- bales into bast fibre, hurd and green microfiber (GMF), in South Africa. The HempTrain will be manufactured

Read More »

Information Booklet Scaffold

An information booklet contains-relevant information on a topic for a particular target audience. The format of an information booklet can vary, however, there are common elements, including: The following process can support you in developing an information booklet on a topic for a particular audience. There are three stages to

Read More »

CHCCSM004 Coordinate Complex Case Requirements

Assessment Task 1: Written Questions b. List and describe eight of a coordinators responsibilities. a. Explain how information about external service providers might be sourced. b. List three circumstances it might be necessary for a coordinator to use external service providers to ensure that a consumer’s care plan meets their needs and

Read More »

BU7401 Leadership in Action

A: Assessment Details Module Title Leadership in Action Module Code BU7401 Module Leader Component Number 1 Assessment Type, Word Count & Weighting Individual written assignment 4000 words 100% of module grade Submission Deadline 21/10/2022 Submission Instructions Online submission using Turn It In Feedback Return Date 4 weeks after submission B:

Read More »

Management Research Perspectives

SBS – DBA Assignment – 2023 UNIT TITLE:                                            NAME (in Full):                                                               GENERAL INSTRUCTIONS converted to 90 marks. Total Marks                      / 90 PLAGIARISM Plagiarism is a form of cheating, by representing someone else’s work as your own or using someone else’s work (another student or author) without acknowledging it

Read More »

MARKETING PLAN ASSIGNMENT HELP

I.        Executive Summary           The executive summary is a synopsis of the overall marketing plan and easier to write last, after the entire marketing plan has been written. II.       Environmental Analysis Micro Analysis:                     Competitive forces (Five Force Analysis)                               Who are our major competitors?  What are their characteristics (size,

Read More »

MQBS7030 Final Assessment Data Analysis and Report

ASSIGNMENT TASK: For this assignment, you need to refer to “Fringe” dataset. Fringe is concerned with the factors that contribute to the fringe benefits of employees. The dataset includes a range of different variables, which allows for a range of different tests to be performed. You should note that our

Read More »

MIS770 Foundation Skills in Business Analysis

MIS770 Foundation Skills in Business Analysis Department of Information Systems and Business Analytics Deakin Business School Faculty of Business and Law, DeakinUniversity Assignment Two Analysis of Click Sales Data Particulars Assurance of Learning This assignment assesses the following Graduate Learning Outcomes and related Unit Learning Outcomes: Graduate Learning Outcome (GLO)

Read More »

ITECH7407 – Real Time Analytics

Assessment Task – Data Analytics Assignment Overview For this assessment task, you will work in a group to analyse a selected data set, and provide recommendations to the leadership of the company based on your findings. Timelines and Expectations Percentage Value of Task: 25% Due: Week 11, Sunday 5pm Minimum

Read More »

BSB123 Data Analysis

BSB123 Data Analysis Research Report Assessment Semester 1, 2021 Due Date: 11:59 30th May The data for the Assignment can be found in the file Research Report Assessment (2021-01).xlsx on Blackboard The Problem FringeTech is an information technology / electrical engineering company that employs thousands of people Australia wide. Recently

Read More »

Final Analysis Assignment Help

Refer to the attached excel file, answer the questions below. Use graph if required. The file that can be accessed through the link below contains data on 100 employees in a particular occupation. Suppose that interest centres on investigating the factors that explain salary differences. The data set contains the following

Read More »

VETS6103 Data Analysis Assignment

Factors influencing milk production in Australian dairy cattle Assignment overview: This assignment involves analysing a dataset, interpreting results, and drawing conclusions based on the analyses. The dataset can be found in the file “practical_assignment_2021.xls” which is on Canvas under the Assignments folder. It is a group task worth 50% of

Read More »

ECON 1030 – BUSINESS STATISTICS

ECON 1030 – BUSINESS STATISTICS 1: Individual Assignment   Instructions: This is an individual assignment with a total of 40 marks. The allocation of marks is as follows: Statistical Analysis     (including excel) 32 Professional Report 8              Total 40 The response to the assignment must be provided in the form

Read More »

Can't Find Your Assignment?

Open chat
1
Free Assistance
Universal Assignment
Hello 👋
How can we help you?