MIS772 – Predictive Analytics
Trimester 2, 2022
Assessment 2 (Individual) – Data Analysis and Report
DUE DATE AND TIME: Due 16th September 2022 8:00PM AEST
PERCENTAGE OF FINAL GRADE: 30%
WORD LIMIT: Equivalent to 3000 words
Word count is only an indication of the workload. Word limit is not applied for this assignment. Instead, page limit is applied following the provided assignment template.
Learning Outcome Details
|Unit Learning Outcome (ULO)||Graduate Learning Outcome (GLO)|
|ULO2: Understand and apply predictive analytics techniques in real-world situations. ULO3: Apply an integrated understanding of current techniques and trends in predictive analytics to the business environment.||GLO1: Discipline-specific knowledge and capabilities GLO3: Digital literacy|
Students who submit their work by the due date will receive their marks and feedback on CloudDeakin 3 weeks after the due date.
No extensions will be considered unless a written request is submitted and negotiated with the Unit Chair before the due date and time. Extension request form must be filled via Cloud Deakin – Assessment – Extension Request (https://www.deakin.edu.au/students/studying/assessment-and-results/assignments), which is accompanied by appropriate documentary evidence for the extension. Submissions after the due date/time without an approved extension will be considered late.
Extensions are only granted in extreme circumstances, such as ongoing health, personal hardship or work-related problems. Temporary illnesses, normal work pressures, multiple assignments due at the same time, failure to keep backups, technology failure, etc are not reasons for an extension. Extension request after the assignment due date should be submitted visa Student Connect following Deakin procedures https://www.deakin.edu.au/students/studying/assessment-and-results/special-consideration.
This assignment aims for students to learn how to …
- Articulate problems and solutions in business terms
- Gain insights from text data
- Prepare data for different models
- Develop estimation and clustering models
- Assess and report model performance.
Case Study Description
AirbnbAI approached you to develop a RapidMiner process(es) capable of analysing and predicting customer feedback about their stay in Hong Kong Airbnb rental properties. AirbnbAI provided you with a sample dataset of approximately 6,000 rental listings and 104,000 associated customer reviews. This sample dataset can be downloaded from the unit website. (Original data source: sample data extracted from http://insideairbnb.com/get-the-data.html)
Image Source: https://www.rabobank.com/en/locate-us/asia-pacific/hong-kong.html
The provided dataset (MIS772 A2 Data.zip) has been partially cleaned up and includes a variety of numerical, nominal and text attributes, and descriptions of these attributes.
AirbnbAI would like you to use RapidMiner to address the following questions:
- Is there a significant correlation between the sentiment (positive vs negative) of customer reviews of a property, and their review score ratings?
- Can the review score ratings of properties be predicted (estimated) based on relevant attributes?
- What are the most meaningful different segments that exist in the retail properties?
AirbnbAI wants you to use RapidMiner to process and explore the provided data, conduct text mining, sentiment analysis, develop, evaluate, and optimise linear regression and cluster analysis models.
Task and Deliverables:
- Executive Summary: Define your problem and solution in business terms, in doing so answer questions A, B and C, cross-reference with other report sections for support.
- Data Exploration, Pattern Discovery, and Preparation: Deal with any duplicates, bad and missing values, and anomalies. Transform selected attributes or create the new ones as needed.
Use text mining techniques and simplistic sentiment analysis (i.e. simply calculate positive-negative words) in review comments as illustrated in the lectures/seminars in Week 4; refer also to partial example process provided (Question A).
Identify appropriate attributes to predict the review score ratings of properties (Question B).
Investigate groups of rental properties and identify appropriate attributes to identify different clusters. Visualize clusters (Question C).
- Modelling: Develop a linear regression model to predict the review score rating of properties by selecting appropriate predictive attributes. Test the linear model and investigate results (Question B).
Develop a cluster model to reveal the most meaningful segments (Question C).
- Evaluation and Optimisation: Evaluate and optimise the performance of all models. Report
metrics for the best performing models (Questions B, C).
See CloudDeakin for more info about this assignment, especially the assignment template and the assessment rubric. The assignment must be prepared using the provided assignment template (.docx file). Read these instructions and suggestions in the report template and understand the assessment rubric. The report font and size must be: Arial 10 points
Only the contents according to the page limits of the report template will be assessed. Any part which is missing in the report or beyond its page limit will not be assessed. We will not look for anything that was missing from your report in your RapidMiner scripts. However, we will check the RapidMiner scripts for consistency with your report and to ensure an authentic effort. Use comments in your RapidMiner process(es) to enable assessors to follow your logic.
A professional analytics report is evidenced-based, not speculative! Anything reported that is not substantiated by RapidMiner scripts will not be awarded marks. Create new versions your processes as you work on them (i.e., include a version number as part of the filename when you save it). It is your responsibility to make regular backups of your RapidMiner processes on alternative storage media. Failure to do so will not be accepted as a reason for seeking extensions.
Your RapidMiner script must be developed so that they can be run independently by assessors. Therefore, do NOT create intermediate data stores, or modify the provided dataset for the assignment outside of RapidMiner.
Submission format: Submit two separate files:
- Your report, according to the template in PDF format.
- Final versions of all your RapidMiner scripts (*.rmp files) compressed as a single ZIP file. This is an individual assignment. The Deakin policy on Academic Integrity applies.
Your files should be named as your firstname_lastname_MIS772A2 (e.g. John_Smith_MIS772A2.pdf and John_Smith_MIS772A2.zip).
You are to submit your assignment in the individual Assignment Dropbox in the MIS772 CloudDeakin unit site by the due date. Only assignments received via the CloudDeakin submission box will be marked. Do NOT submit assignments via email. Approved extensions will result in deadlines redefined for individuals.
- Any work you submit may be checked by electronic or other means for the purposes of detecting collusion and/or plagiarism.
- Feel free to discuss concepts and ideas with peers but remember your submission must be your own work. Be careful not to allow others to copy your work. Submissions, whose python codes are significantly similar (e.g., mostly identical except for only some variable names), are subjected to investigation for potential copying issue. The authors of such submissions may also be asked to present their work to an academic panel if necessary.
- You must keep a backup copy of every assignment you submit, until the marked assignment has been returned to you. In the unlikely event that one of your assignments is misplaced, you will need to submit your backup copy.
- When you are required to submit an assignment through your CloudDeakin unit site, you will receive an email to your Deakin email address confirming that it has been submitted. You should check that you can see your assignment in the Submissions view of the Assignment dropbox folder after upload, and check for, and keep, the email receipt of the submission. You are responsible for submitting the correct documents for the correct unit, in the required content or format. Should you wish to correct your submission, you can resubmit with any applicable penalties. You will not be able to submit, resubmit or correct your submission after the 5 day lateness period (or your extension deadline).
- Penalties for late submission: The following marking penalties will apply if you submit an assessment task after the due date without an approved extension: 5% will be deducted from available marks for each day up to five days, and work that is submitted more than five days after the due date will not be marked. You will receive 0% for the task. ‘Day’ means working day for paper submissions and calendar day for electronic submissions. The Unit Chair may refuse to accept a late submission where it is unreasonable or impracticable to assess the task after the due date.
Get expert help for MIS605 Systems Analysis and Design and many more. 24X7 help, plag free solution. Order online now!