INMT5526 Business Intelligence
Team Assignment: Predictive Modelling and Digital Assets
The Brief
Organisations are leveraging value from their own operational data, their customer data and from secondary data (such as web scraping). One such tool is reporting household asset values, such as for housing or vehicles. In addition, some tools include a modelling component under the hood, such as regression or other machine learning algorithms. Your team assignment will make use of secondary— used vehicle sales – data from the UK, that has been cleaned for modelling and assessment.
Students may self-select to a team within each tutorial group (i.e. only within your tutorial group). Each team will be allocated one of the following data sets:
Audi BMW Ford Mercedes Volkswagen Hyundai
The data sets have been customised so that each team will analyse at least two (2) vehicle models for their chosen manufacturer (i.e. no two data sets will be the same). Your manufacturer will be the one stated in your team’s group name on LMS. It is suggested that you choose higher volume models with a long series –say more than six (6) years of data.
Each dataset takes the form of a list of observations (rows) where each corresponds to the sale of an individual vehicle in the UK. The dataset consists of nine (9) variables (columns) which are described as follows and form the first row of the dataset:
- model: the vehicle’s model, think of it as the name of the vehicle such as Tucson, Fiesta or X3. There will be many vehicles of the same model but each of these vehicles are built on the same ‘blueprint’ and contain the same (or very similar) features;
- year: this is the year of manufacture of the vehicle, not the year of sale;
- price: this is the price that the vehicle sold for (in UK pounds);
- transmission: this is the type of transmission the vehicle is equipped with, either “Automatic”, “Semi-Auto”(matic) or “Manual”;
- mileage: the odometer reading, this is the number of miles the vehicle has travelled at the time of sale;
- fuelType: this is the type of fuel that the vehicle uses to generate movement, either “Diesel”, “Electric”, “Petrol”, “Hybrid” or “Other”;
- tax: the tax paid on sale of the vehicle, known as ‘stamp duty’ in Australia. Consider that there may be discounts for certain types of vehicles;
- mpg: miles per gallon, this is the distance that can be travelled by the vehicle with one gallon of fuel (or equivalent);
- engineSize: the displacement of the internal combustion engine in litres, or “0” if not present.
You are welcome to remove erroneous data points from the data set – that is, you are allowed to clean the data – but consider first whether the data is erroneous based upon the above description of the data, as in some cases data that appears erroneous may not be so based on the value of other variables (such as for electric cars). If you do so, ensure you mention this in the report.
The Task
Your Team Assignment is to summarise and analyse an existing data set and prepare an analytical report aimed at assessing, selecting and applying a predictive model. The modelling context is limited to supervised learning and specifically data that estimates the price of a used vehicle in the UK market.
The new learning in this assessment is to apply your knowledge of and experience using machine learning techniques – namely, decision trees (including their permutations such as random forest and bagging) as well as artificial neural networks (such as MLF networks) to the data.
For each vehicular model, you should apply one of each of these above bolded types of machine learning techniques. Whether or not this is achieved using R scripting or Excel workbooks (you can do one technique in R and one technique in Excel if you desire) is up to your group to decide. You are welcome and encouraged to ‘tune’ your models by adjusting the parameters after model runs.
Marking Guidelines
The guidelines below for page numbers and words are only that; they are there to give you an idea about the size of the report and are not ‘hard limits’. Consider that some sections may have screenshots, graphs and tables and others will not.
A full marking rubric will be made available on the submission page for expectations. Please turn to the next page for an explanation of each component of the assessment.
Component | Marks | Size |
Executive Summary You need to be able to summarise complex analysis “on a page”. We encourage teams to provide the summary as Power BI dashboard or infographic which describes the data set and a summary of your modelling and recommendations / conclusions. | 35% | Single page (can be infographic or Power BI Dashboard) |
Analytical Report | 65% | 6 – 8 pages |
1. Introduction: what is the business/policy context (as you see it1) of the analysis? What is the purpose of your models? (consider aims and problem statement). | ½ page (125 words) | |
2. Summary of the data set: accompanying your executive summary, provide a short summary of the independent and dependent variables (also describe your dashboard if you made one). | 2 pages (500 words) | |
3. Model Selection: Demonstrate the use of training, validating and testing your machine learning models. Provide your model selection criteria and recommend a machine learning model for prediction. | 2 pages (500 words) | |
4. Prediction: demonstrate how your final product works by showing some predictions – it is up to you to determine what test data is reasonable as well as the number of predictions. | 1 page (250 words) | |
5. Recommendations and conclusions: briefly outline the implications of your modelling and predictions (consider your problem statement and the questions posed). | ½ -1 page (250 – 500 words) | |
6. References: either APA or Harvard style (one or the other). |
Submission Details
The assignment should be submitted by the date and time listed on the Unit Outline (details of which will also be provided on LMS) using the submission point on LMS. Please submit the executive summary (.pbix file from Power BI Desktop, .pdf for infographic) alongside a word-processed document for the analytical report. Your working files as either R script files (.r) or Excel workbooks (.xlsx) should also be provided as part of the submission for each of the machine learning models.
If the date and time of submission (or other requirements) are to change, an announcement will be made on LMS by one of the teaching staff. The standard provisions of the UWA Business School assessment guidelines apply; for more details, please see the Unit Outline on the web using the link provided on LMS. Please contact the teaching staff for any other clarifications not listed here.
1 The context is predicting the cost of second-hand vehicles in the UK market. However, there is scope for some originality when detailing the purpose of your analysis and how the model may support decision making or form the basis of a new enterprise. Part of this assignment is to think of something exciting (or important) that the results of your models will support.
No Fields Found.