Digital Analytics – Individual Project + Report
Carefully read this entire document at the beginning of the semester
Objective
The goal of the individual project is to combine and implement all the research and analytical skills you acquired throughout the five training modules (1-5). You are tasked with completing an entire project from hypothesis to results, reporting on the process and the outcomes in a 1,500 to 2,000 word research paper. Although you get quite some help with the data collection, the processing, analysis, and visualisation are up to you. You will have to assemble Python code to accomplish this.
Independence is key. Although we will extensively train all the required steps in the learning materials and tutorials, it is up to you to knit everything together, make your own logical decisions and explain them. That is the whole point of this project. When you work as a professional analyst, you will only get an overview of a problem and maybe some requirements. It is your responsibility to deliver, to make decisions and to come up with solutions. That said, if you read through this document, you will find numerous tips and tricks to get you started and guide you through the process.
You will inevitably get stuck a few times. Do not let it discourage you. That is just the way it goes with any project. Plan ahead, sensibly combine the skills that you acquired throughout the course, and critically weigh the benefits and limitations. There is not one single approach. There will be plenty of opportunities to interact with the teaching staff during the semester and discuss your progression. The last weeks of the semester, prior to the deadline, are entirely allocated to Q&A about this final project. However, try to start as early as possible.
The problem
It is often said that popular music increasingly homogenises. It all ‘sounds the same’ in favour of synthetic dance music with the same sound colour, rhythm, and tempo. Music used to be much more diverse. Ok, boomer.
In your project, you will contextualise this assumption in literature and empirically put it to the test by comparing the top hits in of the past 50 years for their key audio features (1970-2020). You need to mould this research problem into a testable hypothesis. A hypothesis is a specific, concise, testable, and refutable statement.
Format
The final product of your research project is a Blackboard assignment submission (each file separately, notin a consolidated archive such as a .zip or .rar) that contains:
AllPythonfiles
These are the Python files you used in your project to collect, process, analyse, and visualise data. Some of these Python scripts are already provided by us on Blackboard. You can just use them (provided that you understand what is going on) and, as you will read later, potentially improve them.
The filenames should contain numbers that make clear in what sequence they should be run to replicate your project (e.g., 01_datacollection.py, 02_datacollection.py, 03_dataprocessing.py, 04_dataanalysis.py, 05_datavisualisation.py). There is no expectation of a set number of files, but they should cover your entire project and be chronologically labelled with numbers (i.e., 01_, 02_, 03_, etc.).
There is no need to submit any image files and/or data files. Running your Python code should allow us to (re)produce these.
Awrittenreport
- Introduction. This report starts with a concise introduction of the research problem (about 300 words), which leads to the formulation of a testable hypothesis. It is requiredto support this introduction with academic references (APA-style).
- Method section. In this section you explain the steps you took, and you explain the decisions that have been made while collecting and processing the data prior to analysis. You explain where you got what data (as if you collected all data yourself), why you need the data that you are collecting, how they are combined and cleaned, and what they look like (i.e., sample size, procedure to clean the data). Any kind of measure (i.e., variable) needs to be explained: what it is, what it represents. Do not assume that your reader knows anything about the project, nor will they have the courage to sift through any of your Python code. It is up to you to explain the procedure as clear and methodical as possible. There is an example of a method section in the Module 5a learning materials.
Important:
- The method section does not include any Python code. None at all. You write about what you did in an overviewing narrative. You should enable your reader to understand the steps you took, but you do not want to overwhelm with minute detail (the code is in appendix anyway).
- Do not use abstract variable names in your method section. Name or explain your variables in plain English so everyone can understand what they mean. Again, your text should be accessible, even for a slightly less motivated reader. If it becomes hard work to read your report, your reader is likely to tune out.
- Resultssection.Explain the steps taken in your data analysis and sketch out the results. What did you do with the data in terms of analysis (i.e., what (statistical) procedure?),
and what are the results? You need to explain what technique you use to establish what insight. Sensibly combine textual description, tables, and/or data visualisations. Explain what the data tell you, but do not yet start discussing what the results exactly mean. There is an example of a results section in the Module 5b learning materials.
Important:
- Carefully format your tables and visualisations. Do not screenshot outputs from the console. That just looks horrible and reflects poorly on your work, not doing it any justice. Format your own ‘clean and lean’ tables whenever needed.
- Avoid (excessive) redundancy. There is no need to include the same results in textual AND tabular AND visual form. That is overkill and just confuses your reader. It makes you look indecisive. Make decisions on what presentation form communicates your results the clearest. A good approach is to visualise and tabulate results and describe the key findings in the results text.
- Discussion section. You interpret the results in light of the hypothesis and emphasize what this means in light of the research problem. What did you learn from the data? What does it potentially mean for the literature in the introduction? Make sure you also critically assess the strengths and weaknesses of the method you used. Nothing is perfect and choices have consequences. What are they in your case? Think about it. At the end of the Scraping and API modules, you’ll find some critical remarks that will set you on the right path.
- Appendix section with a chronological overview of your code files. You do not only have to submit your .py files, you also need to copy/paste your code into the appendix section of your paper in chronological order (first script first, last script last).
Please consider these formattingrequirements:
- Use Times New Roman 11pt and 1.15 spacing.
- Title page contains the title of your project, your name, your student number. Do not include any visuals and/or logos.
- Write at least 1,500 and maximum 2,000 words (references, tables, figures are excluded from the word count). This might not seem a lot, but it is plenty for what you need to write, and it will keep you focused. Make every word count.
- Tables and figures are included in the text and should be captioned.
Most important is that your lay-out is tidy and looks like a professional research report. Avoid frivolities, keep everything nice and clean.
Assessment criteria
This project makes up 51% of your final grade. Since we have 7 graded exercises of equal weight (7%), that should roughly make up half of the grade, the remaining 1% needed to go somewhere. That explains the somewhat ‘strange’ number.
The marking rubric:
Work is of poor quality and has multiple fundamental shortcomings | Work is of sufficient quality, it has no fundamental shortcomings but there are multiple minor shortcomings | Work is of more than sufficient quality, with only rare minor shortcomings | Work is of outstanding quality, nearing perfection: there are no shortcomings whatsoever | |||
Improve provided code and data quality | You are able to significantly improve the provided programming code that collects the data in such a way that the quality of the data collection significantly improves (0 in case of no attempt beyond the provided code/data files) | 10% | 0 | 1 | 2 | 3 |
Introduction: contextualising research questions | A clear identifiable, sound research question or a testable hypothesis that is contextualised in literature: why is it relevant to research, how does it fit in with literature? | 10% | 0 | 1 | 2 | 3 |
Method: transparency in procedure | A clear narative that aplty explains the choices that have been been and the steps that have been taken in collecting and processing data. It describes the proces and the data (i.e., its dimensions and distributions) | 20% | 0 | 1 | 2 | 3 |
Processing and results: efficacy of data processing, analysis and transparency in reporting results (textually and visually) | Is the code to process and analyse the data sound? Is your procedure and its outcomes clearly communicated through text, tables and/or visuals? | 20% | 0 | 1 | 2 | 3 |
Discussion: quality of discussion | Are the results correctly interpreted and actively linked with the introduction? | 15% | 0 | 1 | 2 | 3 |
Discussion: quality of applied method | Is there a thorough discussion of strengths and weaknesses of the method? | 15% | 0 | 1 | 2 | 3 |
Formatting and language | Is the text carefully formatted (including figures and tables)? | 10% | 0 | 1 | 2 | 3 |
The first rubric category needs a bit of explanation. On Blackboard, you will find code ready-made Python code that collects the data you need. These data are provided as well (see item on Final Project in the Learning Resources). This code (and the data it produces) is a decent attempt and allows you to test the hypothesis. You have two options:
Option 1:You incorporate the Python code and use the data as is provided on Blackboard. This means you can immediately proceed to the data processing and analysis. However, the code is far from perfect and there are ways to improve the data collection from the Spotify API. If you choose notto attempt to improve the code, that is perfectly fine. It does come with the consequence that you will not get a mark for the rubric ‘Improve code and data quality’, which represents 10% of the paper score (so 5.1% of the total grade).
Option 2:You do improve the code and enhance the data quality. Thumbs up for you! Depending on how successful you are, you will get marks for the first rubric as a reward for taking on the challenge. Continue reading to get more information on what can be improved in the ‘tips and tricks’ section below.
Timing
The deadline of this assessment is set on 3 June 2022 at 4pm Brisbane time.
You are encouraged to start early on the project. You can take a head start by writing the introduction. Upon completion of each module, you can advance in your project as well (see tips and tricks for concrete pointers on what module you need for what step). When we finish Module 3, you can figure out the scraping code that is available on Blackboard. After Module 4, you can do the same for the subsequent API interactions (and hopefully improve that code). There will be ample time after we have concluded Module 5 for you to do the analysis and the write up of your process/results/discussion.
The last three weeks of the course are fully dedicated to the project. The regular tutorial slots in the last three weeks are fully reserved for Q&As on the project. Questions on any part of the project are welcomed during these tutorial slots.
Tips and tricks
To get you started, this document gives some valuable pointers for the different steps of the research process.
Data collection
- You need data on what music was the most popular in the past 50 years (1970-2019 – until, so not including – 2020). What better data source than the official charts? We can use https://www.billboard.com/ to get data. The good news is that you do not have to do this yourself. We provide code that scrapes the top 20 of the first week of April, August, and December for every year (1970-2019) – (e.g., https://www.billboard.com/charts/hot-100/1970- 01-01/) . The file is named 01-scraping.py. We also provide the resulting datafile, which is named billboard.csv, so there is no need to run the code, although you’re more than welcome to do so. You will be expected to carefully inspect the code and fully understand what it does to complete the methods section. We will spend some time going through it during the second tutorial on Web Scraping (Module 3). In the methods section you are expected to explain that code as part of your project, as if you have written it yourself. The code and data are available on Blackboard (see Final Project item in the Learning Resources).
- Having that dataset with the most popular hits of the past five decades, you need to get their audio features of those songs in the charts to assess whether those have changed over time, which is what we’re assuming. The Spotify API provides a brilliant resource with its Audio Features Endpoint: https://developer.spotify.com/console/get-audio-features-track/. You are not required to use all of them: make a sensible selection based on reason and literature. Since you are using an API, you will find the appropriate resources in Module 4. You will notice in the documentation that the Audio Features Endpoint requires a Track ID as an argument for each call. This implies that before we can get audio feature information, we need to get a Track ID. This ID can be searched for and obtained through the Search Endpoint (we will extensively train Spotify API interactions in the module and tutorials on APIs). Again, you do not need to worry: the code to search for Track IDs (02-gettrackid.py) and audio features (03- getaudiofeatures.py) is provided on Blackboard, as well as the data that is gathered by running it (spotify_trackid.csv and spotify_audiofeatures.csv). However, that code is far from perfect:
you will notice that there are a lot of missing values. That is, artist and song title combination that didn’t return a matching Track ID and hence can’t return any audio features either. If you successfully experiment with the format of the query, you can improve this, which will get you marks on the first criterion of the Final Project marking rubric.
- Note that Spotify is sensitive for excessive requests rates, so I would advise to pause your script in between requests as well. After all, you are about to perform over 2,000 requests per run. Time-outs of 0.10 to 0.25 seconds (one tenth to a quarter of a second) usually do the trick – I tested this, varying the intervals, and Spotify seemed OK with it. Just check that you are getting
[200] response codes, and not [229] responses. When you’re running the API requests in Replit, make sure the browser window stays foregrounded, so it doesn’t time out. This might take about 20-25 minutes.
Data processing
- You need to thoroughly explore, inspect, and clean the data. Module 5a will teach you how to do that. Be as specific as you can (see the example of a method section at the end of the module).
- You will need to aggregate data per year. Data aggregation is explained and trained in Module 5b (and its tutorial). Eventually you need a dataset with one aggregated observation per relevant audio feature per year. You might want to describe central tendency for a measure such as e.g., bpm (mean and/or median). That would allow you to sketch how music has generally changed through the years. It does not tell us much (or anything) about the diversity, the homogeneity or heterogeneity. That why, to test the hypothesis, you definitely also need the dispersion(e.g., standard deviation) as we are assuming that the variability over time decreases (i.e., an indication of homogenisation, which it is all about). In fact, that dispersion is the key focus. The previous sentence is set in red, so it must be REALLY important!
Do you need a refresher on dispersion? See https://www.statisticshowto.com/dispersion/
Data visualisation and analysis
- You are working with longitudinal data. A time series is likely the most insightful way to visually describe the trends over time in your data. Again, we trained this in Module 5b.
- Looks can deceive, so definitely include formal statistical tests to consider the relations between time and aggregated audio features per year. Module 5 contains the necessary resources.
Good luck!
Get expert help for Digital Analytics – Individual Project + Report and many more. Express delivery, plag free, 100% safe. Order Online Now!
No Fields Found.