CIS 550 Course Project (10%):
- Write a Python program called apriori_xxxxxxx.py (where xxxxxxx is your CSU ID) to implement the provided Apriori algorithm using Python 3 programming language. Make sure that you get the correct results using the three provided datasets (1000-out1.csv, 5000-out1.csv, 20000-out1.csv).
- Please implement the Apriori algorithm based on the structure of Figure 6.4 in the provided file of apriori_algo.pdf. For example, your source code should contain the functions of apriori_gen, has_infrequent_subset, find_frequent_1_itemsets and other functions. According to “Apriori property: All nonempty subsets of a frequent itemset must also be frequent.”, your running result should not contain any nonempty subsets of a frequent itemset.
- Please use “if name == ‘ main ’:” in your code and avoid global variables (read sections
2.5 and 3.17 at Google Python Style Guide (https://google.github.io/styleguide/pyguide.html).
- The input data file name and the minimum support number should be given as the arguments at the command line, i.e., “-i 1000-out1.csv -m 20”, illustrated in the sample result.
- Your output should print the input file name, the minimum support number, the running results, and the total items of the results, illustrated in the sample result.
- Your Python source code should be submitted on the grail using “turnin” command with the “-p finalproj” option before 2:30PM on Friday August 5! The turnin command is as follows:
turnin -c cis550x -p finalproj apriori_xxxxxxx.py
where xxxxxxx is your CSU ID.
- A sample result of the Python program running on your local machine looks as follows:
- You can check your results by visiting https://apriori-davidxiong.appspot.com/ .