Table of contents
Paul Anderson, PhD
Online Office hours:
- Tuesday and Thursday 3 - 4 PM
- Reach out on Slack where we can then decide if text/audio/video is most appropriate.
Office Hours in 222 Building 14
- Friday 11-12 PM
Appointments available. Please email.
Overview of modern knowledge discovery from data (KDD) methods and technologies. Topics in data mining (association rules mining, classification, clustering), information retrieval, web mining. Emphasis on use of KDD techniques in modern software applications. 3 lectures, 1 laboratory.
CSC 349 and one of the following: STAT 302, STAT 312, STAT 321 or STAT 350.
Course Learning Objectives
recognize different types of KDD procedures and identify their uses
implement algorithms/methods/techniques for KDD tasks to solve KDD problems
interpret and analyze the results of KDD processes
recognize and evaluate societal impact of KDD technology, make in-formed choices about use of KDD technology
Textbook and Other Material
There is no textbook to buy for this course. We will be discussing material organized as chapters supplemented by other online materials.
All grades are subject to being prorated due to illness. For example, if you are sick for a week, you will not have to make up that week. Your grade will just be computed out of fewer total points. You are responsible for making sure that this is correctly reflected in Canvas. Canvas is our official record of grades.
Grading philosophy: Mastery learning
- I approach grading as student focused. I don’t believe students should be penalized if it takes them two weeks to do an assignment when other students can finish it in a week.
- I enjoy a challenge, and I want you to enjoy a challenge as well. I want students to say. “Wow. This is challenging and that is what makes it fun”.
- There is this notion that great scientists/mathematicians/human beings are born as great in those respects. Not true.
- A lot of my own thinking is driven by my own experiences. The education system almost completely missed for me. I did not focus or engage with school until fortune intervened on my behalf. I moved from Maryland to Ohio in my junior year of HS. It was only because of a conversation while onboarding at that new school. They asked me to select my own classes for the first time in my life. I had the power because I was a transfer student. I thought what the hell, I’ll jump into all these honors and AP classes. And they were hard. But C’s in hard classes became B’s in hard classes became A’s in college. It wasn’t that my brain wasn’t as good as other students. I just hadn’t exercised it as much as some of my peers. We can all grow and improve. I still try to grow in the same way as before. To push myself and not just choose things that are easy for me. Choose the path that is hard. Spend time on what interests you.
Labs and assignments: 40% (A new lab is posted approximately once a week)
- The primary form of labs and assignments will be programming exercises with open ended questions throughout.
- Unless specified otherwise, labs and assignments will be submitted to GitHub classroom. A link to each lab and assignment will be on Canvas.
- A new lab will be posted approximately once a week.
- Coding questions will be primarily autograded with code reviews on GitHub when appropriate. Short answer and other questions throughout the quarter will be assessed for completion and their primary purpose is to foster discussion and higher level analysis of the material.
- Mastery interpretation: You are attempting to master topic modules. Some may be harder than others based on your background. I am not taking lab points away from you. You are pushing your grade up from 0%. For each lab, you demonstrate mastery by passing the autograder and by providing a thoughtful analysis in the short answers.
Mastery checkpoints: 15%
- Throughout the quarter we will have mastery checkpoints to make sure you are truly mastering the material. These will be administered through Canvas.
- Grading will be mastery (90% or above) or not mastered (0%). i.e., there will be no scores such as 75%.
- All projects are student centered and student driven. I am not assigning or pushing structure upon you. Mastery is only achieved on the project by taking ownership of your learning through knowledge creation.
- This is a two student project. We will share updates, struggles, and successes in lab and as a group.
- While you may pitch a different direction for a project, my guidelines are the following:
- Apply 3+ paired KDD methods to a dataset and provide an in-depth analysis of the results with discussion of the methods.
- A paired KDD methods means that one version of the algorithm is implemented from scratch and the other is from a package (e.g., sklearn). For example, you can run your k-nearest neighbor algorithm and the KNN classifier from sklearn. That counts as a single paired KDD methods.
- “Knowledge” must be one of your primary results.
- Each group will be assigned a group grade on the project (10%):
- 2% - project pitch
- 3% - end of quarter presentation
- 5% - group report
- Each individual will be responsible for weekly status reports for a total of 10%
- Each individual is also responsible for a 10% mastery checkpoint at the end the quarter
- We will be conducting this class using a variety of technologies (Slack, JupyterHub, etc). It is important you contribute to the class on these platforms.
- The biggest way to participate in this class is to show up to class, and to measure this I am requiring you to submit your notes, doodles, in-class work products, etc from class. These will not be graded for content as they are for your own learning.
- A typical Chapter will be worth 3 participation points. Documentation to be uploaded to Canvas. You will get 3/3 for good participation, etc.
Overall, we are trying to encourage a growth mindset. Take time to master things. Work on improving what you aren’t good at. Maybe that is communication. Maybe that is programming. Maybe it is working with creating new knowledge (i.e., project).
- A 100% to 93%
- A- < 93% to 90%
- B+ < 90% to 87%
- B < 87% to 83%
- B- < 83% to 80%
- C+ < 80% to 77%
- C < 77% to 73%
- C- < 73% to 70%
- D+ < 70% to 67%
- D < 67% to 63%
- D- < 63% to 60%
- F < 60% to 0%
(Modified statements courtesy of Dr. Wood)
Laptops have been shown to be distracting in class Laptop FAQ and Link to Paper, and therefore, we must careful monitor how we use them. This is a computing class taught with interactive notebooks designed to increase interaction with the material. You will of course be tempted to use your laptop for other purposes (email, other assignments, etc). Part of this will be regulated because I’m asking you to fill in your chapters in class and submit them to Canvas. But if I notice or it is brought to my attention that you are using your laptop in a way that is distracting to others, I reserve the right to rearrange seating such that your screen is not interfering with other student learning.
Although I encourage you to have lively discussions with one another, all work you hand in as your own must be your own work. encourage you to help each other with debugging. You may look at another student’s code that has a bug, but you cannot look at someone else’s working code to help get your code to work, however you may look at examples of working code on tutorials, Stack Overflow, or similar. You just cannot look at another student’s exact working solution of the same program. Ask me if you are unsure.
I consider this classroom to be a place where you will be treated with respect, and I welcome individuals of all ages, backgrounds, beliefs, ethnicities, genders, gender identities and expressions, national origins, religious affiliations, sexual orientations, ability – and other visible and nonvisible differences. All members of this class are expected to contribute to a respectful, welcoming and inclusive environment for every other member of the class. In lab and lecture, I expect us to strive to build a community in which:
- We are not code snobs. We do not assume knowledge or imply there are things that somebody should know.
- After our work is complete, we prioritize the education of others and actively offer to help, explain, debug, etc. in order to support one another’s learning. We do not share our working solution, but explain the logic/thinking behind our solution and help others recognize errors in their implementation when invited to do so.
- We consistently make the effort to actively recognize and validate multiple types of contributions to a positive classroom environment.
- We will be reading and discussing articles such as “Defensive climate in the computer science classroom”
Lying, cheating, attempted cheating, and plagiarism are violations of our Honor Code that, when identified, are investigated. Each instance is examined to determine the degree of deception involved.
Incidents where the professor believes the student’s actions are clearly related more to ignorance, miscommunication, or uncertainty, can be addressed by consultation with the student. We will craft a written resolution designed to help prevent the student from repeating the error in the future. The resolution, submitted by form and signed by both the professor and the student, is forwarded to the Dean of Students and remains on file.
Cases of suspected academic dishonesty will be reported directly to the Dean of Students. A student found responsible for academic dishonesty will receive a XF in the course, indicating failure of the course due to academic dishonesty. This grade will appear on the student’s transcript for two years after which the student may petition for the X to be expunged. The student may also be placed on disciplinary probation, suspended (temporary removal) or expelled (permanent removal) from the College by the Honor Board.
It is important for students to remember that unauthorized collaboration–working together without permission– is a form of cheating. Unless a professor specifies that students can work together on an assignment and/or test, no collaboration is permitted. Other forms of cheating include possessing or using an unauthorized study aid (such as a PDA), copying from another’s exam, fabricating data, and giving unauthorized assistance.
Remember, research conducted and/or papers written for other classes cannot be used in whole or in part for any assignment in this class without obtaining prior permission from the professor.
Diversity Statement (Cal Poly official statement)
At Cal Poly we believe that academic freedom, a cornerstone value, is exercised best when there is understanding and respect for our diversity of experiences, identities, and world views. Consequently, we create learning environments that allow for meaningful development of self-awareness, knowledge, and skills alongside attention to others who may have experiences, worldviews, and values that are different from our own. In so doing, we encourage our students, faculty, and staff to seek out opportunities to engage with others who are both similar and different from them, thereby increasing their capacity for knowledge, empathy, and conscious participation in local and global communities.
In the spirit of educational equity, and in acknowledgement of the significant ways in which a university education can transform the lives of individuals and communities, we strive to increase the diversity at Cal Poly. As an institution that serves the state of California within a global context, we support the recruitment, retention, and success of talented students, faculty, and staff from across all societies, including people who are from historically and societally marginalized and underrepresented groups.
Cal Poly is an inclusive community that embraces differences in people and thoughts. By being open to new ideas and showing respect for diverse points of view, we support a climate that allows all students, faculty, and staff to feel valued, which in turn facilitates the recruitment and retention of a diverse campus population. We are a culturally invested university whose members take personal responsibility for fostering excellence in our own and others’ endeavors. To this end, we support an increased awareness and understanding of how one’s own identity facets (such as race, ethnicity, gender, sexual orientation, religion, age, disability, social class, and nation of origin) and the combinations of these identities and experiences that may accompany them can affect our different worldviews.
Any student who feels he or she may need an accommodation based on the impact of a disability should contact me individually to discuss your specific needs. Also, please contact the Disability Resource Center: https://drc.calpoly.edu/content/drc-services.