1. Description of the data format and appropriate method of analysis
The offered data set, 'Voting+records.csv', has a Multivariate Characteristic. It includes 435 instances (voting cases). In addition, it is consisted with 17 categorical attributes which is related with the several social policies. All those attributes are nominal. It seems that there are also several missing values.
Each voting cases are labeled as 'Republican' or 'Democrat' in the first Attribute. There are 168 republicans and 267 democrats among the total 435 instances. Also, This vote data set is a binary class problem. It seems that there is a connection between those responds ('Yes' or 'No') to those political issues and their inclination of politics ('Republican' or 'Democrat' class).
As a result, the main issues of data analysis could be suggested as follows.
1. Is it predictable who is republican or democrat with someone's vote records?
2. Is it predictable how someone will vote to the policy with someone's political party?
These objectives are related with the concept of distinguishing. It is possible to predict the likelihood of each responds with the Association rule.
Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness.1 In short, Association rule is used to show the relationships between data items. In conclusion, the analysis with the Association rule is appropriate in the case. Using the Association rule with the associatior 'Apriori', it is possible to find best rules which explain the data set most appropriately.
Attribute Information
1. political party: 2 (democrat, republican)
2. handicapped-infants: 2 (y,n)
3. water-project-cost-sharing: 2 (y,n)
4. adoption-of-the-budget-resolution: 2 (y,n)
5. physician-fee-freeze: 2 (y,n)
6. el-salvador-aid: 2 (y,n)...