Help for a code

This is the place for queries that don't fit in any of the other categories.

Help for a code

Postby leosava » Sun Oct 27, 2013 9:58 am

Hello Guys.

I am new with python, I need to do assignment for my degree. Unfortunately I have no idea how to start it at all. I don't want that nobody does it for me but just give me some idea how to start it please. Below there is the specification for the assignment.

Thank you in advance any help is appreciated.

ncome predictor

Using a dataset ( the "Adult Data Set") from the UCI Machine-Learning Repository we can predict based on a number of factors whether someone's income will be greater than $50,000.

The technique

The approach is to create a 'classifier' - a program that takes a new example record and, based on previous examples, determines which 'class' it belongs to. In this problem we consider attributes of records and separate these into two broad classes, <50K and >=50K.

We begin with a training data set - examples with known solutions. The classifier looks for patterns that indicate classification. These patterns can be applied against new data to predict outcomes. If we already know the outcomes of the test data, we can test the reliability of our model. if it proves reliable we could then use it to classify data with unknown outcomes.

We must train the classifier to establish an internal model of the patterns that distinguish our two classes. Once trained we can apply this against the test data - which has known outcomes.

We take our data and split it into two groups - training and test - with most of the data in the training set.

We need to write a program to find the patterns in the training set.

Building the classifier

Look at the attributes and, for each of the two outrcomes, make an average value for each one, Then aveage these two results for each attribute to compute a midpoint or 'class separation value'.

For each record, test whether each attribute is above or below its midpoint value and flag it accouringly. For each record the overall result is the greater count of the individual results (<50K, >=50K)

You'll know your model works iff you achieve the same results as thee known result for the records. You should track the accuracy of your model, i.e how many correct classifications you made as a percentage of the total number of records.

Process overview

Create training set from data
Create classifier using training dataset to determine separator values for each attribute
Create test dataset
Use classifier to classify data in test set while maintaining accuracy score

The data

The data is presented in the form of a comma-delimited text file (CSV) which has the following structure:

Listing of attributes:

1. Age: Number.
2. Workclass: Can be one of -- Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
3. fnlwgt: number. This is NOT NEEDED for our study.
4. Education: Can be one of -- Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. This is NOT NEEDED for our study.
5. Education-number: Number -- indicates level of education.
6. Marital-status: Can be one of -- Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
7. Occupation: Can be one of -- Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
8. Relationship: Can be one of -- Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
9. Race: Can be one of -- White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
10. Sex: Either Female or Male.
11. Capital-gain: Number.
12. Capital-loss: Number.
13. Hours-per-week: Number.
14. Native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands. This is NOT NEEDED for our study.
15. Outcome for this record: Can be >50K or <=50K.

Data is available from ... You should be able to read this direcctly from the Internet.

Fields that have 'discrete' attributes such as 'Relationship' can be given a numeric weight by counting the number of occurrances as a fraction of the total number of positive records (outcome >= 50K) and negative records (outcome < 50K). So, if we have 10 positive records and they have values Wife:2, Own-child: 3, Husband:2, Not-in-family:1, Other-realtive:1 and Unmarried:1 then this would yield factors of 0.2, 0.3, 0.2, 0.1, 0.1 and 0.1 respectively.
Last edited by Mekire on Sun Oct 27, 2013 10:18 am, edited 1 time in total.
Reason: First post lock.
Posts: 4
Joined: Sun Oct 27, 2013 9:29 am

Re: Help for a code

Postby Kebap » Sun Oct 27, 2013 12:59 pm

leosava wrote:I have no idea how to start it at all.

Welcome to the forums - and to python! I am sure you have learned some things, that you can now use?

Usually, if you find a big and complex problem, you try to split it into multiple smaller problems. Then start solving one of the small problems. If it is still too complex, split it again. When you have solved all the small problems, combine the solutions to solve the bigger problems.
Learn: How To Ask Questions The Smart Way
Join the #python-forum IRC channel on and chat with uns directly!
Posts: 396
Joined: Thu Apr 04, 2013 1:17 pm
Location: Germany, Europe

Re: Help for a code

Postby leosava » Mon Oct 28, 2013 10:54 am

Thank you for the reply. I will do that
Posts: 4
Joined: Sun Oct 27, 2013 9:29 am

Re: Help for a code

Postby Kebap » Mon Oct 28, 2013 12:45 pm

When you get stuck somewhere, be sure to check back, and show what you have. We can surely offer help, then.
Learn: How To Ask Questions The Smart Way
Join the #python-forum IRC channel on and chat with uns directly!
Posts: 396
Joined: Thu Apr 04, 2013 1:17 pm
Location: Germany, Europe

Return to General Coding Help

Who is online

Users browsing this forum: Baidu [Spider], W3C [Linkcheck], Yoriz and 4 guests