Determinants of Wage and Salary Income

Professor instructions:
To write an original econometric research paper, using the IPUMS CPS data,  to research an economic question of your choice. Your paper should pose a question, develop a hypothesis, and then apply the econometric skills developed in class to analyze the issue empirically. In addition to answering an economic question based on the data, three tests should be included in the analysis: multicollinearity, heteroskedasticity, and serial correlation (if necessary). Because the CPS samples have a large number of observations over a large number of years, it is likely a good idea to limit your research to a certain geographic area and time period, as well any other data restrictions that are appropriate for your research question.
IPUMS-CPS is an integrated set of data spanning more than 50 years (1962-forward) of the Current Population Survey (CPS). The CPS is a monthly U.S. household survey conducted jointly by the U.S. Census Bureau and the Bureau of Labor Statistics. Initiated in the 1940s in the wake of the Great Depression, the survey was designed to measure unemployment. A battery of labor force and demographic questions, known as the “basic monthly survey,” is asked every month. Over time, supplemental inquiries on special topics have been added for particular months. Among these supplemental surveys, the Annual Social and Economic Supplement conducted in March (hereafter referred to as the ASEC) is the most widely used by social scientists and policymakers.
IPUMS is not a collection of compiled statistics; it is composed of microdata. Each record is a person, with all characteristics numerically coded. Because the data are individuals and not tables, researchers must use a statistical package to analyze the millions of records in the database. A data extraction system enables users to select only the samples and variables they require. You will need to register with the IPUMS-CPS website to extract data. IPUMS produces fixed-column ASCII data (a text file).
In addition to the ASCII data file, the system creates a statistical package syntax file to accompany each extract. The syntax file is designed to read in the ASCII data while applying appropriate variable and value labels. SPSS, SAS, and Stata are supported. You must download the syntax file with the extract or you will be unable to read the data. The syntax file requires minor editing to identify the location of the data file on your local computer. A codebook file is also created with each extract. It records the characteristics of your extract and should be downloaded for record-keeping. All data files are created in gzip compressed format. You must uncompress the file to analyze it. Most data compression utilities will handle the files.

, I want to find out how INCWAGE under person-income, which is “Wage and salary income” is determined on several variables. Which are, SCHLCOLL(School or college attendance) under person-education, and WKSWORK1(Weeks worked last year) under person-work.If possible, take sex and race as variables as well, which are under person-Core Demographic. Make regression models for “wage and salary income” on these variables(such as linear, log-linear, cubic or some other models that may apply). In each model, find out the coefficients for each variable and use the hypothesis test to figure out whether each coefficient is significantly different from 0. And use the F test to test whether each variable have a significant effect on wage and salary income. Calculate the Goodness-of-fit(R^2) for each model and decide which model is the best. And just as the professor’s requirement: multicollinearity, heteroskedasticity, and serial correlation (if necessary) should also be included. The sample we use is during 2000s(from 2000 to 2009). Thus, the data cart will have 10 samples, 3 to 5 variables. SPSS, SAS, and Stata is required to read these data, and Stata is preferred.