|
These papers describe basic features of the data and how it was retrieved and processed. Because it is a very large data set there will inevitably be some mistakes in it. I do my best to continue improving the data--if you find mistakes please let me know and I will fix them.
Another extremely useful site for bill data is the
Congressional Bills Project by
E. Scott Adler and John Wilkerson.
UPDATED DATA FILES
Updated on 09/22/2010
Andrew Scott Waugh and Yunkyu Sohn recently updated the cosponsorship data to include improvements from the Thomas database and data from the 110th Congress. A zipped copy of all new files is available here.
-
The house_bills.zip and senate_bills.zip archives contain 4 column, comma-separated text files, with each row representing a bill, and columns containing the Congress number, bill type, bill number, and a dummy indicating if the bill was a private bill.
-
The house_committees.zip and senate_committees.zip archives contain text files of identical length to house_bills.txt and senate_bills.txt. Each row contains a comma-separate vector of committee referrals for the corresponding bill.
-
The house_matrices.zip and senate_matrices.zip archives contain comma-separated sponsorship/cosponsorship matrices for the 93rd-110th Congresses. Each row represents a Congressman, each column represents a bill. A value of 1 represents a sponsorship, while 2 represents a cosponsorship, 3 represent a cosponsorship after withdrawing the previous cosponsorship, and 5 represents a withdrawn cosponsorship.
-
The house_datematrices.zip and senate_datematrices.zip archives contain comma-separated matrices of sponsorship/cosponsorship dates for the 93rd-110th Congresses. If Congressman i sponsored/cosponsored bill j, then datematrix[i,j] contains the date of that sponsorship/cosponsorship in a string.
-
The house_members.zip and senate_members.zip archives contain 3 column, comma-separated, lists of Congressmen who served in the 93rd-110th Congresses corresponding to the rows in the cosponsorship matrices. The columns indicate the Congressman's name, Thomas ID#, and ICPSR ID#.
-
The house_status.zip and senate_status.zip archives contain 4 column, comma-seaprated text files of bill status information, with each row corresponding to a bill in house_bills.txt or senate_bills.txt. These columns indicated whether the bill passed the House, passed the Senate, was agreed to in a conference, and signed/vetoed by the President.
-
Some of the bill html files at the Thomas database have incomplete information. The list of the incomplete html files is available here.
DATA FILES FOR REPLICATION
A zipped copy of all files used in the original two articles is available here. In these files an "NA" or a null value indicate the data is missing or was not matched. These occur because data was not available (e.g. early cosponsorship dates were not available at the time these files were generated) or there was a typo in Thomas or other problem with the matching procedure.
The next files are all 283,994 element vectors with measures on each bill.
-
The bills.txt file is the name of each bill as identified in the Thomas database. The name identifies the type, chamber, Congress, and number of each bill. Here's a key:
|
HC
|
House Concurrent Resolutions
|
|
HE
|
House Resolutions
|
|
HJ
|
House Joint Resolutions
|
|
HR
|
House Bills
|
|
HZ
|
House Amendments
|
|
SC
|
Senate Concurrent Resolutions
|
|
SE
|
Senate Resolutions
|
|
SJ
|
Senate Joint Resolutions
|
|
SN
|
Senate Bills
|
|
SP
|
Senate Amendments
|
-
The senate.csv file and house.csv are csv files that match ICPSR numbers ("id") to names ("name") and a few other variables for all congresses. ICPSR numbers are derived from http://voteview.com/icpsr.htm and change if a person switches party, so it is important to match by congress.
-
The sponsors.txt file identifies the ICPSR code of each bill sponsor
-
The cosponsors.txt file identifies the ICPSR codes of each cosponsor (one bill per line, each cosponsor is space delimited) -- large (13M)
-
The cospcount.txt file is the total number of cosponsors on each bill
-
The dates.txt file is the date each bill was introduced
-
The cosponsordates.txt file shows the space delimited date(s) each bill was cosponsored -- the order of dates on each line conforms to the order of cosponsors on each line in the cosponsors.txt file -- large (22M)
-
The party.txt file shows the party of sponsor
-
The passedam.txt file shows whether amendment passed on the floor
-
The passedbills.txt file shows whether bill passed on the floor
-
The publaws.txt file shows whether bill became public law
-
The pvtbills.txt file shows whether bill is a "private" bill

This work by James H. Fowler, Andrew Scott Waugh, and Yunkyu Sohn is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
|