I am reading in a dataset from this link:
url = "https://www.bis.doc.gov/dpl/dpl.txt"
This is how I have read it in (if I read it in as a csv I get the Forbidden error -> hence the use of requests):
import requests
test_URL = url
def get_data(link):
hdr = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Mobile Safari/537.36'}
req = requests.get(link,headers=hdr)
content = req.content
return content
data = get_data(test_URL)
The data I have read in look like this:
print(data)
b'"Name"\t"Street_Address"\t"City"\t"State"\t"Country"\t"Postal_Code"\t"Effective_Date"\t"Expiration_Date"\t"Standard_Order"\t"Last_Update"\t"Action"\t"FR_Citation"\n" I. ASH"\t"UPON THE DATE OF THE ORDER INCARCERATED AT USM NO: 26265-177, FCI SEAGOVILLE, 2113 NORTH HIGHWAY 175"\t"SEAGOVILLE"\t"TX"\t"US"\t"75159"\t"06/19/2003"\t"06/29/2056"\t"Y"\t"2007-01-31"\t"FEDERAL REGISTER NOTICE UPDATED"\t"68 F.R. 38290 6/27/03 71 F.R. 38843 7/10/06 72 F.R. 4236 1/30/07"\n"AARON ABRAHAM VILLA"\t"3415 RIVERA AVENUE"\t"EL PASO"\t"TX"\t""\t"79905"\t"08/24/2022"\t"01/14/2026"\t"Y"\t"2022-08-29"\t"F.R. NOTICE ADDED"\t"87 F.R. 52741 8/29/2022"\n"ABDIEL PADRON MADRID"\t"INMATE NUMBER: 42167-480, FCI LA TUNA FEDERAL CORRECTIONAL INSTITUTION, P.O. BOX 3000"\t"ANTHONY"\t"NM"\t""\t"88201"\t"02/10/2022"\t"06/17/2030"\t"Y"\t"2022-02-22"\t"F.R. NOTICE ADDED"\t"87 F.R. 9030 2/17/2022"\n"ABDUL MAJID SAIDI"\t"2948 PEASE DRIVE, APT. 201"\t"ROCKY RIVER"\t"OH"\t""\t"44116"\t"10/30/2020"\t"03/13/2026"\t"Y"\t"2020-11-05"\t"F.R. NOTICE ADDED"\t"85 F.R. 70581 11/5/2020"\n"ABDULAH AL NASSER"\t"605 TRAIL LAKE DRIVE"\t"RICHARDSON"\t"TX"\t"US"\t"75081"\t"03/04/2002"\t"06/29/2056"\t"Y"\t"2006-07-11"\t"50 YEAR DENIAL"\t"67 F.R. 56530 9/4/02 67 F.R. 10890 3/11/02 71 F.R. 38843 7/10/06"\n"ABDULAH AL NASSER"\t"908 AUDELIA ROAD, SUIE 200, PMB #245"\t"RICHARDSON"\t"TX"\t"US"\t"75081"\t"03/04/2002"\t"06/29/2056"\t"Y"\t"2006-07-11"\t"50 YEAR DENIAL"\t"67 F.R. 56530 9/4/02 67 F.R. 10890 3/11/02"\n"ABDULAMIR MAHDI"\t"20 HUNTINGWOOD DRIVE"\t"SCARBOROUGH, ONTARIO"\t""\t"CA"\t"M1W1A2"\t"10/03/2003"\t"10/03/2023"\t"N"\t"2003-10-06"\t"NON STANDARD DENIAL"\t"68 F.R. 57406 10/3/03"\n"ABDULLAH AL NASSER"\t"UPON THE DATE OF THE ORDER INCARCERATED AT USM NO: 26265-177, FCI SEAGOVILLE, 2113 NORTH HIGHWAY 175"\t"SEAGOVILLE"\t"TX"\t"US"\t"75159"\t"06/19/2003"\t"06/29/2056"\t"Y"\t"2007-01-31"\t"FEDERAL REGISTER NOTICE UPDATED"\t"68 F.R. 38290 6/27/03 71 F.R. 38843 7/10/06 72 F.R. 4236 1/30/07"\n"ABEL HERNANDEZ, JR."\t"120 SAINT JOHN DRIVE"\t"PHARR"\t"TX"\t""\t"78577"\t"04/30/2021"\t"08/29/2029"\t"Y"\t"2021-05-05"\t"F.R. NOTICE ADDED"\t"86 F.R. 23920 5/5/2021"\n"ABU AL-JUD"\t"INMATE NUMBER: 87450-083, FCI VICTORVILLE MEDIUM II FEDERAL CORRECTIONAL INSTITUTION, P.O. BOX 3850"\t"ADELANTO"\t"CA"\t""\t"92301"\t"03/31/2017"\t"06/13/2026"\t"Y"\t"2017-04-06"\t"FR NOTICE ADDED"\t"82 F.R. 16788, 16789 4/6/2017"\n"ADAM AL HERZ"\t"INMATE NUMBER: 13991-029, FMC ROCHESTER, P.O. BOX 4000"\t"ROCHESTER"\t"MN"\t""\t"55903"\t"08/13/2019"\t"10/13/2026"\t"Y"\t"2019-08-22"\t"FR NOTICE ADDED"\t"84 F.R. 43787 8/22/2019"\n"ADRIANA GABRIELA GUAJARDO-CAVAZOS"\t"CALLE MANUEL OTIZ #49, MATAMOROS, TAMAULIPAS"\t"MEXICO"\t""\t"MX"\t"87394"\t"05/08/2023"\t"11/12/2027"\t"Y"\t"2023-05-12"\t"ADDITION, F.R. NOTICE ADDED "\t"88 F.R. 30721 5/12/2023"\n"ADT ANALOG AND DIGITAL TECHNIK"\t"8019 NIEDERSEEON, HOUSE
Does anyone know how to convert the data above into a nice looking, well formatted pandas dataframe?
You can use
io.StringIOandpandas.read_csv:Note that you can also pass parameters to
requestthroughread_csv'sstorage_options:Output: