Working with csv/tabbed files in Python

This is probably the limits of my knowledge/experience. Without having all the files on my machine, I can’t really help unless I run to see everything. I have had more experience working with pandas, so I am more confident with my answers than I am with python :sweat_smile:.

If you were going to do code it out rather than use pandas, you’d probably want a loop to go through the data looking for a match with asia and add it to a running sum variable.
^ but that method adds a lot of work for yourself. Pandas’s groupby function can group and sum by a particular category. In this case I’d probably groupby df.groupby([country]).sum().

If you want to do this without pandas, I’d suggest looking into the pandas library to see how they implmented groupby.

1 Like

You’re absolutely right, I didn’t mean to ask too much of you or anyone that’s here. Brainstorming something I think it’s the best way to achieve a solution in the end.

I think I figured out how to do it: right now I’m selecting all the rows that have the same continent and copying all the values to an array. Now I FINALLY have an array of values I can sum.

If I manage to get this, at least working, I only need to order all the values and for that I can use just a bubble sort or something like that.

Thanks for helping me brainstorm this. What I’m basically doing is “bandaid coding”: I just try doing something and if it spits some error I try to fix it. If I can’t fix it I try something different lol

2 Likes

I know this is a necro almost but have you thought about converting the csv to Jason and then you can package all your data into nice little lists? Python likes JSON. I ended up having a similar but easier assignment and ended up writing everything to a list with commas etc as the delimiter?

Mabye basic but yeah :stuck_out_tongue:

that would be inefficient, using json means you transform strings to json format & load it into dict, when csv module can do csv strings to dict out of the box. the most efficient way to read csv is chunking it with csv or make use of buffer parameter in csv.reader or DictReader.
pandas makes processing & playing with tabular data convenient, but pandas is intended to be a replacement of spreadsheet & more, so it can be overkill in this case

import csv
from datetime import datetime

with open('owid-covid-data.csv') as f:
    reader = csv.reader(f)
    headers = next(reader) # reads the first line of the file, the column names
    cases_date = datetime(2020, 4, 30)
    result = []
    # either load the rest of the entire csv as lists in list, dumping all into memory
    # -> rows = [row for row in reader]
    # or loop line by line, with assigning first 4 rows & the rest to vars 
    for row in reader:
        iso_code, continent, location, current_dt, *rest = row
        if all([
                not iso_code,   # empty iso
                not continent,  # empty continent
                cases_date == datetime.fromisoformat(current_dt)
            ]):
            print(iso_code, continent)
            result.append({current_dt: row})
2 Likes

Thanks @UHI and @deadvoid for the answers.

I have to agree with deadvoid on the use of json. I looked into using json files but in the end it was much easier to parse cvs files with Python, weirdly enough. And basically the result of my code is almost the same that deadvoid posted.

1 Like

If it works it works! Did you pass pass :)?

1 Like

I did! Not just by a stroke of luck luckly. Also I avoided the java bullet for this class and managed to use python. Thanks for asking!