[Help wanted] Need to download, edit, and upload CSV file

Hi!

We need to download the CSV file from a Google Form, remove some columns, and reupload it elsewhere.

If you’re a developer with some free time, shoot me a pm with what it would cost to implement something like this.

If you have any suggestions on how something like this could be tackled please post as well! :heart:

Do you have Excel, LibreOffice, etc. ?

  1. download the CSV file
  2. rightclick → open with → Excel/LibreOffice Calc/etc.
  3. Make your changes
  4. File → Export → CSV
3 Likes

Since you passed on the charges, can i charge OP? I want a nice NFT for @MazeFrame ’s effort. Thanks.

3 Likes

that’s why I say:

why much technology when few do trick

2 Likes

I just have to add…

  1. ?
  2. profit
3 Likes

Apologies for being so vague, and thanks for the laughs :joy: @MazeFrame @regulareel @lurk3r @vivante

We need this to be automated. I was thinking a machine running arch btw and a Cron job that runs a .py maybe?

if anyone has an idea on how many NFTs this would cost hit me up lol

1 Like

@adamscott38
If you want to download a .csv and do a transformation and load somewhere else ( This is called an ETL process ) Python is the most practical programming language for this. numpy or just the python standard libs can do the job.

If you are downloading a form and “munging” the data then uploading it, and this data constantly changes ( so basic Reporting ) there are many avenues for this. You can have a Jenkins automation server can handle it, or setting up systemd timer/ service files as well.

Is there something we can look at to get a scope of the project ?

1 Like

Some sample date would be nice, how many columns, is it constant column count, are there headers, do you want to remove same elements in every row, how do you access “elsewhere”?
Sound fairly trivial, if I get it correctly you basically want to:

get data from csv file
for each line in csv remove element x, y and z and add line to new file
upload new file

Should be easy to do in pretty much any scripting language. Python comes to mind, but I hate the damn thing, no semicolons you see.

1 Like

As usual, someone else already had this requirement and even wrote up a nice article about how to do it the ‘proper’ way…

1 Like

These are all amazing suggestions, definitely have a lot of reading to do! @Hammerhead_Corvette @vivante @MadMatt

Thank you all for having the patience to kindly ask about the info I should’ve posted in the first place lol

Here’s what the data looks like:

Timestamp Name Equipment Miles Fuel Oil
2021/12/18 1:39:32 AM PST test V-001 - F-150 1234 30% - 39% Okay
2021/12/18 1:40:10 AM PST test2 UTV-001 - Yamaha 4321 70% - 79% [5/7] Okay

I tried tackling the editing problem using Python and pandas, like this:

import pandas
df = pandas.read_csv('sample.csv')

for col in df.columns:
    if 'Timestamp' not in col:
        if 'Equipment' not in col:
            if 'Miles' not in col:
                del df[col]

df['Timestamp'] = df['Timestamp'].str[:10]

And got this:

Timestamp Equipment Miles
12/18/2021 V-001 - F-150 1234
12/18/2021 UTV-001 - Yamaha 4321

But this is wrong. The edit on Timestamp should happen after a space, so that we can run it on the Equipment column and end up with this:

Timestamp Equipment Miles
12/18/2021 V-001 1234
12/18/2021 UTV-001 4321

This would get uploaded to a CMMS web app, which uses this info to generate work orders for oil changes, etc. Selenium, maybe? No idea on how to do this tho… :upside_down_face:

Look, @vivante, no semicolons! :eyes:

for col in df.columns:
    if 'Timestamp' not in col:
        if 'Equipment' not in col:
            if 'Miles' not in col:
                del df[col]

Maybe you wanted:

df = df[['Timestamp', 'Equipment', 'Miles']]

And instead of truncating first column as strings and fiddling with spaces, it’s a better practice to import it into an actual date time to begin with:

pd.read_csv('sample.csv', dtype=[datetime....

You can then filter, sort, do datetime arithmetic, format these datetimes back into strings in a format of your choice and so on…

What’s a CMMS?


There’s this e-book that has a chapter on interfacing with Google sheets… which is how you can pick up Google forms data

https://automatetheboringstuff.com/2e/chapter14/

Other chapters have some useful stuff too, and the whole things is not as dry to read as purely technical references normally are which is good for folks new to using Python for glueing things together.

To get the CSV? If you can just get an url for the CSV you can probably just give that to pandas.

Otherwise there are many web scraping libraries that do not need a browser. They fall apart usually when loads of JavaScript is needed for the part of the page you want to exist. If it exists without the JavaScript having executed there is nothing to worry about. You can disable JavaScript in your browser to check. Or just yolo find the element and see what happens in python.

For python beautiful soup seems to be one of the more popular once. Never done this in python. I have always done this with nodejs because I’m assuming it also being JavaScript makes it easier to apply mocks for DOM APIs that are implemented by the browser normally. Not sure how beautiful soup does it. They might not or they use node under the hood or they went full ham and reimplemented JavaScript in python. Either way lots of people use python for this. However they do it chances are it works reasonably well.

If you need a browser for it selenium is not what you would use that is for testing. Something like puppeteer would be more fitting. But that is really a last resort when the other options dont work to get the CSV.