@adamscott38
If you want to download a .csv and do a transformation and load somewhere else ( This is called an ETL process ) Python is the most practical programming language for this. numpy or just the python standard libs can do the job.
If you are downloading a form and “munging” the data then uploading it, and this data constantly changes ( so basic Reporting ) there are many avenues for this. You can have a Jenkins automation server can handle it, or setting up systemd timer/ service files as well.
Is there something we can look at to get a scope of the project ?
Some sample date would be nice, how many columns, is it constant column count, are there headers, do you want to remove same elements in every row, how do you access “elsewhere”?
Sound fairly trivial, if I get it correctly you basically want to:
get data from csv file
for each line in csv remove element x, y and z and add line to new file
upload new file
Should be easy to do in pretty much any scripting language. Python comes to mind, but I hate the damn thing, no semicolons you see.
Thank you all for having the patience to kindly ask about the info I should’ve posted in the first place lol
Here’s what the data looks like:
Timestamp
Name
Equipment
Miles
Fuel
Oil
2021/12/18 1:39:32 AM PST
test
V-001 - F-150
1234
30% - 39%
Okay
2021/12/18 1:40:10 AM PST
test2
UTV-001 - Yamaha
4321
70% - 79% [5/7]
Okay
I tried tackling the editing problem using Python and pandas, like this:
import pandas
df = pandas.read_csv('sample.csv')
for col in df.columns:
if 'Timestamp' not in col:
if 'Equipment' not in col:
if 'Miles' not in col:
del df[col]
df['Timestamp'] = df['Timestamp'].str[:10]
And got this:
Timestamp
Equipment
Miles
12/18/2021
V-001 - F-150
1234
12/18/2021
UTV-001 - Yamaha
4321
But this is wrong. The edit on Timestamp should happen after a space, so that we can run it on the Equipment column and end up with this:
Timestamp
Equipment
Miles
12/18/2021
V-001
1234
12/18/2021
UTV-001
4321
This would get uploaded to a CMMS web app, which uses this info to generate work orders for oil changes, etc. Selenium, maybe? No idea on how to do this tho…
for col in df.columns:
if 'Timestamp' not in col:
if 'Equipment' not in col:
if 'Miles' not in col:
del df[col]
Maybe you wanted:
df = df[['Timestamp', 'Equipment', 'Miles']]
And instead of truncating first column as strings and fiddling with spaces, it’s a better practice to import it into an actual date time to begin with:
pd.read_csv('sample.csv', dtype=[datetime....
You can then filter, sort, do datetime arithmetic, format these datetimes back into strings in a format of your choice and so on…
What’s a CMMS?
There’s this e-book that has a chapter on interfacing with Google sheets… which is how you can pick up Google forms data
Other chapters have some useful stuff too, and the whole things is not as dry to read as purely technical references normally are which is good for folks new to using Python for glueing things together.
To get the CSV? If you can just get an url for the CSV you can probably just give that to pandas.
Otherwise there are many web scraping libraries that do not need a browser. They fall apart usually when loads of JavaScript is needed for the part of the page you want to exist. If it exists without the JavaScript having executed there is nothing to worry about. You can disable JavaScript in your browser to check. Or just yolo find the element and see what happens in python.
For python beautiful soup seems to be one of the more popular once. Never done this in python. I have always done this with nodejs because I’m assuming it also being JavaScript makes it easier to apply mocks for DOM APIs that are implemented by the browser normally. Not sure how beautiful soup does it. They might not or they use node under the hood or they went full ham and reimplemented JavaScript in python. Either way lots of people use python for this. However they do it chances are it works reasonably well.
If you need a browser for it selenium is not what you would use that is for testing. Something like puppeteer would be more fitting. But that is really a last resort when the other options dont work to get the CSV.