Automating image conversion and html generation with Bash

This thread is for development of a small script that will automate maintenance tasks for gallery-centered website. A byproduct of this is getting more proficient in Bash.

A little background. I’ve build a fully responsive website without any CMS or static site generator. Currently it has nearly 100 images, but I expect it to grow over time. Unfortunately lack of any automation of repetitive tasks is slowing me down. The most time consuming things are image preparation/conversion/resizing (1.) and pasting image details into an html file (2.).

Conditions:

  1. The website uses src-set based library (lazysizes) to serve image with the closest size to what is needed in the browser viewport. I was able to shave up to 80% of page size (and speed) when viewing it on certain mobile devices. This however requires having images in many different sizes available on a server. I’ve run some tests over couple months with sub-optimal setup of 5 different sizes and in some scenarios I still can reduce page size over 40%. After listing all major breakpoints I came up with around 14-16 sizes that are very close to actual image size displayed on the screen.
    Previously I used actions and batches in Photoshop to automate this, but now I’m on Linux and I want to build something that will do this quicker and better. I want to use ImageMagick for this part.

  2. Second part is related to how the images are presented on a website. Every picture have a structured description visible (like date of creation, title, etc.) and some not visible attributes like alt or certain classes. The gallery code overall have very clean and consistent DOM structure with very few exceptions across different gallery sections.

All of this got me thinking how to combine both operations in one script, and i came up with the following plan of attack:

TODOs:

  1. DONE Prepare high-resolution images in one directory.
  2. DONE For every image write small txt file with all repetitive attributes whether visible or not on the website, like artwork description, title, year, alt, id, classes, etc.
  3. DONE Write a script that converts, resizes and saves all images to a desired location
  4. DONE Write second script that takes all images filenames and corresponding descriptions and combines them into html output that can be copy-pasted into website code.

Prerequisites:

  • DONE investigate ImageMagick export and file naming options
  • DONE extract and prepare html template, and list all exceptions (different classes, etc).
  • learn how to make basic logic in bash and python
  • learn how file outputs works in bash and python

Features:

  • jpg/png autodetection - all jpgs are to be converted to webp, pngs need just to be exported in different sizes
  • converted images should end in new /media/ directory
  • attributes defined in input image.txt should be added in specific locations in html
  • html output should be pre-formatted with desired indentation
  • html output file should contain code for all images present in a directory in alphabetical order

Example image.txt file content:

* image1
* background-color-blue
* Image1 alt attribute.
* cover
* series-1
* Image1
* 2020
* 20 x 30 in
* Lorem ipsum dolor sit amet.

Example partial output:

<body>

<div id="image1" class="slide background-color-blue">
	<div class="img-container">
		<img src="https://my-domain.com/media/series-1/image1-300.webp"
			sizes="100vw"
			srcset="
				https://my-domain.com/media/series-1/image1-300.webp 300w,
				https://my-domain.com/media/series-1/image1-400.webp 400w,
				https://my-domain.com/media/series-1/image1-500.webp 500w,
				https://my-domain.com/media/series-1/image1-600.webp 600w,
				...
				https://my-domain.com/media/series-1/image1-1600.webp 1600w,
				https://my-domain.com/media/series-1/image1-1920.webp 1920w,
				https://my-domain.com/media/series-1/image1-2560.webp 2560w,
			alt="Image1 alt attribute." class="cover" loading="lazy">
	</div>
	<div class="overlay-button noslide" onclick="toggleOverlay(event)"></div>
	<div class="overlay noslide overlay-hide overlay-work">
		<p class="title"><i>Image1</i></p>
		<p class="year">2020</p>
		<p class="dimensions">20 x 30 in</p>
		<p class="description">Lorem ipsum dolor sit amet</p>
	</div>
</div>

<div id="image2" ... >
...
</div>

</body>

If you have some interesting or useful tips, or if you just want to comment feel free to post here.

6 Likes

@ me if you hit a wall with imagemagick. It’s been a while but I have used it to do conversions in the past.

1 Like

Sounds like a fun project!
My two cents: I think something like php would be better suited for this job.

  • Php makes it easier to render/generate the html you want than bash would.
  • The text file you mentioned seems to be acting like a database. Why not use an actual database?
  • ImageMagick is available as a php library.
  • The autodetection of new images can be done as a php script added to cron.

This might have already been obvious to you, but I just wanted to throw this out there just in case.

2 Likes

sounds interesting… I was also looking into bash generated html for some personal stuff but I found text parsing to be pita…

sed for example has to escape every other character which are standard in html but special for sed…

did you use something for that thats a little more manageable?

1 Like

Fun and usefull!

Thanks for the tips. Too bad I’m not a developer nor CS graduate. Not even an IT janitor haha. I’m just very, very, very, VERY lazy.

Lazy and cheap.

I don’t know PHP and haven’t worked with databases before, so this is complete overkill. Learning both would require a lot of my time and would add a lot overhead. At that point It would be faster for me to do all of this by hand. Which negates the reason for going PHP + db.

The website is a personal project and so far I don’t see it growing beyond 200-250 images in next 3-5 years. Also I don’t want to generate anything dynamically on a server or make database requests, etc. The less server is doing the better for my wallet.
All I need is to generate static code on my local machine that can be pasted ‘cavemen style’ to html file. I really don’t see any benefits of running database and generating code at runtime for a static image gallery unless we are talking thousands of pictures and professional tier project. I’m fine with my website only looking professional :^)

On another note - I didn’t yet found good, FOSS, local database software that can handle both text and images and is easy to set up. Granted - didn’t looked very hard.

Bottom line - PHP hard, db hard, Bash good, page fast, server cheap, Grug happy.

ooga-booga

Some time ago I made couple scripts for automating post-install steps in Linux and yeah. It wasn’t pretty. But I’ll find a way. I trust my laziness with this one.

Not so far. But will look around.

1 Like

I’ve done some work.

First I checked how string operations looks in couple languages. Python got my eye with very nice fstrings. They are very easy to write and read, @nbm

myfoo = 'Foo'
mybar = 'Bar'

message = f'Lorem {myfoo}, {mybar}. Ipsum!'
print(message)

Didn’t checked how to efficiently pass extracted data from file as variable yet, but it shouldn’t be that hard I guess.

If this is true I think I’ll go with python. Thou still want to use bash to integrate image processing and html generation. It will be easier to do imagemagick stuff in bash and I can call a python script after conversion is done.

First attempt at making architecture layout. List of all needed metadata / variables structured by file.

SERIES FILE (series_name as file name):
	- html_id
	- background_color_class
	- font_class
	- h3_description
	- h4_description

FOTO SERIES-OVERLAY FILE (series_overlay_description_foto as file name):
	- series_overlay_foto_alt

FOTO SERIES-SPLASH FILE (series_splash_foto as file name):
	- series_splash_foto_alt

FOTO WORK FILE (series_name_tag at the beggining of the file name followed by work_id):
	- slide_background
	- foto_module_type
	- work_title
	- work_technique
	- work_year
	- work_dimensions
	- work_alt

FOTO WORK-OVERLAY FILE (work_overlay_foto as file name):
	- work_overlay_alt

And the pseudocode with logic. I didn’t put any html in it because I wanted to better organize the structure. Nevermind the syntax :joy:.

foto_module_type_one() {
	src-set;
	alt;
}

foto_module_type_slider() {
	src-set;
	alt;
	src-set;
	alt;
}

foto_module_type_quad() {
	src-set;
	alt;
	src-set;
	alt;
	src-set;
	alt;
	src-set;
	alt;
}

main() {
	for series_name[] {
		series_comment_header;
		series {
			html_id;
			background_color_class;
			font_class;
			series_name;
			series_overlay {
				font_class;
				h3_description;
				h4_description;
				foto_module_type (series_overlay_description_foto) {};
			}
			series_splash_foto {
				src-set;
				alt;
			}
		}
		for work_id[] with same series_name_tag {
			work_id;
			slide_background;
			foto_module_type(work_overlay_foto) {};
			work_title;
			work_technique;
			work_year;
			work_dimensions;
			work_foto {
				src-set;
				alt;
			}
			empty line;
		}
		empty line;
	}
}

Missing things:

  • logic for foto_module_type, this will be hard for me because I don’t know what tool to use here. The foto_module_type should be used when valid foto_module_type name is specified in work.txt or series.txt. Then the module should find corresponding image file/s in specified directory based on work_id or series_name prefix in their names.
  • image file naming convention and directory structure. The last one should be identical locally and on server.
1 Like

Started Bash groundworks. The images are stored in /images/ and are organized in tree that looks like one below. The script is copying all /image tree without files into /media/. So it looks like this:

dirs

/images/ contain all high-res images and their work.txt files.
/media/ should have only resized and converted images, so it can be easily and safely send to server.

Some cleanup first:

# go to ~/my_path/images directory
cd ~/my_path/images/;
# copy directory structure from /images to ~/my_path/media
find . -type d -exec mkdir -p -- ~/my_path/media/ {} \;
# rename all images to lowercase
find . -type f -exec rename 'y/A-Z/a-z/' * {} \;
# changes underscore to hyphen for all images names
find . -type f -exec rename 'y/_/-/' * {} \;

Next, the main part. So far I listed all exceptions with simple find checks. Some files will have wide- at the beginning of the name and I want to resize them differently than the others. JPGs are to be converted to webp, PNGs - not.

# JPGs without "wide-" wildcard
find . -type f -name "*.jpg" ! -name "wide-*" -exec bash -c mogrify ... {} \;
# JPGs with "wide" wildcard
find . -type f -name "*.jpg" -name "wide-*" -exec bash -c mogrify ... {} \;
# PNGs without "wide" wildcard
find . -type f -name "*.png" ! -name "wide-*" -exec bash -c mogrify ... {} \;
# PNGs with"wide" wildcard
find . -type f -name "*.png" -name "wide-*" -exec bash -c mogrify ... {} \;

@oO.o you predicted me hitting a wall with ImageMagick. And I did.

So far I can convert, resize and move image to new directory but only one at a time. This would work for all sizes but I would need to make 16 different routines for every exception listed above:

mogrify -path ~/my_path/media/ -resize 800 -format webp *.jpg
mogrify -path ~/my_path/media/ -resize 900 -format webp *.jpg
mogrify -path ~/my_path/media/ -resize 1000 -format webp *.jpg
...

So I started to look how to convert to multiple sizes at once. And this is what I found:

for i in 300 400 500 600 700 800 900 1000 1250 1400 1600 1920 2560 3000; 
	do mogrify -resize "$i" -write test"$i".webp *.jpg; done;

But the code above works partially. I can’t make it work for multiple files, or output them to different location.
Tried to nest this in another loop executed for every file but I don’t know how and where to pass the filename so the output is not overwritten every time. Also tried to put -path with a destination directory in mogrify, but it’s ignored and the files are saved in source folder, although it works with single resize for multiple files.

Last problem is making all this work recursively for many directories inside /images/ and save the output to in corresponding /media/ directory.
So far I was able to convert all images from /images/ recursively and put it in /media/ dir, but not in their desired subfolders. Did it with this:

find . -type f -name "*.jpg" -not -name "wide-*" -exec mogrify -path ~/my_path/media/ -resize 300 -format webp {} \;

Making progress.
This script takes files from child directories of /images/, resizes them, converts them to webp, adds size to their names and saves them in respective child directory in /media/.

I still can’t wrap my head around recursion and how to apply this operation to directories upper in the tree.

destination=~/my_path/media ;
for source in ~/my_path/images/*/ ; do
	# makes variable from current dir name - to be passed in destination path
	(cd "$source" && local=$(basename "`pwd`") &&
		# makes variable with selected files
		files="$(find . -type f -name "*.jpg" -not -name "wide-*")"
		for filename in $files; do
			basename=$(basename "$filename" .jpg)
			# converts images, adds size to filenames, and saves in destination directory
			convert "$source"/$basename.jpg -resize 300 "$destination"/"$local"/$basename-300.webp
			convert "$source"/$basename.jpg -resize 400 "$destination"/"$local"/$basename-400.webp
		done
     )
done
1 Like

you could clean this up a bit with a variable and an array.

ARRAY=(300 400 500 600 700 800 900 1000 1250 1400 1600 1920 2560 3000)

for i in "${ARRAY[@]}"
	do mogrify -resize "$i" -write test"$i".webp *.jpg; done;

This way if you need more elements in the array you can just add them. Slightly more elegant…

1 Like

Ohh, thanks. That will be very useful indeed for iterating through sizes.

I probably won’t use mogrify because it cannot append anything to the filenames. That’s why I went with convert in last code snippet. So I might go with:

ARRAY=(300 400 500 600 700 800 900 1000 1250 1400 1600 1920 2560 3000)

for i in "${ARRAY[@]}"
	do convert "$source"/$basename.jpg -resize "$i" "$destination"/"$local"/$basename-"$i".webp
done

Checked, and it’s working great.

2 Likes

More cleanup so you can do the translation in one command.

find . -type f -exec rename 'y/A-Z_/a-z-/' * {} \;
2 Likes

Iirc, mogrify will run a async/parallel to some degree so depending on your performance needs, you might want to separate naming from conversion for that reason.

2 Likes

You can combine these into 1 command, first start with by adding -o:

JPGs without "wide-" wildcard
find . -type f -name "*.jpg" ! -name "wide-*" -exec bash -c mogrify ... {} \;
PNGs without "wide" wildcard
find . -type f -name "*.png" ! -name "wide-*" -exec bash -c mogrify ... {} \;

  into this 

find . -type f -name "*.jpg" -o -name "*.png" ! -name "wide-*" -exec bash -c mogrify ... {} \; 

there’s a use for an if here, so something like

if -name "wide-*" then -exec bash -c mogrify ... {} \;  fi

so you might have to test that out if you like.

1 Like

Don’t really care, will convert once in a month tops, so even if there is a slight delay, I’m ok with it. And I’ve got a Threadripper.

The biggest problem for me right now is how to invoke the for loop recursively for /images/ and export to matching dir in /media/.
I can make this work manually since there is only one folder /works/with nested content, but this is not very elegant and if any new dir will appear the script will break.

1 Like

I’m afraid this wont work as jpgs are to be converted to webp and png->png.

Oh ok, I couldn’t tell because that portion of your code is obfuscated. Left me guessing. A little refactoring, adding some logic and maybe making this a function instead would work.

Well, I’m not too experienced with bash scripting, but if this were python you could use a dictionary {key: value} :


import shutil

my_dictionary = {
    "/images/series-overlay/": "/media/series-overlay", 
    "/images/series-spalsh/": "/media/series-splash", 
    "/images/works": "/media/works", 
    "/images/works/series1": "/media/works/series1"
}

for frm,to in my_dictionary:
    new_dir = os.path.dirname(to)    #create dir in to if not exit
    os.makedirs(directory, exist_ok=True)
    shutil.move(frm,to)
    

and you could recurse through the {keys: } to invoke the mv command on the { : value}

EDIT: I guess you can do this in bash by using declare -A and declaring a hash map, but the syntax is different.

#!/usr/bin/env bash

my_dictionary=( ["/images/series-overlay/"]="/media/series-overlay" ["/images/series-spalsh/"]="/media/series-splash" )

I’m throwing up ideas for you since I haven’t seen your code, I can’t offer too much

1 Like

If you want to stick with shell, can you just derive the output file from the source with sed?

echo $current_path | sed s|/images/|/media/|

Or something similar…

2 Likes

Thanks for help!

Sorry, will post it as soon as I solve the directory recursion.

This is useful. Your example is meh as it implies manual work to fill the key and value, but I’ll try to automate it.

Grug Thank.
I went a little different route. With the code below the script always returns absolute path, so I don’t need to pay attention where entire project folder is located.

# get absolute paths of /images/*
find ~+ -type d
# get absolute paths of /media/*
find ~+ -type d | sed s/images/media/g

And since both outputs match each other I’m thinking of making an mapfile from each one. And then combining them both kinda like this, but I don’t know if this will run or if this is even valid syntax.

my_trees=( ["${#mapfile_images[@]}"]="${#mapfile_media[@]}" )

But I didn’t tried this yet. I have two problems with mapfile. I found two examples shown below, and I don’t understand some parts. What are '' and $'\0' doing exactly in that context? Is this part of -d delimiter?
Second question is about -print0. I cant find anything about it.

I don’t want to run blindly code found on the internet without understanding what it’s doing.

mapfile -d $'\0' ARRAY < <(find . -name "${input}" -print0)
mapfile -d '' ARRAY < <(find . -name "$input" -print0)

Probably doesn’t matter but I would take out the /g in the sed statement. You don’t want the substitution to happen more than once per path.

1 Like

Yeah. I don’t use images in series or works names, but better safe than sorry.

1 Like

If for no other reason, if I’m reading your code and see the /g, that tells me explicitly that you are expecting to have multiple replacements per line which would confuse me. Also might run a nanosecond faster, haha.

1 Like