I personally know that you're not meant to actually type in symbols straight into your HTML, but rather use the symbol entity codes. I.e. I've learned from having typed it so many times that '-' = -, I automatically use the entity code for it now.
Anyway, I've gotta work with about 1,000 or so files of static content, and I wanna know if there's a pretty neat script, front end or backend that can fix any invalid entities, I've noticed from having checked some of the files, that they output the likes of: canâ€™t as an example. I will not be going through each file, searching line by line for invalid characters.
It's bad enough that nearly every other page has a unique layout, don't ask, whoever made the site before is clearly not a very good developer... I mean it's impressive that he didn't stick to a theme of some sort, but a good thing, personally, I think not.
May as well give it a go aye? And only thing I've thought of trying is different char sets to be perfectly honest.
HOWEVER and please don't ask me why, it actually hurts me head, but they have some characters that, as far as I'm aware, they're VERY odd characters to us for web content, example being a collection of the following symbols:
.... I'm sure you get the picture around about now? .... I mean I know each symbol has it's own meaning and purpose, but half the time, with this current site i'm working on, I just wonder why they don't use a graphic instead, like with the hot beverage symbol....
And just so no one's gonna be silly or childish, yes, I included a swastika, and no, that one is not a nazi swastika but a Buddhism one (so I believe, may be hindi, I'm not an expert on the matter, either way that one is actually a religious symbol).
As for the encoder, I just ran the demo, and it seems to work just fine. The only concern I have is if it'll work before the browser tries to output the HTML, cause when you view the source code, it'll even display in there as canâ€™t.
I know this was not your actual question, but it's important to know that this is not correct.
You should always prefer the actual character, except in cases where that would be unclear/confusing (e.g., nonprinting characters) and where the character actually needs to be escaped in html context (i.e., <, >, &, and sometimes ', ").
"htmlentity all the things" is from days when no one had their sh*t together with unicode support. It's not been necessary/advisable for decades: declare your character encoding properly and that's all there is to it.
Long story short, the acual source code of this site if F$@KING DISGUSTING.... I wouldn't even write code this bad if I hated my life, I swear that this website was made by people who've just begun to learn HTML & CSS, like they've read a handful of articles on W3Schools or something, it's actually foul. This is by far, the worst source code I've ever seen. I spent 1 day tidying up 1 file, it was thousands of lines long, BUT..... EVERYTHING... And I mean literally EVERYTHING was in line, they had not used the enter button when writing this code, like at all....
I swear to god, jobs like this will cause me to smoke 50 fags a day at this rate. I mean I don't mind a pretty crappy file, full of br tags, empty divs, silly crap like that, I can handle that just fine, but literally ~5,000 lines of code, all inline, that's not even funny, my eyes are genuinely burning from trying to see what's f$@ked up and what isn't, some tags weren't closed correctly, some had wrong names, some inline css, all pretty disgusting things...
I assume you checked the end line characters? Most of the editors should recognize them anyway. I ask, since putting everything in one line requires an actual "effort" to do so when editing files or at least to have automated process before sending them (and that alone is mean).
I honestly don't know why everything is inline, I mean I just can't see why on earth you'd do that. I mean I could even somewhat understand if it was a 'compiled' version, but this is legit the developers files. Just why?
Thank F$!K I've just finished rebuilding all of the web pages, it has without a doubt been a painful process, I can certainly say I've learned how to NOT wbe dev. I mean I've gone through 1,000 files within a week and a bit, and I'm honestly SO relieved now that it's all done, I can put that f$@king nightmare behind me and hopefully go onto projects where the previous developers have at least tried to make an effort!
It has been so painful that I built myself a tool to help me validate every individual page.
Oh no, I mean that was literally the developer version, before having been compressed or whatever, the compressed version is actually nicer, it has less tags. ... They actually sorted out their br tags in that one...
And I would've just used an IDE where it does that, but not only was the code foul to look at, but it was screwed up, closing tags missing, sometimes too many tags, etc, you get the picture? ... I mean I'd be impressed if there is an IDE that could both make it look neater and actually rearrange some of the tags? .... I think that's asking too much, but I don't actually know if such a thing exists personally?
BUT in general, a HTML beautifier would've been useful either way, I genuinely didn't think of that, but then again, it would've been annoying 'cause I'd basically have to use it every 5 seconds... .... Either way, it's done now!