My Stupid List Of Benchmarks, and no not fucking firestrike

I hate synthetic benchmarks so I started inventing my own ridiculous and useful benchmarks.

So in the past I have posted about hardware that I have gotten. This is mostly just to show what projects I might do or to pose questions on the thread later on if I am really not getting something. Some of those threads have very weird benchmarks posted with them. These all to test different aspects of what the machine is supposed to do and are rated to different machines. Hi-Calibur machines get one kind, low-calibur another, and mid-range kind of get both but also have their own. At that some are designed to show what I will be doing with the machine and how fast I can expect to be able to do that from boot, closing an app, opening another app, or after setting idle for a minimum of 30 minutes (while the machine gets hot). We’ll start from low to high and go through what they do.

The AAAAAAAAAAAA Test:

The A test is applicable to any OS with an office suite. This is generally set with linux as I use that the most, but after startup and basic system checks, I generally open an office suite (or install libreoffice) and just hold down A. Could be any key, but A is just there. I keep a stop watch open on my phone and the CPU graph in the task manager. If the machine spikes to 100 I count the seconds between the spikes and count how many times it lags the entrance of text to the document. If it is a high lag time and the CPU graph constantly spikes, I generally go back to an older OS or trime the DE down to the basic essentials. In some cases trimming helps, like a lot.

Example: IBM 600X with a Pentium 3 MMX @ 650 MHZ, 386 MB ram, 1997 IDE Bus. The average time for startup on a 7200 RPM drive was around 56 seconds (Xubuntu). Opening the Office Suite (libre office) took 25-27 seconds and while it loaded I held the A key. As soon as the AAAAA started I got 4 in, CPU spiked, then 17, spike, 15, spike, and after another 3 spikes (around 35-45 seconds) it smoothed out and I could type as fast as I wanted.

This test is based on the CPU, the FSB, RAM timings, and over all system performance. A pentium M laptop (like my HP NW8000 for example) can be as fast as possible bit doesn’t like office suites. It likes rendering graphics. So we do…..

The Youtube Test:

Yes I use youtube as a benchmark. I have a CPU graph open while I load the YT web page. If it slams the ceiling then I know web browsing at all will be a problem. When a video loads I watch how long it takes for the entire video to buffer and test the different resolutions. I find that 360p is probably the best to do on any laptop up to 2008, then 720p is more possible and 1080p is the gold spot, obviously. Then after seeing how long it takes to load, I flip between the resolutions as fast as possible to see if the system lags (most of the time it does), and in the event that it does how long it takes to recover. If its a newer system (Core 2 Dua P8*** and on) this test can be avoided as it doesn’t do anything.

The Load Test:

This one is simple: bog the system down with as many apps as possible. If you have very little ram, under 512, this test helps show if you need to manage what apps you have open and what you do with the system. If you have a system with 1-2 GB ram then it will be more a test to see what all happens when you lag the system out. Some systems do different things if you believe me or not. Past 2 GB of ram this test is kind of pointless.

Example: IBM 600X, same specs as above. Boot, start fucking with the DE, just as much as possible. Hit every single button you can, jump around in work spaces, start opening terminals, then web browser windows, then office suites. This system specifically lags out with web browsers and more than 3 libreoffice windows. Terminals don’t matter. Opening games up on top of all this (such as mine sweeper) will totally crash the machine and force a reboot. Fixing this just takes a bigger swap file.

HP NW8000, 1.67 GHZ Pentium M, 2 GB ram. Again, boot, hit everything then open stuff. Can handle 5 firefox windows and 5-7 libreoffice windows. Lags more from heat generation than anything else, but still becomes slow after not opening that much.

Another good test on top of this is….

The Pony Test:

This sounds stupid, it really does, but the linux package qt-ponies and the windows app Desktop Ponies is an amazing CPU benchmark. Essentially each pony you spawn has its own log of what its done on your screen, a rather large list of what it can do, and a bit of AI behind it to interact with other ponies on your screen. This test mostly shows how intense your CPU can get in the number of database pulls it can perform.

Example: Lenovo Think Center TS140, 3.6 GHZ Xeon 1225v3 4c4t, 8 GB ram. Can only handle 45 ponies on the screen before the system slows down. Adding more will incremently kill the system till it crashes.

This test is good to show what your best hardware can do maxed out as well as show the limits of your lowest hardware.

What I like to call “The 3d Pile”.

Both CPU and GPU intensive , the 3D pile is mostly for linux. I don’t really use this anymore as it doesn’t apply to many, if any, of my machines. But the gist of it is to open a bunch of openGL games, GLTron in my case, and open as many of them as possible and run the demo in a loop. The last time I tested this was on a pentium 4 (Eve actually) and I got to 14 instances I think? I may have ran it on my Phenom 2 machine but it didn’t go much higher than that (9600PRO vs 250X).

Kdenlive Run:

Just like the benchmarks on any of the adobe suites, this is meant to show what the machine can do video editing wise. Generally I take a video with a camera as high def as possible, or a desktop recording actually, and roll that for 30 minutes. When I have that file, I’ll start adding as many effects as the system can handle and render the file out multiple times. The goal time for me is about 15 minutes at 45 filters and effects. For my hardware, this is a lot, especially at 1080p or 720p.

The VM Test:

I haven’t done this one because I can’t figure out GPU passthrough, but the test is the same as the tests above, just done through a passthrough VM of windows. This shows how well the system handles everything going on in the VM as well as how well the VM software runs. Again as I have not run this I cannot give an example, but imagine a can that you can add infinite preassure to inside of a bucket and if it touches the sides of the bucket then you really need to stop putting things in the can.

The VM Pile Test:

Basically VM-Ception, this test is to see how many VM’s you can run inside of each other. My record is 6 or 7 on my current Xeon machine. The goal is to just run as many VM’s as possible. The higher the better.

The Generic Tests:

Basically running the highest game possible on the system. Most of the time I use rocket league and CS go to see the FPS, but sometimes I don’t have that option (older systems) so I’ll run something like Wizard101 or second life where there is always shit everywhere and there could be 500 people on the screen.

I’ll list one more because I am getting bored.

The Minecraft Pile Test:

Download tekkit and start wiring computers up to each other. Then, start stuffing the giant system you made with commands. If java crashes, you know how many java requests can be made based on the amound of computers you built. I would recommend using IndustialCraft2 for this as Computercraft tends to be a bit difficult. With this, you can normally stack 8 computers on each other and it will register as one system. When you add more to each other it will just mirror your request. When you get to about 20 it starts to put some stress on the JVM. The more of those computers you can tack on, the more ridiculous stuff you can do with Java on your system.

I’ll have more later I’m just bored now so expect an edit.

4 Likes

I now expect a site benchmarking everything using these

I have like 40 more I just got really bored of writing.

1 Like

I'm so confused, what did I just read?

3 Likes

I dunno, thats your decision. I'm just here to laugh at you. haHAA

Back with this.

Built Ins: Often an OS in the "OtherOS" category has benchmarks that make sense for that system. I'm going to use Icaros Desktop as my main example here. Because Icaros is a constant build, always having very critical changes added, the benchmarks for the system not only show what works on the system, but if the build itself works completely. Some tests are:

Proc to Mem, Bus to Mem, Draw to Mem, Mem to GPU, HDD Bus Flush and Rewrite, and colorscale Draw.

Each of these are easy to tell what they do. Processor to Memory (pretty sure its just memtest86+ but its a memory test), Bus to Memory (read write between different buses [usb fsb etc] and somewhat a hardware read out as it goes down the list), Draw to Memory (runs a small pack of blender files and renders that are basically just a 3D ball or a couple of shapes and increases the polygons), Memory to GPU (if you have a laptop like a Dell Inspiron B120 or Inspiron 1000, an early macbook, or an Acer KAV10, this tests the intel GPU caching into ram and the vram the intel chip has), HDD Bus Flush (As Aros is the base system, it operates out of ram most of the time and only needs 256MB of space to do most anything. When it needs to load off of the hard drive it needs to do so efficiently. This test stops the system, parks the hdd, flushes the HDD bus whether its ATA or IDE [or scsi], and starts to write some data [I think its just the Icaros manual] through the bus to check that it reads everything properly and then again to test the speed), and lastly Colorscale (Tests the scale of color that the system can handle [ie adobe RGB and other standards]).

Thats not all the benches in Icaros, theres hundreds that it runs, but those are the ones I know off the top of my head. They are good to run on an older system, any system really, to know if the hardware all works as it should. If you really want the performance scores to be as high as possible you miss the point of the benches and Icaros in total (do as much as possible in as little of space, as well as the benches being used more as HW checks).

=================

Banished:

The game banished is not hard to run. As long as you have a CPU with SSE and MMX you can play banished. If you had a fast enough Pentium 3 you could even play it. So why use it as a benchmark?

Banished is what I use to test Wine and to see what is playable on laptops and desktops. If Banished at least runs at 45 FPS, then you move up to TF2, but thats later on on this doc. Banished needs a DX9 compatible chip (though can be run through DX8) and a processor that runs at 1.5 GHZ. An example of a good system Banished is a great test on is my NW8000 laptop (pentium M 2GHZ and a FireGL T2 [128MB vram]). With the old processor in the system (1.67 GHZ) the game ran at 5 FPS. The problem with this laptop (and any pentium M laptop for that matter) is that the processors clock down, so instead of running at 1.67 it ran at 1.3 or lower. So, to fix that you have to take the base processor out and get a bigger one. At 2 GHZ it only downclocks to 1.5 GHZ, BUT it runs Banished at 15FPS. The reason I do this is to show where the CPU will be limited and how well it will run full bore inside of that limitation.

And if you are wondering, just because the FireGL T2 is a modified ati 9600 clocked to the same speed as the 9700, that doesn't mean it can't handle the game. It runs the same at the highest quality as it does at the lowest.

===================

USB Pile:

Basically I put a USB in and DD an image. Depending on the USB bus and the controller, the USB may or may not even boot. I have had systems before that would scramble the image after a DD session but the same image on another computer worked fine. This test also shows how degraded the USB bus is. If I have 2 NW8000's, one can still be faster than the other USB wise. The target for me is 7MBPS. If it can hit that, I don't care. If it goes lower than that then the unit either will not be used for linux stuff or something is severely wrong with it.

Also if it proves to be good at it, I'll keep adding more USB's and DD'ing more ISO's all at once. Some systems do well with this, some lag. When I do this I also run a youtube video to watch when the system lags; normally a CPU monitor doesn't show what a DD session is doing to a system, at least not in windows so the youtube video is used instead (this is beneficial on older systems).

====

Will have more in a while I need to attend to something.

Back now.

Fraps Test:

If you're like me 4 years ago and want to start a YT channel but only have some very low end laptop that you can use, then you might wanna do this test first. FRAPS is known to be one of the best recorders in the industry and if you intend to play games and record them you might buy a FRAPS license. When I started my channel I had a Lenovo B575 (E450 APU) and not much else that was any more powerful. So to test, I put some game on, probably TF2, and left fraps to fill an external hard drive.

The recording out is what you want to know. If you hold the FPS mark you want, for me it was 25 to 30, then that part doesn't matter. Because of the APU that I had, each game had its own problems being recorded. If a game was too intense then the record out would be glitchy and textures would stick.... It was a mess. If it looks like what you expect, then its probably fine to run a YT channel off that turd you have.

===========

TF2:

Yup. This game runs on everything, sure, but its been known to be CPU intensive at times. For me I use this as a CPU and laptop test (my desktop can run this fine). You're just looking for an FPS mark. Lowest hardware I have used with TF2 and not had problems with it was a Pentium 4 (socket 478 2.8 GHZ) with 2 GB ram and a 9600 Pro. Ran great, actually. But different hardware configs have different results. TF2 Offline on my NW8000 runs great, online is shit.

===========

Uhhhh, running out now. Thats all of my main marks already listed, the rest are miniscule ones I barely do or don't matter, deprecated ones, or ones I don't often need to do because they are tooled the wrong way. Example

GIMP Renders:

Back when all I had was a Pentium M and a 2GHZ P4, I used gimp because I was in to photography. I also made signs for school clubs I was in because most kids at my school had access to the computers there and nothing at home. So, any time I got a new pentium 4 upgrade I did 5 tests to make sure I could do all that I needed to do when I needed to do it.

Scale, Transform, Blow Up, Shrink, Colorgrade.

I needed those tools no matter what, on top of a short render time. Often, because I was in XP, I didn't have the speed I normally would want. So it could take something like an hour to render a big enough image to be put on a banner, not to mention the little things up there that are listed. As long as those little blips could be done in under 7 minutes, it'd be great. Then, the render would need to be under an hour if possible (in HS I was constantly working on my grades so I didn't have much time to fool around most days, even Xmas break and other holidays I was working on an online class). Time is the point here, shorter = better. This test only applies to XP as a linux system would later do this stupid fast on the same hardware platform.

====

WGet Pulls:

At one point I didn't even have a pentium M. I had a IBM 600X with a 650MHZ processor and 386 (now maxed to 544) MB ram. It ran Xubuntu because it was the only system I could delete the audio stack from (the speakers would get a static shock and freak out or something. That has since stopped, but I still have no idea what caused it). If I had a school project I needed to download I had to use WGet or I was fucked.

If you know this laptop you know it has no built in wifi, so I had to scrape some pcmcia cards together. I would do a wget pull on a ubuntu iso or something, something over 1GB, and test each card for its speed. Some cards were faster than others. More download speed the better.

====

SystemD boot vs OpenRC:

This isn't used anymore but when I was playing with systemd initially I was used to OpenRC as my boot. This test was to figure out which would boot faster on the systems I had at the time, which was probably still that 600X. The goal was 1 minute, SystemD often took longer.

====

Cities Skylines:

I use this game as a bench on hackintoshes. As I only have a few of them, I don't do it often. Just an FPS test, but the game is intense enough to be valid.

====

Emulation:

Back in high school I didn't have many games that I owned for PC. Most from the library. But I did have emulators and ROMs. This test was for ZSNES most of the time. I would just see how accurate the emulation was really, nothing amazing. If the game ran smooth, had good sound, it was a good game to emulate. I later tested ePSXe and DeSmuME on the hardware I had running games like Spyro and Mario Kart DS (my DS died and I wanted to play mario kart lol).

====

I guess my last one I have that is of any use is Skype:

In HS and late MS, I used skype a lot. Talk to friends and teachers and family. If my system at the time could run skype, then it was probably a good system for most things. I checked for call quality and later video quality when I had a camera.

===========================

So thats about it. Most of these are tests for more low end than high end, but they are still valid I think. Most people will just say "Buy a new thing" but often people can't and they scrap. So, these are for those people.

If you use these for your machine, I wish you luck.

2 Likes

Ponies :D

I love it, all pc's should now have a pony mark.

1 Like

Mine can do 45, so if you have a better processor, in theory, you should be able to have more ponies.

1 Like

410 ponys on screen and no lag 8350 you are a boss.

2 Likes

I had something come up recently that I thought was interesting and may include. I found Vector linux, a distro based off slackware that uses LiLo and openRC (all of my yes) that boots into a cli mode and tells you your top10 apps AND commands after login. So, with that, it might make for a neat low end bencher for the kernel boot and load. As soon as you are logged in on the CLI and it lists the top10's the timer can be stopped, and it starts after LiLo selection.

I dunno, popped into my head today though.

I'm adding to the list.

F@H:

Using this as a bench because it, like the pony test, will allow me to compare numbers directly. If chips are similar then performing numbers will be around the same.

will edit later I have more written down it seems I have misplaced the sheet.

Adding a new one thanks to @chiefshane

1 Like