Possibly simple C problem in logrotate

The other day I ran into this issue with logrotate:

https://github.com/logrotate/logrotate/issues/105

I am a sysadmin, not a programmer and haven’t touched any C code in years. However solving this would provide a great improvement in the domain joined linux machines in my environment.

It seems like a simple enough solution for a C dev. I looked into it and found that it would require using something other than sscanf to parse the line.

I looks like it isn’t a high priority on the GitHub page, so I am sharing this here in case someone familiar with C is willing to take this on and contribute.

This is probably a bad idea:

int func(int **str, int **u, int **g)
{
	int rc = 0;
	int u_location = 0;
	int g_location = 0;

	bool ignore_space = false;
	int j = 0; // # of string
	for (int i = 0; ; i++) {
		switch ((*str)[i]) {
			case NULL: {
				if (j != 1)
					rc = 1;
				goto out;
			}
			case '\n': {
				if (j != 1)
					rc = 1;
				goto out;
			}

			case ' ':
				if (ignore_space == false) {
					if (j == 0) {
						(*u)[u_location] = '\0';
						j++;
						break;
					}
				}
				// else alow it to fall through
			
			default:
				if (!isprint((*str)[i])) {
					rc = 1;
					goto out;
				}
				if ((*str)[i] == '"') {
					ignore_space = !ignore_space;				
				}
				if (j == 0) {
					(*u)[u_location] = str[i];
					u_location++;
				}
				else if (j == 1) {
					(*g)[g_location] = str[i];
					g_location++;
				} else {
					rc = 1;
					goto out;
				}
		}
	}
out:
	if (j == 0)
		(*u)[u_location] = '\0';
	else if (j == 1)
		(*g)[g_location] = '\0';

	return rc;
}

So, fair bit of warning here, text extraction in C is pretty bad (even though scanf and it’s cousins needs to die in a fire). As for this specific usage of it, hmmm…

If I understand you correctly, the problem is that you have a space in your user/group and this is not recognized? This seems to be the failing logic:

/* Will fail on this config: create 0777 "my user" "my group"*/

static int readModeUidGid(const char *configFile, int lineNum, char *key,
			  const char *directive, mode_t *mode, uid_t *pUid,
			  gid_t *pGid)
{
	char u[200], g[200];
	unsigned int m;
	char tmp;
	int rc;

	if (!strcmp("su", directive))
	    /* do not read <mode> for the 'su' directive */
	    rc = 0;
	else
	    rc = sscanf(key, "%o %199s %199s%c", &m, u, g, &tmp);

	/* We support 'key <owner> <group> notation now */
	if (rc == 0) {
		rc = sscanf(key, "%199s %199s%c", u, g, &tmp);
		/* Simulate that we have read mode and keep the default value. */
		if (rc > 0) {
			m = *mode;
			rc += 1;
		}
	}

	if (rc == 4) {
		message(MESS_ERROR, "%s:%d extra arguments for "
			"%s\n", configFile, lineNum, directive);
		return -1;
	}

	if (rc > 0) {
		*mode = m;
	}

	if (rc > 1) {
		if (resolveUid(u, pUid) != 0) {
			message(MESS_ERROR, "%s:%d unknown user '%s'\n",
				configFile, lineNum, u);
			return -1;
		}
	}
	if (rc > 2) {
		if (resolveGid(g, pGid) != 0) {
			message(MESS_ERROR, "%s:%d unknown group '%s'\n",
				configFile, lineNum, g);
			return -1;
		}
	}

	return 0;
}

The problem here is that instead of getting owner=my owner and group=my group you get owner="my and group=owner", yes? This in turn results in one of three error messages:

myConfig:42 extra arguments for 'create'
myConfig:42 unknown user '"my'
myConfig:42 unknown group 'user"'

Simple workaround; never, ever use whitespace in usernames or groupnames. I assume this is not an option here though.

I would recode this entire function to use a custom char-by-char parser that breaks out each component into three separate strings in this case. Something like:

char conf_mode[10] = { 0 };
char conf_user[200] = { 0 };
char conf_group[200] = { 0 };
char param = 0;
char quoted = 0;
char offset = 0;

for (i = 0; key[i] != 0; i++) {
    if (key[i] == '"') {
        quoted = (quoted) ? 0 : 1;    // Toggle
        continue;                     // Do not copy
    }
    if (key[i] == ' '  && !quoted) {
        param += 1;
        offset = i;
        continue;
    }
    switch(param) {
        case 0: conf_mode[i-offset] = key[i]; break;
        case 1: conf_user[i-offset] = key[i]; break;
        case 2: conf_group[i-offset] = key[i]; break;
        default: key[i+1] = 0; break;    // Exit loop, ugly version
    }
}

[...]

// Put parameter sanity checks here

return 0;

So the issue is that LDAP/AD allows for a space in group names and I would like to implement a logrotate definition using one of these group names on our domain-joined Linux servers

I would estimate this to be a 100 hour job to implement properly for a medium skill C programmer (1-2 years as a pro). By “properly” I mean code of sufficient quality to upstream.

A rough breakdown is 10-20 hours to reliably trigger the bug and build test cases, 30-40 hours to reimplement the readModeUidGid function, 5 hours to make a proper patch and then 40 hours arguing with upstream that this will not break anything, get proven wrong and resubmit.

Use this info as you sant.

40 hours arguing with upstream that this will not break anything, get proven wrong and resubmit.

I hate how right you are about this part

A “more flexible parser” could be relatively simple

First replace a single call to sscanf with 1 for each of user, group, tmp.

Then check if last character is \ and of yes, call scanf again for good measure, to concatenate the strings.

However, then you’ll get into a fight with backslash using people complaining…
… There’s certainly a mathematical possibility to use repeated/double spaces as single spaces… or to url encode the user and group.

…so …

… a simple parser as suggested might just be an unescape function

e.g. introduce a new function maybe_scan_quoted, that calls a strncpy in case the string doesn’t start with " and returns the same number of characters read as scanf would (maintaining the scanf interface somewhat).

Then introduce another function scan_quoted(char* src, char* dst, int dstlen) that uses a while dstlen>0 loop with:
strcspn to find some delimiter (e.g. " or \)
if found then strncpy everything up to at most dstlen characters and then depending on the delimiter, if \, then copy character after, and if " copy \0 and you’re done, return the number of characters consumed…
… and if strcspn doesn’t find a delimiter, then you have an error return -1

This is very low level, lots of of line of code, looks like lots going on from a distance but not lots of CPU instructions and still not hard/overly smart. Parsers tend to be like that.

There’s lots of opportunity for off by one errors, it’s fidgety code / fun to write perfect for junior programmers or someone to get their workflow setup, and introduces no reliance on third party libraries.

(it’s horrible that we have config parsers everywhere; some stuff is old)