Perl / sed / grep regex trouble (didn't use Python)

Hi,

I'm trying to make a regex to return only a certain part of an entry. Please consider the following two entries

BIO-2140-01 (54321) Foundations of Biology
IDS-3150-F2-02 (12345) History of Everything

I'd like to return the numbers in parenthesis only - so 54321 and 12345

So far my approach as been search for everything that isn't 5 consecutive digits and replace that are with ''.

I've tried this regex s/[^\d\d\d\d\d]//g and s/[^\s(\d{5})]//g
Add this one to this mix that doesn't work s/[^\b(\d{5})\b]//g;

First one will return the all numbers squished ie 21400154321, the second on returns 214001 (54321).

Any thoughts on just grabbing the digits in between the parens?

1 Like

Haven't used perl in while but the regex for capturing the content of parentheses is

\(([^\)]+)\)

to fine tune the regex I recommend that you use regex101.com or regexr.com

1 Like

Also you might find it handy to include the capture group (e.g. $1 and so on). Then if you want you could include all three parts as it becomes alot easier as only one regex is needed to capture the three parts that you probably will need.

But as nakamura pointed out always use a regex tester before trying anything as it will save you so much hassle

Thanks @nakamura - that matches it.

@Ion I'm trying to figure out how to use the capture group but no luck. I've tried

$var1 = "BIO-1200-01 (12345) Some Course Name =~

s/(\(([^\)]+)\))/$1/;

No luck... any advice - I'd like to set a variable = to that matched result.

step 1) Abandon Perl.. run for your life to Python.

I'd be willing to consider it, but how would you solve this using python - I'm used to sed (which is very close to perl). Python's regex requires me to know both regex and python - perl pretty much you just need to know regex.

There I changed the title to willing to consider Python - I'm language agnostic - as long as it can get the job done.

Python is incredibly readable and quick to pick up compared to perl.

I would use the regular expressions library. It's how you parse input using re.

https://docs.python.org/2/library/re.html

Thanks I'll check that out. Unfortunately, I still don't conceptually understand how to assign a variable to only the matching part of a regex from a string... nor can seem to come up with a solid google search which helps me understand.

Stuck with perl - here's the nasty regex

($courseID) = ($course =~ /[^(]+\(([^)]+)\)/)

omg my eyes!!!! IT HURTS!

1 Like

me@my-PC ~
$ echo "BIO-2140-01 (54321)" | perl -ne 'm/\(([^)]+)\)/; print $1 . "\n";'
54321
me@my-PC ~
$ echo "IDS-3150-F2-02 (12345)" | perl -ne 'm/\(([^)]+)\)/; print $1 . "\n";'
12345

me@my-PC ~
$ cat test.txt
BIO-2140-01 (54321) Foundations of Biology
IDS-3150-F2-02 (12345) History of Everything
me@my-PC ~
$ cat test.pl
#!/usr/bin/perl
my $fn = 'test.txt';
open(my $fh, '<', $fn);
while(my $line = <$fh>){
        chomp($line);
        if ($line =~ m/\(([^)]+)\)/) {
                print $1 . "\n";
        }
}
me@my-PC ~
$ ./test.pl
54321
12345
2 Likes

@cotton @reikoshea example is what you want, but just to help you understand capture groups see code below.

#!/usr/bin/perl
# n2 - extract forename and surname

print "please enter your name ";
chop ($name = <STDIN>);

if ($name =~ /^\s*(\S+)\s+(\S+)\s*$/) {
 print "Hi $1. Your Surname is $2.";
} else {
 print "no match";
}
print "\n";
1 Like

break it up into groups... 's/(\w{3})-(\d{4})-(\d{2}) \(\d{5}\)/$1-$2-$3-/g' each group is described by a $ and number so you basically remove the last 5 numbers and parenthesis. Hope this helps. Will work with Perl rename @cotton

rename -n 's/(\w{3})-(\d{4})-(\d{2}) \(\d{5}\)/$4/g'
this will give you the numbers in parenthesis only.

1 Like

Ok honestly - what the heck, are you guys seeing this???? If I show that to any "normal" person and told them it actually means something they'd tell me I'm crazy. I mean for gosh sakes - there's not even a number or letter in that thing.

1 Like

It's speaks to us !

1 Like

That's why we get the big bucks.