“Success is walking from failure to failure with no loss of enthusiasm.” — Winston Churchill

Welp… this is gonna be the sixth post I’ve had the word welp in it.

Apparently I like the word welp. It works well for me.

I meant to post a post about Foreman today. There’s no fucking way that’s happening. My schedule has been packed today, I’ve had no opportunity to write, and constant context shifting has made things difficult.

Like living.

But, not only do I still need to continue to write it, I also have less notes to go through!

Sort of…

I do have lots of text to go through and pull snippets from, but I realized that I shared quite a bit of what I’m writing in a meeting I setup a while back to teach what I’m writing.

Fortunately, I’d the foresight to record it, so now I have a transcript I can use as a base to whittle words away from. Which is great! That means I have a lil less work to do.

I love doing less work. Less work work that is. Not less fun work. Fun work is fun. Less fun work is less fun.

One day, when I worked at the porn company, a guy named Fabian said to me, “Tim, if you make yourself replaceable then no one will want to replace you.”

I was 20 when I was told that.

Working in a more senior position within the midst of a prolonged existential crisis has me recognizing just what he meant at the time. Wise words, even if he was a fucking dick.

Unfortunately his emotional intelligence was very tiny. And his communication style can arguably be considered abusive. Which is something that I know now, because I have soooooo many e-mails from way back when. My mailbox goes aaaaaaall the way back to 1999. I only recently started going back to them. It’s wild.

Back to the transcript… let’s do some math! We’ll be using the tool wc to learn about this text.

$ wc transcript.txt
3382   18356  111050 transcript.txt

It’s 3,382 lines long, 18,356 words, and 11050 characters. But that includes both names and times. So let’s get rid of those using grep . But first let’s see how many presumed participants there are.

We’ll assume that a participant is a line that begins with a capital letter, followed by an unlimited number of letters, then a space, then a capital letter, and an unlimited number of letters all the way to the end of the line.

$ grep -cE '^[A-Z]\\w\+ [A-Z]\\w\+$' transcript.txt
1127

One-thousand, one-hundred, and twenty-seven lines that seem like they be like names. So let’s see if things are accurate… and count the number of lines. And lob off last names. And use sort , uniq , and awk , to massage some bits.

$ grep -E '^[A-Z]\\w\+ [A-Z]\\w\+$' transcript.txt | sort | uniq -c | awk '\{print $1 " " $2\}' | sort -n
4 Josef
12 Matthew
22 Chris
36 Alison
1053 Tim

Nice.

Now I know my RegEx works.

We still have a bunch of lines with timecode info that we need to remove. Let’s see if another RegEx that matches any number of numbers that are at the beginning of a line.

$ grep -E '^\\d\+' transcript.txt | head
0:0:0.0 --> 0:0:0.700
0:0:-3.-400 --> 0:0:1.210
0:0:1.670 --> 0:0:2.390
0:0:2.400 --> 0:0:3.300
0:0:3.570 --> 0:0:6.440
0:0:5.570 --> 0:0:7.140
0:0:9.20 --> 0:0:10.530
0:0:10.400 --> 0:0:11.950
0:0:11.960 --> 0:0:13.90
0:0:15.970 --> 0:0:16.700

Seems like timecode. Let’s find out how many lines there are.

$ grep -cE '^\\d\+' transcript.txt
1128

One-thousand, one-hundred, and twenty-eight lines of timecode.

We have a better idea of what we gotta get rid of. Let’s do it.

$ grep -vE '^[A-Z]\\w\+ [A-Z]\\w\+$' transcript.txt | grep -vE '^\\d\+' | head -16
Killed the camera.
You understand people that aren't here, and this is like it.
Exactly.
Kill the camera.
We were recording exactly.
I'm still gonna chew on the mic, though.
I mean, OK.
That's direct.
Instead of chewing your food.
Doing the mic.
OK.
Let's see what you can't hear me laughing at your jokes.
OK.
OK, bye.
Who is Mike please, Mike.
Alright, I am keeping my mouth shut, so the whole purpose behind here is to talk about Foreman.

Nice. Names and timecode have successfully been stripped out. There are a few questions that are asked during the training, but they didn’t consist of that many words.

Now, for the word count.

$ grep -vE '^[A-Z]\\w\+ [A-Z]\\w\+$' transcript.txt | grep -vE '^\\d\+' | wc -w
12715

Twelve-thousand, seven-hundred, and fifteen words

That’s a lot of words that I now have the pleasure of pruning, editing, and making mine nice and generically before making it business-like for others during the day.

Some day I’m going to make entering posts so much easier for myself.

Some day.

Markdown my words.