Galatia

Do you recall back in 2023 where I mentioned failing my C programming class in college? So long ago! Going into this holiday break (a two week vacation for me), I got bored and picked up the old source code and input file and I finished the assignment…on the 30th anniversary of my Incomplete.

Assignment:
Based on knowledge learned through the semester with file management, text processing, memory allocation, data structures, B-Trees, linked-lists, and so on, write a program that can take a text file representing a book of the bible and produce a concordance of the important words, listing each word in alphabetical order with a list of references for each word in book/chapter/verse order. Extra credit if you include a parser to stem the words (instead of “write”, “writes”, “wrote”, you get “write”).

My book was actually Galatians, not Ephesians like I recalled earlier, but whatever.

Input format (sample snippet, no newlines):

@$GAL@ 01:01 Paul, an apostle - sent not from men nor by man, but by Jesus Christ and God the Father, who raised him from the dead -

My old code was written for Borland C on Windows 3 to be run on the command line. That’s how old this code is. It had decent bones, and I gave it an honest try in 1994, but I just couldn’t get the bones to stick together. Something-something about my obvious misunderstanding of fundamentals like pointers, recursion, source file flow, something-something.

Once I got the gcc build chain up on my Linux box and got VSCode going, I tried building what I had. There were so many dependency issues and syntax errors, I moved everything aside and rebuilt the code from the ground up, using old pieces to build new files by the lessons I learned a decade ago when I built my JX3P Tape Dump Decoder tool (also written in C).

20 years of professional and hobby code development has taught me so much more than I ever could’ve grokked in 4 months at that thumb-headed age of 22.

The new code did it proper:

  • makefile with real and .phony targets (all, clean)
  • *.c source files under ./src/
  • *.h headers under ./include/
  • *.o object files under ./obj/
  • source management with git

Invocation, with explicit source and destination files (can also read from stdin and print to stdout for piping):

user@host:~/concordance$ ./concordance books/galatians.txt final-gal.txt

Sample output from final-gal.txt:

gained         GAL 2:21
Galatia        GAL 1:2
Galatians      GAL 3:1
gave           GAL 1:4, 2:9, 2:20, 3:18
Gentile        GAL 2:14, 2:15
Gentiles       GAL 1:16, 2:2, 2:7, 2:8, 2:9, 2:12, 2:12, 2:14, 3:8, 3:14
gentleness     GAL 5:23
gently         GAL 6:1
get            GAL 1:18, 4:30
give           GAL 2:5, 3:5, 6:9
given          GAL 2:9, 3:14, 3:21, 3:22, 3:22, 4:15
glad           GAL 4:27
glory          GAL 1:5
go             GAL 1:17, 2:9, 5:12
goal           GAL 3:3
God            GAL 1:1, 1:3, 1:4, 1:10, 1:13, 1:15, 1:20, 1:24, 2:6, 2:8, 2:19, 2:20, 2:21, 3:5, 3:6, 3:8, 3:11, 3:17, 3:18, 3:20, 3:21, 3:26, 4:4, 4:6, 4:7, 4:8, 4:9, 4:9, 4:14, 5:21, 6:7, 6:16
gods           GAL 4:8
good           GAL 4:17, 4:18, 5:7, 6:6, 6:9, 6:10, 6:12
goodness       GAL 5:22
gospel         GAL 1:6, 1:7, 1:7, 1:8, 1:9, 1:11, 2:2, 2:5, 2:7, 2:14, 3:8, 4:13
grace          GAL 1:3, 1:6, 1:15, 2:9, 2:21, 3:18, 5:4, 6:18
gratify        GAL 5:16
Greek          GAL 2:3, 3:28
group          GAL 2:12
guardians      GAL 4:2

As an aside, I grabbed the entire bible from Gutenberg.org and modified the formatting to fit the concordance parser, and — hoo-boy — it took 5 minutes for a single thread to chew through that 4MB text file to produce a 3MB concordance. Mighty. Just look at that sample output:

account        1CH 27:24; 2CH 26:11; JOB 33:13; PSA 144:3; ECC 7:27; MAT 12:36, 18:23; LUK 16:2; ACT 19:40; ROM 14:12; 1CO 4:1; PHI 1:18, 4:17; HEB 13:17; 1PE 4:5; 2PE 3:15; 2KA 12:4
accounted      DEU 2:11, 2:20; 1KI 10:21; 2CH 9:20; PSA 22:30; ISA 2:22; MAR 10:42; LUK 20:35, 21:36, 22:24; ROM 8:36; GAL 3:6
Accounting     HEB 11:19
accounts       DAN 6:2
accursed       DEU 21:23; JOS 6:17, 6:18, 6:18, 6:18, 7:1, 7:1, 7:11, 7:12, 7:12, 7:13, 7:13, 7:15, 22:20; 1CH 2:7; ISA 65:20; ROM 9:3; 1CO 12:3; GAL 1:8, 1:9
accusation     JUD 1:9; EZR 4:6; MAT 27:37; MAR 15:26; LUK 6:7, 19:8; JOH 18:29; ACT 25:18; 1TI 5:19; 2PE 2:11
accuse         PRO 30:10; MAT 12:10; MAR 3:2; LUK 3:14, 11:54, 23:2, 23:14; JOH 5:45, 8:6; ACT 24:2, 24:8, 24:13, 25:5, 25:11, 28:19; 1PE 3:16

I included code to preserve capitalization if words appear to be known names, and bring capitalized words to lowercase of they’re seen in lowercase elsewhere (useful for words first seen at the start of sentences). I didn’t include any stemming code, so no extra credit; the parser is naive. And I did cheat a little and use the string search/case/copy methods available in the gcc stdlib, and I don’t feel guilty about it. But I did write the recursive B-Tree and linked list code from scratch, so there’s that.

I won’t be posting the code. I’m proud of it and happy it works, and it’s clean and neat, but I’m not a fan of public git repo sites (especAIlly now). And tarballs seem excessive for how silly this project is.

I still chafe that Dr. H made us do this with a book of the bible but, honestly, it’s an interesting project with other applications. I tested myself and believe I would’ve passed had I enough experience and patience.

So take that, Doctor H! I hope you’re doing well, wherever you are.

Endlich

Took my Deutsch I final this morning. I don’t believe I did well. Probably 50%. It was oral with another student over Zoom. We each were given separate facts about a fictitious person that we had to ask questions about, and we had to answer in full sentences im Deutsch. My vocabulary is bad, because I didn’t do vocabulary drills.

Honestly, this return to school thing, as continuing education for fun, was meta-educational. I dropped out of school after 11 semesters in 1995 and so much about education has changed. Everything’s, like, computers now. Here are some lessons:

  • Your first battle is eating the layer cake that is your local community college’s registration process and activating all of the disparate IT systems that have been slapped together like a ball of mud. Each one was a bad solution to a misunderstood problem, and now somehow they’re interconnected and dependent on expensive public-private partnerships. The guy that wrote the one infrastructure tool that binds it all together retired or died 7 years ago. He used to teach the HTML class.
  • Your second battle is learning Blackboard, which is the LMS that ACC uses. Your college may be different. Bb is terrible. They’re all terrible. My buddy who is a PhD in EdTech says they’re all terrible. I believe him.
  • Your teacher uses a textbook not sold by the campus bookstore, and it will have workbooks online. You will have to create an account at the outside vendor who publishes the textbook just so you can read it and do homework. This will cost 50% more than cost for the class, and your college catalog won’t mention this.
  • They will not offer a PDF copy of the textbook. It will be a poorly-rendered electronic copy, in a browser tab immune from scraping, of a book designed for paper. Matter of fact, you have to pay extra for a paper copy of the textbook.
  • Always buy the paper copy of the textbook, or you might not be able to read the textbook.
  • Unfortunately, your paper copy of the textbook is loose leaf laser-printed sheets with 3-ring punched holes. It is not a bound textbook. You provide the binder.
  • The printed margins of the book mean you have to use bulky 3-ring binders that take up 30% more coffee shop table space instead of slim paper folders because otherwise you wouldn’t see the print close to the spine.
  • Don’t study in coffee shops. Space is at a premium, and the WiFi is shit.
  • Your third battle is learning how the online workbook works. You will get the first 2 exercises wrong. Accept your fate.
  • The online workbook is terrible. Sometimes you will have 3 tries to get your exercise right. Sometimes you will have 1. And you can’t go back and forth between exercises once you “submit” them, so it’s like playing a video game where you’re on the rails. Nothing says, “Open world games suck and should never be allowed” like higher learning.
  • If you have 3 tries to get it right, your homework score will be amazing. And it will be a wrong assessment of your actual skill. You actually have to go out of your way (“yes, I want to submit this wrong lesson”, “no, yes, I’m sure.”) to submit wrong lessons just so you can track your actual progression.
  • Your inflated homework score, weighted, will give you an inflated sense of ego, and you will be wrong.
  • If you can’t play a video because your laptop’s browser couldn’t download and render it because of WiFi or browser glitch, tough shit. Don’t click “submit”. Retry it at home.
  • If you have the option to “ask the teacher” about a technical issue, do it just so they know you’re having technical issues, but expect them to bring it up in the next Zoom class session in front of God and everybody.
  • Don’t click “Refresh” on a page if it times-out on a submission because of shitty WiFi, or you will burn one of your 3 attempts and fail the lesson.
  • Never use a Chromebook for class work. Never. Chrome on ChromeOS tricks the textbook LMS into thinking you’re on a mobile device, and it’ll render the page wrong, and your grades will suffer because you couldn’t click the dropdown options that don’t render because the Javascript thinks you don’t have enough screen resolution to draw it. And your touch screen will act stupid. And it’s underpowered for what you need. And it’ll drop connections to your Bluetooth earbuds randomly (and your wired earbuds will fall out of your ears).
  • Once you login to your Chromebook with your school’s GMail-For-Schools account, you lose all admin access on your Chromebook to install real apps on that account (“Your Chromebook is administered by your Organization”). It’s a paperweight after that point, because everything will have to be a Chrome extension webapp, and they all suck.
  • You can use an Android/IOS tablet to do your homework, but you can’t record your voice for the random surprise speaking lessons. It’ll be digital noise on playback, and your professor will hate you.
  • Forget using an physical keyboard setting for umlauts and special characters. I enabled that on my Chromebook, and got dinged because my supposed Eszett symbol (ß) was marked as incorrect because it was actually entering a Beta symbol (β). Unicode is hard. Use the LMS Javascript character picker, or get a “QWERTZ” German keyboard with an AltGr key for special characters.
  • Javascript character pickers are stupid.
  • Use a real laptop, because you’ll randomly be asked by the professor to record a Zoom session, because your semester final might be a recorded Zoom session with another student and you’ll have to practice by submitting files generated by the Zoom app.
  • Your university/college email will be spam central, and all the spam is from the Chancellor’s Office, the bookstore, or security. Even if your class is 100% online, you’ll get notifications that campus police have closed a site because of suspicious activity in the parking lot.
  • Pay the parking fee, even if you won’t be parking on campus. It’s cheaper than a ticket on your student bill.
  • Your campus bookstore only generates spam, because it’s in the business of making money on a captive market. File it accordingly.
  • Do your vocabulary drills. Learn the gender of the nouns. Don’t be ein dumbkopf. Do your vocabulary drills.

Honestly, I’m doubting that I’ll sign up for Deutsch II. I feel like I should, just to be a completist, but let me be real: my stupid fantasy of living/retiring in Germany is dead. Come January 20, every country in the western world will be closing their doors to American expats. My language skills are shit, I don’t have a college Bachelor’s degree, and Germany is already full of tech-savvy engineers. I have nothing special to offer.

I learned more about my shortcomings this semester than I did about the language. There’s some soul-searching happening, and I’m in the middle of it. Let me never reach the end.

Meine Treppe wartet bis pro Jahr heir. (and I probably got that wrong, too.)

Ejecta

Dreamed this morning again of having to move out of my last apartment (the one I was in for nine years before developers kicked us out). This dream, I wasn’t the last to move out, but I did see other people throwing a party in the pool even though the demolition crew was on site. Power was still on. My old carpet still had marks from the vacuum cleaner, though I knew it was all getting torn down (the balcony was in an active state of decay). F’n weird. I guess some concepts just stick in my mind. Wish I knew why it keeps flashing back.

FreshRSS and Fail2ban

Based on my prior installation success with FreshRSS, I felt it was necessary to put it behind a fail2ban filter to drop inbound traffic from abusive external IPs.

My FreshRSS installation is using the simple tarball on top of webroot method; I’m not using containers or anything. It’s all bare metal, so it’s piggybacking off of the same Apache server that’s serving other sites.

For my own notes, and for your edification, here are my fail2ban configs:

/etc/fail2ban/jail.d/freshrss.local :

[freshrss]
enabled = true
port = 80,443
protocol = tcp
filter = freshrss
maxretry = 3
bantime = 10800
logpath = /var/log/apache2/ssl_access.log

/etc/fail2ban/filter.d/freshrss.local :

[Definition]
failregex = ^<HOST> .+\" 401 \d+ .*$
ignoreregex=

This filter combs the log for anything that looks like an HTTP 401 error. Admittedly, this filter will catch 401’s for all the other sites on this server, but let’s be realistic: that’s a good thing.

# SAMPLE APACHE LOG
# 172.31.13.13 - - [01/Dec/2024:16:27:11 -0600] "POST /freshrss/api/greader.php/accounts/ClientLogin?Email=asdf&Passwd=uuuuu HTTP/1.1" 401 670 "-" "FeedMe/3.16 (com.seazon.feedme; build:206; Android SDK 34)"

Save these files, do systemctl reload fail2ban.service so it’ll pick up the new jail. Try a few bad login attempts and voila.

root@server:/etc/fail2ban/filter.d# fail2ban-client status freshrss
Status for the jail: freshrss
|- Filter
|  |- Currently failed:	1
|  |- Total failed:	4
|  `- File list:	/var/log/apache2/ssl_access.log
`- Actions
   |- Currently banned:	1
   |- Total banned:	1
   `- Banned IP list:	172.31.13.13

Additionally, you can test your fail2ban filter regex during development with:

fail2ban-regex /var/log/apache2/ssl_access.log /etc/fail2ban/filter.d/freshrss.local

I somehow feel…safer. It’s the little things.