Plumbum for shell scripting in Python

I never was a fan of scripting, and unless it was a real edge case, I was always ending up refactoring my needs based on existing tools instead of scripting. Why?

Simply said, because as soon as I finish writing a Bash script, I quickly forget what the mess was about, and if I needed a refactor sometime later, I always end up with too many WTF/m.


Yes, I realize how cool it is, but in 2022 I prefer to use my mental capacity on making something valuable than deciphering my novice bash skills 🙂

I learned about Babashka, the scripting tool for Clojure developers, a while back. I found it interesting to have the full power of a programming language within your shell scripts. I gave it a try yesterday and found myself some reasons not to pick it up. The reasons are:

  1. Shell outs (using applications in your PATH) are not as smooth as a regular Bash script.
  2. Things we like about shell scripts, like piping, are not as straightforward as they seem. There are thread macros, yet it is not easy to utilize it in every scenario (there are better ways, more Clojure friendly alternatives).
  3. Clojure itself is a bizarre tool for system administration. Not a bad thing, but I am not much of a Clojure developer “yet.”
  4. Available libraries for system management are not as vast as I know in Python.

Wait a minute, Python? Why not Python? At least it is the second-best citizen on any Unix system nowadays (unlike Babashka, which needs to be installed separately). I once switched to Xonsh some years ago (oh man, it has been six years). I used Python’s subprocess, popen and sh from time to time, but never was encouraged to switch to it altogether.

Influenced by my questions, I tried to search and found Plumbum. I wouldn’t say I like to showcase by duplicating the excellent documentation. Still, unlike other alternatives, I want to say that it’s trying to solve the right problem in a novel way. There are still some shortcomings, like being an external dependency, meaning the scripts won’t be as plug and play as we like them to be, but I thought maybe those kinds of issues could be resolved with some package management.

So I mixed it with Poetry, and my script project was born 😀. I expect to write more about my experience in this area, as I find it interesting. The power of scripting is helpful, but I wonder if we can improve it to catch up with the good practices I like to follow in software development. At least those who contribute have less WTF/day.

Tool Gut: Kafka Message Journey


In this article, I will review how a message travels through its journey in the Kafka ecosystem. If you need a refresher on the core concepts of Kafka, you can refer to this blog post. Also, for more information on any of the concepts covered in this article, I would suggest referring to either of the following resources:

The Path

In a Kafka cluster, there are a couple of components that need to be initialized before the journey begins. Of course, this is something handled by Kafka, but they are necessary, so we are going to review them:


This component is responsible for the coordination of assigning partitions and data replicas. The controller will be selected in a process called Kafka Controller Election (more details on how the process works, can be found in this blog post).

In short, all of the brokers initialized will try to register themselves as the controller on the cluster start. Since only one controller can exist on the cluster by design, the first one who succeeds will become the controller, and the rest will watch. When the controller goes down for whatever reason, Zookeeper will inform the watchers, and the re-election process will be triggered (just like the first one).


They are the primary concurrency mechanism in Kafka to enable producers and consumers to scale horizontally (more info). Controllers allocate these partitions over available brokers based on the configuration defined for each topic using the --partitions argument.


They are the primary fault-tolerance mechanism in Kafka. Replicas are created by the controller component based on the configuration defined for each topic using the --replication-factor argument.

Lead Replica

Shown in orange diamond in the picture above (with “L”), these are the primary replica for each topic responsible for handling read and write requests. They’ll get selected in a process called Kafka Lead Replicat Election (this blog post, has an excellent in-depth overview on the process).

For each topic, Kafka tracks a factor named In-Sync Replica (or ISR for short), which indicates the number of replicas reflecting the latest status. When the lead replica goes down, the next in-sync replica will be selected, and if there is no in-sync replica to choose from, Kafka will wait (accepts no write action) until one such replica gets booted. There is a configuration called unclean-leader-election, which, when enabled, allows Kafka to use any non-sync replication when such a state happens to continue the consumption process.


Now let’s review the message journey (you can use the image in the start of the article to help with visualizing the process):

  1. Producer publishes the message to the cluster.
  2. The target partition will be selected in a round-robin fashion.
  3. The message will be appended to the end of the lead replica in the selected partition, and a unique ID (offset) will be assigned to the message.
  4. The replication mechanism will also create copies of data into the other replicas (if defined based on the replication factor).
  5. During the lifetime message on the topic, any consumer active on the consumer groups will consume the message if it’s not already consumed by other consumers active on that group. Note: This is only effective if the consumer uses a consumer group; otherwise, the consumer is responsible for tracking the offsets.
  6. When the retention period configuration of the topic exceeds, the message will automatically get deleted from the topic. Obvious fact: If no consumer has consumed the message, it’s lost forever!

Using data to improve software engineering

Whenever working on a new source code, I find it quite cumbersome to grow to enough understanding around it to be productive with it both in terms of security and speed of delivery. This problem can especially get challenging given different factors like:

  1. Lack of documentation around the source code or architecture.
  2. Unknown external dependencies relying on the current project to function as a system.
  3. Different developers contribute to the project without a unified coding style/conventions.

Over time, I’ve grown some instincts about understanding smaller software projects with a list of questions related to their function and trying to answer them one by one by interrogating the git history.

My method, of course, is not only helpful with the code written by others but also helps me to understand what I was thinking about a couple of weeks ago, developing a piece myself. However, this approach is relatively harder to apply when the project is more prominent in size.

Searching around this topic, I stumbled upon the following video by Adam Tornhill, which describes a method as an answer to this problem with the potential to expand it to other areas related to software development like organizational level concerns.

He also has a book on the same subject called Your Code as a Crime Scene, I enjoyed his presentation, and yes found the next interesting book I’m going to read 😉