What CS courses won’t teach you

My current job is in a large company with a lot of different business: it’s a bank, a national postal service, a large insurance company, it even has airplanes and a huge logistic network.

Image for post
Image for post
By Mike Beauregard from Nunavut, Canada (Stripes) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

Giving in a single glimpse into the IT infrastructure of the company is like looking at sedimentary rocks: the result of tens of years of deposition of the remains of design decisions that have been cutting edge when they have been took and that nobody cared to actually remove when they were not anymore efficient or effective.

In this short post I will summarize what I’ve learned in about five years of designing and implementing pieces of software within such infrastructure. Most of the lessons are something that you can hardly find in university courses. Even if it would be quite handy.

Real time computing means batch computing

I have been able to measure a gradient of the concept of real time and none of the degree is actually taught in a university:

  1. Batch — Data moved around/ingested/elaborated in nightly/daily/weekly batches that might last hours or even days;
  2. Near real time — Data are moved around/ingested/elaborated in nightly/daily batches that might last hours, you have to try to avoid overlapping between two consecutive runs;
  3. Real time — means the data must be moved around/ingested/elaborated in micro-batches every hour or two.

None of these concepts actually matches with the concept of real time computing I have studied in Distributed Algorithms course.

If your software is not working there is checklist of things you should actually check that is orthogonal to your choice of technology/operating system/programming language

  1. Check the proxy — The enterprise proxy might decide to randomly block maven, github, slashadot, stackoverflow etc… After you’ve checked it find a way to circumvent it.
  2. Disable iptables — When you plan the development of a new piece of software first focus on algorithms, data model, UI. Nobody really cares of when/where your packets are allowed to go to/come from.
  3. Passwords — Use root/12345 consistently across your development environment and you will not need to check this point anymore
  4. Date/time representation — Your date/time will be always represented in the wrong format. This is particularly true in the usual task of moving data across different repository/files.
  5. Character encoding — This is an issue that routinely appears among files that you successfully use for your test and then, suddenly, they rot because you added few character using (for example) Notepad++ under Windows instead of vi under Linux. Stick to one encoding and just memorize the shortest sequence of tools that is able to convert text to that character encoding.

Counter intuition, counter intuition everywhere

  • Memory ? The more the better. WRONG: Java heap size uses a reasonable mechanism called compressed oops to represent memory pointer. Because of this mechanism it may happen that 32GB of heap size are better than, let say, 35GB (see this article for a serious discussion)

Written by

Data Masseur, Distributed Systems Sculptor, and Scalability Evangelist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store