How do you tackle large, unknown Codebases?

I’d like to hear some advice from more seasoned or expirienced devs. I’m not a professional myself and would say i only code recreationally. Until now, i did most of my stuff from scratch, but recently i found myself wanting to change existing Projects.

For the particular thing i’ve selected to look into, i’m having a go at the new Windows Terminal Github Prject
Since this is done by mostly professionals, it’s on a completely different scale than anything i’ve looked at until now. I’m able to clone the Repo and can build/debug it perfectly fine, so i have working code.

How do i tackle something like this? Lets say i want to start with something “easy” and just change something like the color of one element, or the size of something. How would i even start to find the relevant parts in the code? I’m familiar with c++ a bit by having worked on my own DWM fork, but that’s only a few lines of code in comparison.

I have Visual Studio and Code set up and can work through the code, but in most cases, there doesn’t seem to be anything done directly in place and finding where functions, references or variables come from feels nearly impossible without having programmed this myself.

Any good tips on what to focus on and where to look at first to find a certain property, element or function?

1 Like

What I do at least is: I get a piece of paper and draw a graph of the major “things” and relationships while reading through code at a high level. Not necessarily classes (if OO language). Go through and/or redraw as needed until I think I understand what’s going on. If I have a colleague, I take myself and the graph to him/her, and spend a little while explaining what I learned, and he/she tells me where and if I’m right or wrong about certain things. If I don’t have a colleague, and I don’t have another way to verify, that’s my best assumption of what is going on at the time, usually enough to start doing at least some tasks. I do this whenever I’m tasked to work on something I’m not yet familiar with. Infrastructure and code alike, I do this for both.

The goal I’m trying to reach is getting a visual picture, that is much easier to comprehend. Sometimes I don’t do that to the entire project (system), but bits at the time, depends on how large it is. This is the quickest way for me to get to working. You won’t learn it all at once, though, give it time.

1 Like

Most definitly.
I’m also slowly learning to appreciate new features in Code. For C++ at least it does a really good job of moving in Code. After some time scanning i can look up references with Alt+F12, get definitions and get the corresponding functions in a popup. This helps a lot in understanding what’s going on.

I feel like i know at least which parts of the codebase are even relevant to my Platform. But it’s certainly not easy to grasp.
I’ll take your advice and sit down on Saturday and will try to create a rough outline of what is going where. Thanks for the Input!

Yep was just going to add: That reading consists of observing the components, looking where they are referenced: where created, where interacted with, where destroyed. This gives a good understanding overall, which is the goal. Detailed code comes later.

Reading unit tests is useful sometimes too, they define how application works after all. But they tend to be too detailed, as they deal with concrete component behaviour, if you have higher level tests, that’s a good place too.

Man that project is massive lol. Back in the day I would find “main” and then work from there, but as projects become more complex or larger, I’ve started finding a piece or function and working with it until I know how it works, then moving to another piece or function and working with it until I know how it works. After a few days, a week, or a month you start to get the feel for how it all works together.

Don’t be afraid to ask questions on the community either.

Also, the search in GitHub is halfway decent :grin:

That may or may not be what you’re looking for :thinking:

Sadly it’s not. That’s for defining the Console Color scheme (Text and such). That’s already surfaced through settings.

I’ll make sure to give a proper search a try though. I feel like, again, Code will have a nice feature for that, that i only need to find :wink:

1 Like

Haha yeah. Control + Click will take you to the function definition if/when you find something that looks interesting. I know that works in JetBrains but I’m 99% sure that works in VS Code, too :thinking:

1 Like

Nice, that’s the same that pops up with alt+F12 but much more convenient.

1 Like

If you install some intellij keybind plugin (witch is how I did it), or change the buttons.

It would then be
ctrl+b to get to what you are looking at
ctrl+alt+b to go to the implementation (if what you are looking at is an interface or abstract method, youd need this to find the implementation of it)

Jetbrains seems to have a bit better shortcuts for actually using the thing. F12 is so far away… and you need that all the time.

Interfaces, abstract classes, inheritence and so forth are also some things you should be very familiar with for larger projects.

Also, keep in mind if you actually wanna commit code to the windows terminal. Microsoft has coding guidelines that you have to read and follow or your code may be rejected even if it works. You might want to have a look at those.

1 Like

I remember watching a webinar about this sort of thing to where some guy inherited a codebase of over 27K different files. I shall search for it after work to see if I can find it for you.


Sounds good, would also watch it probably! :smile:

I watched some video about windows terminal at some point. Kinda mind blowing how much time has gone into and how many lines of code their terminal has. When you look at the gnome-terminal in comparison especially. It’s a tiny fish of a codebase if you compare the two.

Just cloned the git repos to verify
(Obviously the bigger one being the windows terminal)
(Lines of code is not necessarily lines of code though, it’s just lines of everything in the repo)

EDIT: actually looks like I goofed in that regard I ended up copying the 1.417 to the lines of code of gnome-terminal. Wow, I should go sleep I guess… :sweat_smile:

Total Files Lines of Code
257 364.249
1.417 319.440

That is what you get comparing those two repos

with git ls-files | wc -l and git ls-files | xargs cat | wc -l if you wonder how I got to those numbers.

It better be good!! @Microsoft

I use the cloc tool that I found some time ago, it’s pretty comprehensive, it skips files that are usually generated by IDE’s etc. Here are the digits:

     255 text files.
     252 unique files.                                          
      27 files ignored. v 1.72  T=1.98 s (115.8 files/s, 180118.1 lines/s)
Language                     files          blank        comment           code
PO File                        111          63715         108230         152817
C                               30           3359           1963          14747
Visualforce Page                32            476             85           2601
Glade                            2              0             15           2528
C/C++ Header                    33            538            769           1213
m4                               3            150             45            910
XML                              4             51             70            749
make                             6            100             75            595
Qt                               4              9             40            247
INI                              1              1              0             19
CSS                              1              3              1             18
Bourne Shell                     1              4              1             10
Lisp                             1              0              0             10
SUM:                           229          68406         111294         176464

microsoft terminal

    1290 text files.
    1229 unique files.
     326 files ignored. v 1.72  T=3.48 s (277.2 files/s, 57509.5 lines/s)
Language                             files          blank        comment           code
C++                                    345          19563          27405         100374
C/C++ Header                           353           6316           6370          19312
C#                                      79           1527           1116           7668
Markdown                                46           1149              0           3537
Pascal                                  17            189              0            779
Windows Resource File                   19            230            200            730
IDL                                     19             79              0            435
MSBuild script                           7              6              7            401
DOS Batch                               35            104             99            392
PowerShell                               6            101            120            372
XML                                     10             15             42            364
XAML                                     7             43             41            356
YAML                                     8             52             13            311
Python                                   4             31             52            160
INI                                      3              3              0             60
JSON                                     2              0              0             43
JavaScript                               1              4             13              8
Windows Module Definition                3              2              0              8
Windows Message File                     1              0              0              4
SUM:                                   965          29414          35478         135314

How did you end up with 95 times the lines of code I ended up with for gnome-terminal though? Did I use the wrong repo? I know they have their own gitlab, but I did not doubt the github gnome-terminal, because it had a commit a 5 days ago.

NOTE: Made some random mistake in my calculator before. Copy pasting is hard sometimes.

Yeah I found it a bit weird that it’s just 1.5k. You could do a terminal in that much, but not with as many features as gnome-terminal has :smiley:

I just did a blind clone on both, wonder why it’s so different, number of files looks right though :thinking:

What repo did you use I know they do have a seperate gitlab, but I did not doubt the github repo as it had a commit 5 days ago, was first googeling and is called gnome-terminal.

I used the link you posted! :smiley:

Looks, like I messed something up edited it. Well as I said copy pasting is hard sometimes. Wow, do I feel stupid now. :joy:

Does not actually end up looking that much off all in comparison then.

No worries, happens to me all the time too :smiley:. Well, microsoft’s terminal is still at least 6 times bigger in scope (don’t mind .po files, they’re just translations), still I’d say your point stands :smiley:

There’s nothing to do but get on the grind, you got to understand what your peers are doing, and it takes what ever time it takes.(This is what most bosses seems not to understand).
Theres no method really basically you have to read through and understand a 500+ page paper at times, just to change the color of some text.
It does get easier with practice, but understading what you’re doing is the most important part, and if you’re new to it, just be patient.
i think one of my most ever experienced teachers said, coding is a craft, just practice practice, and more practice.

1 Like