Behind the curtain: the magic behind “ls -l *.c”
A step by step introduction into what goes on when you type “ls -l *.c” and press enter in a shell
First of all, what is a shell?
The shell is the most basic way to interact with the system. In fact, every time you use the terminal, you are actually using the shell. Not only does the shell act as a link between the operating system and the user but it also serves as an interface that converts human readable commands into information the kernel can understand and process. In the physical world a kernel is the innermost part of a nut contained within a shell. Likewise, the kernel is the innermost part of an operating system and maintains absolute control over the system. Consequently, the shell works as a command-line interpreter so that programs can be executed.
Whenever the user types a command, they’ll immediately see it on the screen as well as its output. However, behind the scenes, the system runs many processes which we’ll discuss in this article.
The shell prompt
A prompt is where you type commands and it is composed of a series of characters. Once displayed it means the system is ready to accept commands. The prompt usually ends with “$” and a blinking cursor. Most commands receive input from the standard input, ususally the keyboard, and then send it to standard output, usually the screen. In order to send input the user must type the desired command and then hit enter.
The command is entered…now what?
At this point the system must recognize the commands you entered and to pair it to a program that will be executed. The command entered by the user is a string which will be stored as a buffer. If this process is not succesful then the system will return an error message. This task is performed by the function getline() which takes the below three parameters:
- &buf, the address of the first character of the buffer where the input is stored
- &bufsize, the address of the variable containing the buffer
- stdin, in this case we read the input from standard input
The return values for getline() are 1, if it finds the input, 0 if it reaches EOF (end of the file) or -1 if an error occurs.
Divide and conquer: now let’s tokenize!
At this step we need to parse the string into smaller strings by using the blank space as a delimiter. Tokenization will split the program name and each one of the parameters passed. In our example the first token is “ls”, the second “-l” and the third “*.c”. To do this the shell uses the strtok() function. Its prototype is char *strtok(char *str, const char *delim) where,
- str, is the string we want to parse
- delim, is our delimiter which, in this case, will be the blank space
strtok() parses the string and returns a pointer to the token or, a null pointer if the string cannot be parsed.
Now that we have our string split into smaller sub-strings the system can interpret the command and check whether it calls a program.
Separating the weed from the chaff: is it an alias or a built-in function?
In real life an alias is an assumed identity. Likewise, in the shell an alias is a shortcut to reference a command, it is usually used to avoid typing long commands or to avoid making typing mistakes. An alias is set by replacing a string with the first token of a command. These are usually added within the .bashrc files or in .tcsh.
In case an alias is found then it will be replaced by the actual command, find the executable program and finally call it. Otherwise, it will check if the command is a built-in function.
On the other hand, built-in functions are executed directly within the shell itself, not from an external executable program, reason why they work much faster than those. Opposite to external programs, which are stored in the hard disk, built-in functions are always accesible in RAM. To check whether a command is a built-in function you can execute the command type. If the system matches the command entered by the user with a built-in function then it will be executed.
Show me the way!: searching within the PATH
If the command entered is not an alias nor a built-in function then the shell checks whether it can be found within the environment variable PATH. This variable stores a number of directories where executables are stored, separated by “:”.
If the command is identified within the PATH variable then all the directories will be tokenized and parsed with the delimiter “:”. After that, the command will be concatenated to one of the subdirectories of the PATH variable.
Parallel processes: executing the program
Now that the user entered input and the program was located the shell will execute it. But there is a caveat here. So far, the shell has executed everything within the same process but now the shell has to run the program on a parallel process by duplicating itself. The shell has to do so in order to avoid the whole shell from crashing should there be an error with the command entered. Additionally, this enables the shell to work in a loop which returns to the prompt instead of finishing after each command is executed.
The system call that allows the shell to create parallel processes is fork(). It will create a copy (the child process) from the ongoing process (the parent process). Each process possesses it own unique process id (pid).
The command the user enters is executed within the child process while the parent process is halted which is done by using the wait() system call. In the child process the program route will be passed to the system call execve() which will execute it and display its output onto the screen. Once this is over the parent process is called which will go to the prompt so that new commands can be entered by the user.
So what about “ls -l *.c”?
When the user enters “ls -l *.c” the above process will take place and the output will be a list of the files with extension .c displayed in long format.
- “ls” displays the files in the current directory
- “-l” is an argument which causes the list to be in long format
- “*.c” is a wildcard so that all files ending with the extension .c are dissplayed
What about wildcards?
In board games a wild card is a card that can have any value. Likewise, in our scenario a wild card is a character that can be used as a substitute for any character or list of characters. The star wildcard, *, can represent any single character, any string or no characters at all. Wildcards can also be combined with other characters to represent part of a string. In our example the star wild card is used to retrieve all the files ending in .c. However, it will only show files that are not hidden, since we did not add other options such as “-a”.
When “ls -l *.c” is executed we can see that the output contains ten fields per each file.
1- indicates it is a file, it would be “d” if it were a directory or “l” if it were a link
2- shows the permissions for the file owner; r for read, w for rite and x for execute
3- shows the permissions for the group owner
4- shows the permissions for everyone else
5- shows the number of links
6- shows the file owner
7- shows the group owner of the file
8- file size in bytes
9- date of the last modification
10- file name
In conclusion, whenever the user enters a command the shell will interpret it, execute it and then display its output on the prompt; after all of this it will display the prompt and await user input.
Hope you’ve enjoyed taking a pick under the shell’s hood!
This blog was written by Alina de los Santos, Agustin Bolioli and Marcelo Arbiza.