Tech Vault Interview Questions Bank
Note: A documentation of carefully curated technical engineering questions by experienced Engineers who have conducted many interviews. These are amongst popular and important engineering questions to test knowledge of candidates in interviews.
Any questions taken from external references are referenced below.
Table of Contents
- Vision
π - What is different in this repo compared to others?
π - Contributing
ποΈ - DevOps
π - System Design
π₯ - Software Engineering
- Data
- Machine Learning
π€ - Cyber Security & InfoSecurity
π‘οΈ - Interpersonal skills
- References
Vision
- Coming from a non-technical background and without experience can be difficult for many. So I and my team have decided to create this project where all, non-technical and technical tech professionals, can have a place of reference to technical interview questions and interview prep. This is maintained by a lovely community. If you wish to contribute, feel free to do so. Read this for more info >> How to Contribute
What is different in this repo compared to others?
ποΈ
Contributing - To contribute, please read our Contributing Guidelines. For any fixes, updates or new additions, please make a pull-request (PR). Thank you!
π
DevOps
πΉ
Linux
What is Linux and difference between UNIX and Linux?
Linux is an open-source operating system based on the UNIX architecture. It was created by Linus Torvalds in 1991. UNIX, on the other hand, is a family of operating systems that was developed in the late 1960s at Bell Labs. The main differences between UNIX and Linux are:
- License: UNIX is proprietary, while Linux is open-source and free to use.
- Development: UNIX is developed by a few organizations (like IBM, Sun Microsystems, and HP), while Linux has a widespread community-driven development process.
- Portability: Linux is more portable and can be used on a wide range of hardware, whereas UNIX is limited to specific hardware platforms.
- User base: Linux has a broader user base, including personal computers, servers, and embedded systems, while UNIX is primarily used in enterprise environments.
What is the Linux kernel?
The Linux kernel is the core component of the Linux operating system. It is responsible for managing the system's resources, providing an interface between hardware and software, and facilitating essential tasks such as memory management, process scheduling, and input/output (I/O) operations.
What are inodes in Linux?
Inodes in Linux are data structures that store important information about files on a file system. Each file or directory in a Linux file system has an associated inode that contains metadata such as the file's size, permissions, ownership, timestamps, and the location of the file's data blocks on the disk.
When you create a file or directory, the file system assigns a unique inode number to it. The inode number serves as an identifier for the file, allowing the file system to access the inode's information and manage the file accordingly.
Here's a summary of what inodes store:
- File type (regular file, directory, symbolic link, etc.)
- File permissions (read, write, execute)
- Ownership (user and group)
- Timestamps (creation, modification, and access times)
- File size
- Number of hard links to the file
- Location of the file's data blocks on the disk
It's important to note that inodes don't store the file's name or the actual file data. The file name is stored in the directory that contains the file, which associates the name with the inode number. The actual file data is stored in separate data blocks on the disk, and the inode points to these blocks.
Inodes play a crucial role in managing files and directories within a Linux file system, providing an efficient way to access and manipulate file metadata.
Explain the Linux boot process
The Linux boot process consists of several stages that initialize the system and load the operating system. Here's a brief overview of the key steps:
BIOS/UEFI: When the computer is powered on, the BIOS (Basic Input/Output System) or UEFI (Unified Extensible Firmware Interface) performs initial hardware checks and locates the boot device.
Bootloader: The bootloader (e.g., GRUB) loads from the boot device and presents the available operating systems to the user. It then loads the Linux kernel and initial RAM disk (initrd) into memory.
Kernel initialization: The Linux kernel initializes hardware, sets up memory management, starts essential processes, and mounts the initial RAM disk, which contains essential drivers and tools needed during the boot process.
Root file system: The kernel switches the root file system from the initial RAM disk to the actual root partition on the disk, typically identified by its UUID or device name (e.g., /dev/sda1).
Init process: The first user-space process, called init (e.g., Systemd, SysVinit, or Upstart), starts and manages system services and processes during the boot process and the system's runtime.
Runlevel/target: Init process initializes the predefined runlevel (SysVinit) or target (Systemd), which determines the services and processes to run at startup.
Login prompt: Once all services and processes specified in the runlevel/target have started, the system displays a login prompt, indicating that the boot process is complete and the system is ready for use.
What is a zombie process?
A zombie process, also known as a defunct process, is a process that has completed its execution but still remains in the process table. This happens because the parent process has not yet read the child process's exit status, which is required to clean up the child process's resources and remove its entry from the process table.
Zombie processes don't consume any system resources, except for the process table entry, which includes the process ID (PID) and the exit status. The operating system keeps this information so that the parent process can eventually retrieve the exit status and perform the necessary clean-up.
More info on zombie processes below:
Typically, a well-behaved parent process will use the wait() or waitpid() system call to collect the exit status of its child processes. However, if the parent process doesn't do this, either due to a programming error or because the parent is still running and hasn't reached the point where it collects the exit status, the child process becomes a zombie.
Zombie processes are usually harmless, but if a system accumulates a large number of them, it could exhaust the available PIDs and prevent new processes from being created. To resolve this issue, the parent process should be fixed to correctly handle its child processes' exit status, or if the parent process is unresponsive or terminated, a system reboot might be necessary.
Difference between soft links and hardlinks?
Soft links and hard links are two types of file links in a Unix-like file system, such as Linux. They serve different purposes and have distinct characteristics:
Soft Link (Symbolic Link):
- A soft link is a separate file that points to the target file or directory by storing its path.
- If the target file is deleted, the soft link becomes a "dangling" link, pointing to a nonexistent file.
- Soft links can span across different file systems and partitions.
- Soft links can link to directories as well as files.
- When a soft link is created, the link count of the target file doesn't change.
- Soft links have different inode numbers than their target files.
Hard Link:
- A hard link is a direct reference to the data on the disk, sharing the same inode as the target file.
- If the target file is deleted, the hard link still points to the data, and the data remains accessible until all hard links to it are removed.
- Hard links can only be created within the same file system or partition.
- Hard links cannot link to directories, only to files.
- When a hard link is created, the link count of the target file increases by one.
- Hard links have the same inode numbers as their target files.
In summary, a soft link is a more flexible but less reliable type of link that can point to files or directories across file systems, while a hard link is a more robust link that directly references the file's data, but is limited to the same file system and cannot link to directories.
What are namespaces and c-groups?
Namespaces and cgroups (control groups) are two Linux kernel features that play a crucial role in implementing process isolation and resource management, especially in containerization technologies like Docker.
Namespaces:
Namespaces are a feature that provides process isolation by creating separate instances of certain system resources, which can only be accessed by processes within the same namespace. This isolation helps ensure that processes running in one namespace don't interfere with processes in another namespace
There are several types of namespaces, including:
- PID namespace: Isolates process IDs, allowing each namespace to have its own set of PIDs.
- Mount namespace: Isolates the file system mount points, so that each namespace has its own mount tree.
- Network namespace: Isolates network resources, providing each namespace with its own network stack, including interfaces, routes, and firewall rules.
- IPC namespace: Isolates inter-process communication resources, preventing processes in one namespace from communicating with processes in another namespace.
- UTS namespace: Isolates system identifiers like hostname, allowing each namespace to have its own unique hostname.
- User namespace: Isolates user and group ID mappings, enabling each namespace to have its own set of user and group IDs.
Cgroups (Control Groups):
Cgroups are a kernel feature that enables the management and limitation of system resources, such as CPU, memory, and I/O, for a group of processes. Cgroups help ensure fair distribution of resources, prevent resource starvation, and enforce limits on resource usage. Some of the key features of cgroups include:
- Resource limiting: Allows setting limits on resource usage for a group of processes, such as maximum CPU usage, memory consumption, and disk I/O bandwidth.
- Prioritization: Enables setting priorities for resource allocation among different cgroups, helping to ensure that critical processes receive sufficient resources.
- Accounting: Collects resource usage statistics for processes in a cgroup, which can be useful for monitoring, profiling, and billing purposes.
- Control: Provides a mechanism to start, stop, or freeze processes in a cgroup, allowing for better management of process groups.
In combination, namespaces and cgroups provide the necessary isolation and resource management capabilities required to build and run containers, enabling multiple containers to coexist on the same host without interfering with each other or consuming excessive resources.
What are symbolic links?
Symbolic links, also known as soft links or symlinks, are a type of file link in Unix-like file systems, such as Linux. A symbolic link is a special file that points to another file or directory by storing its path. Symbolic links serve as a reference to the target file or directory, allowing users and applications to access the target through the link.
Symbolic links are useful in various scenarios, such as creating shortcuts, linking to files or directories in different locations, or maintaining multiple versions of a file or directory. Some key characteristics of symbolic links are:
Symbolic links can point to files or directories, and they can span across different file systems and partitions. If the target file or directory is moved or deleted, the symbolic link becomes a "dangling" link, pointing to a nonexistent location. Symbolic links have different inode numbers than their target files or directories. When listing files with the ls command, symbolic links are usually indicated by an "l" at the beginning of the file permissions and an arrow (->) pointing to the target file or directory.
To create a symbolic link in Linux, you can use the ln command with the -s option, followed by the target file or directory and the desired symlink name:
ln -s target_file symlink_name
Example:
ln -s /path/to/original/file.txt link_to_file.txt
This command creates a symbolic link named link_to_file.txt that points to the file located at /path/to/original/file.txt.
What are the different types of permissions in Linux?
In Linux, there are three main types of permissions for files and directories, which determine how users can interact with them. These permissions are classified into categories based on the user's relationship to the file or directory: owner, group, and others (sometimes referred to as "world" or "public"). The three types of permissions are:
Read (r): Read permission allows a user to view the contents of a file or list the contents of a directory. For files, this means the user can open and read the file. For directories, the user can view the names of files and subdirectories within that directory.
Write (w): Write permission allows a user to modify the contents of a file or make changes within a directory. For files, this means the user can edit, append, or delete the file's content. For directories, the user can create, rename, or delete files and subdirectories within that directory. Note that deleting or renaming files within a directory requires write permission on the directory itself, not the individual files.
Execute (x): Execute permission allows a user to run a file as a program or script, or enter and access a directory. For files, this means the user can execute the file if it is a binary executable or script with a proper interpreter (e.g., a shell script or Python script). For directories, the user can change their current working directory to that directory, and access files and subdirectories within it.
These permissions are usually represented using a combination of letters (r, w, x) and dashes (-) for each of the three categories: owner, group, and others. For example, the permission string -rwxr-xr-- indicates:
- The first character - indicates it's a regular file (a d would indicate a directory).
- The owner has read (r), write (w), and execute (x) permissions: rwx.
- The group has read (r) and execute (x) permissions, but not write permission: r-x.
- Others have only read (r) permission: r--.
Alternatively, permissions can also be represented using octal notation (base-8), where read, write, and execute permissions are assigned values of 4, 2, and 1, respectively. The permissions are then represented by a three-digit number, with each digit corresponding to the owner, group, and others. For example, the permission string -rwxr-xr-- can be represented as 754 in octal notation.
What is swap space?
Swap space is a dedicated area on a storage device (such as a hard drive or SSD) that functions as an extension of a computer's physical memory (RAM). It is used by the operating system to temporarily store data that does not fit into RAM or when the system experiences memory pressure due to high RAM utilization.
When the operating system needs more memory than is physically available, it can move the least recently used or less important data (called pages) from RAM to the swap space. This process is called "paging" or "swapping out." By doing so, it frees up space in RAM for more critical or frequently accessed data. If the swapped-out data is required again, the operating system will move it back into RAM, possibly swapping out other data in the process. This is called "swapping in."
Swap space can be implemented as a dedicated swap partition or a swap file. In Linux, you can manage swap space using commands such as swapon, swapoff, and mkswap. To check the current swap space usage on a Linux system, you can use the free or swapon -s commands.
What is chmod, chown and chgrp in Linux?
chmod chmod (change mode) is a command used to change the permissions of a file or directory. You can set read, write, and execute permissions for the owner, group, and others. Permissions can be represented in octal notation (numeric) or using symbolic notation (letters).
Example: For example, to give the owner read, write, and execute permissions, the group read and execute permissions, and others only read permission, you would use:
- Using octal notation:
chmod 754 file.txt
- Using symbolic notation:
chmod u=rwx,g=rx,o=r file.txt
chown
chown (change owner) is a command used to change the ownership of a file or directory. You can specify a new owner and an optional new group for the file or directory.
Example: For example, to change the owner of file.txt to the user john and the group to developers, you would use:
chown john:developers file.txt
chgrp
chgrp (change group) is a command used to change the group assignment of a file or directory. You can specify a new group for the file or directory.
Example: For example, to change the group of file.txt to the group developers, you would use:
chgrp developers file.txt
What are cronjobs?
Cronjobs, also known as cron jobs or simply cron, are scheduled tasks that run automatically at specified intervals on Unix-like operating systems, such as Linux. The term "cron" comes from the Greek word "chronos," which means "time." Cronjobs are commonly used for automating repetitive tasks, performing system maintenance, running periodic backups, and other similar activities.
Cronjobs are managed by a daemon called "cron," which runs in the background and executes the scheduled tasks. The configuration for cron jobs is stored in a series of files called "crontabs" (short for "cron tables"). Each user on the system can have their own crontab, and there is also a system-wide crontab.
For example, a cron job that runs every day at 3:30 AM would have the following entry in the crontab:
30 3 * * * /path/to/command arg1 arg2
To manage cron jobs, you can use the crontab command with various options:
- crontab -l: List the current user's cron jobs.
- crontab -e: Edit the current user's cron jobs using the default text editor.
- crontab -r: Remove the current user's cron jobs.
- crontab -u USER: Perform an operation (list, edit, or remove) on the specified user's cron jobs (requires root privileges).
Commands (basic & advanced):
What does chmod +x FILENAME do?
The command chmod +x FILENAME is used to add execute permissions to a file in a Linux or Unix-like system. By adding execute permissions, you allow the file to be run as an executable or script, provided it has the appropriate format and interpreter (e.g., a shell script, Python script, or compiled binary). The +x option specifically grants execute permission to the owner, group, and others (all users).
For example, if you have a script named myscript.sh and you want to make it executable, you would run:
chmod +x myscript.sh
Which command will show you free/used memory?
the free command is used to display information about free and used memory. The command provides details on total, used, free, shared, and available memory, as well as swap space usage.
To use the free command, simply type free in the terminal, followed by any desired options. Some common options include:
- -b: Display memory usage in bytes.
- -k: Display memory usage in kilobytes (default).
- -m: Display memory usage in megabytes.
- -g: Display memory usage in gigabytes.
- -h: Display memory usage in a human-readable format, automatically choosing the appropriate unit (e.g., B, K, M, or G).
- -t: Display a line containing the total amount of physical memory and swap space.
- -s N: Continuously display memory usage information, updating every N seconds.
For example, to display memory usage in a human-readable format, you would run:
free -h
Which command will show me the current directory I am in?
pwd
How can I terminate an on going process?
kill -9 PID
Write the command that will display all .yaml files including permissions of each file? ()
find . -type f -name "*.yaml" -exec ls -l {} ;
How can I found the status of a process?
Content HERE
What is the command to show all open ports?
Content HERE
How do you find the process ID of a running process in Linux?
Content HERE
How do you find the dependencies of a package in Linux?
Content HERE
Advanced:
Does free memory exist on Linux?
Content HERE
How can I check if a server is down?
Content HERE
What is inside /proc?
Content HERE
A process on the system can no longer log files, what can I do?
Content HERE
What is LILO?
Content HERE
What are syscalls in Linux and how do they work?
Content HERE
What is no route to host?
Content HERE
What is the difference between a hard link and a symbolic link in Linux?
Content HERE
Linux Advanced (Scenario based questions):
Explain the linux boot process
Content HERE
A process on the system can no longer log files, how would you debug?
Content HERE
How can I check if a Linux system is healthy?
Content HERE
What happens when you type "ls" or "cd" into a terminal? (go deep and talk about what happens behind the scenes - kernel level)
Content HERE
How can I check if a server is down?
Content HERE
How are Linux processes killed on a lower level?
Content HERE
I have accidentally entered `cd/bin` and done `chmod 644 chmod` - how can I fix this?
Content HERE
How would you troubleshoot a network connectivity issue in Linux?
Content HERE
How do you troubleshoot a connectivity issue with a remote server in Linux?
Content HERE
How do you view and edit the system logs in Linux?
Content HERE
How do you troubleshoot a DNS issue in Linux?
Content HERE
πΉ
Networking
What is HTTP? How is HTTPS different?
Content HERE
TCP vs UDP
Content HERE
What is DNS and how does it work?
Content HERE
What is TLS?
Content HERE
What are CIDR ranges?
Content HERE
What is ingress and egress traffic?
Content HERE
What is a switch vs a hub?
Content HERE
What is a switch vs a router?
Content HERE
What is HTTPS vs Websockets?
Content HERE
Explain how a 3 way handshake works?
Content HERE
Stateless vs Stateful firewalls?
Content HERE
What are VPCs?
Content HERE
What is subnetting?
Content HERE
What is DHCP?
Content HERE
Advanced + Scenario based questions:
When I type google.com into the browser, what actually happens? (go into as much detail as you can)
Content HERE
I can't reach a website, how can I troubleshoot? (use deep Linux + networking knowledge)
Content HERE
Can you break down the OSI model and what does it signify?
Content HERE
How does mTLS work and compare it to TLS?
Content HERE
Describe the TCP/IP connection process?
Content HERE
When and why would one use a TCP over UDP?
Content HERE
Data transfer between 2 hosts is extremely slow. How can you troubleshoot?
Content HERE
πΉ
Git
-
What is Git?
Click here to view the answer
Content HERE
-
Difference between Git and SVN?
Click here to view the answer
Content HERE
-
What is the basic Git workflow?
Click here to view the answer
Content HERE
-
Difference between git pull and Git fetch
Click here to view the answer
Content HERE
-
What is git cherry-pick?
Click here to view the answer
Content HERE
-
What is the HEAD in Git?
Click here to view the answer
Content HERE
-
When do I use Git stash?
Click here to view the answer
Content HERE
-
What does git reset do?
Click here to view the answer
Content HERE
-
What is Git fork? What is difference between git fork, clone and branch?
Click here to view the answer
Content HERE
-
What is difference between
git stash pop
andgit stash apply
?Click here to view the answer
Content HERE
Advanced:
-
I need to update my local repos, what commands do I use?
Click here to view the answer
Content HERE
-
I need to rollback to a previous commit and I don't need my recent changes, what do I use?
Click here to view the answer
Content HERE
-
How can I amend an older commit?
Click here to view the answer
Content HERE
-
What is the command to check the difference between two commits?
Click here to view the answer
Content HERE
-
When do you use
git rebase
instead ofgit merge
?Click here to view the answer
Content HERE
-
Do you know how to undo a git rebase?
Click here to view the answer
Content HERE
-
How do you bring down updates from main branch if your local branch becomes stale?
πΉ
AWS
-----General--------
- What is AWS?
- What are two services of AWS where you could store secrets?
- What is the relation between the Availability Zone and Region?
- What is auto-scaling?
- What services can help minimise a DDoS attack?
- What is an AMI?
- What are different types of load balancers?
-----Networking-----
- What is a VPC?
- How do Subnets work?
- What network object do you need to allow a server ingress from the internet?
- What are the different type of load balancers in AWS?
- Whate are subnets? and what are CIDRs?
- How can your resources in the VPC get access to the internet?
-----Scenario-based-----
- I want to create a 3 Tier Architecture. Can you explain step by step of how I can go about this?
- In VPC with private and public subnets, database servers should ideally be launched into which subnet?
- What are some security best pracites for EC2s?
- I created a web application with autoscaling. I observed that the traffic on my application is the highest on Wednesdays and Fridays between 9 AM and 7 PM. What would be the best solution for me to handle the scaling?
- You have an application running on your Amazon EC2 instance. You want to reduce the load on your instance as soon as the CPU utilization reaches 100 percent. How will you do that?
-----Others-----
- Name some managed runtimes for Lambda
πΉ
Azure
- What is Azure?
- What are ARM templates in Azure?
- What is Azure CDN?
- How is Azure App Service different from Azure Functions?
- How to define an Environment Variable on Azure using Azure CLI?
- How would you choose between Azure Blob Storage vs. Azure File Service?
- What is difference between Keys and Secrets in Azure Key Vault?
- Whatβs the difference between Azure SQL Database and Azure SQL Managed Instance?
- When should we use Azure Table Storage over Azure SQL?
- Explain what is the difference between Block Blobs, Append Blobs and Page Blobs in Azure?
Advanced:
- What to use: many small Azure Storage Blob containers vs one really large container with tons of blobs?
πΉ
Terraform
- What is IaC? What is Terraform?
- What is Terraform state
- What are most common Terraform commands?
- What is Terraform backend?
- What are modules in Terraform?
- What is Terragrunt?
- Explain the workflow of the core Terraform?
- Difference between Terraform, AWS CloudFormation and Azure ARM?
- What is the difference between Terraform and Ansible?
- What are provisioners in Terraform?
- How can two people using the Terraform cloud can create two different sets of infrastructure using the same working directory?
- What is a null resource in Terraform?
- Which command is used to perform syntax validation on terraform configuration files?
- How can I format my current directory of Terraform config files in the HCP format?
Advanced:
- I have 3 people in my team who want to work on the same AWS Infrastructure on Terraform as well as same state. What can I do to ensure there are no clashes?
- In a pipeline, where would you store the Terraform state?
- Can I test my terraform modules? If so, how can I test them? Is there a common framework used to Terraform modules?
- How does state file locking work?
- What is Checkov?
- How can I use Terraform in a pipeline?
- How can one export data from one module to another?
- How can you import existing resources under Terraform management?
- Which command can be used to reconcile the Terraform state with the actual real-world infrastructure?
πΉ
Docker & K8s
Container (Docker):
- What is a container and what are its fundamentals?
- What are c-groups and namespaces in Linux?
- What is Docker and how does it work?
- When do I use Docker Compose vs Docker Swarm and K8s?
- What is a Dockerfile used for?
- Can you explain the basic components of a Dockerfile?
- What is a FROM in a Dockerfile used for?
- What is a COPY in a Dockerfile used for?
- What is a ADD in a Dockerfile used for?
- What is a CMD & ENTRTPOINT in a Dockerfile used for?
- How is a container different from a virtual machine?
- How can I run a container?
- How can I stop and remove a container?
- How can I attach a shell to a terminal of a running container?
- What is a dangling image?
- What is Docker compose and when is it used?
Advanced:
- What is the difference between COPY and ADD commands in a Dockerfile?
- What is the difference between CMD and RUN commands in a Dockerfile?
- What are some best practices to follow when running containers in production?
Container Orchestration (Kubernetes = K8s):
- What is Kubernetes?
- What problems does Kubernetes solve?
- What is the difference between Docker and K8s?
- What is a Pod and what does it do?
- How can containers within a pod communicate with each other?
- What is a K8s cluster ?
- What are deployments?
- What are stateful sets?
- How do you restrict pod-to-pod communication in a cluster?
- How can I rollback a deployment?
- What are namespaces?
- What is the role of the kube-apiserver?
Advanced:
- Can you mention some good practices to follow when creating containers?
- Can you explain a simple K8s cluster architecture and the components within it?
- What happens when a master node fails?
- What happens when a worker node fails?
- What is an Ingress controller?
- How can one build a highly availabe (HA) cluster in K8s?
- What is the role of ETCD in K8s?
πΉ
Ansible
- What is Ansible?
- How does Ansible work?
- What is Ansible Galaxy?
- What are Ansible handlers?
- What is Ansible Vault?
- What aer Ansible collections?
- How do you get a list of Ansible predefined variables?
- How is Ansible playbook different from ad-hoc commands?
- What is the recommended for securing secret information used within Ansible playbooks?
- What templating language is directly supported within Ansible for creating dynamic playbooks?
- What protocol does Ansible use for communicating with client systems?
- What is an inventory file?
Advanced:
- Can you name some Ansible best practices?
- How do you handle errors in Ansible?
- How do you test your Ansible roles and tasks?
- What is Molecule and how does it works?
πΉ
CI/CD
πΉ
DevOps methodology, practices, & Agile
- What is meant by continuous integratons?
- What are the advantages of DevOps?
- Can you describe some branching strategies you have used?
- What is the blue/green deployment pattern?
System Design
πΉ
CDN & Caching
- What is a CDN and why would I use one?
- What are CDN edge servers?
- How does CDN caching work?
- What is cache invalidation?
- What are some cache invalidation methods?
- What is a cache Hit?
- What is a cache Miss?
- Can you explain a caching workflow?
- What is the CAP Theorem?
- Explain the difference between horizontal scaling and vertical scaling?
- Difference between forward proxy and reverse proxy?
- What is Load Balancing and how does it work? How does it relate to reverse proxies?
- Name me some common load balancers
- What is a microservice architecture and when would I consider using one?
πΉ
Databases
- What is a database?
- What is DBMS (Database Management System)?
- What is the schema referred to in a Database?
- What are different types of databases?
- Advantages & Disadvantages of using relational databases?
- Advantages & Disadvantages of using NoSQL relational databases?
- What is a key-value database? What are some examples of this?
- What are graph databases? Name some examples?
- What is database replication?
- What is master-slave replication? And what is master-master replication?
- What is Synchronous vs Asynchronous replication?
- What are indexes in databases?
- How can one improve query performance by using index hunting?
- What do you understand by βAtomicityβ and βAggregationβ?
- What is database partitioning?
Advanced:
- What are message queues? And what are message brokers?
- How does the publish-subscribe model work?
- Can you explain how an event-driven architecture works?
- What is an API Gateway? How is this different from load balancers?
- Explain why CDN (in)availability may be a problem for using WebSockets?
Software Engineering
πΉ
General
- What are some ways that you can version an API?
- What does it mean for an API to be idempotent?
- How can you implement idempotency in APIs?
- What are some ways you could authenticate an API?
- What are benefits of working with circuit breakers?
- Define cascading failures and how to deal with those?
- How can you deploy an API without disrupting traffic?
πΉ
Golang
- What is Go?
- Why was the Go langauge created?
- What is a pointer?
- What is the difference between the = and := operator?
- What are goroutines?
- Does Go have exceptions?
Advanced:
- Implement a function that reverses a slice of integers?
- What are generics and how do they work?
πΉ
Python
πΉ
Java
πΉ
JavaScript (TS,nodeJS...)
- What is TypeScript?
- What are the differences between TypeScript and JavaScript?
- Why use TypeScript?
- What are the advantages of TypeScript?
- What are Types in TypeScript?
- What are Type Assertions in TypeScript?
- What are the Primitive data types?
- What are the special data types in TypeScript?
- What are interfaces in TypeScript?
- Interfaces Vs Types?
Data
Data Modelling and Schemas:
-
Define data modelling and the benefits of implementing a data model?
Click here to view answer
Answer: Data modeling is the process of creating a visual representation of either a whole information system or parts of it to communicate connections between data. Data modeling concepts create a blueprint for how data is organized and managed in your organization. Data models give developers and non-technical stakeholders a simplified way to have meaningful conversations about the needs of the business and how data insights can fuel better decision making.
-
What are some of the design schemas used when performing data modelling?
Click here to view answer
Answer: Star Schema. Snowflake Schema. Galaxy Schema. -
What are the three types of data models?
Click here to view answer
Answer: Entity models, relational and dimensional
-
What is a table (entity) and column (attribute)?
Click here to view answer
Answer: A row in a database table is an entity. A column header of a database table is an attribute.
-
What is normalisation/denormalisation and what is its purpose?
Click here to view answer
Answer: Normalization is the technique of dividing the data into multiple tables to reduce data redundancy and inconsistency and to achieve data integrity. On the other hand, Denormalization is the technique of combining the data into a single table to make data retrieval faster.
-
What are the main relationships which can be found in a data model? (name 3)
Click here to view answer
Answer: one to one, one to many and many to many
-
Explain the two different types of design schemas (snowflake and star)?
Click here to view answer
Answer: Snowflake and star are two different types of design schemas used in data warehousing. In a snowflake schema, the data is organized into a hierarchy of tables, with each table having multiple child tables. In a star schema, the data is organized into a central fact table and several dimension tables that connect to it.
-
What is a data mart?
Click here to view answer
Answer: A data mart is a subset of a larger data warehouse that is designed to serve a specific business function or department. It contains a smaller subset of the data found in the overall data warehouse.
-
How would you describe granularity?
Click here to view answer
Answer: Granularity refers to the level of detail or specificity of data. It describes how finely the data is divided and recorded.
-
How does data sparcity impact aggregation of data sets/sources?
Click here to view answer
Answer: Data sparcity can impact the aggregation of data sets/sources by making it more difficult to obtain accurate and meaningful results. Sparse data can result in incomplete or inaccurate analyses, as well as a loss of insights and trends.
-
In the context of data modelling, what is the importance of metadata? How would you describe the role of metadata in data modelling?
Click here to view answer
Answer: Metadata is important in data modelling because it provides additional information about the data being modelled. It helps to document the structure and relationships of the data, as well as its origin, quality, and meaning. -
What is a DDL script?
Click here to view answer
Answer: A DDL (Data Definition Language) script is a set of commands used to create, modify, or delete database objects such as tables, indexes, or views. -
What is a fact and dimension?
Click here to view answer
Answer: In a data model, a fact is a measurable event or transaction, while a dimension is a descriptive attribute of that event or transaction. For example, in a sales data model, a fact might be the quantity of a product sold, while dimensions could include attributes such as the date of the sale, the customer who made the purchase, or the location where the sale occurred. -
What is an ERD? Entity relationship diagram?
Click here to view answer
Answer: An ERD (Entity Relationship Diagram) is a visual representation of the relationships between entities (tables) in a database. It shows how the tables are related to each other and the nature of those relationships. -
What are the differences between foreign and surrogate keys?
Click here to view answer
Answer: Foreign keys are columns in a table that reference the primary key of another table. Surrogate keys are artificial primary keys that are created specifically for use in a database and have no inherent meaning outside of that context. -
Desribe cardinality
Click here to view answer
Answer: Cardinality refers to the number of relationships between entities in a data model. It describes how many instances of one entity can be related to another entity. Common cardinalities include one-to-one, one-to-many, and many-to-many.
Data Architect
-
Please state an example of designing, creating, deploying and managing an end to end data architecture project?
Click here to view answer
Answer: A company wants to implement a customer relationship management (CRM) system. The project would involve designing a data architecture to store customer data, creating the necessary databases and tables, and deploying the system. The data would need to be cleansed, transformed, and loaded into the new system. Ongoing management would involve monitoring the data for quality and making any necessary updates or modifications to the system. -
What are the fundamental skills required for a data architect?
Click here to view answer
Answer: Strong knowledge of data modelling and database design Expertise in database management systems (DBMS) and database administration Experience with data warehousing and business intelligence Proficiency in data integration, ETL (extract, transform, load) processes, and data migration Familiarity with programming languages and software development methodologies Excellent analytical and problem-solving skills Strong communication and collaboration abilities -
What is a data block and a data file?
Click here to view answer
Answer: A data block is the smallest unit of data that can be accessed or transferred from a database to memory. A data file is a physical storage unit used to store data in a database. -
What is data warehousing?
Click here to view answer
Answer: Data warehousing is the process of collecting, storing, and managing large amounts of data from various sources to support business intelligence and decision-making. It involves organizing the data into a centralized repository or data warehouse, where it can be queried and analyzed using business intelligence tools. -
What are the main differences between 'a view' and 'a materialised view'?
Click here to view answer
Answer: A view is a virtual table that is based on a select statement and does not store data. A materialized view, on the other hand, is a physical copy of a view that stores data in a table, making it faster to retrieve. -
What is a junk dimension?
Click here to view answer
Answer: A junk dimension is a single table that contains several low-cardinality attributes that are not related to any specific dimension. It is used to reduce the number of dimension tables in a data warehouse and simplify the schema design. -
Please explain in detail data warehousing architecture
Click here to view answer
Answer: The source layer, where data is collected from various sources and stored in staging tables. The integration layer, where data is cleansed, transformed, and combined into a single format for loading into the data warehouse. The storage layer, where the data is stored in a central repository or data warehouse. The access layer, where business intelligence tools and applications can retrieve the data for analysis and reporting. -
What are the different types of integrity constraints?
Click here to view answer
Answer: Entity integrity: ensures that each row in a table has a unique identifier. Referential integrity: ensures that relationships between tables are maintained and that foreign keys reference primary keys. Domain integrity: ensures that data values meet specific constraints, such as data type or format requirements. User-defined integrity: allows users to define their own constraints based on specific business rules or requirements. -
Why do data architects enforce and monitor data compliance standards in data systems?
Click here to view answer
Answer: Data architects enforce and monitor data compliance standards in data systems to ensure that data is accurate, complete, and secure. This helps to maintain data quality and prevent data breaches or other security risks. -
Differentiate between OTLP and OLAP
Click here to view answer
Answer: OLTP (Online Transaction Processing) systems are designed for real-time transactional processing and are typically used for day-to-day business operations. OLAP (Online Analytical Processing) systems are designed for data analysis and reporting and are typically used for business intelligence and decision-making. -
How do you design and implement a data warehouse?
Click here to view answer
Answer: Analyze business requirements and identify data sources. Develop a conceptual data model and logical data model. Design the physical data model and schema. Develop ETL processes to extract, transform, and load data into the warehouse. Implement the database and BI tools. Test and validate the system. Deploy and maintain the system. -
How do you handle data quality issues?
Click here to view answer
Answer: To handle data quality issues, data architects can implement data profiling and data cleansing processes. Data profiling involves analyzing the data to identify any issues or inconsistencies, such as missing or duplicate data, incorrect data types, or invalid values. Data cleansing involves correcting these issues and ensuring that the data is accurate and complete. -
How do you optimise data models for performance?
Click here to view answer
Answer: Normalize the data to reduce redundancy and improve consistency. Denormalize the data to improve query performance and simplify the schema. Partition large tables to improve query performance. Use appropriate indexing to speed up queries. Optimize queries to reduce the amount of data that needs to be processed. -
Describe your familiarity with big data technologies such as Hadoop and Spark
Click here to view answer
Answer: Hadoop is an open-source software framework that is used to store and process big data in a distributed computing environment. It uses a file system called Hadoop Distributed File System (HDFS) to store data across multiple machines and a processing engine called MapReduce to distribute processing tasks across the cluster. Hadoop is particularly useful for handling unstructured and semi-structured data, such as log files, social media data, and sensor data.Spark is another open-source software framework that is designed to handle big data processing tasks in real-time. Spark is built on top of Hadoop and provides a more flexible and efficient processing engine than MapReduce. It supports various data processing tasks, including batch processing, stream processing, machine learning, and graph processing. Spark is particularly useful for handling large datasets that require real-time analysis, such as financial trading data, social media data, and sensor data.
-
How do you go about gathering requirements for a new data project?
Click here to view answer
Answer: Meet with stakeholders to understand their business needs and goals. Identify the data sources and types of data that will be needed. Develop a data model based on the business requirements. Define the data quality and security requirements. Determine the performance and scalability requirements. Establish a project timeline and budget. -
How do you hamdle conflicting priorities when working on multiple projects?
Click here to view answer
Answer: To handle conflicting priorities when working on multiple projects, data architects can prioritize based on business impact, urgency, and feasibility. Communication with stakeholders and project managers can also help to ensure that expectations are clear and aligned. -
Which software and design patterns are you familiar with?
Click here to view answer
Answer: Relational database management systems (RDBMS) such as Oracle and SQL Server. NoSQL databases such as MongoDB and Cassandra. Data integration tools such as Informatica and Talend. Data visualization and business intelligence tools such as Tableau and Power BI. Design patterns such as the star schema and snowflake schema for data warehousing.
Data Engineering
-
What made you choose data engineering and what does it mean to you?
Click here to view answer
Answer: Explaining the reasons behind choosing data engineering... Data engineering means the process of designing, building, maintaining, and managing data architecture and infrastructure. It involves transforming data from different sources into a format that can be easily analyzed, processed, and stored. -
How would you define data engineering?
Click here to view answer
Answer: Data engineering is the process of designing, building, maintaining, and managing data infrastructure that allows organizations to efficiently store, process, and analyze large volumes of data. It involves working with different data sources, transforming data into a format that is suitable for analysis, and ensuring that data is stored securely and accessed efficiently. -
What is the difference between a data architect and a data engineer?
Click here to view answer
Answer: A data architect is responsible for designing the overall data architecture and creating data models, while a data engineer is responsible for implementing the data architecture and building data pipelines. -
What are data engineers responsible for?
Click here to view answer
Answer: Data engineers are responsible for designing and maintaining the data infrastructure, developing and maintaining data pipelines, ensuring data quality, optimizing data storage and retrieval, and ensuring data security. They work closely with data scientists and analysts to ensure that the data is available, reliable, and ready for analysis. -
What is the difference between structured ,semi structured and unstructured data?
Click here to view answer
Answer: Structured data is organized and well-defined, such as data in a relational database. Semi-structured data has some structure, but it is not organized in a rigid schema, such as JSON or XML data. Unstructured data has no predefined structure, such as text documents or images. -
Describe differences between a data warehouse and an operational database
Click here to view answer
Answer: An operational database is designed for transactional processing and supports real-time data updates, while a data warehouse is designed for analytical processing and supports complex queries and reporting. Data warehouses typically store historical data and provide a consolidated view of data from multiple sources, while operational databases store current data and support day-to-day operations. -
How would you increase the revenue of a business using data analytics and big data?
Click here to view answer
Answer: By using data analytics and big data, businesses can identify patterns and trends in customer behavior, optimize pricing and promotions, identify new market opportunities, and improve operational efficiency. These insights can be used to develop new products and services, target marketing campaigns more effectively, and improve customer satisfaction, which can all lead to increased revenue. -
What are the advantages of using skewed tables in hive?
Click here to view answer
Answer: Using skewed tables in Hive can improve query performance by reducing the amount of data that needs to be processed. Skewed tables can be used to optimize queries that have high cardinality columns, where a small number of values dominate the distribution of data. -
Explain the hive data model and its components
Click here to view answer
Answer: Hive is a data warehousing tool that provides a SQL-like interface for querying large datasets stored in Hadoop. The Hive data model includes tables, partitions, buckets, and views. Tables are the basic unit of storage in Hive, and partitions are used to divide tables into smaller, more manageable segments. Buckets are a way of organizing data within partitions to improve query performance, and views are virtual tables that can be used to simplify complex queries. -
Why is using a distributed system important in hadoop?
Click here to view answer
Answer: Using a distributed system in Hadoop is important because it allows for the processing of large amounts of data by dividing the workload across multiple nodes in a cluster. This enables faster processing times and the ability to handle large-scale data processing tasks that would be impossible to do on a single machine. A distributed system also provides fault tolerance and scalability, allowing for easy scaling up or down of resources as needed. -
Name the core features of hadoop
Click here to view answer
Answer: The core features of Hadoop include: Hadoop Distributed File System (HDFS) for distributed storage of data MapReduce for distributed processing of data YARN (Yet Another Resource Negotiator) for cluster resource management Hadoop Common, which provides the libraries and utilities needed for Hadoop to function -
Define hadoop streaming
Click here to view answer
Answer: Hadoop streaming is a utility in Hadoop that allows developers to use scripts or programs written in any language that can read and write to standard input and output as MapReduce jobs. It enables developers to use languages other than Java, the primary language used in Hadoop, for data processing tasks. -
What is data locality?
Click here to view answer
Answer: Data locality is the principle that when processing data in a distributed system, the processing task should be performed on a node where the data is stored or is easily accessible. This reduces network traffic and improves processing performance by minimizing the amount of data that needs to be transferred across the network. -
What does Context object do in Hadoop and why is it important?
Click here to view answer
Answer: The Context object in Hadoop is used to provide information to MapReduce tasks about the state of the job and the environment in which it is running. It allows MapReduce tasks to interact with the job configuration and provides access to counters for tracking the progress of the job. The Context object is important because it provides a mechanism for MapReduce tasks to communicate with each other and with the Hadoop framework. -
Name the three reducer phases in hadoop
Click here to view answer
Answer: The three reducer phases in Hadoop are: Shuffle: where the data is sorted and partitioned before being sent to the reducers Sort: where the data is sorted by key before being passed to the reducer Reduce: where the data is processed and output is written to the Hadoop Distributed File System (HDFS) -
What do args and kwargs commands do?
Click here to view answer
Answer: args and kwargs are used in Python to pass a variable number of arguments to a function. args is used to pass a tuple of positional arguments, while kwargs is used to pass a dictionary of keyword arguments. -
List the differences between tuples and lists
Click here to view answer
Answer: Tuples and lists are both used to store sequences of values in Python, but there are several key differences between them. Tuples are immutable, meaning that they cannot be changed after they are created, while lists are mutable. Tuples are also more memory-efficient than lists and can be used as keys in dictionaries, while lists cannot. Additionally, tuples are often used to represent fixed structures with a specific number of elements, while lists are more flexible and can have any number of elements. -
What are the advantages of working with big data on the cloud?
Click here to view answer
Answer: Working with big data on the cloud offers several advantages, including: Elastic scalability, allowing for easy scaling up or down of resources as needed Reduced hardware and infrastructure costs Increased accessibility and collaboration, with data stored in a centralized location accessible from anywhere with an internet connection Improved security and data backup, with data stored on secure cloud servers and automatic backup and disaster recovery options -
Can you describe what happens when a data block is corrupted?
Click here to view answer
Answer: When a data block is corrupted in Hadoop, the NameNode detects the corruption during a routine block scan and marks the block as corrupted in its metadata. The NameNode then replicates the block from another replica or source, and the corrupted replica is deleted. The DataNode that stored the corrupted block is also notified to delete the corrupted replica. The JobTracker is then notified of the corrupted block, and it reschedules any tasks that were using the corrupted block to use a healthy replica instead. -
How would you explain file permissions in hadoop?
Click here to view answer
Answer: In Hadoop, file permissions are used to control access to files in the Hadoop Distributed File System (HDFS). There are three types of permissions: read, write, and execute, which can be set for the owner of the file, the group the owner belongs to, and all other users. The file owner can set the permissions using the chmod command, and the permissions can be viewed using the ls command with the -l option. Properly setting file permissions is important for ensuring that only authorized users can access or modify sensitive data. -
Which process would you follow to add a node to a cluster?
Click here to view answer
Answer: To add a node to a Hadoop cluster, the following process would be followed:Set up the new node with the same version of Hadoop as the existing nodes in the cluster.
Configure the Hadoop environment variables and networking settings on the new node to match the existing nodes.
Add the new node to the cluster's hosts file and update the DNS settings as needed.
Start the Hadoop services on the new node and ensure that it can communicate with the other nodes in the cluster.
Use the Hadoop command-line tools to add the new node to the cluster's configuration files, such as hdfs-site.xml and mapred-site.xml.
Restart the Hadoop services on the existing nodes in the cluster to ensure that they recognize the new node.
-
Can you list python libraries which can facilitate efficient data processing?
Click here to view answer
Answer: Python has several libraries that can facilitate efficient data processing, including: NumPy: a library for numerical computing that provides fast and efficient operations on arrays and matrices. Pandas: a library for data manipulation and analysis that provides high-performance, easy-to-use data structures and tools for working with tabular data. Matplotlib: a library for data visualization that provides a wide range of plotting tools and styles. Scikit-learn: a library for machine learning that provides tools for data mining and data analysis. Dask: a library for parallel computing that provides an interface for working with large datasets and distributed computing frameworks like Hadoop and Spark. -
What challenges came up during your recent project, and how did you overcome these challenges?
Click here to view answer
Answer: Some common challenges that can arise during a data engineering project include data quality issues, hardware and infrastructure limitations, and scaling issues. To overcome these challenges, data engineers may use techniques like data cleansing and transformation, cloud-based infrastructure, and distributed computing frameworks like Hadoop and Spark. -
Have you ever transformed unstructured data into structured data? and how did you do it?
Click here to view answer
Answer: Transforming unstructured data into structured data is a common task for data engineers. The process typically involves using techniques like text mining, natural language processing, and machine learning to extract useful information from unstructured data sources like text files and log files. Once the data has been extracted and transformed, it can be loaded into a structured data store like a database or data warehouse for further analysis. -
Can you tell me about NameNode? What happens if NameNode crashes or comes to an end?
Click here to view answer
Answer: The NameNode is a key component of the Hadoop Distributed File System (HDFS). It is responsible for managing the metadata of the files stored in the HDFS, including the location and attributes of each block of data. The NameNode maintains a directory tree of all files in the HDFS and keeps track of the location of each block across the cluster.If the NameNode crashes or becomes unavailable, the HDFS becomes read-only until the NameNode is restored or replaced. During this time, no new data can be written to the HDFS, but existing data can still be read. To recover from a NameNode failure, the administrator can either restart the failed NameNode or replace it with a new one. In either case, the new NameNode must be initialized with a backup of the metadata stored in the secondary NameNode, which acts as a hot standby for the primary NameNode. Once the new NameNode is operational, it takes over the responsibility of managing the metadata of the HDFS, and the cluster can resume normal operations.
-
How to achieve security in Hadoop?
Click here to view answer
Answer: Hadoop provides several mechanisms to achieve security in the cluster, including authentication, authorization, and data encryption. Some of the ways to achieve security in Hadoop are:Kerberos authentication: Hadoop can integrate with Kerberos to provide secure authentication for users and services in the cluster. This ensures that only authorized users can access sensitive data and services.
Access control lists (ACLs): Hadoop supports ACLs, which allow administrators to control access to files and directories in the HDFS. ACLs can be used to grant or deny specific permissions to individual users or groups.
Encryption: Hadoop supports data encryption at rest and in transit. Data can be encrypted using industry-standard encryption algorithms like AES, and keys can be managed using key management services like Key Management Service (KMS).
Firewall and network security: Hadoop clusters should be protected by firewalls to prevent unauthorized access from outside the cluster. Network security measures like Virtual Private Network (VPN) can also be used to secure data transmission across the cluster.
Auditing and monitoring: Hadoop provides tools for auditing and monitoring cluster activities, which can help identify security breaches and anomalies in the system.
-
What is FIFO Scheduling?
Click here to view answer
Answer: FIFO scheduling is a scheduling algorithm used in Hadoop that stands for First In, First Out. In this scheduling algorithm, the tasks are executed in the order in which they are submitted to the JobTracker. The first task submitted to the JobTracker is the first one to be executed, followed by the second task, and so on. This scheduling algorithm is simple and easy to implement, but it may not be the most efficient algorithm for all workloads. In particular, it can lead to long wait times for high-priority jobs if lower-priority jobs are submitted first. Other scheduling algorithms like Fair Scheduler and Capacity Scheduler are also available in Hadoop to address these issues.
SQL
-
How can you deal with duplicate data points in an SQL query?
Click here to view answer
Answer: To deal with duplicate data points in an SQL query, you can use the DISTINCT keyword in the SELECT statement. The DISTINCT keyword filters out duplicate values from the result set, returning only unique values. Alternatively, you can use the GROUP BY clause in combination with an aggregate function such as COUNT() or SUM() to group the duplicate values together and perform calculations on them. -
List objects that are created via the CREATE statement in SQL
Click here to view answer
Answer: Objects that can be created using the CREATE statement in SQL include tables, views, indexes, sequences, procedures, functions, and triggers. -
How would you see the database structure in SQL?
Click here to view answer
Answer: In SQL, you can use the DESCRIBE statement to see the structure of a specific table, which shows the names, data types, and constraints of the columns in the table. Additionally, you can use the SHOW TABLES statement to see a list of all tables in the current database. -
How would you search for a specific string in a column?
Click here to view answer
Answer: To search for a specific string in a column, you can use the LIKE operator in a SELECT statement. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column. For example, the following SQL query selects all rows from the "customers" table where the "contact_name" column contains the string "John":SELECT *
FROM customers
WHERE contact_name LIKE '%John%';
-
What are the differences between DDL, DML and DCL?
Click here to view answer
DDL (Data Definition Language) is used to define the database schema and includes commands such as CREATE, ALTER, and DROP. DML (Data Manipulation Language) is used to manipulate the data in the database and includes commands such as SELECT, INSERT, UPDATE, and DELETE. DCL (Data Control Language) is used to manage user access permissions and includes commands such as GRANT and REVOKE. -
Difference between SQL and MySQL?
Click here to view answer
SQL (Structured Query Language) is a programming language used to manage relational databases. MySQL is a popular open-source relational database management system that uses SQL as its standard programming language. -
How is a RDBMS different to a DBMS?
Click here to view answer
An RDBMS (Relational Database Management System) is a type of DBMS (Database Management System) that organizes data into one or more tables with a predefined structure, and uses relationships between tables to store and retrieve data. A DBMS, on the other hand, can be any system that manages data, including hierarchical, network, and object-oriented databases. -
What is a self join? name other join commands
Click here to view answer
A self join is a join where a table is joined with itself. It is useful when a table contains a foreign key that references its own primary key. Other join commands include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN. -
What is the SELECT statement?
Click here to view answer
The SELECT statement is used to retrieve data from one or more tables in a database. It is the most commonly used SQL statement and is used to retrieve specific columns and/or rows that match a specified condition. -
What are the CRUD commands?
Click here to view answer
CRUD (Create, Read, Update, Delete) commands are used to manipulate data in a database. CREATE is used to create new records, READ is used to retrieve records, UPDATE is used to modify existing records, and DELETE is used to delete records. -
What are UNION, MINUS and INTERSECT commands?
Click here to view answer
UNION, MINUS, and INTERSECT are set operators that allow combining, subtracting, and intersecting results from two or more SELECT statements in SQL.UNION: returns the combined results of two or more SELECT statements, removing duplicates. The number of columns and their data types must be the same in all SELECT statements.
MINUS (also known as EXCEPT in some versions of SQL): returns the rows from the first SELECT statement that are not present in the second SELECT statement.
INTERSECT: returns only the rows that are common to both SELECT statements.
These set operators can be useful in situations where you need to combine, subtract or compare data from multiple tables or queries.
-
How would you load data into tables using SQL?
Click here to view answer
Data can be loaded into tables using SQL by using the "INSERT" statement. This statement is used to add one or more records to a table. The syntax for inserting data into a table is as follows:INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
In this syntax, "table_name" is the name of the table where data needs to be inserted, and "column1, column2, column3, ..." are the names of the columns where data needs to be inserted. "value1, value2, value3, ..." are the values that need to be inserted in the respective columns.
-
What is PostgreSQL?
Click here to view answer
PostgreSQL is a powerful open-source relational database management system (RDBMS) that supports a wide range of features including transactions, sub-selects, triggers, views, and foreign keys. It is designed to be highly scalable, and can handle large amounts of data and concurrent users. PostgreSQL is known for its stability, reliability, and data integrity. -
Explain the character manipulation functions in SQL
Click here to view answer
Character manipulation functions in SQL are used to modify strings in a database. These functions include:CONCAT: Used to concatenate two or more strings.
UPPER and LOWER: Used to convert a string to uppercase or lowercase respectively.
SUBSTRING: Used to extract a substring from a given string.
LENGTH: Used to get the length of a string.
TRIM: Used to remove leading or trailing spaces from a string.
REPLACE: Used to replace a substring within a string with a new substring.
-
What is the difference between RANK and DENSE_RANK() functions?
Click here to view answer
Character manipulation functions in SQL are used to modify strings in a database. These functions include:The RANK() and DENSE_RANK() functions are used to assign a rank to each row in a result set based on the values in one or more columns. The main difference between the two functions is how they handle ties.
The RANK() function assigns the same rank to tied rows, and leaves gaps in the ranking sequence. For example, if two rows have the same value and are assigned rank 2, the next row will be assigned rank 4.
The DENSE_RANK() function, on the other hand, assigns the same rank to tied rows, but does not leave gaps in the ranking sequence. For example, if two rows have the same value and are assigned rank 2, the next row will be assigned rank 3.
-
What are tables and fields?
Click here to view answer
Tables are the fundamental objects in a database, where data is stored in rows and columns. A table is made up of one or more columns, which define the type of data that can be stored in each row of the table. Each column has a name and a data type.Fields, also known as columns, are the individual pieces of data stored in a table. Each field has a name, a data type, and a value.
-
What is a schema in a SQL server?
Click here to view answer
A schema in a SQL server is a logical container for database objects, such as tables, views, indexes, and stored procedures. It is a way to organize database objects into groups and assign permissions to those groups. A schema can be used to separate database objects into logical categories, and can be used to isolate different applications or users from each other. -
How would you create a table with 4 columns?
Click here to view answer
To create a table with 4 columns, you would use the "CREATE TABLE" statement, with the column names and data types specified as follows:CREATE TABLE my_table (
column_1 datatype,
column_2 datatype,
column_3 datatype,
column_4 datatype
);
-
What is a CASE statment?
Click here to view answer
A CASE statement is a control flow structure in SQL that allows you to add conditional logic to a query. It evaluates a set of conditions and returns a value based on the first true condition. The basic syntax of a CASE statement is:CASE
WHEN condition_1 THEN result_1
WHEN condition_2 THEN result_2
...
ELSE result_n
END
-
Summarise differences between SQL and NoSQL
Click here to view answer
SQL (Structured Query Language) and NoSQL (Not Only SQL) are two types of database management systems that differ in their data model, storage architecture, and query language.SQL databases are relational databases that store data in tables with fixed schemas. They use SQL as the query language to perform operations like CRUD (Create, Read, Update, Delete) on the data. SQL databases are known for their ability to handle complex queries and transactions, and for their strong consistency and ACID (Atomicity, Consistency, Isolation, Durability) properties. Examples of SQL databases include MySQL, Oracle, and PostgreSQL.
On the other hand, NoSQL databases are non-relational databases that store data in flexible schemas, such as key-value pairs, documents, or graphs. They typically use a query language other than SQL, such as JSON or MongoDB's query language, to retrieve and manipulate data. NoSQL databases are known for their scalability, high availability, and ability to handle unstructured and semi-structured data. However, they may sacrifice consistency and durability for performance and scalability. Examples of NoSQL databases include MongoDB, Cassandra, and Couchbase.
-
Difference between NOW() and CURRENT_DATE()
-
What is a BLOB and TEXT in MySQL?
-
How do you remove duplicate rows in SQL?
-
How do you create a stored in procedure in SQL?
-
What is the difference between CHAR, VARCHAR datatypes in SQL?
-
What are constraints in SQL?
-
Differences in DELETE and TRUNCATE statements?
-
What is data integrity?
-
What do you understand by query optimisation?
-
What are entities and relationships?
-
Name some aggregate functions which are commonly used in SQL
-
What are the syntax and use of the COALESCE function?
-
What is the ACID property in a database?
-
What is a βTriggerβ in SQL?
-
What is a subquery in SQL?
-
What is a CLAUSE in SQL?
-
What is the need for a MERGE statement?
-
How can you fetch common records from two tables?
-
What are aggregate and scalar functions?
-
What are Views used for?
-
What are Local and Global variables?
ETL & Data Pipelines & more
- How would I go about troubleshooting pipelines?
Machine Learning
Cyber Security & Info Security
Interpersonal & behavioural questions
References:
- intellipaat.com
- Add more links to any references.
Charity
- Since this is a community-based project and it is run by the community - we (the creators) do not gain any personal nor financial gain other than helping the community. Instead, any financial gain would be better suited to a charity. So we wanted to take it upon us to help those in need. We, as a community, have chosen a certified charity to donate to. Here is the link to donate to:
- To ADD charity donation links here.
License
This project is licensed under the Apache License. See the LICENSE file for more details.