Text processing is at the heart of Linux system administration. Whether you're analyzing log files, parsing configuration files, or extracting specific data from command outputs, you need powerful tools to manipulate text efficiently. In this comprehensive guide, we'll master three essential commands that work beautifully together: cut, sort, and uniq.
🎯 What You'll Learn:
- Extract specific columns and fields with
cut - Parse delimited data (CSV, TSV, colon-separated files)
- Sort text alphabetically and numerically with
sort - Remove duplicate lines efficiently with
uniq - Combine all three commands in powerful pipelines
- Analyze real-world files like /etc/passwd and system logs
- Build practical text processing workflows
- Master field extraction and data cleaning techniques
Series: LFCS Certification - Phase 1 (Post 32 of 52)
Prerequisite: Post 31 (grep command) recommended
Why These Commands Matter for LFCS
As a Linux system administrator, you'll constantly work with structured text:
- Extracting data: Get usernames from /etc/passwd, IP addresses from logs
- Analyzing logs: Find most common errors, count occurrences
- Processing CSV files: Extract specific columns from reports
- Cleaning data: Remove duplicates from lists
- System auditing: Sort users by UID, find duplicate processes
The commands cut, sort, and uniq form a powerful trio that you'll use daily. They're essential for the LFCS exam and real-world system administration.
Understanding Text Processing Pipelines
Before diving into individual commands, let's understand how they work together:
Text Processing Pipeline
How cut, sort, and uniq work together
Example pipeline:
cut -d: -f1 /etc/passwd | sort | uniq
This extracts usernames, sorts them alphabetically, and removes any duplicates.
The cut Command: Extracting Fields
The cut command extracts specific portions of each line from a file or input stream.
Basic cut Syntax
cut OPTIONS FILE
Common options:
-f- Select fields (columns)-d- Specify field delimiter-c- Select characters-b- Select bytes
Extracting Fields with -f and -d
The most common use of cut is extracting specific fields from delimited data.
Understanding Delimiters
A delimiter is a character that separates fields:
- Colon (
:) - Used in /etc/passwd, /etc/group - Tab - Default delimiter for cut
- Comma (
,) - Used in CSV files - Space - Common in many files
- Pipe (
|) - Sometimes used in data files
Example: Extracting from /etc/passwd
The /etc/passwd file uses colons (:) as delimiters:
head -3 /etc/passwd
Output:
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
Field structure:
- Username
- Password placeholder (x)
- UID (User ID)
- GID (Group ID)
- GECOS (Full name/description)
- Home directory
- Shell
Extract First Field (Usernames)
cut -d: -f1 /etc/passwd | head -5
Breakdown:
cut- The command-d:- Use colon as delimiter-f1- Select field 1 (first column)/etc/passwd- Input file| head -5- Show only first 5 lines
Output:
root
bin
daemon
adm
lp
What happened:
- cut read each line
- Split line on
:delimiter - Extracted field 1 (username)
- Printed only that field
Extract Multiple Fields
You can extract multiple fields using commas:
cut -d: -f1,3,6 /etc/passwd | head -5
Output:
root:0:/root
bin:1:/bin
daemon:2:/sbin
adm:3:/var/adm
lp:4:/var/spool/lpd
This extracts:
- Field 1: Username
- Field 3: UID
- Field 6: Home directory
Extract Field Ranges
Use hyphens to specify ranges:
cut -d: -f1-3 /etc/passwd | head -3
Output:
root:x:0
bin:x:1
daemon:x:2
This extracts fields 1 through 3 (username, password placeholder, UID).
More range examples:
cut -d: -f1-3,6 /etc/passwd # Fields 1-3 and 6
cut -d: -f3- /etc/passwd # Field 3 to end of line
cut -d: -f-4 /etc/passwd # Fields 1 through 4
Extract Last Field
To get the last field (shell):
cut -d: -f7 /etc/passwd | head -5
Output:
/bin/bash
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin
Character-Based Extraction with -c
Sometimes you need to extract specific character positions rather than fields.
Extract Specific Characters
echo "Hello World" | cut -c1-5
Output:
Hello
This extracts characters 1 through 5.
Character Position Examples
# First character
echo "Linux" | cut -c1
# Output: L
# Last 3 characters (positions 3-5)
echo "Linux" | cut -c3-5
# Output: nux
# Characters 1, 3, and 5
echo "Linux" | cut -c1,3,5
# Output: Lnx
# From character 3 to end
echo "Linux" | cut -c3-
# Output: nux
Real-World Example: Extract Date from ls Output
ls -l /etc/passwd
Output:
-rw-r--r--. 1 root root 2584 Nov 15 10:23 /etc/passwd
Extract just the date portion (characters 42-53):
ls -l /etc/passwd | cut -c42-53
Output:
Nov 15 10:23
Byte-Based Extraction with -b
For files with multi-byte characters (UTF-8), use -b:
cut -b1-10 filename.txt
Difference from -c:
-ccounts characters (may be multi-byte in UTF-8)-bcounts bytes (always single byte)
For ASCII text, -c and -b are identical. For international characters, they differ.
The sort Command: Organizing Data
The sort command sorts lines of text alphabetically or numerically.
Basic sort Syntax
sort OPTIONS FILE
Common options:
-n- Numeric sort (treats numbers correctly)-r- Reverse order-k- Sort by specific field-t- Specify field delimiter-u- Unique (remove duplicates while sorting)-h- Human-numeric sort (1K, 2M, 3G)-V- Version sort (handles version numbers correctly)
Alphabetical Sorting (Default)
By default, sort arranges lines alphabetically:
cat << EOF > fruits.txt
banana
apple
cherry
date
EOF
sort fruits.txt
Output:
apple
banana
cherry
date
Sorted alphabetically (A-Z).
Case-Sensitive Sorting
Uppercase letters come before lowercase in ASCII:
cat << EOF > mixed.txt
Zebra
apple
Banana
EOF
sort mixed.txt
Output:
Banana
Zebra
apple
Why? In ASCII, uppercase A-Z (65-90) comes before lowercase a-z (97-122).
Case-insensitive sort:
sort -f mixed.txt # -f = fold case (ignore case)
Output:
apple
Banana
Zebra
Numeric Sorting with -n
Problem: Alphabetical sort treats numbers as text:
cat << EOF > numbers.txt
100
2
30
1
EOF
sort numbers.txt
Output (WRONG):
1
100
2
30
Why wrong? Alphabetically, "100" starts with "1", so it comes before "2".
Solution: Numeric Sort
sort -n numbers.txt
Output (CORRECT):
1
2
30
100
Now sorted numerically!
Reverse Sorting with -r
Reverse the sort order:
sort -r fruits.txt
Output:
date
cherry
banana
apple
Combine with numeric:
sort -nr numbers.txt
Output:
100
30
2
1
Sorted numerically in reverse (largest first).
Sorting by Specific Field with -k
You can sort based on a specific column/field.
Example: Sort /etc/passwd by UID
sort -t: -k3 -n /etc/passwd | head -5
Breakdown:
sort- The command-t:- Use colon as field delimiter-k3- Sort by field 3 (UID)-n- Numeric sort/etc/passwd- Input file
Output:
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
Users sorted by UID (0, 1, 2, 3, 4...).
Sort by Multiple Fields
# Sort by GID (field 4), then by UID (field 3)
sort -t: -k4 -k3 -n /etc/passwd
This sorts by field 4 first, then by field 3 for ties.
Sort by Last Field
To sort by the last field (shell):
sort -t: -k7 /etc/passwd | head -5
Output (sorted by shell path):
root:x:0:0:root:/root:/bin/bash
centos9:x:1000:1000::/home/centos9:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
Unique Sort with -u
Remove duplicates while sorting:
cat << EOF > duplicates.txt
apple
banana
apple
cherry
banana
EOF
sort -u duplicates.txt
Output:
apple
banana
cherry
Duplicates removed, output sorted.
This is equivalent to sort | uniq but more efficient.
Human-Numeric Sort with -h
When sorting file sizes or numbers with suffixes:
cat << EOF > sizes.txt
1K
2M
500K
1G
100M
EOF
sort -h sizes.txt
Output:
1K
500K
2M
100M
1G
Correctly sorted by size (K < M < G).
Without -h, it would sort alphabetically (wrong).
Version Sort with -V
For version numbers:
cat << EOF > versions.txt
version-1.10
version-1.2
version-1.1
version-2.0
EOF
sort -V versions.txt
Output:
version-1.1
version-1.2
version-1.10
version-2.0
Correctly handles version numbering!
Without -V:
version-1.1
version-1.10 # Wrong! 10 comes before 2 alphabetically
version-1.2
version-2.0
The uniq Command: Removing Duplicates
The uniq command removes consecutive duplicate lines.
⚠️ CRITICAL: uniq only removes consecutive duplicates. You MUST sort first!
Wrong:
uniq unsorted_file.txt # Won't work correctly!
Correct:
sort unsorted_file.txt | uniq # Sort first, then remove duplicates
Basic uniq Usage
cat << EOF > repeated.txt
apple
apple
banana
cherry
cherry
cherry
banana
EOF
uniq repeated.txt
Output:
apple
banana
cherry
banana
Notice: The second "banana" is still there because it's not consecutive with the first.
Proper Usage: Sort First
sort repeated.txt | uniq
Output:
apple
banana
cherry
Now all duplicates are removed!
Counting Occurrences with -c
Count how many times each line appears:
sort repeated.txt | uniq -c
Output:
2 apple
2 banana
3 cherry
Interpretation:
- apple appears 2 times
- banana appears 2 times
- cherry appears 3 times
The count is left-padded with spaces for alignment.
Sort by Count
Find most common items:
sort repeated.txt | uniq -c | sort -nr
Output:
3 cherry
2 banana
2 apple
Pipeline breakdown:
sort- Sort the fileuniq -c- Count occurrencessort -nr- Sort numerically, reverse (highest first)
This shows "cherry" is most common (3 occurrences).
Show Only Duplicates with -d
Show only lines that appear more than once:
sort repeated.txt | uniq -d
Output:
apple
banana
cherry
All three items appear multiple times.
Show Only Unique Lines with -u
Show only lines that appear exactly once:
cat << EOF > mixed.txt
apple
banana
apple
cherry
date
EOF
sort mixed.txt | uniq -u
Output:
banana
cherry
date
Only items that appear once.
Case-Insensitive Comparison with -i
Ignore case when comparing:
cat << EOF > case_test.txt
Apple
apple
APPLE
Banana
EOF
sort case_test.txt | uniq -i
Output:
Apple
Banana
All variations of "apple" treated as same (first occurrence kept).
Combining cut, sort, and uniq
Now let's see the real power: combining all three commands.
Example 1: List All Unique Shells
Goal: Get a unique list of all shells used on the system.
cut -d: -f7 /etc/passwd | sort | uniq
Output:
/bin/bash
/bin/sync
/sbin/halt
/sbin/nologin
/sbin/shutdown
Pipeline breakdown:
cut -d: -f7 /etc/passwd- Extract shell (field 7)sort- Sort the shellsuniq- Remove duplicates
Example 2: Count Users Per Shell
Goal: How many users use each shell?
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -nr
Output:
18 /sbin/nologin
2 /bin/bash
1 /sbin/shutdown
1 /sbin/halt
1 /bin/sync
Interpretation:
- 18 users have /sbin/nologin (system accounts)
- 2 users have /bin/bash (real users)
- 1 user each for system special accounts
Example 3: Find Duplicate UIDs
Goal: Check if any UIDs are used by multiple users (security issue).
cut -d: -f3 /etc/passwd | sort -n | uniq -d
Output:
(empty if no duplicates)
If you see output: You have duplicate UIDs (security problem!).
Example 4: Extract and Count Unique IP Addresses from Log
Assuming a log file with IP addresses:
cat << EOF > access.log
192.168.1.100 - GET /index.html
192.168.1.101 - GET /about.html
192.168.1.100 - GET /contact.html
192.168.1.102 - GET /index.html
192.168.1.100 - GET /services.html
EOF
cut -d' ' -f1 access.log | sort | uniq -c | sort -nr
Output:
3 192.168.1.100
1 192.168.1.102
1 192.168.1.101
192.168.1.100 accessed the site 3 times.
Example 5: List Home Directory Types
Goal: What types of home directories exist?
cut -d: -f6 /etc/passwd | cut -d/ -f2 | sort | uniq -c
Output:
1 bin
1 boot
18 home
1 root
7 sbin
3 var
Pipeline breakdown:
cut -d: -f6 /etc/passwd- Extract home directorycut -d/ -f2- Get first directory after /sort | uniq -c- Count unique directories
Real-World Text Processing Scenarios
Scenario 1: Find Most Common Error in Logs
# Extract ERROR lines, get error type, count occurrences
grep ERROR /var/log/messages | cut -d' ' -f5- | sort | uniq -c | sort -nr | head -10
This shows top 10 most common errors.
Scenario 2: List All Users with Bash Shell
grep '/bin/bash$' /etc/passwd | cut -d: -f1 | sort
Output:
centos9
root
Scenario 3: Parse CSV File
Given a CSV file:
Name,Age,City
Alice,30,NYC
Bob,25,LA
Charlie,30,NYC
Extract city column and count:
cut -d, -f3 data.csv | tail -n +2 | sort | uniq -c
Output:
1 LA
2 NYC
Pipeline:
cut -d, -f3- Extract city (field 3)tail -n +2- Skip header linesort | uniq -c- Count cities
Scenario 4: Find Users with Same Home Directory
cut -d: -f6 /etc/passwd | sort | uniq -d
If output appears: Multiple users share the same home directory.
Scenario 5: Extract Domain from Email Addresses
cat << EOF > emails.txt
user1@example.com
user2@gmail.com
user3@example.com
user4@yahoo.com
EOF
cut -d@ -f2 emails.txt | sort | uniq -c | sort -nr
Output:
2 example.com
1 yahoo.com
1 gmail.com
Advanced Techniques
Multiple Field Extraction
Extract username and shell, create custom format:
cut -d: -f1,7 /etc/passwd | head -5
Output:
root:/bin/bash
bin:/sbin/nologin
daemon:/sbin/nologin
adm:/sbin/nologin
lp:/sbin/nologin
Using Different Output Delimiter
By default, cut uses the same delimiter for output. To change it, use tr:
cut -d: -f1,7 /etc/passwd | tr ':' ' ' | head -3
Output:
root /bin/bash
bin /sbin/nologin
daemon /sbin/nologin
Now space-separated instead of colon-separated.
Sorting by Multiple Criteria
Sort by shell, then by username:
sort -t: -k7,7 -k1,1 /etc/passwd | head -5
This sorts:
- Primary: Field 7 (shell)
- Secondary: Field 1 (username) for ties
Case-Insensitive Sort and Unique
cat << EOF > names.txt
Alice
bob
ALICE
Charlie
Bob
EOF
sort -f names.txt | uniq -i
Output:
Alice
bob
Charlie
All case variations of "Alice" and "bob" removed.
Common Patterns and Idioms
Pattern 1: Count Unique Items
command | sort | uniq | wc -l
Counts number of unique items.
Pattern 2: Most Frequent Items
command | sort | uniq -c | sort -nr | head -10
Shows top 10 most frequent items.
Pattern 3: Find Duplicates Only
command | sort | uniq -d
Shows only items that appear more than once.
Pattern 4: Extract Field from Delimited File
cut -d'DELIMITER' -fN filename | sort | uniq
Replace DELIMITER and N with your values.
Pattern 5: Remove Blank Lines
sort file.txt | uniq | grep -v '^$'
Removes empty lines from sorted, unique output.
Quick Reference Tables
cut Options
| Option | Purpose | Example |
|---|---|---|
-f N | Select field N | cut -d: -f1 |
-d DELIM | Set delimiter | cut -d, -f2 |
-c N-M | Select characters N to M | cut -c1-5 |
-b N-M | Select bytes N to M | cut -b1-10 |
-f1,3,5 | Select multiple fields | cut -d: -f1,3,5 |
-f1-3 | Select field range | cut -d: -f1-3 |
sort Options
| Option | Purpose | Example |
|---|---|---|
-n | Numeric sort | sort -n numbers.txt |
-r | Reverse order | sort -r file.txt |
-k N | Sort by field N | sort -t: -k3 -n |
-t DELIM | Set field delimiter | sort -t: -k3 |
-u | Unique (remove duplicates) | sort -u file.txt |
-f | Ignore case | sort -f file.txt |
-h | Human-numeric sort | sort -h sizes.txt |
-V | Version sort | sort -V versions.txt |
uniq Options
| Option | Purpose | Example |
|---|---|---|
-c | Count occurrences | uniq -c |
-d | Show only duplicates | uniq -d |
-u | Show only unique lines | uniq -u |
-i | Ignore case | uniq -i |
-f N | Skip first N fields | uniq -f 2 |
🧪 Practice Labs
Let's apply what you've learned with comprehensive hands-on practice.
Lab 1: Basic Field Extraction (Beginner)
Task: Extract all usernames from /etc/passwd and display them in alphabetical order.
Show Solution
# Extract usernames (field 1) and sort
cut -d: -f1 /etc/passwd | sort
Expected output: Alphabetically sorted list of all usernames.
Explanation:
cut -d: -f1extracts first field (username)sortarranges alphabetically
Lab 2: Multiple Field Extraction (Beginner)
Task: Display username, UID, and home directory for all users. Format: username:uid:home
Show Solution
# Extract fields 1, 3, and 6
cut -d: -f1,3,6 /etc/passwd
Expected output:
root:0:/root
bin:1:/bin
daemon:2:/sbin
...
Explanation:
-f1,3,6extracts username (1), UID (3), and home (6)- Fields separated by colon (original delimiter preserved)
Lab 3: Character Extraction (Beginner)
Task: Extract the first 3 characters from each username in /etc/passwd.
Show Solution
# Extract usernames, then first 3 characters
cut -d: -f1 /etc/passwd | cut -c1-3
Expected output:
roo
bin
dae
adm
...
Explanation:
- First
cutextracts username - Second
cut -c1-3gets characters 1 through 3
Lab 4: Numeric Sort (Beginner)
Task: List all UIDs from /etc/passwd sorted numerically.
Show Solution
# Extract UID field and sort numerically
cut -d: -f3 /etc/passwd | sort -n
Expected output:
0
1
2
3
4
...
Explanation:
cut -d: -f3extracts UID (field 3)sort -nsorts numerically (not alphabetically)
Lab 5: Reverse Sort (Beginner)
Task: Display all shells from /etc/passwd in reverse alphabetical order.
Show Solution
# Extract shells and sort in reverse
cut -d: -f7 /etc/passwd | sort -r
Expected output:
/sbin/shutdown
/sbin/nologin
/sbin/nologin
...
/bin/bash
/bin/bash
Explanation:
cut -d: -f7extracts shell (field 7)sort -rsorts in reverse alphabetical order
Lab 6: Remove Duplicates (Beginner)
Task: Get a list of unique shells used on the system.
Show Solution
# Extract shells, sort, remove duplicates
cut -d: -f7 /etc/passwd | sort | uniq
Expected output:
/bin/bash
/bin/sync
/sbin/halt
/sbin/nologin
/sbin/shutdown
Explanation:
sortis required beforeuniq(uniq only removes consecutive duplicates)- Result is unique list of shells
Lab 7: Count Occurrences (Intermediate)
Task: Count how many users use each shell.
Show Solution
# Extract shells, sort, count with uniq
cut -d: -f7 /etc/passwd | sort | uniq -c
Expected output:
2 /bin/bash
1 /bin/sync
1 /sbin/halt
18 /sbin/nologin
1 /sbin/shutdown
Explanation:
uniq -cadds count before each line- Shows how many users have each shell
Lab 8: Most Common Item (Intermediate)
Task: Find which shell is used by the most users.
Show Solution
# Extract, sort, count, sort by count (descending)
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -nr | head -1
Expected output:
18 /sbin/nologin
Explanation:
uniq -ccounts occurrencessort -nrsorts numerically, reverse (highest first)head -1shows only top result/sbin/nologinis most common (system accounts)
Lab 9: Sort by Field (Intermediate)
Task: Display all users sorted by their UID (lowest to highest).
Show Solution
# Sort /etc/passwd by field 3 (UID) numerically
sort -t: -k3 -n /etc/passwd
Expected output:
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
...
Explanation:
-t:sets delimiter to colon-k3sorts by field 3 (UID)-nnumeric sort (0 < 1 < 2, not alphabetical)
Lab 10: Field Range Extraction (Intermediate)
Task: Extract username, UID, GID, and home directory (fields 1, 3, 4, 6) from /etc/passwd.
Show Solution
# Extract multiple fields
cut -d: -f1,3,4,6 /etc/passwd | head -5
Expected output:
root:0:0:/root
bin:1:1:/bin
daemon:2:2:/sbin
adm:3:4:/var/adm
lp:4:7:/var/spool/lpd
Explanation:
-f1,3,4,6extracts specified fields only- Output uses same delimiter (colon)
Lab 11: Find Duplicate UIDs (Intermediate)
Task: Check if any UID is used by multiple users (security issue).
Show Solution
# Extract UIDs, sort, show only duplicates
cut -d: -f3 /etc/passwd | sort -n | uniq -d
Expected output:
(empty if no duplicates - which is good!)
Explanation:
cut -d: -f3extracts UIDssort -nsorts numericallyuniq -dshows only duplicated values- Empty output = no duplicate UIDs (secure system)
Lab 12: Users with Specific Shell (Intermediate)
Task: List usernames of all users who have /bin/bash as their shell.
Show Solution
# Method 1: Using grep and cut
grep '/bin/bash$' /etc/passwd | cut -d: -f1
# Method 2: Using awk (if you know it)
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd
Expected output:
root
centos9
Explanation (Method 1):
grep '/bin/bash$'finds lines ending with /bin/bashcut -d: -f1extracts username from matching lines
Lab 13: Count Users by Home Directory Prefix (Advanced)
Task: Count how many users have home directories under each top-level directory (/home, /root, /var, etc.).
Show Solution
# Extract home dir, get first directory, count
cut -d: -f6 /etc/passwd | cut -d/ -f2 | sort | uniq -c | sort -nr
Expected output:
18 home
7 sbin
3 var
1 root
1 boot
1 bin
Explanation:
- First
cut -d: -f6extracts home directory path - Second
cut -d/ -f2extracts first directory after / sort | uniq -ccounts occurrencessort -nrshows most common first
Lab 14: Extract and Process CSV Data (Advanced)
Task: Create a CSV file with user data and extract specific columns.
Show Solution
# Create sample CSV file
cat << 'EOF' > users.csv
Name,Age,City,Department
Alice,30,NYC,Engineering
Bob,25,LA,Sales
Charlie,30,NYC,Engineering
David,35,Chicago,Marketing
Alice,30,Boston,Sales
EOF
# Extract names and departments (fields 1 and 4)
cut -d, -f1,4 users.csv
# Count unique departments
cut -d, -f4 users.csv | tail -n +2 | sort | uniq -c
Expected output (extraction):
Name,Department
Alice,Engineering
Bob,Sales
Charlie,Engineering
David,Marketing
Alice,Sales
Expected output (count):
2 Engineering
1 Marketing
2 Sales
Explanation:
cut -d, -f1,4extracts name and department (comma delimiter)tail -n +2skips header linesort | uniq -ccounts unique departments
Lab 15: Find Users with Duplicate Home Directories (Advanced)
Task: Identify if multiple users share the same home directory.
Show Solution
# Extract home directories, find duplicates
cut -d: -f6 /etc/passwd | sort | uniq -d
Expected output:
(empty if no shared home directories)
If output shows directories: Multiple users share those directories (unusual, possible issue).
Explanation:
cut -d: -f6extracts home directorysort | uniq -dshows only duplicated directories
Lab 16: Extract Specific Character Positions (Advanced)
Task: From the output of ls -l, extract only the permissions (characters 1-10).
Show Solution
# List files, extract permission string
ls -l /etc | tail -n +2 | cut -c1-10
Expected output:
drwxr-xr-x
drwxr-xr-x
-rw-r--r--
-rw-r--r--
...
Explanation:
ls -l /etclong listingtail -n +2skip "total" linecut -c1-10extract first 10 characters (permissions)
Lab 17: Sort by Multiple Fields (Advanced)
Task: Sort /etc/passwd first by shell (field 7), then by username (field 1) within each shell group.
Show Solution
# Sort by shell (primary), then username (secondary)
sort -t: -k7,7 -k1,1 /etc/passwd | head -10
Expected output:
centos9:x:1000:1000::/home/centos9:/bin/bash
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
halt:x:7:0:halt:/sbin:/sbin/halt
...
Explanation:
-k7,7primary sort by field 7 (shell)-k1,1secondary sort by field 1 (username)- Users with same shell are alphabetically sorted
Lab 18: Version Number Sorting (Advanced)
Task: Create a list of version numbers and sort them correctly.
Show Solution
# Create version list
cat << 'EOF' > versions.txt
app-1.10.0
app-1.2.0
app-1.9.5
app-2.0.0
app-1.1.0
EOF
# Sort with version sort
sort -V versions.txt
Expected output:
app-1.1.0
app-1.2.0
app-1.9.5
app-1.10.0
app-2.0.0
Explanation:
sort -Vhandles version numbers correctly- Without
-V, "1.10.0" would come before "1.2.0" (alphabetically) -Vunderstands semantic versioning
Lab 19: Complex Pipeline - Log Analysis (Advanced)
Task: Analyze a log file to find the 5 most frequent error messages.
Show Solution
# Create sample log file
cat << 'EOF' > server.log
2025-12-09 10:15:23 INFO Server started
2025-12-09 10:15:45 ERROR Connection timeout
2025-12-09 10:16:12 ERROR Connection timeout
2025-12-09 10:17:33 WARNING Low memory
2025-12-09 10:18:21 ERROR Database unavailable
2025-12-09 10:19:15 ERROR Connection timeout
2025-12-09 10:20:44 ERROR Database unavailable
2025-12-09 10:21:08 INFO Request completed
EOF
# Extract errors, count, show top 5
grep ERROR server.log | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -5
Expected output:
3 Connection timeout
2 Database unavailable
Explanation:
grep ERRORfilters error lines onlycut -d' ' -f4-extracts message (from field 4 to end)sort | uniq -ccounts occurrencessort -nrsorts by count (highest first)head -5shows top 5
Lab 20: Real-World System Audit (Advanced)
Task: Create a comprehensive report showing: number of users, shells used, UID ranges, and potential security issues.
Show Solution
# Complete system user audit script
echo "=== System User Audit Report ==="
echo ""
echo "Total Users:"
wc -l /etc/passwd
echo ""
echo "Shell Distribution:"
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -nr
echo ""
echo "UID Range:"
echo "Lowest UID: $(cut -d: -f3 /etc/passwd | sort -n | head -1)"
echo "Highest UID: $(cut -d: -f3 /etc/passwd | sort -n | tail -1)"
echo ""
echo "Users with Bash Shell:"
grep '/bin/bash$' /etc/passwd | cut -d: -f1 | sort
echo ""
echo "Checking for Duplicate UIDs:"
DUPES=$(cut -d: -f3 /etc/passwd | sort -n | uniq -d)
if [ -z "$DUPES" ]; then
echo "No duplicate UIDs found (GOOD)"
else
echo "WARNING: Duplicate UIDs found: $DUPES"
fi
echo ""
echo "Top 5 Home Directory Locations:"
cut -d: -f6 /etc/passwd | cut -d/ -f2 | sort | uniq -c | sort -nr | head -5
Expected output: A comprehensive formatted report with all system user statistics.
Explanation: This combines all techniques learned:
- Field extraction with
cut - Sorting with
sort(numeric and alphabetical) - Duplicate detection with
uniq - Counting and analysis
- Conditional logic for security checks
📚 Best Practices
1. Always Sort Before uniq
Golden Rule: uniq only removes consecutive duplicates.
Wrong:
uniq unsorted.txt # Misses non-consecutive duplicates!
Correct:
sort unsorted.txt | uniq # Always sort first
2. Use Numeric Sort for Numbers
# Wrong (alphabetical)
sort numbers.txt # 1, 10, 100, 2, 20...
# Correct (numeric)
sort -n numbers.txt # 1, 2, 10, 20, 100...
3. Specify Delimiters Explicitly
# Assume colon-delimited file
cut -d: -f1 data.txt # Explicit, clear
cut -f1 data.txt # Uses tab (default), might fail
4. Combine Commands in Efficient Pipelines
# Less efficient (multiple reads)
cut -d: -f1 /etc/passwd > users.txt
sort users.txt > sorted_users.txt
uniq sorted_users.txt
# More efficient (single pipeline)
cut -d: -f1 /etc/passwd | sort | uniq
5. Use sort -u Instead of sort | uniq
# Good
sort file.txt | uniq
# Better (more efficient)
sort -u file.txt
Both produce same result, but sort -u is faster.
6. Test on Small Sample First
# Test on first 10 lines
head -10 largefile.txt | cut -d, -f3 | sort | uniq
# When satisfied, run on full file
cut -d, -f3 largefile.txt | sort | uniq
7. Save Intermediate Results for Complex Pipelines
# For complex multi-step analysis
cut -d: -f1,3 /etc/passwd > users_uids.txt
sort -t: -k2 -n users_uids.txt > sorted_by_uid.txt
# Now analyze sorted_by_uid.txt
8. Handle Missing Delimiters
If a line doesn't contain the delimiter, cut prints the entire line:
# To suppress lines without delimiter
cut -d: -f1 -s /etc/passwd # -s = only delimited lines
🚨 Common Pitfalls to Avoid
Pitfall 1: Forgetting to Sort Before uniq
# WRONG - uniq won't catch all duplicates
cat file.txt | uniq
# CORRECT
cat file.txt | sort | uniq
Why: uniq only removes consecutive duplicates.
Pitfall 2: Alphabetical Sort on Numbers
# WRONG - treats numbers as text
sort numbers.txt
# Output: 1, 10, 100, 2, 20, 3...
# CORRECT - numeric sort
sort -n numbers.txt
# Output: 1, 2, 3, 10, 20, 100...
Pitfall 3: Wrong Field Delimiter
# File is colon-delimited, but using default (tab)
cut -f1 /etc/passwd # WRONG - no tabs in file
# Specify delimiter
cut -d: -f1 /etc/passwd # CORRECT
Pitfall 4: Extracting Non-Existent Fields
# /etc/passwd has 7 fields
cut -d: -f10 /etc/passwd # Field 10 doesn't exist, returns empty
Always know your data structure first!
Pitfall 5: Case Sensitivity Issues
# Data has mixed case
sort names.txt | uniq
# "Alice", "alice", "ALICE" all treated as different
# Case-insensitive
sort -f names.txt | uniq -i
Pitfall 6: Not Handling Header Lines
# CSV with header
cut -d, -f2 data.csv | sort | uniq
# Includes header in results!
# Skip header
cut -d, -f2 data.csv | tail -n +2 | sort | uniq
Pitfall 7: Inefficient Multiple Passes
# Inefficient - reads file 3 times
sort file.txt > temp1
uniq temp1 > temp2
wc -l temp2
# Efficient - single pipeline
sort file.txt | uniq | wc -l
📝 Command Cheat Sheet
cut Command Patterns
# Extract single field
cut -d: -f1 file.txt
# Extract multiple fields
cut -d: -f1,3,5 file.txt
# Extract field range
cut -d: -f1-3 file.txt
# Extract from field N to end
cut -d: -f5- file.txt
# Extract characters
cut -c1-10 file.txt
# Extract bytes
cut -b1-20 file.txt
# Suppress lines without delimiter
cut -d: -f1 -s file.txt
sort Command Patterns
# Basic sort
sort file.txt
# Numeric sort
sort -n numbers.txt
# Reverse sort
sort -r file.txt
# Unique sort
sort -u file.txt
# Sort by field
sort -t: -k3 -n file.txt
# Multiple field sort
sort -t: -k2,2 -k1,1 file.txt
# Case-insensitive
sort -f file.txt
# Human-readable numbers
sort -h sizes.txt
# Version numbers
sort -V versions.txt
uniq Command Patterns
# Remove duplicates (must sort first!)
sort file.txt | uniq
# Count occurrences
sort file.txt | uniq -c
# Show only duplicates
sort file.txt | uniq -d
# Show only unique (non-repeated)
sort file.txt | uniq -u
# Case-insensitive
sort file.txt | uniq -i
Common Pipelines
# Extract, sort, unique
cut -d: -f1 /etc/passwd | sort | uniq
# Count unique items
command | sort | uniq | wc -l
# Most frequent items
command | sort | uniq -c | sort -nr | head -10
# Extract field, sort numerically
cut -d: -f3 file.txt | sort -n
# Find duplicates only
cut -d: -f1 file.txt | sort | uniq -d
# Sort by specific field
sort -t: -k3 -n /etc/passwd
🎯 Key Takeaways
Essential Concepts:
-
cut extracts specific fields or characters from lines
- Use
-dto specify delimiter,-ffor fields,-cfor characters
- Use
-
sort organizes lines in order
- Use
-nfor numbers,-rfor reverse,-kfor specific fields
- Use
-
uniq removes consecutive duplicate lines
- Must sort first!
uniqonly works on consecutive duplicates - Use
-cto count,-dfor duplicates only
- Must sort first!
-
Pipelines are powerful
- Combine commands:
cut | sort | uniq - Process data efficiently in single pass
- Combine commands:
-
Common pattern
cut -d: -fN file | sort | uniq -c | sort -nr- Extract field, count occurrences, show most frequent
-
For LFCS exam: Master these combinations
- Analyzing /etc/passwd, /etc/group
- Processing log files
- Extracting and counting data
🚀 What's Next?
Congratulations! You've mastered three powerful text processing commands: cut, sort, and uniq. These are fundamental tools for system administration and LFCS exam success.
In the next post, we'll explore even more advanced text processing with awk and sed - the Swiss Army knives of text manipulation. You'll learn pattern matching, field processing, and in-place editing.
Coming Up: Post 33 - Advanced Text Processing with awk and sed
Your Progress: 32 of 52 posts complete (61.5%)! You're past the halfway mark! 🎉
🎉 Excellent work! You now know how to:
- Extract specific data with
cut - Organize information with
sort - Remove duplicates and count occurrences with
uniq - Build powerful text processing pipelines
- Analyze system files like /etc/passwd
- Process structured data efficiently
These skills are essential for LFCS certification and daily system administration. Keep practicing with the labs, and you'll master text processing in no time!
Next: Continue with Post 33 for advanced text manipulation with awk and sed!

