LFCS Phase 1 Part 32: Text Processing with cut, sort, and uniq

Text processing is at the heart of Linux system administration. Whether you're analyzing log files, parsing configuration files, or extracting specific data from command outputs, you need powerful tools to manipulate text efficiently. In this comprehensive guide, we'll master three essential commands that work beautifully together: cut, sort, and uniq.

💡

🎯 What You'll Learn:

Extract specific columns and fields with cut
Parse delimited data (CSV, TSV, colon-separated files)
Sort text alphabetically and numerically with sort
Remove duplicate lines efficiently with uniq
Combine all three commands in powerful pipelines
Analyze real-world files like /etc/passwd and system logs
Build practical text processing workflows
Master field extraction and data cleaning techniques

Series: LFCS Certification - Phase 1 (Post 32 of 52)

Prerequisite: Post 31 (grep command) recommended

Why These Commands Matter for LFCS

As a Linux system administrator, you'll constantly work with structured text:

Extracting data: Get usernames from /etc/passwd, IP addresses from logs
Analyzing logs: Find most common errors, count occurrences
Processing CSV files: Extract specific columns from reports
Cleaning data: Remove duplicates from lists
System auditing: Sort users by UID, find duplicate processes

The commands cut, sort, and uniq form a powerful trio that you'll use daily. They're essential for the LFCS exam and real-world system administration.

Understanding Text Processing Pipelines

Before diving into individual commands, let's understand how they work together:

Text Processing Pipeline

How cut, sort, and uniq work together

Input File

Raw, unstructured data

↓

cut

Extract specific fields/columns

↓

sort

Organize data in order

↓

uniq

Remove duplicates

↓

Clean Output

Processed, analyzed data

Example pipeline:

cut -d: -f1 /etc/passwd | sort | uniq

This extracts usernames, sorts them alphabetically, and removes any duplicates.

The cut Command: Extracting Fields

The cut command extracts specific portions of each line from a file or input stream.

Basic cut Syntax

cut OPTIONS FILE

Common options:

-f - Select fields (columns)
-d - Specify field delimiter
-c - Select characters
-b - Select bytes

Extracting Fields with -f and -d

The most common use of cut is extracting specific fields from delimited data.

Understanding Delimiters

A delimiter is a character that separates fields:

Colon (:) - Used in /etc/passwd, /etc/group
Tab - Default delimiter for cut
Comma (,) - Used in CSV files
Space - Common in many files
Pipe (|) - Sometimes used in data files

Example: Extracting from /etc/passwd

The /etc/passwd file uses colons (:) as delimiters:

head -3 /etc/passwd

Output:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

Field structure:

Username
Password placeholder (x)
UID (User ID)
GID (Group ID)
GECOS (Full name/description)
Home directory
Shell

Extract First Field (Usernames)

cut -d: -f1 /etc/passwd | head -5

Breakdown:

cut - The command
-d: - Use colon as delimiter
-f1 - Select field 1 (first column)
/etc/passwd - Input file
| head -5 - Show only first 5 lines

Output:

root
bin
daemon
adm
lp

What happened:

cut read each line
Split line on : delimiter
Extracted field 1 (username)
Printed only that field

Extract Multiple Fields

You can extract multiple fields using commas:

cut -d: -f1,3,6 /etc/passwd | head -5

Output:

root:0:/root
bin:1:/bin
daemon:2:/sbin
adm:3:/var/adm
lp:4:/var/spool/lpd

This extracts:

Field 1: Username
Field 3: UID
Field 6: Home directory

Extract Field Ranges

Use hyphens to specify ranges:

cut -d: -f1-3 /etc/passwd | head -3

Output:

root:x:0
bin:x:1
daemon:x:2

This extracts fields 1 through 3 (username, password placeholder, UID).

More range examples:

cut -d: -f1-3,6 /etc/passwd     # Fields 1-3 and 6
cut -d: -f3- /etc/passwd         # Field 3 to end of line
cut -d: -f-4 /etc/passwd         # Fields 1 through 4

Extract Last Field

To get the last field (shell):

cut -d: -f7 /etc/passwd | head -5

Output:

/bin/bash
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin

Character-Based Extraction with -c

Sometimes you need to extract specific character positions rather than fields.

Extract Specific Characters

echo "Hello World" | cut -c1-5

Output:

Hello

This extracts characters 1 through 5.

Character Position Examples

# First character
echo "Linux" | cut -c1
# Output: L

# Last 3 characters (positions 3-5)
echo "Linux" | cut -c3-5
# Output: nux

# Characters 1, 3, and 5
echo "Linux" | cut -c1,3,5
# Output: Lnx

# From character 3 to end
echo "Linux" | cut -c3-
# Output: nux

Real-World Example: Extract Date from ls Output

ls -l /etc/passwd

Output:

-rw-r--r--. 1 root root 2584 Nov 15 10:23 /etc/passwd

Extract just the date portion (characters 42-53):

ls -l /etc/passwd | cut -c42-53

Output:

Nov 15 10:23

Byte-Based Extraction with -b

For files with multi-byte characters (UTF-8), use -b:

cut -b1-10 filename.txt

Difference from -c:

-c counts characters (may be multi-byte in UTF-8)
-b counts bytes (always single byte)

For ASCII text, -c and -b are identical. For international characters, they differ.

The sort Command: Organizing Data

The sort command sorts lines of text alphabetically or numerically.

Basic sort Syntax

sort OPTIONS FILE

Common options:

-n - Numeric sort (treats numbers correctly)
-r - Reverse order
-k - Sort by specific field
-t - Specify field delimiter
-u - Unique (remove duplicates while sorting)
-h - Human-numeric sort (1K, 2M, 3G)
-V - Version sort (handles version numbers correctly)

Alphabetical Sorting (Default)

By default, sort arranges lines alphabetically:

cat << EOF > fruits.txt
banana
apple
cherry
date
EOF

sort fruits.txt

Output:

apple
banana
cherry
date

Sorted alphabetically (A-Z).

Case-Sensitive Sorting

Uppercase letters come before lowercase in ASCII:

cat << EOF > mixed.txt
Zebra
apple
Banana
EOF

sort mixed.txt

Output:

Banana
Zebra
apple

Why? In ASCII, uppercase A-Z (65-90) comes before lowercase a-z (97-122).

Case-insensitive sort:

sort -f mixed.txt    # -f = fold case (ignore case)

Output:

apple
Banana
Zebra

Numeric Sorting with -n

Problem: Alphabetical sort treats numbers as text:

cat << EOF > numbers.txt
100
2
30
1
EOF

sort numbers.txt

Output (WRONG):

Why wrong? Alphabetically, "100" starts with "1", so it comes before "2".

Solution: Numeric Sort

sort -n numbers.txt

Output (CORRECT):

Now sorted numerically!

Reverse Sorting with -r

Reverse the sort order:

sort -r fruits.txt

Output:

date
cherry
banana
apple

Combine with numeric:

sort -nr numbers.txt

Output:

Sorted numerically in reverse (largest first).

Sorting by Specific Field with -k

You can sort based on a specific column/field.

Example: Sort /etc/passwd by UID

sort -t: -k3 -n /etc/passwd | head -5

Breakdown:

sort - The command
-t: - Use colon as field delimiter
-k3 - Sort by field 3 (UID)
-n - Numeric sort
/etc/passwd - Input file

Output:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

Users sorted by UID (0, 1, 2, 3, 4...).

Sort by Multiple Fields

# Sort by GID (field 4), then by UID (field 3)
sort -t: -k4 -k3 -n /etc/passwd

This sorts by field 4 first, then by field 3 for ties.

Sort by Last Field

To sort by the last field (shell):

sort -t: -k7 /etc/passwd | head -5

Output (sorted by shell path):

root:x:0:0:root:/root:/bin/bash
centos9:x:1000:1000::/home/centos9:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

Unique Sort with -u

Remove duplicates while sorting:

cat << EOF > duplicates.txt
apple
banana
apple
cherry
banana
EOF

sort -u duplicates.txt

Output:

apple
banana
cherry

Duplicates removed, output sorted.

This is equivalent to sort | uniq but more efficient.

Human-Numeric Sort with -h

When sorting file sizes or numbers with suffixes:

cat << EOF > sizes.txt
1K
2M
500K
1G
100M
EOF

sort -h sizes.txt

Output:

1K
500K
2M
100M
1G

Correctly sorted by size (K < M < G).

Without -h, it would sort alphabetically (wrong).

Version Sort with -V

For version numbers:

cat << EOF > versions.txt
version-1.10
version-1.2
version-1.1
version-2.0
EOF

sort -V versions.txt

Output:

version-1.1
version-1.2
version-1.10
version-2.0

Correctly handles version numbering!

Without -V:

version-1.1
version-1.10    # Wrong! 10 comes before 2 alphabetically
version-1.2
version-2.0

The uniq Command: Removing Duplicates

The uniq command removes consecutive duplicate lines.

⚠️

⚠️ CRITICAL: uniq only removes consecutive duplicates. You MUST sort first!

Wrong:

uniq unsorted_file.txt    # Won't work correctly!

Correct:

sort unsorted_file.txt | uniq    # Sort first, then remove duplicates

Basic uniq Usage

cat << EOF > repeated.txt
apple
apple
banana
cherry
cherry
cherry
banana
EOF

uniq repeated.txt

Output:

apple
banana
cherry
banana

Notice: The second "banana" is still there because it's not consecutive with the first.

Proper Usage: Sort First

sort repeated.txt | uniq

Output:

apple
banana
cherry

Now all duplicates are removed!

Counting Occurrences with -c

Count how many times each line appears:

sort repeated.txt | uniq -c

Output:

      2 apple
      2 banana
      3 cherry

Interpretation:

apple appears 2 times
banana appears 2 times
cherry appears 3 times

The count is left-padded with spaces for alignment.

Sort by Count

Find most common items:

sort repeated.txt | uniq -c | sort -nr

Output:

      3 cherry
      2 banana
      2 apple

Pipeline breakdown:

sort - Sort the file
uniq -c - Count occurrences
sort -nr - Sort numerically, reverse (highest first)

This shows "cherry" is most common (3 occurrences).

Show Only Duplicates with -d

Show only lines that appear more than once:

sort repeated.txt | uniq -d

Output:

apple
banana
cherry

All three items appear multiple times.

Show Only Unique Lines with -u

Show only lines that appear exactly once:

cat << EOF > mixed.txt
apple
banana
apple
cherry
date
EOF

sort mixed.txt | uniq -u

Output:

banana
cherry
date

Only items that appear once.

Case-Insensitive Comparison with -i

Ignore case when comparing:

cat << EOF > case_test.txt
Apple
apple
APPLE
Banana
EOF

sort case_test.txt | uniq -i

Output:

Apple
Banana

All variations of "apple" treated as same (first occurrence kept).

Combining cut, sort, and uniq

Now let's see the real power: combining all three commands.

Example 1: List All Unique Shells

Goal: Get a unique list of all shells used on the system.

cut -d: -f7 /etc/passwd | sort | uniq

Output:

/bin/bash
/bin/sync
/sbin/halt
/sbin/nologin
/sbin/shutdown

Pipeline breakdown:

cut -d: -f7 /etc/passwd - Extract shell (field 7)
sort - Sort the shells
uniq - Remove duplicates

Example 2: Count Users Per Shell

Goal: How many users use each shell?

cut -d: -f7 /etc/passwd | sort | uniq -c | sort -nr

Output:

     18 /sbin/nologin
      2 /bin/bash
      1 /sbin/shutdown
      1 /sbin/halt
      1 /bin/sync

Interpretation:

18 users have /sbin/nologin (system accounts)
2 users have /bin/bash (real users)
1 user each for system special accounts

Example 3: Find Duplicate UIDs

Goal: Check if any UIDs are used by multiple users (security issue).

cut -d: -f3 /etc/passwd | sort -n | uniq -d

Output:

(empty if no duplicates)

If you see output: You have duplicate UIDs (security problem!).

Example 4: Extract and Count Unique IP Addresses from Log

Assuming a log file with IP addresses:

cat << EOF > access.log
192.168.1.100 - GET /index.html
192.168.1.101 - GET /about.html
192.168.1.100 - GET /contact.html
192.168.1.102 - GET /index.html
192.168.1.100 - GET /services.html
EOF

cut -d' ' -f1 access.log | sort | uniq -c | sort -nr

Output:

      3 192.168.1.100
      1 192.168.1.102
      1 192.168.1.101

192.168.1.100 accessed the site 3 times.

Example 5: List Home Directory Types

Goal: What types of home directories exist?

cut -d: -f6 /etc/passwd | cut -d/ -f2 | sort | uniq -c

Output:

      1 bin
      1 boot
     18 home
      1 root
      7 sbin
      3 var

Pipeline breakdown:

cut -d: -f6 /etc/passwd - Extract home directory
cut -d/ -f2 - Get first directory after /
sort | uniq -c - Count unique directories

Real-World Text Processing Scenarios

Scenario 1: Find Most Common Error in Logs

# Extract ERROR lines, get error type, count occurrences
grep ERROR /var/log/messages | cut -d' ' -f5- | sort | uniq -c | sort -nr | head -10

This shows top 10 most common errors.

Scenario 2: List All Users with Bash Shell

grep '/bin/bash$' /etc/passwd | cut -d: -f1 | sort

Output:

centos9
root

Scenario 3: Parse CSV File

Given a CSV file:

Name,Age,City
Alice,30,NYC
Bob,25,LA
Charlie,30,NYC

Extract city column and count:

cut -d, -f3 data.csv | tail -n +2 | sort | uniq -c

Output:

      1 LA
      2 NYC

Pipeline:

cut -d, -f3 - Extract city (field 3)
tail -n +2 - Skip header line
sort | uniq -c - Count cities

Scenario 4: Find Users with Same Home Directory

cut -d: -f6 /etc/passwd | sort | uniq -d

If output appears: Multiple users share the same home directory.

Scenario 5: Extract Domain from Email Addresses

cat << EOF > emails.txt
user1@example.com
user2@gmail.com
user3@example.com
user4@yahoo.com
EOF

cut -d@ -f2 emails.txt | sort | uniq -c | sort -nr

Output:

      2 example.com
      1 yahoo.com
      1 gmail.com

Advanced Techniques

Multiple Field Extraction

Extract username and shell, create custom format:

cut -d: -f1,7 /etc/passwd | head -5

Output:

root:/bin/bash
bin:/sbin/nologin
daemon:/sbin/nologin
adm:/sbin/nologin
lp:/sbin/nologin

Using Different Output Delimiter

By default, cut uses the same delimiter for output. To change it, use tr:

cut -d: -f1,7 /etc/passwd | tr ':' ' ' | head -3

Output:

root /bin/bash
bin /sbin/nologin
daemon /sbin/nologin

Now space-separated instead of colon-separated.

Sorting by Multiple Criteria

Sort by shell, then by username:

sort -t: -k7,7 -k1,1 /etc/passwd | head -5

This sorts:

Primary: Field 7 (shell)
Secondary: Field 1 (username) for ties

Case-Insensitive Sort and Unique

cat << EOF > names.txt
Alice
bob
ALICE
Charlie
Bob
EOF

sort -f names.txt | uniq -i

Output:

Alice
bob
Charlie

All case variations of "Alice" and "bob" removed.

Common Patterns and Idioms

Pattern 1: Count Unique Items

command | sort | uniq | wc -l

Counts number of unique items.

Pattern 2: Most Frequent Items

command | sort | uniq -c | sort -nr | head -10

Shows top 10 most frequent items.

Pattern 3: Find Duplicates Only

command | sort | uniq -d

Shows only items that appear more than once.

Pattern 4: Extract Field from Delimited File

cut -d'DELIMITER' -fN filename | sort | uniq

Replace DELIMITER and N with your values.

Pattern 5: Remove Blank Lines

sort file.txt | uniq | grep -v '^$'

Removes empty lines from sorted, unique output.

Quick Reference Tables

cut Options

Option	Purpose	Example
`-f N`	Select field N	`cut -d: -f1`
`-d DELIM`	Set delimiter	`cut -d, -f2`
`-c N-M`	Select characters N to M	`cut -c1-5`
`-b N-M`	Select bytes N to M	`cut -b1-10`
`-f1,3,5`	Select multiple fields	`cut -d: -f1,3,5`
`-f1-3`	Select field range	`cut -d: -f1-3`

sort Options

Option	Purpose	Example
`-n`	Numeric sort	`sort -n numbers.txt`
`-r`	Reverse order	`sort -r file.txt`
`-k N`	Sort by field N	`sort -t: -k3 -n`
`-t DELIM`	Set field delimiter	`sort -t: -k3`
`-u`	Unique (remove duplicates)	`sort -u file.txt`
`-f`	Ignore case	`sort -f file.txt`
`-h`	Human-numeric sort	`sort -h sizes.txt`
`-V`	Version sort	`sort -V versions.txt`

uniq Options

Option	Purpose	Example
`-c`	Count occurrences	`uniq -c`
`-d`	Show only duplicates	`uniq -d`
`-u`	Show only unique lines	`uniq -u`
`-i`	Ignore case	`uniq -i`
`-f N`	Skip first N fields	`uniq -f 2`

🧪 Practice Labs

Let's apply what you've learned with comprehensive hands-on practice.

Lab 1: Basic Field Extraction (Beginner)

Task: Extract all usernames from /etc/passwd and display them in alphabetical order.

Show Solution

# Extract usernames (field 1) and sort
cut -d: -f1 /etc/passwd | sort

Expected output: Alphabetically sorted list of all usernames.

Explanation:

cut -d: -f1 extracts first field (username)
sort arranges alphabetically

Lab 2: Multiple Field Extraction (Beginner)

Task: Display username, UID, and home directory for all users. Format: username:uid:home

Show Solution

# Extract fields 1, 3, and 6
cut -d: -f1,3,6 /etc/passwd

Expected output:

root:0:/root
bin:1:/bin
daemon:2:/sbin
...

Explanation:

-f1,3,6 extracts username (1), UID (3), and home (6)
Fields separated by colon (original delimiter preserved)

Lab 3: Character Extraction (Beginner)

Task: Extract the first 3 characters from each username in /etc/passwd.

Show Solution

# Extract usernames, then first 3 characters
cut -d: -f1 /etc/passwd | cut -c1-3

Expected output:

roo
bin
dae
adm
...

Explanation:

First cut extracts username
Second cut -c1-3 gets characters 1 through 3

Lab 4: Numeric Sort (Beginner)

Task: List all UIDs from /etc/passwd sorted numerically.

Show Solution

# Extract UID field and sort numerically
cut -d: -f3 /etc/passwd | sort -n

Expected output:

0
1
2
3
4
...

Explanation:

cut -d: -f3 extracts UID (field 3)
sort -n sorts numerically (not alphabetically)

Lab 5: Reverse Sort (Beginner)

Task: Display all shells from /etc/passwd in reverse alphabetical order.

Show Solution

# Extract shells and sort in reverse
cut -d: -f7 /etc/passwd | sort -r

Expected output:

/sbin/shutdown
/sbin/nologin
/sbin/nologin
...
/bin/bash
/bin/bash

Explanation:

cut -d: -f7 extracts shell (field 7)
sort -r sorts in reverse alphabetical order

Lab 6: Remove Duplicates (Beginner)

Task: Get a list of unique shells used on the system.

Show Solution

# Extract shells, sort, remove duplicates
cut -d: -f7 /etc/passwd | sort | uniq

Expected output:

/bin/bash
/bin/sync
/sbin/halt
/sbin/nologin
/sbin/shutdown

Explanation:

sort is required before uniq (uniq only removes consecutive duplicates)
Result is unique list of shells

Lab 7: Count Occurrences (Intermediate)

Task: Count how many users use each shell.

Show Solution

# Extract shells, sort, count with uniq
cut -d: -f7 /etc/passwd | sort | uniq -c

Expected output:

      2 /bin/bash
      1 /bin/sync
      1 /sbin/halt
     18 /sbin/nologin
      1 /sbin/shutdown

Explanation:

uniq -c adds count before each line
Shows how many users have each shell

Lab 8: Most Common Item (Intermediate)

Task: Find which shell is used by the most users.

Show Solution

# Extract, sort, count, sort by count (descending)
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -nr | head -1

Expected output:

     18 /sbin/nologin

Explanation:

uniq -c counts occurrences
sort -nr sorts numerically, reverse (highest first)
head -1 shows only top result
/sbin/nologin is most common (system accounts)

Lab 9: Sort by Field (Intermediate)

Task: Display all users sorted by their UID (lowest to highest).

Show Solution

# Sort /etc/passwd by field 3 (UID) numerically
sort -t: -k3 -n /etc/passwd

Expected output:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
...

Explanation:

-t: sets delimiter to colon
-k3 sorts by field 3 (UID)
-n numeric sort (0 < 1 < 2, not alphabetical)

Lab 10: Field Range Extraction (Intermediate)

Task: Extract username, UID, GID, and home directory (fields 1, 3, 4, 6) from /etc/passwd.

Show Solution

# Extract multiple fields
cut -d: -f1,3,4,6 /etc/passwd | head -5

Expected output:

root:0:0:/root
bin:1:1:/bin
daemon:2:2:/sbin
adm:3:4:/var/adm
lp:4:7:/var/spool/lpd

Explanation:

-f1,3,4,6 extracts specified fields only
Output uses same delimiter (colon)

Lab 11: Find Duplicate UIDs (Intermediate)

Task: Check if any UID is used by multiple users (security issue).

Show Solution

# Extract UIDs, sort, show only duplicates
cut -d: -f3 /etc/passwd | sort -n | uniq -d

Expected output:

(empty if no duplicates - which is good!)

Explanation:

cut -d: -f3 extracts UIDs
sort -n sorts numerically
uniq -d shows only duplicated values
Empty output = no duplicate UIDs (secure system)

Lab 12: Users with Specific Shell (Intermediate)

Task: List usernames of all users who have /bin/bash as their shell.

Show Solution

# Method 1: Using grep and cut
grep '/bin/bash$' /etc/passwd | cut -d: -f1

# Method 2: Using awk (if you know it)
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd

Expected output:

root
centos9

Explanation (Method 1):

grep '/bin/bash$' finds lines ending with /bin/bash
cut -d: -f1 extracts username from matching lines

Lab 13: Count Users by Home Directory Prefix (Advanced)

Task: Count how many users have home directories under each top-level directory (/home, /root, /var, etc.).

Show Solution

# Extract home dir, get first directory, count
cut -d: -f6 /etc/passwd | cut -d/ -f2 | sort | uniq -c | sort -nr

Expected output:

     18 home
      7 sbin
      3 var
      1 root
      1 boot
      1 bin

Explanation:

First cut -d: -f6 extracts home directory path
Second cut -d/ -f2 extracts first directory after /
sort | uniq -c counts occurrences
sort -nr shows most common first

Lab 14: Extract and Process CSV Data (Advanced)

Task: Create a CSV file with user data and extract specific columns.

Show Solution

# Create sample CSV file
cat << 'EOF' > users.csv
Name,Age,City,Department
Alice,30,NYC,Engineering
Bob,25,LA,Sales
Charlie,30,NYC,Engineering
David,35,Chicago,Marketing
Alice,30,Boston,Sales
EOF

# Extract names and departments (fields 1 and 4)
cut -d, -f1,4 users.csv

# Count unique departments
cut -d, -f4 users.csv | tail -n +2 | sort | uniq -c

Expected output (extraction):

Name,Department
Alice,Engineering
Bob,Sales
Charlie,Engineering
David,Marketing
Alice,Sales

Expected output (count):

      2 Engineering
      1 Marketing
      2 Sales

Explanation:

cut -d, -f1,4 extracts name and department (comma delimiter)
tail -n +2 skips header line
sort | uniq -c counts unique departments

Lab 15: Find Users with Duplicate Home Directories (Advanced)

Task: Identify if multiple users share the same home directory.

Show Solution

# Extract home directories, find duplicates
cut -d: -f6 /etc/passwd | sort | uniq -d

Expected output:

(empty if no shared home directories)

If output shows directories: Multiple users share those directories (unusual, possible issue).

Explanation:

cut -d: -f6 extracts home directory
sort | uniq -d shows only duplicated directories

Lab 16: Extract Specific Character Positions (Advanced)

Task: From the output of ls -l, extract only the permissions (characters 1-10).

Show Solution

# List files, extract permission string
ls -l /etc | tail -n +2 | cut -c1-10

Expected output:

drwxr-xr-x
drwxr-xr-x
-rw-r--r--
-rw-r--r--
...

Explanation:

ls -l /etc long listing
tail -n +2 skip "total" line
cut -c1-10 extract first 10 characters (permissions)

Lab 17: Sort by Multiple Fields (Advanced)

Task: Sort /etc/passwd first by shell (field 7), then by username (field 1) within each shell group.

Show Solution

# Sort by shell (primary), then username (secondary)
sort -t: -k7,7 -k1,1 /etc/passwd | head -10

Expected output:

centos9:x:1000:1000::/home/centos9:/bin/bash
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
halt:x:7:0:halt:/sbin:/sbin/halt
...

Explanation:

-k7,7 primary sort by field 7 (shell)
-k1,1 secondary sort by field 1 (username)
Users with same shell are alphabetically sorted

Lab 18: Version Number Sorting (Advanced)

Task: Create a list of version numbers and sort them correctly.

Show Solution

# Create version list
cat << 'EOF' > versions.txt
app-1.10.0
app-1.2.0
app-1.9.5
app-2.0.0
app-1.1.0
EOF

# Sort with version sort
sort -V versions.txt

Expected output:

app-1.1.0
app-1.2.0
app-1.9.5
app-1.10.0
app-2.0.0

Explanation:

sort -V handles version numbers correctly
Without -V, "1.10.0" would come before "1.2.0" (alphabetically)
-V understands semantic versioning

Lab 19: Complex Pipeline - Log Analysis (Advanced)

Task: Analyze a log file to find the 5 most frequent error messages.

Show Solution

# Create sample log file
cat << 'EOF' > server.log
2025-12-09 10:15:23 INFO Server started
2025-12-09 10:15:45 ERROR Connection timeout
2025-12-09 10:16:12 ERROR Connection timeout
2025-12-09 10:17:33 WARNING Low memory
2025-12-09 10:18:21 ERROR Database unavailable
2025-12-09 10:19:15 ERROR Connection timeout
2025-12-09 10:20:44 ERROR Database unavailable
2025-12-09 10:21:08 INFO Request completed
EOF

# Extract errors, count, show top 5
grep ERROR server.log | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -5

Expected output:

      3 Connection timeout
      2 Database unavailable

Explanation:

grep ERROR filters error lines only
cut -d' ' -f4- extracts message (from field 4 to end)
sort | uniq -c counts occurrences
sort -nr sorts by count (highest first)
head -5 shows top 5

Lab 20: Real-World System Audit (Advanced)

Task: Create a comprehensive report showing: number of users, shells used, UID ranges, and potential security issues.

Show Solution

# Complete system user audit script
echo "=== System User Audit Report ==="
echo ""

echo "Total Users:"
wc -l /etc/passwd

echo ""
echo "Shell Distribution:"
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -nr

echo ""
echo "UID Range:"
echo "Lowest UID: $(cut -d: -f3 /etc/passwd | sort -n | head -1)"
echo "Highest UID: $(cut -d: -f3 /etc/passwd | sort -n | tail -1)"

echo ""
echo "Users with Bash Shell:"
grep '/bin/bash$' /etc/passwd | cut -d: -f1 | sort

echo ""
echo "Checking for Duplicate UIDs:"
DUPES=$(cut -d: -f3 /etc/passwd | sort -n | uniq -d)
if [ -z "$DUPES" ]; then
    echo "No duplicate UIDs found (GOOD)"
else
    echo "WARNING: Duplicate UIDs found: $DUPES"
fi

echo ""
echo "Top 5 Home Directory Locations:"
cut -d: -f6 /etc/passwd | cut -d/ -f2 | sort | uniq -c | sort -nr | head -5

Expected output: A comprehensive formatted report with all system user statistics.

Explanation: This combines all techniques learned:

Field extraction with cut
Sorting with sort (numeric and alphabetical)
Duplicate detection with uniq
Counting and analysis
Conditional logic for security checks

📚 Best Practices

1. Always Sort Before uniq

⚠️

Golden Rule: uniq only removes consecutive duplicates.

Wrong:

uniq unsorted.txt    # Misses non-consecutive duplicates!

Correct:

sort unsorted.txt | uniq    # Always sort first

2. Use Numeric Sort for Numbers

# Wrong (alphabetical)
sort numbers.txt    # 1, 10, 100, 2, 20...

# Correct (numeric)
sort -n numbers.txt    # 1, 2, 10, 20, 100...

3. Specify Delimiters Explicitly

# Assume colon-delimited file
cut -d: -f1 data.txt    # Explicit, clear
cut -f1 data.txt        # Uses tab (default), might fail

4. Combine Commands in Efficient Pipelines

# Less efficient (multiple reads)
cut -d: -f1 /etc/passwd > users.txt
sort users.txt > sorted_users.txt
uniq sorted_users.txt

# More efficient (single pipeline)
cut -d: -f1 /etc/passwd | sort | uniq

5. Use sort -u Instead of sort | uniq

# Good
sort file.txt | uniq

# Better (more efficient)
sort -u file.txt

Both produce same result, but sort -u is faster.

6. Test on Small Sample First

# Test on first 10 lines
head -10 largefile.txt | cut -d, -f3 | sort | uniq

# When satisfied, run on full file
cut -d, -f3 largefile.txt | sort | uniq

7. Save Intermediate Results for Complex Pipelines

# For complex multi-step analysis
cut -d: -f1,3 /etc/passwd > users_uids.txt
sort -t: -k2 -n users_uids.txt > sorted_by_uid.txt
# Now analyze sorted_by_uid.txt

8. Handle Missing Delimiters

If a line doesn't contain the delimiter, cut prints the entire line:

# To suppress lines without delimiter
cut -d: -f1 -s /etc/passwd    # -s = only delimited lines

🚨 Common Pitfalls to Avoid

Pitfall 1: Forgetting to Sort Before uniq

# WRONG - uniq won't catch all duplicates
cat file.txt | uniq

# CORRECT
cat file.txt | sort | uniq

Why: uniq only removes consecutive duplicates.

Pitfall 2: Alphabetical Sort on Numbers

# WRONG - treats numbers as text
sort numbers.txt
# Output: 1, 10, 100, 2, 20, 3...

# CORRECT - numeric sort
sort -n numbers.txt
# Output: 1, 2, 3, 10, 20, 100...

Pitfall 3: Wrong Field Delimiter

# File is colon-delimited, but using default (tab)
cut -f1 /etc/passwd    # WRONG - no tabs in file

# Specify delimiter
cut -d: -f1 /etc/passwd    # CORRECT

Pitfall 4: Extracting Non-Existent Fields

# /etc/passwd has 7 fields
cut -d: -f10 /etc/passwd    # Field 10 doesn't exist, returns empty

Always know your data structure first!

Pitfall 5: Case Sensitivity Issues

# Data has mixed case
sort names.txt | uniq
# "Alice", "alice", "ALICE" all treated as different

# Case-insensitive
sort -f names.txt | uniq -i

Pitfall 6: Not Handling Header Lines

# CSV with header
cut -d, -f2 data.csv | sort | uniq
# Includes header in results!

# Skip header
cut -d, -f2 data.csv | tail -n +2 | sort | uniq

Pitfall 7: Inefficient Multiple Passes

# Inefficient - reads file 3 times
sort file.txt > temp1
uniq temp1 > temp2
wc -l temp2

# Efficient - single pipeline
sort file.txt | uniq | wc -l

📝 Command Cheat Sheet

cut Command Patterns

# Extract single field
cut -d: -f1 file.txt

# Extract multiple fields
cut -d: -f1,3,5 file.txt

# Extract field range
cut -d: -f1-3 file.txt

# Extract from field N to end
cut -d: -f5- file.txt

# Extract characters
cut -c1-10 file.txt

# Extract bytes
cut -b1-20 file.txt

# Suppress lines without delimiter
cut -d: -f1 -s file.txt

sort Command Patterns

# Basic sort
sort file.txt

# Numeric sort
sort -n numbers.txt

# Reverse sort
sort -r file.txt

# Unique sort
sort -u file.txt

# Sort by field
sort -t: -k3 -n file.txt

# Multiple field sort
sort -t: -k2,2 -k1,1 file.txt

# Case-insensitive
sort -f file.txt

# Human-readable numbers
sort -h sizes.txt

# Version numbers
sort -V versions.txt

uniq Command Patterns

# Remove duplicates (must sort first!)
sort file.txt | uniq

# Count occurrences
sort file.txt | uniq -c

# Show only duplicates
sort file.txt | uniq -d

# Show only unique (non-repeated)
sort file.txt | uniq -u

# Case-insensitive
sort file.txt | uniq -i

Common Pipelines

# Extract, sort, unique
cut -d: -f1 /etc/passwd | sort | uniq

# Count unique items
command | sort | uniq | wc -l

# Most frequent items
command | sort | uniq -c | sort -nr | head -10

# Extract field, sort numerically
cut -d: -f3 file.txt | sort -n

# Find duplicates only
cut -d: -f1 file.txt | sort | uniq -d

# Sort by specific field
sort -t: -k3 -n /etc/passwd

🎯 Key Takeaways

Essential Concepts:

cut extracts specific fields or characters from lines
- Use -d to specify delimiter, -f for fields, -c for characters
sort organizes lines in order
- Use -n for numbers, -r for reverse, -k for specific fields
uniq removes consecutive duplicate lines
- Must sort first! uniq only works on consecutive duplicates
- Use -c to count, -d for duplicates only
Pipelines are powerful
- Combine commands: cut | sort | uniq
- Process data efficiently in single pass
Common pattern
```
cut -d: -fN file | sort | uniq -c | sort -nr
```
- Extract field, count occurrences, show most frequent
For LFCS exam: Master these combinations
- Analyzing /etc/passwd, /etc/group
- Processing log files
- Extracting and counting data

🚀 What's Next?

Congratulations! You've mastered three powerful text processing commands: cut, sort, and uniq. These are fundamental tools for system administration and LFCS exam success.

In the next post, we'll explore even more advanced text processing with awk and sed - the Swiss Army knives of text manipulation. You'll learn pattern matching, field processing, and in-place editing.

Coming Up: Post 33 - Advanced Text Processing with awk and sed

Your Progress: 32 of 52 posts complete (61.5%)! You're past the halfway mark! 🎉

✅

🎉 Excellent work! You now know how to:

Extract specific data with cut
Organize information with sort
Remove duplicates and count occurrences with uniq
Build powerful text processing pipelines
Analyze system files like /etc/passwd
Process structured data efficiently

These skills are essential for LFCS certification and daily system administration. Keep practicing with the labs, and you'll master text processing in no time!

Next: Continue with Post 33 for advanced text manipulation with awk and sed!