Mini CSV: Sum Column by Header
๐ฆ Mini CSV: Sum Column by Header
Difficulty: Intermediate Tags: parsing, strings, data-processing Series: CS 101
Problem
In the mystical archives, scrolls are written in CSV (Comma-Separated Values) format. You must parse these scrolls and extract numerical insights from specific columns.
Given a CSV string where: - The first line contains column headers - Subsequent lines contain data rows - Fields are separated by commas
Return the sum of all values in the column specified by header.
Real-World Application
CSV parsing appears everywhere in data processing: - Data analysis - reading datasets for processing (pandas alternative) - ETL pipelines - extracting data from exports and reports - Log analysis - parsing structured log files - API integrations - processing CSV responses from web services - Spreadsheet automation - reading Excel/Google Sheets exports - Database imports - bulk data loading
Input
data = {
'csv': str, # CSV string with headers and data
'header': str # Column name to sum
}
Output
int # Sum of all values in the specified column
Constraints
- Lines separated by
\n(newline) - Fields separated by
,(comma) - Trim whitespace around field values
- Header is guaranteed to exist
- Non-integer values (empty, text, etc.) treated as 0
- Ignore empty lines
Examples
Example 1: Basic CSV
Input:
{
'csv': 'name,score\nA,10\nB,20',
'header': 'score'
}
Output: 30
Explanation: - Headers: ['name', 'score'] - Row 1: A has score 10 - Row 2: B has score 20 - Sum of 'score' column: 10 + 20 = 30
Example 2: CSV with Spaces
Input:
{
'csv': 'name, score\nA, 7\nB, 8',
'header': 'score'
}
Output: 15
Explanation: - Headers after trimming: ['name', 'score'] - Row 1: score = 7 (trimmed from ' 7') - Row 2: score = 8 (trimmed from ' 8') - Sum: 7 + 8 = 15
Example 3: Missing Values and Blank Lines
Input:
{
'csv': 'name,score\nA,\n\nC,5',
'header': 'score'
}
Output: 5
Explanation: - Row 1: A has empty score โ treated as 0 - Row 2: Empty line โ ignored - Row 3: C has score 5 - Sum: 0 + 5 = 5
Example 4: Multiple Columns
Input:
{
'csv': 'name,age,score\nAlice,25,100\nBob,30,85\nCarol,28,92',
'header': 'age'
}
Output: 83
Explanation: - Sum of 'age' column: 25 + 30 + 28 = 83
Example 5: Non-Numeric Values
Input:
{
'csv': 'name,score\nAlice,100\nBob,invalid\nCarol,50',
'header': 'score'
}
Output: 150
Explanation: - Alice: 100 - Bob: 'invalid' โ treated as 0 - Carol: 50 - Sum: 100 + 0 + 50 = 150
What You'll Learn
- How to parse delimited text data
- String manipulation and trimming techniques
- Handling edge cases (empty values, blank lines)
- Mapping column names to indices
- Robust error handling for data parsing
Why This Matters
CSV parsing is a fundamental skill for data engineering. Interview questions test your ability to handle malformed data, edge cases, and write robust parsing logic without relying on libraries.
Starter Code
def challenge_function(data):
"""
Sum all values in a specified CSV column.
Args:
data: dict with keys:
- 'csv' (str): CSV string with headers and data
- 'header' (str): Column name to sum
Returns:
int: Sum of all numeric values in the column
"""
# Your implementation here
pass