
Pooja Vishnu Chavan
Data Analyst, Toyota Lift Northeast
Pooja has 5+ years of data analysis experience across enterprise operations, IT, and technology. She currently integrates asset, inventory, service, and customer data at Toyota Lift Northeast during a major enterprise merger, reducing reporting discrepancies by 20%. Previously, she built PostgreSQL analytics and automated data quality workflows at UMBC, improving decision turnaround by 25%. She holds an MS in Information Systems from UMBC and is an AWS Certified Cloud Practitioner.
What is Power Query?
Power Query is Microsoft's data transformation and connectivity engine built into Excel (Get & Transform) and Power BI. It lets you connect to hundreds of data sources, clean and reshape data through a visual interface, and load the results for analysis. Behind the scenes, Power Query uses a functional language called M, but you can do most tasks without writing code.
Is Power Query the same as Power BI?
No. Power Query is the data transformation engine — it connects to sources, cleans data, and shapes it for analysis. Power BI is the visualization and reporting platform. Power Query is built INTO Power BI (and Excel), handling all the data preparation before you build dashboards. Think of Power Query as the kitchen where ingredients are prepped, and Power BI as the restaurant where dishes are served.
Do I need to learn M language to use Power Query?
No — you can be highly productive using just the visual interface. Power Query generates M code automatically as you click through transformations. However, learning basic M unlocks advanced scenarios: custom functions, dynamic parameters, and transformations the UI doesn't support. I'd estimate 80% of my work uses the visual interface, 20% requires custom M.
When should I use Power Query vs SQL?
Use SQL when working with large datasets (millions of rows), when transformations can be pushed to the database, or when you need to share logic across multiple reports. Use Power Query when combining data from multiple sources, when you need transformations SQL doesn't support well (unpivoting, fuzzy matching), or when you want reproducible, documented data prep that non-technical users can understand.
- Power Query
Power Query is Microsoft's data connectivity and transformation engine, built into Excel (as "Get & Transform Data") and Power BI Desktop. It provides a visual interface to connect to 100+ data sources, apply transformations (filtering, pivoting, merging, custom calculations), and load clean data for analysis — all without writing code. Behind the scenes, Power Query uses a functional language called M, which you can edit directly for advanced scenarios.
When I joined Toyota Lift Northeast during a major enterprise merger, I inherited datasets from two different systems that needed to be integrated: legacy asset data from the acquired company and target system data from Toyota Lift. Field names didn't match. Date formats were inconsistent. Some records were duplicated across systems while others had missing values.
I had three options: write SQL scripts to clean everything in the database, build Python notebooks for the transformation, or use Power Query to create a repeatable, visual workflow that anyone could understand and maintain.
I chose Power Query. Here's why it was the right call:
Power Query exists in two main places:
The interface and M language are nearly identical between them. Skills transfer directly.
Power Query is more than a data cleaning tool — it's a full ETL engine that handles extraction, transformation, and loading with a visual, reproducible workflow. For data analysts who need to integrate multiple sources without database write access, it's often the fastest path to clean data.
I've worked with SQL, Python (Pandas), and Power Query across different organizations. Each has its place, but Power Query fills a unique gap that pure code solutions miss.
No-Code/Low-Code Transformation
Most Power Query tasks require zero coding. You click through a visual interface:
- Remove columns: Right-click → Remove
- Filter rows: Click the dropdown → filter values
- Change types: Right-click column → Change Type
- Merge tables: Home → Merge Queries
Every click generates M code automatically. You get the power of a programming language with the accessibility of a spreadsheet.
For analysts who aren't developers, this is transformative. You're not blocked waiting for IT or data engineering to clean data for you.
Reproducibility That Business Users Understand
When I wrote Python scripts at UMBC to clean incident data, the logic was technically documented — in code. But when I handed off the project, the next analyst struggled to understand what the transformations were doing.
Power Query's step-by-step interface changes this:
Applied Steps:
Source
Navigation
Promoted Headers
Changed Type
Filtered Rows (Removed nulls in AssetID)
Merged Queries (Joined with ServiceHistory)
Expanded ServiceHistory
Removed Duplicates
Added Custom (Calculated DaysSinceService)
Each step has a plain-English name. Click any step to see the data at that point. Non-technical stakeholders can follow the logic.
Query Folding: Let the Database Do the Work
This is where Power Query gets clever. For supported data sources (SQL Server, PostgreSQL, Oracle, many others), Power Query translates your visual transformations into native database queries.
WHERE clause to the database and retrieves only what you need.Right-click any step and look for "View Native Query." If it's available, that step (and all steps before it) are folding to the database. If it's grayed out, Power Query is processing locally. Structure your transformations to keep folding active as long as possible.
Direct Integration with Power BI
For analysts building dashboards, Power Query eliminates the gap between data prep and visualization:
- Connect to your sources in Power Query
- Transform data with visual steps
- Load directly into Power BI data model
- Build visualizations immediately
- Schedule refresh to keep data current
No intermediate files. No manual exports. Changes in your transformation automatically flow to dashboards.
- Visual interface accessible to non-programmers
- Reproducible, self-documenting transformations
- Query folding optimizes database queries automatically
- Direct integration with Excel and Power BI
- Handles diverse source types in single workflow
- Performance degrades with very large local transformations
- Advanced scenarios require learning M language
- Version control is challenging (files are binary)
- Debugging complex queries can be difficult
- Limited compared to Python for statistical analysis
Power Query bridges the gap between manual Excel work and full programming solutions. It's visual enough for business users, powerful enough for complex ETL, and integrated enough that your transformations flow directly into reports.
Let me walk you through a real transformation workflow. We'll start with a messy data file and create clean, analysis-ready output.
Connecting to Data
In Power BI Desktop (or Excel):
- Click Home → Get Data
- Choose your source type (Excel, CSV, Database, Web, etc.)
- Navigate to your file or enter connection details
- Select the tables/sheets you want
Basic Transformations
Here's the sequence I follow for most data cleaning tasks:
- Dates showing as text → Change to Date
- Numbers showing as text → Change to Decimal/Whole Number
- IDs that should stay as text → Keep as Text (avoid leading zero problems)
- Right-click column → Replace Values → Replace
nullwith your default - Or filter: Click column dropdown → uncheck
(null)
Example: Cleaning Asset Data
Here's a real example from my Toyota Lift work. Raw asset data came like this:
| asset_id | asset_name | acq_date | status | loc |
|---|---|---|---|---|
| A001 | Forklift Model X | 01/15/2023 | active | WH1 |
| A002 | 2023-02-20 | ACTIVE | WH2 | |
| A001 | Forklift Model X | 01/15/2023 | active | WH1 |
| A003 | Pallet Jack Y | NULL | inactive |
Problems: duplicates, inconsistent date formats, inconsistent case, nulls, missing values.
- Promoted Headers (first row had column names)
- Changed Type: Set
acq_dateto Date (handles both formats) - Replaced Values: In
loccolumn, replacednulland blank with "UNKNOWN" - Transformed:
statuscolumn → Transform → Lowercase (standardizes case) - Filled Down: For
asset_name, used Fill → Down to carry values into blank rows - Removed Duplicates: Selected
asset_id→ Remove Duplicates
Result:
| asset_id | asset_name | acq_date | status | loc |
|---|---|---|---|---|
| A001 | Forklift Model X | 2023-01-15 | active | WH1 |
| A002 | Forklift Model X | 2023-02-20 | active | WH2 |
| A003 | Pallet Jack Y | null | inactive | UNKNOWN |
Clean, consistent, deduplicated — ready for analysis.
Power Query transformations follow a predictable pattern: set headers, fix types, handle nulls, remove duplicates, rename columns. Master these basics, and you can clean 90% of the messy data you'll encounter.
One of Power Query's biggest strengths is its ability to connect to diverse data sources in a single workflow. Here's what I use in enterprise environments.
Databases
- Server name and port
- Database name
- Credentials (Windows or database authentication)
Home → Get Data → Database → PostgreSQL
Enable query folding by keeping transformations simple early in your query. Complex M functions break folding, forcing Power Query to download full tables before filtering.
Files
Get Data → Folder → [Select folder path]
Power Query creates a function that applies the same transformation to each file, then appends results. I use this for monthly report files that arrive with the same structure.
Cloud Sources
Get Data → Web → [Enter URL]
For APIs returning JSON, Power Query automatically parses the structure. You'll need to expand nested records and lists.
Combining Multiple Sources
The real power comes from combining sources. In my Toyota Lift work, I merged:
- PostgreSQL table: Current asset inventory
- Excel file: Legacy asset data from acquired company
- CSV export: Service history records
Steps to merge:
- Create separate queries for each source
- Clean each query independently
- Use Home → Merge Queries to join on common keys
- Select join type (Left, Inner, Full, etc.)
- Expand the merged columns you need
| Source Type | Query Folding | Best For | Watch Out For |
|---|---|---|---|
| SQL Server / PostgreSQL | Excellent | Large transactional data | Network latency, credentials |
| Excel / CSV | None | Reference data, ad-hoc files | File location changes, format drift |
| Folder (multiple files) | None | Monthly/daily file drops | Inconsistent file structures |
| SharePoint | Limited | User-maintained lists | Permission issues, rate limits |
| REST APIs | None | Real-time external data | Authentication, pagination, rate limits |
Power Query shines when you need to combine data from multiple sources — databases, files, and cloud services — into a single, clean dataset. Plan your connections around query folding: do database filtering early, complex transformations later.
This is where Power Query goes from useful to essential. Data quality issues — duplicates, nulls, type mismatches, orphaned records — kill report credibility. I build validation into every Power Query workflow.
Identifying Missing Values
Quick null check for any column:
- Select column
- View → Column Quality (shows % valid, error, empty)
- View → Column Distribution (shows value counts including nulls)
For systematic validation, create a validation query:
// M code to count nulls per column
let
Source = YourMainQuery,
ColumnNames = Table.ColumnNames(Source),
NullCounts = List.Transform(ColumnNames, each
[Column = _, NullCount = List.Count(List.Select(Table.Column(Source, _), each _ = null))]
),
Result = Table.FromRecords(NullCounts)
in
Result
This produces a table showing null counts by column — your data quality scorecard.
Catching Duplicates
Duplicates are subtle. A record might be truly duplicated (error) or legitimately appear multiple times (valid). Context matters.
- Select your key column(s)
- Home → Keep Rows → Keep Duplicates
- Review the results
If you have duplicates with different values in non-key columns, decide your rule:
- Keep first? Sort first, then remove duplicates.
- Keep latest? Sort by date descending, then remove duplicates.
- Keep most complete? This requires custom M logic.
modified_date. Power Query handled this with a sort before deduplication.Data Type Validation
Type mismatches are silent killers. A date stored as text won't sort correctly. A number with leading zeros stored as a number loses the zeros.
- Transform → Detect Data Type on the whole table
- Check for columns that became "Any" type (mixed types in data)
- For each problem column, use Add Column → Column From Examples to extract the correct format, or Add Column → Custom Column with explicit type conversion
// Safe type conversion with error handling
try Number.From([Amount]) otherwise null
This converts the value if possible, returns null if not — no query failures.
Building Validation Queries
I create separate validation queries that run alongside main data queries:
= Table.SelectRows(CleanedAssets, each [AssetID] = null or [AssetID] = "")
= Table.SelectRows(
Table.Group(Transactions, {"TransactionID"}, {{"Count", each Table.RowCount(_)}}),
each [Count] > 1
)
= Table.NestedJoin(ServiceRecords, "AssetID", Assets, "AssetID", "AssetMatch", JoinKind.LeftAnti)
Data validation isn't optional — it's how you build trust in your reports. Build validation queries that run with every refresh, surface issues in dashboards, and catch problems before stakeholders see wrong numbers.
Once you've mastered basics, these transformations unlock complex scenarios.
Pivot and Unpivot
| Product | Jan_Sales | Feb_Sales | Mar_Sales |
|---|---|---|---|
| Widget A | 100 | 150 | 120 |
| Product | Attribute | Value |
|---|---|---|
| Widget A | Jan_Sales | 100 |
| Widget A | Feb_Sales | 150 |
| Widget A | Mar_Sales | 120 |
Now you can filter by month, create time series, and build proper visualizations.
Merging and Appending
Merge join types:
- Left Outer: All rows from first table, matching rows from second
- Inner: Only rows that match in both tables
- Full Outer: All rows from both tables
- Left Anti: Rows from first table with NO match in second (great for finding orphans)
Merging large tables locally (without query folding) is slow. If both tables come from the same database, merge in the database using SQL instead of Power Query.
Custom Columns and Conditional Logic
Simple conditional:
if [Status] = "active" then "In Service" else "Out of Service"
Multiple conditions:
if [DaysSinceService] > 365 then "Overdue"
else if [DaysSinceService] > 180 then "Due Soon"
else "Current"
Date calculations:
Duration.Days(DateTime.LocalNow() - [LastServiceDate])
Text manipulation:
Text.Upper(Text.Trim([CustomerName]))
Parameters for Dynamic Queries
Parameters let you create flexible, reusable queries.
- Home → Manage Parameters → New Parameter
- Set name, type, and default value
- Use in queries:
= FilterDate(parameter name)
// Using a parameter in a query
let
ServerName = ServerParameter, // References the parameter
Source = Sql.Database(ServerName, "SalesDB"),
...
Custom Functions
For transformations you repeat across multiple queries, create custom functions.
// CleanPhoneNumber function
(phoneText as text) as text =>
let
DigitsOnly = Text.Select(phoneText, {"0".."9"}),
Formatted = if Text.Length(DigitsOnly) = 10
then Text.Format("(#{0}) #{1}-#{2}",
{Text.Range(DigitsOnly,0,3),
Text.Range(DigitsOnly,3,3),
Text.Range(DigitsOnly,6,4)})
else DigitsOnly
in
Formatted
Invoke in any query:
= Table.AddColumn(Source, "CleanPhone", each CleanPhoneNumber([RawPhone]))
Advanced transformations — pivoting, merging, custom columns, parameters, functions — let Power Query handle complex scenarios that would otherwise require code. Learn these patterns, and you can solve most data integration challenges visually.
Let me walk through the actual data integration project I completed during the Toyota Lift / acquired company merger.
The Challenge
When Toyota Lift Northeast acquired a regional equipment dealer, we inherited:
- Legacy asset database: Different field names, different ID formats, incomplete data
- Target system (Toyota Lift): Standardized fields, comprehensive service history
- Business requirement: Single unified view of all assets for reporting
The two systems had grown independently for years. Field mappings weren't documented. Some assets existed in both systems with different IDs.
The Approach
I built a Power Query solution with three main queries:
Source: PostgreSQL connection to legacy database
Transformations:
- Renamed columns to match target schema
- Converted legacy date format (MM/DD/YYYY) to standard
- Mapped legacy status codes to target status values
- Created composite key from legacy fields for matching
- Flagged records with critical nulls
Source: SQL Server connection to Toyota Lift system
Transformations:
- Filtered to active assets only
- Standardized location codes
- Created matching key for merge
Merge: Full outer join on matching key
- Matched records: Use target system as primary, legacy as supplement
- Legacy-only records: Flag for review and migration
- Target-only records: Carry through as-is
Additional transformations:
- Coalesce: Use target value if available, else legacy
- Calculate data completeness score per record
- Flag duplicates for manual review
Validation Queries
Alongside the main queries, I created validation outputs:
These fed a Data Quality dashboard that stakeholders reviewed weekly during the migration.
Results
The hardest part wasn't the transformation logic — it was building confidence that the merged data was correct. Power Query's step-by-step visibility let stakeholders trace any number back to its source. That transparency was worth more than any fancy algorithm.
Lessons Learned
Real enterprise data integration is messy. Power Query's strength is making that mess visible and manageable — step-by-step transformations, validation queries that surface issues, and outputs that stakeholders can trace back to sources.
I use all three regularly. Here's my decision framework.
| Factor | Power Query | SQL | Python (Pandas) |
|---|---|---|---|
| Learning Curve | Low — visual interface | Medium — query language | High — programming required |
| Performance (large data) | Limited locally | Excellent (database-side) | Good with optimization |
| Reproducibility | Good — saved queries | Good — stored procedures | Excellent — version control |
| Flexibility | Medium | Database operations only | Unlimited |
| Integration | Excel, Power BI | Database tools | Notebooks, scripts, apps |
| Collaboration | File-based sharing | Database-based | Git-based |
When I Choose Power Query
- Multiple source types: Combining database + Excel + CSV in one workflow
- Power BI destination: Direct integration, scheduled refresh
- Non-technical stakeholders: Need to understand and validate logic
- Quick iteration: Prototyping transformations visually
- No database write access: Read-only data preparation
When I Choose SQL
- Large datasets: Millions of rows where performance matters
- Single database source: All data in one place
- Shared transformation logic: Multiple reports need same cleaned data
- Complex joins and aggregations: SQL syntax is more expressive
- Database write operations: Creating tables, updating records
When I Choose Python
- Statistical analysis: Beyond simple aggregations
- Machine learning prep: Feature engineering, model input
- Custom algorithms: Logic that doesn't fit visual tools
- Version control needed: Git-based collaboration
- Automation: Scheduled scripts, CI/CD pipelines
Combining Approaches
Often, I use multiple tools together:
At UMBC, I used:
- PostgreSQL (SQL): Store incident data, run complex queries
- Power Query: Prepare data for Power BI dashboards
- Python (Pandas): Exploratory analysis and pattern detection
The tools complement each other. Choose based on the task, not tribal loyalty.
When Power Query performance is slow, check if you can push logic to SQL. Add a database view or stored procedure that does heavy transformations, then connect Power Query to the already-transformed output.
Power Query, SQL, and Python each have strengths. Choose Power Query for multi-source integration and Power BI workflows, SQL for large-scale database operations, Python for statistical analysis and custom algorithms. Best results come from combining them strategically.
After years of Power Query work, these are the mistakes I see most often.
- Loading full tables when you only need a subset — kills performance
- Breaking query folding with complex M functions early in the query
- Not handling null values explicitly — they propagate and cause errors
- Hardcoding file paths instead of using parameters
- Creating transformation spaghetti instead of clean, staged queries
- Skipping data validation — assuming source data is clean
- Using Remove Duplicates without understanding which row is kept
- Not testing with realistic data volumes
Mistake: Over-Fetching Data
WHERE clause in your SQL query:// Instead of filtering after loading
Source = Sql.Database("server", "database"),
Table = Source{[Schema="dbo",Item="HugeTable"]}[Data],
Filtered = Table.SelectRows(Table, each [Year] = 2025)
// Do this — filter in the SQL
Source = Sql.Database("server", "database",
[Query="SELECT * FROM HugeTable WHERE Year = 2025"])
Mistake: Breaking Query Folding
- Custom functions with complex logic
Table.Buffer(forces full load)- Some text transformations
- Database filters and simple transformations first (these fold)
- Complex M logic after (runs locally on smaller dataset)
- Check "View Native Query" to verify folding
Mistake: Transformation Spaghetti
- Raw_SourceA: Connect and do minimal cleaning
- Clean_SourceA: Apply business transformations
- Merged_Data: Combine clean sources
- Final_Output: Add calculated columns, final formatting
Reference earlier queries in later ones. If something breaks, you know exactly where.
Mistake: Ignoring Nulls
- Math on null returns null (quietly)
- String functions on null often error
- Joins on null don't match (null ≠ null in joins)
- Replace with defaults before operations
- Use null-safe functions:
if [Field] = null then 0 else [Field] - Filter out nulls if they're invalid for your use case
Mistake: Hardcoded File Paths
C:\Users\Pooja\Downloads\data.xlsx. Works on your machine, breaks everywhere else.- Create a parameter:
DataFolderPath - Reference in query:
= Excel.Workbook(File.Contents(DataFolderPath & "data.xlsx")) - Update parameter when deploying to another machine or server
Most Power Query performance and reliability issues trace back to a few common mistakes: over-fetching data, breaking query folding, messy query structure, unhandled nulls, and hardcoded paths. Avoid these, and your queries will be faster and more maintainable.
- 01Power Query is Microsoft's ETL engine built into Excel and Power BI — visual, reproducible, and no-code accessible
- 02Connect to 100+ source types and combine them in single workflows — databases, files, APIs, cloud
- 03Query folding translates visual transformations to database queries — keep it active for performance
- 04Build validation into every workflow — null checks, duplicate detection, referential integrity
- 05Advanced features (pivot/unpivot, merge, parameters, functions) handle complex scenarios visually
- 06Choose Power Query for multi-source integration and Power BI; SQL for large database ops; Python for statistics
- 07Avoid common mistakes: over-fetching, breaking folding, transformation spaghetti, ignoring nulls, hardcoding paths
- 08The real power is reproducibility — anyone can trace a number back through documented transformation steps
What is Power Query?
Power Query is Microsoft's data connectivity and transformation engine, built into Excel (as Get & Transform Data) and Power BI Desktop. It provides a visual interface to connect to 100+ data sources, clean and reshape data, and load the results for analysis — all without writing code. Behind the scenes, it uses a functional language called M.
Is Power Query free?
Yes. Power Query is included free in Excel (2016 and later, Microsoft 365) and Power BI Desktop. You don't need additional licenses. For enterprise deployments with scheduled refresh and collaboration, you'll need Power BI Pro or Premium licenses, but the Power Query functionality itself is included.
What is the M language?
M (officially: Power Query Formula Language) is the functional programming language that Power Query uses behind the scenes. Every visual transformation generates M code. You can edit M directly in the Advanced Editor for scenarios the visual interface doesn't support. Learning basic M unlocks custom functions, dynamic parameters, and advanced transformations.
Power Query vs Power Pivot — what's the difference?
Power Query is for data preparation (ETL): connecting to sources, cleaning, transforming, shaping data. Power Pivot is for data modeling: creating relationships between tables, defining calculated columns and measures with DAX, building analytical models. They work together: Power Query prepares the data, Power Pivot models it, Power BI/Excel visualizes it.
Can Power Query handle large datasets?
Yes, with caveats. For database sources with query folding, Power Query can handle very large datasets because transformations run on the database server. For local transformations (Excel files, CSV), performance depends on your machine's memory. For millions of rows locally, consider filtering early or using SQL to pre-aggregate.
How do I refresh Power Query data automatically?
In Excel: Data → Queries & Connections → right-click query → Properties → set refresh interval. In Power BI Service: Dataset settings → Scheduled refresh → configure time and frequency. Note: scheduled refresh requires Power BI Pro/Premium and a gateway for on-premises sources.
Can Power Query connect to REST APIs?
Yes. Use Get Data → Web and enter the API URL. Power Query handles JSON and XML responses natively. For authenticated APIs, you may need to add headers or use Web.Contents with options. Pagination requires custom M code to loop through pages.
- 01Power Query Documentation — Microsoft (2026)
- 02M Language Specification — Microsoft (2026)
- 03Query Folding in Power Query — Microsoft (2026)
- 04Power BI Documentation — Microsoft (2026)
- 05Data Transformation Best Practices — SQLBI (2026)
- 06Power Query Cookbook — Andrea Janicijevic (2021)