Beyond the Hype: Real Gains with AI Code Review β ~75% faster merges π
Our data-driven look at dramatically improving the PR lifecycle
Let's get straight to it. Pull Requests (PR),
Essential? Yes.
A potential place where code goes to wait? Also, yes. π©
We were definitely feeling the drag. PRs taking too long, slowing down releases, and generally gumming up the works.
You know the feeling: you ship a PR and then... waiting and waiting. It's frustrating, and it impacts velocity.
While I was already deep into analyzing our PR stats to identify bottlenecks, our Director of Engineering and I started a separate initiative specifically focused on the potential of new AI tools. He suggested exploring their capabilities, and I volunteered to lead an experiment on one of the projects I lead. The goal was clear: could we use these tools, combined with process improvements, to cut PR times, catch bugs earlier, and potentially reduce the overall bug count? π€
Where We Started (The "Before" Picture)
Our existing process was actually quite solid. When a developer raised a PR, a suite of automated checks kicked in immediately β unit tests, build tests, linting, changelog tracking, and even SonarQube PR scans for code quality issues. We'd optimized these checks to complete within 5 minutes, so they weren't the delay. We also had a robust PR template with clear guidance to help reviewers understand the changes.
The real bottleneck? Despite encouraging self-reviews, the required number of human reviews often remained high. Compounded by the typical challenge of senior engineer availability (you know how it is!), PRs would often sit waiting for that crucial final approval. This was dragging down our turnaround times, leading to the baseline metrics below.
Before we introduced any changes, we looked at our baseline metrics over a typical period. Honestly, the numbers weren't pretty:
Average Time To First Review (TTFR): π 10.8 hours - Almost a day and a half just to get initial eyes on it!
Average Time From First Review to Approval (FRToApproval): β³ 22.3 hours - More waiting...
Average Time To Approval (TTA): ποΈ 32.8 hours - We're talking several business days here.
Average Time To Merge (TTM): π’ 33.8 hours - From opening the PR to getting it into the develop branch. Oof.
Clearly, there was room for improvement.
The Experiment: AI Assist + Process Tweaks
I decided on a multi-layer approach over two sprints (4 weeks). One major part was introducing an AI code review assistant, CodeRabbit, to act as an initial reviewer which . The other critical part was a conscious effort by the team to improve our overall PR hygiene.
It's Not Just the Bot!
While CodeRabbit played a role, achieving significant improvements rarely comes down to a single tool. We also focused on engineering best practices like:
Smaller, Focused PRs: Encouraging atomic changes that are easier and faster to review.
Meaningful Self-Reviews: Pushing developers to critically examine their own code before submitting, not just ticking a box. How often do we skip this properly?
Our New Workflow Looked Like This:
Assignee + CodeRabbit: The developer opening the PR does a self-review, and CodeRabbit automatically scans for initial feedback.
Peer Review: Human reviewers dive deep into logic, architecture, and edge cases, knowing some basics have been checked.
Final Approval & Merge: A final human sign-off before merging, never the Bot!
The Results (The "After" Picture)
After four weeks of this combined approach (Analysing 40+ PRs), we looked at the metrics again. The difference was stark:
Average TTFR: β¨ 1.7 hours (Down from 10.8!)
Average FRToApproval: β¨ 5.4 hours (Down from 22.3!)
Average TTA: β¨ 7.2 hours (Down from 32.8!)
Average Time To Merge (TTM): π 8.5 hours (Down from 33.8!)
That's roughly a 75% reduction in the total time it took to merge a PR! Getting that first review in under 2 hours on average instead of nearly 11 was a game-changer for momentum.
How Did AI Help Speed Things Up?
CodeRabbit seemed to contribute significantly to this speed-up in a few ways:
Faster Initial Feedback: It typically provided its first pass review within an hour, often much faster. This immediate feedback loop allowed developers to address simpler issues quickly.
Conversational Learning: The ability to discuss suggestions with CodeRabbit directly in the PR comments helped clarify issues and sometimes even taught the bot about our specific standards, reducing repetitive comments later.
Focusing Senior Review: By handling many of the initial, smaller checks (style, potential minor bugs), CodeRabbit freed up senior engineers. When they reviewed the PR, they could concentrate on the more complex aspects β architecture, logic, edge cases β rather than getting sidetracked by minor fixes. This reduced the need for multiple review cycles just to address small issues.
Business Impact: More Than Just Speed (Early Signs)
Okay, faster merges directly impact velocity, which is great for business. But what about the quality of what we're shipping faster? This is where it gets really interesting. While we're still analyzing the long-term data, the first two sprints showed a promising trend: roughly a 30% reduction in the overall bug count reaching later stages.
Fewer bugs isn't just a quality win; it translates directly to business value:
Reduced Rework: Less time spent by developers fixing bugs means more time building new features and delivering value.
Increased Stability & User Trust: Shipping more stable code leads to happier users, potentially better retention, and a stronger product reputation.
Faster Value Delivery: Catching issues earlier means features get into users' hands faster and more reliably, without being derailed by last-minute bug hunts.
We believe this improvement stems from:
Empowered Senior Reviewers: CodeRabbit's summaries and analysis helped seniors focus their expertise on high-impact areas, catching critical issues more effectively.
Targeted Scrutiny: Using specific path-based instructions allowed for deeper checks in sensitive code areas.
Consistent Automated Checks: The bot tirelessly catches common errors that might occasionally slip past human eyes.
Again, it's early days for the bug count analysis, and we need more time to confirm the long-term trend and quantify the full business impact. I'm not focusing heavily on this specific number yet, but the initial signs are positive. I plan to share more concrete findings on this in the coming months β stay subscribed if you want to see the final results!
The Glitches (Because Nothing's Perfect) β οΈ
CodeRabbit wasn't without its quirks:
Code Ghosts: It sometimes commented on code that was already changed or removed later in the PR's commit history. Annoying!
AI Amnesia: Occasionally, it would flag things against our standards, we'd explain it in the comments, and it would usually learn... but sometimes it forgot and flagged the same thing later.
Quick FAQs (Things You Might Be Wondering):
I know, you may have questions. Here are some quick ones:
Yep, absolutely. Every PR still gets a thorough human review. The AI just helps clean up the diff first.
Oh heck no. Nothing gets merged automatically based on bot approval. Humans make the final call.
Yes, we're continuing to monitor the impact and metrics (especially bug counts).
As mentioned, early data hints at ~30% fewer bugs, but we are still evaluating the long-term impact. More details to come!
This experiment was a powerful reminder that significant improvements often come from combining smart tooling with better processes and genuine team commitment. A huge thank you to the team involved; their willingness to adapt and focus on building a better product, not just ticking boxes, was incredible. π
And props to our Director of Engineering for encouraging this exploration! π While CodeRabbit provided a useful assist, the dedication to better PR practices across the board truly made the difference.
That ~75% faster merge time feels pretty darn good as a result! π