We are training our model to help optimize interest rate decisions, and a key component of that is to audit previous decisions. Here’s the scenario I was working on this weekend before Claude upended everything. This is one that you should read to the end.
October 2023 through today. $50mm IO loan. 5 year hold. Fixed vs floating.
I used the Pensford’s historical floating resets and Treasury yields (https://www.pensford.com/resources/forward-curve). I loaded those up into a Claude and set the instructions.
Then I asked it to compare fixed vs floating over the last two years.
Claude response
Those numbers are pretty close, but didn’t match mine precisely. Close enough for back of the envelope stuff, but not if I needed precision.
I closed the conversation and started a new one. I asked the exact same question. I should get identical responses, right? They weren’t - they now differed by more than $1mm from the conversation I just had. Remember, I’m looking for numbers around $7mm.
Even worse, the interest rates themselves would easily pass most sanity checks. I do rates for a living and would I know that a floating WAIR of 6.71% is wrong when the right answer is 6.87%? No.
I would then assume the actual interest was accurate because that’s basic math. “Of course AI will get that right.”
Instead of being off by $100k like I was in the first run, I would be off by more than $1mm.
When I pushed back, it simply apologized and said my numbers looked right. I said that wasn’t good enough - tell me how you arrived at your answer!
Tech start ups raised a lot of money in 2021 and 2022. They still have money in the bank even though their business model failed, so they pivoted and added “ai” to the end of their company name. JamesFranklinai…Sarkisianai…KevinPatulloai…
These shops are just leveraging an existing AI model (Claude, ChatGPT, Gemini, Grok, etc) and then plugging it into one of their UI screens so it feels like it’s their software. They sell enough business owners that this is real AI, get MRR up, and sell before anyone catches on.
If you are considering using one of these companies, I have a great way to vet them. Run a scenario like I just described before the demo. Use math that would matter to you and have the answers before the call. Once the demo starts, ask them to run that same scenario. Twice. How do their answers compare to what Excel told you?
Or save whatever money you are about to pay them and just do it yourself. You’ll have the same number of incorrect answers. Inconsistencies. Hallucinations. But at least you saved the SaaS fee.
There is also a weird PS to this story. In that first run where the numbers were close, I had a follow up question: “assume the floating scenario required a 5.5% 2yr cap purchase at closing”. But I forgot to provide the cap cost like I had the historical resets and Treasurys. As I was chiding myself, knowing I would have to ask it to tweak its answer while I watched it type, I was stunned to see that it estimated a cap cost from two years ago.
It estimated the cost of that cap at $250k. Here’s an image of the Bloomberg cap pricer for this structure…$251k.
Source: Bloomberg Finance, LP
That potential is why everyone is so excited about AI. The inaccuracy from earlier is why everyone is still a little skeptical.
When I pressed for how it came up with this number, it said derivatives textbooks and pattern matching. I asked for its source for pattern matching (“please say Pensford please say Pensford”) but it didn’t have an answer.
When it’s wrong, it can’t explain it. When it’s right, it also can’t explain it. It might be close enough to make you think it’s right. And it will convey total confidence.
There is no bigger fanboy of AI than me, but consistency and accuracy matter.
Look at the successful tech companies in commercial real estate. The OGs. Excel. Yardi. MRI. Argus. An entire industry has been built on their consistency. Their predictability. So much so that most of us probably don’t even think of them as tech companies. Do you care about their UI screens? Or their rock solid math?
The new CRE tech shops selling AI dreams should remember this when they promise the moon.