M
Modularā€¢8mo ago
bunny

software cost estimating

That article does not understand software economics. COCOMO is used as a very rough estimate for early development, when most of the work is cookie-cutter code. It is also generally disputed and not viewed as accurate for projecting real effort. For that matter, LOC is not a particularly good metric for assessing a code base: not for assessing concepts of "maintainability" or "cost" or "defect rate" or any other metric of code health. It is often used as a base starting point (i.e., given everything else being identical, a code base 2x as large will cost roughly 2x as much), but all the other factors are often considered more important. "All the other factors" include things like modularity, code complexity (many methods of measurement), token count (similar to line count, but not just measuring lines; i.e., some lines have a LOT of tokens, while other lines have only one token), and more. I used to work for a company that had many patents around concepts of code complexity. We were one of many companies with a "cost estimator" tool, too. And we all knew (and openly admitted) that our estimator (just like all others) was very, very, very rough. I.e., cost could easily be 50% of our estimate or 2x our estimate -- the statistical deviation was wide. Our statistical relevance was so-so at best. Same with other models. Finance people still loved the models because you have to assess cost projections somehow, even when you know the estimates are widely flawed. Side note: this is one of the many "holy grail" targets. If you can make an accurate (good luck) cost projection tool to help organizations identify "cost to complete a software project," then you have a product you can sell for $$. Just don't step into the ring assuming that nobody has tried. There are literally dozens of companies shilling various models, and they all sorta suck. Very wide margins-of-error (statistical deviations). As of a couple years ago, there was not a product in existence that is substantially more accurate than quality engineering managers giving a "guestimate" based on decades of experience. Some companies sell massive reports that combine all of the current estimation methods (we sold our estimates to a couple prominent report-generation companies, like Black Duck), but there is no single "this number is pretty good" solution. At least not as of ~2 years ago; last time I was working in that field.
60 Replies
curiosity.fan
curiosity.fanā€¢8mo ago
thank you
bunny
bunnyOPā€¢8mo ago
I am well aware that people will come in here saying:
Well my org uses ABC-XYZ Method, and it's been great for us.
Super cool. I am still confident that ABC-XYZ Method, when used on tens of thousands of code bases, will come up as "meh" at best. I say that because we ran all of the methods with open math against 16,000 code bases, and saw widely varying results. And we had the real cost results (from clients) for ~1k code bases. None of the models were consistently accurate.
curiosity.fan
curiosity.fanā€¢8mo ago
So Would you have a guesstimate for how much LLVM cost was ?
bunny
bunnyOPā€¢8mo ago
I guess one closing comment: Black Duck's "just get all the methods into one report" was probably the best. It just gives Merger & Acquisitions (M&A) teams a TON of data to consider. The M&A teams can decide what they feel is the best cost estimates for the software they're acquiring. But really: get some senior managers who've been through stuff and can give "A Grizzled Vet's Estimate." oh hell no šŸ˜‚ Way out of my expertise. I was part of the team that was building the estimation software, but like we were implementing the math from "white paper" into "code." And scraping for code bases. And running tests. Doesn't mean I know the stuff; I couldn't fully invent or even explain the best actual software estimation models, let alone try to assess cost on a project. šŸ˜„ But Chris might have perspective on real costs. Maybe the team kept time & resource logs -- both human and infrastructure. I don't know. I guess given the wide deviations we saw in our lab experiments, and that COCOMO estimate of $500m, I'd hazard a wide-ranging guess of:
Anywhere from $250m (1/2) to $1b (2x).
šŸ˜‚ my cheap cop-out answer oh, another HUGE complicating factor:
How bullet-proof must this software be?
I.e., building 100k-LOC video game is much cheaper per LOC than building a 100k-LOC military aircraft controller -- lives are at stake, the software is classified, testing is rigorous, government regulations must be followed, etc, etc, etc, etc.
Darkmatter
Darkmatterā€¢8mo ago
The GHC estimate is forgetting almost all of those devs are PhDs who are doing the development on grant funding.
curiosity.fan
curiosity.fanā€¢8mo ago
This makes me realize how valuable LLVM was, and at the same time how cheap it is given that it runs almost all silicon on earth.
bunny
bunnyOPā€¢8mo ago
I think the article was playing the "what if Microsoft had made it internally" type of analysis game.
Darkmatter
Darkmatterā€¢8mo ago
Have you seen some of the stuff in GHC?
bunny
bunnyOPā€¢8mo ago
That was almost an internet meme in the 90s, like with Linux and the rise of FOSS.
Darkmatter
Darkmatterā€¢8mo ago
There are haskell features from the 90s which were considered new and confusing when Scala tried to introduce them to a general audience decades later.
bunny
bunnyOPā€¢8mo ago
I don't recall the acronym "GHC" off-hand.
Darkmatter
Darkmatterā€¢8mo ago
Haskell
bunny
bunnyOPā€¢8mo ago
oh! The Haskell thing. Yeah. I saw that one.
Darkmatter
Darkmatterā€¢8mo ago
It's the haskell compiler.
bunny
bunnyOPā€¢8mo ago
I don't know much about it. but I know it has some cost and complexity estimators included, right?
Darkmatter
Darkmatterā€¢8mo ago
Haskell is where all of the programming language nerds with Math or CS PhDs do development.
bunny
bunnyOPā€¢8mo ago
But it's also really 100% Haskel-World, so most of our business model just wasn't in that space.
Darkmatter
Darkmatterā€¢8mo ago
Having anyone else try to replicate that compiler would be massively expensive.
bunny
bunnyOPā€¢8mo ago
Most of our target was "stuff that needs total rebuild" -- Cobol, Java, Perl/Py/Rb/JS/TS, and others. So we'd try to give companies an estimate on "what will it really cost you to rebuild your stuff in a different language."
Darkmatter
Darkmatterā€¢8mo ago
Really, Java? Running from memory costs?
bunny
bunnyOPā€¢8mo ago
But our main product wasn't so much about $$ cost (tough to estimate; we gave ours as a freebie add-on) -- our main product was more about refactoring code along the way. I.e., so long as you're shifting from Cobol to X-Language (whatever one), then how do you assess code maintainability and make future migration easier. and slow. Java is sooooo slow (for a compiled language).
Darkmatter
Darkmatterā€¢8mo ago
Once the jit gets going it can get pretty close to C perf it you're not horribly abusing it.
bunny
bunnyOPā€¢8mo ago
my Java buddies always tell me "Java is super fast. I took this Python script and rewrote it...."-- Lemme stop you there.
Darkmatter
Darkmatterā€¢8mo ago
I'm a C/C++/Zig/Rust person. Java is the level directly up from that due to the giant amount of money in that JIT optimizer. Now, you can't write "Clean Code" (tm), since that is actively performance harmful.
bunny
bunnyOPā€¢8mo ago
I was C/C++, but gladly moved closer to pure data work -- lot of Python. And I've played with Rust (I admire that language a ton). I use Go a lot too, but it seems to be stalling out a bit (dunno, just a personal feeling). All I know is we had a significant chunk of customers seeking estimates on Java rebuilds.
Darkmatter
Darkmatterā€¢8mo ago
Go for me came down to feeling like the language was for people who didn't want to learn new things. See the current "iterators are too functional" debate.
bunny
bunnyOPā€¢8mo ago
That's why I liked it. šŸ˜‚ Like sometimes I just wanted "fast Python" without making custom C/C++ py modules. And now we know my obsession with Mojo, even if I've been distracted from it for the last month. šŸ™‚ hope you realize I'm mostly joking with the "that's why I liked it" quip -- I love learning new stuff, but clients don't always love it
Darkmatter
Darkmatterā€¢8mo ago
Clients don't like new things, I agree. I've spent the last several weeks convincing a client I'm not a snake oil salesman because I can get over 500k messages per second into a server. "Your phone company uses this library" wasn't helpful.
bunny
bunnyOPā€¢8mo ago
I cannot begin to express how excited I am to change my business model to "I help peeps convert from Python to Mojo" -- that's some consulting engagements I think would be really fun. To include the coaching/handoff where you ensure the org's tech-team can handle the new code and you don't become a maintenance engineer by proxy.
Darkmatter
Darkmatterā€¢8mo ago
I'm torn on whether python people will be willing to use Mojo. There still seems to be a debate about whether types are good (they are), and they will add compile times.
bunny
bunnyOPā€¢8mo ago
yeah, introducing new libs can be scary. And I do get that the client just wants to control their tech stack -- any additions (including libs) introduces risk, cost, etc. But it's difficult to get them to shift.
Darkmatter
Darkmatterā€¢8mo ago
I'm concerned the notebook version of mojo will slow down over time and become unusable for the things I do usually use python for (data analysis).
bunny
bunnyOPā€¢8mo ago
oh, facts. But if we can get Mojo to truly be a super-set of Python, then I can convert people to The Mojo Way in a dark, sinister, and passive-aggressive way:
cp script.py script.mojo
mojo script.mojo
# cheer as it runs flawlessly *and* faster
cp script.py script.mojo
mojo script.mojo
# cheer as it runs flawlessly *and* faster
Darkmatter
Darkmatterā€¢8mo ago
I'm able to say "These companies are paying money to support the project, and it's a Linux foundation project, and it's already in your distro's repos" and still have issues:
No description
bunny
bunnyOPā€¢8mo ago
then, you start sprinkling in some types -- get a bit faster. Coach & educate about the side-benefits of types ( {we all sing in unison: safety, compile-time checks, ...} šŸ˜‚ ). Next, make a few def-to-fn changes.....
Darkmatter
Darkmatterā€¢8mo ago
Then it might work, hopefully.
bunny
bunnyOPā€¢8mo ago
could take a few years though -- superset of python, that is I mean, realistically, it could be a while till it's just s/py/mojo/
Darkmatter
Darkmatterā€¢8mo ago
Considering C++ can't maintain being a superset of C, longer than that.
bunny
bunnyOPā€¢8mo ago
but I'm patient. I'll play the long game.
Darkmatter
Darkmatterā€¢8mo ago
restrict, #embed, etc. How restrict isn't in yet baffles me.
bunny
bunnyOPā€¢8mo ago
if we can even get "superset-ish" then I'm beyond happy šŸ™‚
Darkmatter
Darkmatterā€¢8mo ago
Honestly, just making a sane package ecosystem would be great.
bunny
bunnyOPā€¢8mo ago
I decided during my last month away from Mojo that I want to help nibble around the edges of implementing all the pythonic stuff.
Darkmatter
Darkmatterā€¢8mo ago
If compiling from source is more reasonable then most of python's issues go away. I'm waiting for C interop, then I'll go port liburing and try to push that as the default way of doing io.
bunny
bunnyOPā€¢8mo ago
I saw Jack chatting about pixi. I'd be pretty happy if Modular worked with Pixel to get pixi to be Mojo's manager. It's highly inspired by Cargo, which I think is pretty much the gold standard of managers that I've dealt with.
Darkmatter
Darkmatterā€¢8mo ago
Cargo with the ability to specify more hardware targets would be nice.
bunny
bunnyOPā€¢8mo ago
re this: Over the last month, I've still talked about Mojo with clients and friends. They all have the same basic "but I couldn't do list comprehensions / other python features" as a reason they won't pursue it more. And I think winning over more Py peeps will really gain momentum for the language. So, gonna work it a bit. šŸ˜„
Darkmatter
Darkmatterā€¢8mo ago
For instance, specifying CPU and GPU targets
bunny
bunnyOPā€¢8mo ago
100% agree. So much agreement. All the agreement.
Darkmatter
Darkmatterā€¢8mo ago
I think I can help the most with providing networking that blows everything else away. I can port that RPC library that does 500k RPS (single core) to Mojo.
bunny
bunnyOPā€¢8mo ago
I haven't done much GPU stuff lately (other than basic s/pandas/cudf/ stuff), but I've done some WASM stuff. So the same can be said across a wide range of compile targets. oh! I lost track of time. Alarm just went off-- I have to jet for a call in 1 min. šŸ˜‚ šŸ‘‹
Darkmatter
Darkmatterā€¢8mo ago
Bye
bunny
bunnyOPā€¢8mo ago
quick addition: It really leaves me excited to see so many smart peeps contributing cool stuff to Mojo. I've done a few cool things in my past, but in very specific domains. And very not-applicable-to-general-world. I feel like my contribution surface is rather limited. But that doesn't mean I can't get all pumped up when I see folk like you doing such cool stuff. ā¤ļø
Darkmatter
Darkmatterā€¢8mo ago
| not-applicable-to-general-world Think about how most servers react to being fed 500k rps. The response from most engineers to that level of load is to declare a security incident. I'll do my best, but I'll probably need to throttle it down a bit.
bunny
bunnyOPā€¢8mo ago
feature flags:
from Darkmatters-Package import RPC_Server, LudicrousSpeed
my_rpc_server = RPC_Server(speed=LudicrousSpeed)
from Darkmatters-Package import RPC_Server, LudicrousSpeed
my_rpc_server = RPC_Server(speed=LudicrousSpeed)
šŸ˜‚
Darkmatter
Darkmatterā€¢8mo ago
and a quarter gigabyte of libraries...
bunny
bunnyOPā€¢8mo ago
oof
Darkmatter
Darkmatterā€¢8mo ago
userspace drivers are fun like that.
Moosems / Three chickens
Bunny lore has been divulged šŸ˜‰ I also noticed that it only used the line count at the most recent point and failed to take into account any and all refactoring over the years and additions/deletions in the past
Maxim
Maximā€¢8mo ago
Reminded me of my game dev days. We had multiple city builder games developed at the company. My team built a game with just ~12K LOC of production code and ~36K LOC of test code in one year. Other team had ~100K LOC production code and no tests in 2-3 years. Just looking at the LOC is like buying software by the byte šŸ˜„. 2GB = 2M $. šŸ˜œ

Did you find this page helpful?