In the mid nineties I worked in a research institute. There was a large shared Novell drive which was always on the verge of full. Almost every day we were asked to clean up our files as much as possible. There were no disc quota for some reason.
One day I was working with my colleague and when the fileserver was full he went to a project folder and removed a file called balloon.txt which immediately freed up a few percent of disk space.
Turned out that we had a number of people who, as soon as the disk had some free space, created large files in order to reserve that free space for themself. About half the capacity of the fileserver was taken up by balloon.txt files.
Perfect example of the tragedy of the commons. If individuals don’t create these balloon files then they won’t be able to use the file server when they need it, yet by creating these balloon files the collective action depletes the shared resource of its main function.
This is similar to how some government agencies retain their budgets.
At the end of the budget period they’ve only spent 80% of their allocated budget, so they throw out a bunch of perfectly good equipment/furniture/etc. and order new stuff so that their budget doesn’t get cut the following year, rather than accepting that maybe they were over-budgeted to begin with.
Rinse, repeat, thus continuing the cycle of wasting X% of the budget every year.
I think the problem is that you do not need 100% of your budget every year, but getting it back when you do need it is much harder than keeping it in the first place.
Definite case of misaligned incentives.
Yep! The problem happens when you divide the safety buffer up in the first place. Safety buffers demand to be shared, when one part does not use all of its safety margin you want to transfer that to another system.
Another surprising place where this happens is project scheduling. We budget time for each individual step of a project based on our guess of a 90% or 95% success rate, then our “old-timers’ experience” kicks in and we double or triple our time for all the steps together, then our boss adds 50% before giving the estimate to their boss, which sounds gratuitous but it is to protect you because their boss looks at how grotesquely long the estimate is and barks out a cut of 20%, so the overall effect of those two is (3/2) × (4/5), so your boss still netted you a 20% buffer while making the skip-level feel very productive and important.
Say the 50%-confidence-to-95%-confidence gives you 30% more time as safety buffer, and you only double the estimate, and the work that you missed in your initial assessment, while it’s not gonna be say half the project, maybe generously it’s a third of the project or so. So the project actually takes time 1.5 measured properly, you have together budgeted 1.3 × 2 × 1.2 = 3.12 time. The total project deadline is more than half composed of safety buffer. And we still consistently overrun~!
But if Alice needs to work on some step after Bob, and Bob finishes early, when does Alice start on it? Usually not when Bob finishes. Alice has been told that Bob has until X deadline to complete, and has scheduled herself with other tasks until X. Bob says “I got done early!” and Alice says “that’s great, I’m still working on other things but I will pick my tasks up right on time.” Bob’s safety buffer gets wasted. This does not always cause any impact to the deadline, but it does for the important steps.
Of course, if you are a web developer you already know this intuitively because you work on servers, and you don’t run your servers (Alice, for example) at 100% load, because if you do then you can’t respond to new requests (Bob’s completion event) with low latency. It’s worth thinking about, in an efficient workplace, how much are you not working so that you have excess capacity to operate efficiently?
Have you ever thought of just accepting that you can’t predict how long a project will take to complete?
It’s a revelation. You get to have some hard conversations with other managers. But in the end everyone finds it easier to deal with “it’ll be ready when it’s ready” rather than endless missed deadlines and overruns.
In my experience people demand fantasies, and will fight tooth and nail any encroachment on them by the reality that things never happen as they are planned. Although when I say this, I am thinking of one year and above kind of estimates.
When people who are experts on the topic evaluate the work needed over a period of say 3 months, even in something as notoriously hard to plan as video game production, it can hold. This entails being willing to adjust scope and resources though, when planning, in order to ensure the objectives are likely to be met.
This never worked in 15+ years I’ve been working. The people who did try that went out the door very soon once management realised some other person could tell them a date and they could plan their business around that date even though that date half of the time got missed anyhow.
Yeah it’s tough to get the point across. Worth it if you can, though. For everyone – no-one enjoys rescheduling everything because the deadline was missed again
Did you notice how your budgeting / estimation guideline converges to “multiply the estimate by Pi” advice?
I’ve never understood this about budgeting. So you allocate. A budget. These are fund YOU ARE PLANNING TO SPEND! So, OK, you DON’T spend them this year. Why the fuck don’t you get to SAVE THAT MONEY!? No, instead you are punished for now spending it all and you cannot create a realistic budget for next year – why the hell not!?
Sorry, but this frustrates the hell out of me! What am I missing here? What arcane bit of finance lore leads us down this path? Am I just hopelessly naive? Is acing money such a bad thing!? I just don’t get it…
Your frustration may come from a good place, believing gov and big orgs want to be efficient. They don’t and they don’t have to. They are super efficient when it comes to taxes though. At least in my country, every department is trash except the revenue service which is so forward thinking and effective, they put private business to shame.
This definitely rings true.
But I’d say it’s not a feature of governments only, any sufficiently big organisation which centralise power ends up being like this. That’s why large corporates need nimble startups to innovate: startups either innovate or die.
The government is just the largest example of this phenomenon – and they don’t have any analogue for startups, they’re just doomed to grow larger and larger over centuries until they collapse.
Look at the USA, once the perfect minarchist experiment and now the largest employer in the world.
Not here in the US. The IRS has been crippled by previous Republican administrations to the point of uselessness, as another method of giving free unlimited tax breaks to the millionaires and billionaires that bought them in office.
But god help the poor people who file simple tax returns that can be easily audited, however.
If you don’t reduce your budget next year, what does “save that money” mean to a department in a company? If this year I have 105% of last year’s budget (because they always go up), what am I supposed to do with the 20% surplus from last year? Most companies wouldn’t even have a place in your cost center to track a surplus, it’s such a foreign concept.
Zero based budgeting is one answer to the moral hazards of either over or under-estimating your budget on purpose. If each year you start with a blank spreadsheet and then add (with justification) expenses for the year, it avoids some of the pitfalls. Not a panacea however.
Ahah, that would be nice. In most company I know, they are shrinking every years, because you know, “cost reduction plans”.
Just brainstorming, but maybe the irony is that your scenario somehow has even worse incentives? For example, building up that rollover number would gamify thriftiness greatly exceeding the typical “oops didn’t spend it all,” and think about the consequences of when you DIY something you’re not the best at instead of hiring the pros.
For two reasons:
1) separation of duty: you might not be the best department to invest surplus
2) cost effectiveness: if you’re operating with a deficit, as is generally the case with governments these days, this money is not free, so it could effectively be cheaper to give it back and re-borrow it when you actually need it
> 1) separation of duty: you might not be the best department to invest surplus
But with this reasoning there is no surplus, because departments will spend their money at all cost.
> 2) cost effectiveness: if you’re operating with a deficit, as is generally the case with governments these days, this money is not free, so it could effectively be cheaper to give it back and re-borrow it when you actually need it
That’s totally fine, when GP said “Save the money” they didn’t meant “on their own bank account”. It just means: the top management owe them this money when they’ll need it later.
Anecdote: I’m currently working on a project started in emergency earlier this month, which must be done before the end of the month (because it’s the end of the accounting year at this company) for this exact reason. And this project is overprices by a factor close to two, because this money really had to be spent!
Top management doesn’t “owe” them any money when they need it later. Say you budget $100 for dinner tonight and you go out and it costs $75. Do you owe the restaurant $25? While certainly some people might roll the $25 into the next day’s meals, some people might allocate that $25 to another cost center like buying a new car.
Budgets are meant to estimate costs and manage cash flow. From a greedy team perspective it’s best (and self interested) to try to game the system as much as possible so you get the largest share of the pie. From the organizational perspective it’s best to reallocate capital efficiently, especially if a team consistently over budgets.
> Say you budget $100 for dinner tonight and you go out and it costs $75. Do you owe the restaurant $25?
No, but if you accurately forecast that dinner will cost $100 on average, and this time it only happened to cost $75, you should put most of the savings aside for the other times when it will cost $125 and not reallocate it to be spent on something else.
Consistent over-budgeting is still an issue which would need to be addressed, of course, but a system where any annual cost underrun is treated as over-budgeting and punished by reallocating that part of the budget to other groups ignores the inevitable presence of risk in the budget forecast.
We’re arguing about different things it appears. This thread started with someone saying that a team coming in under budget should is “owed” that money in the future by management. I said this isn’t so and that it’s a self centered and myopic viewpoint. You are talking about punishment and reallocation, presumably by reducing the budget the next cycle. I’m not in favor of that unless it’s clear that the team is consistently over budgeting.
For example, if a team says they need $100 a year and comes in at $90 then I don’t think next year’s budget should be $110 while some people in this thread think it should be. That makes no sense. Neither do I think the budget should be cut to $90. Unless something has changed, the budget should stay the same.
Your point about average cost just means that you’re budgeting on the wrong timeframe. If you estimate your average dinner is $100 but you’re spending $75 most of the time except for one huge dinner every month then you should be budgeting $75 for dinner and then budget separately for one large dinner a month. Similarly, if a team says they need $10MM a year but half of that is them trying to amortize a $25MM cost over 5 years then they are budgeting incorrectly. Their budget should be $5MM with a $25MM side fund contributed to on a risk adjusted basis.
The worst case scenario is the team budgeting $10MM when they only need $5MM and losing control of their budget so that when the real charge comes due they’re fucked because they’ve been spending $10MM for the past 5 years without realizing the fixed charge is coming or, worse, realizing the fixed charge is coming but just ignoring it so they can buy new office furniture and exhaust their budget this year selfishly.
> For example, if a team says they need $100 a year and comes in at $90 then I don’t think next year’s budget should be $110 while some people in this thread think it should be.
IMHO it depends on why the expenses were less than the budget. If it’s a matter of probability or essential uncertainty then the savings should be set aside for other occasions where luck isn’t as favorable. If the department realized cost savings by improving business practices then most or all of the savings should stay with the department to be invested in future improvements (a one-time carry-over into the next budget period) and/or distributed as reward to those responsible for the improvements, as an incentive to continue making such improvements. If costs were lower because the department didn’t accomplish everything they set out to do then that might be a justification for reallocating their budget, and/or implementing more drastic changes to get them back on track.
> Your point about average cost just means that you’re budgeting on the wrong timeframe.
The timeframe for the budget would generally be predetermined (e.g. one fiscal year) and not set by the department itself.
> If you estimate your average dinner is $100 but you’re spending $75 most of the time except for one huge dinner every month then you should be budgeting $75 for dinner and then budget separately for one large dinner a month.
Sure, but I was referring to probabilistic variation due to uncertainty in the forecast, not a predictable mix of large and small expenses. And the “dinners” in this analogy would be once per budget period (i.e. annual for most organizations), not frequent enough to average out.
I think we agree in general and are just quibbling about the details of how to budget correctly (timeframes, line items, etc.). Most of the issues that come up with these stories of people getting their budgets slashed if they don’t spend enough or having to buy a bunch of bullshit at the end of the year are just a result of poor budgeting at some point which has been allowed to continue.
It is about opportunity costs. The budget you did not spend could have been spent elsewhere in the meantime and since it didn’t get invested elsewhere, it’s not a savings, that’s a net loss, because of course anything less than 100% utilisation of 100% of “resources”, 100% of the time is a loss .. Or some such.
That’s probably because of this kind of reasoning that people just throw away money threw windows, because “gotta get rid of it all”.
Seems it’s the yearly cycle not matching up to the longer cycle of certain needs that’s a problem.
Years ago I worked as a research assistant for a university. One day, my boss (a professor) pulled me aside for an impromptu meeting. “I have $5000 left in a research grant I need to spend this week or else it’s gone forever – do you have any ideas of what I should spend it on?”
Unfortunately I couldn’t think of much. I suggested maybe we buy some more computers with it but I’m sure he’d already thought of that himself. I don’t know what he ended up doing, but I’m sure he’d have decided to buy something with it rather than just losing it entirely.
This is usually handled much more elegantly by senior academic staff.
You contact a department who’s services you use a lot, then you arrange to pre-pay for services. Ideally you negotiate a discount.
Then you use the service and state which grant to draw from.
This way you have grants paying for things that are completely unrelated to their intent, you have one nightmare of a billing system which no one understands and you get to use everybody cent.
This as a fairly recent occurrence in my research group. It’s often quite tedious because you don’t want to waste the money and it’s never clear if there’s going to be a period where we’re short on cash at some point in the future. most of it’s spent on boring but expensive things to be used down the line. Would be far better if funding wasn’t quite so cyclical!
Could you order conference tickets or something similar that allows free cancelations in the future? In my previous job, we did this to carry over training budgets.
Buy annual licenses. Then renew if you want to block the same amount next year OR don’t renew and use it for what you actually need.
I was in the army, they had a certain budget for bullets. Last day of the year we shoot the rest because if not they would get less next year.
We shot so much we destroyed some of the rifles, apparently that was better than getting a smaller allocation next year.
Did you have a budget for replacement rifles too? Unlike bullets, there might be some incentive to get new and different ones.
> This is similar to how some government agencies retain their budgets.
The non-government sector isn’t immune to this.
If whatever actions turn a non government entity into something inefficient, then the entity wont survive for long and will go out of business (or at least that’s the hope of a competitive free market economy)
> then the entity wont survive for long and will go out of business (or at least that’s the hope of a competitive free market economy)
Only if the inefficiency is large enough to overcome other forces.
Or to put it another way, picture if every single individual teams at Google did this to the tune of 100k a year, per team, and assume among 135,000 employees there are 13500 teams.
That’s 1.35 billion dollars. Well under 1% of their revenue.
No way is a competitor going to appear that is identical to Google in every way except they have better budget management. Google has too many moats around their business, they can be really inefficient in many many ways and still dominate in multiple markets.
> competitive free market economy
It was not even believed by Adam Smith. He writes that it only works that way in a controlled environment. That’s why European countries usually rank higher in market freedom than the US, because we don’t have companies getting so cancerously big that they have very real effects on law making (how lobbying is legal is still beyond me)
You need to take another look at how the EU makes laws, and who gets to propose them, and who they talk to.
It’s less blatant, but just as pernicious.
The USA got rich because of unbridled capitalism. Then richness trickled down through generations of companies while regulations caught up and their government became a behemoth not dissimilar to the ones living in EU and that the USA was running from.
Nowadays the USA jurisdiction is comparable to the EU one, but they still have more $$$.
Did that richness really trickle down? Just look at the distribution of that money.
Of course. I agree with what I think you’re getting at, specifically that not nearly enough richness trickled down. However, the poorest Americans today are still significantly better off than they were in most any previous decade or century because a bit of that wealth did trickle down.
It’s not always the case.
I work at one of those US bank.
The amount of inefficiencies in form of red tape, confusing processes, custom half-baked tools that crash half the time is just mind-boggling. I’ve spent more than a week now for opening firewall on one ip/port on one host just to test my prototype in dev environment(local machine or docker are not an option due to lack of admin rights ), and it’s still in change approval stage. If we weren’t this giant too-big-to-fail bank we’d be out of business by now.
Hope is a really interesting way to frame something that has consistently failed to prove true after centuries of theory and decades of targeted policy changes.
It’s possible that once a company reaches a certain size, it’s inevitable. Corporations internally have the same top-down centralized organizational structure as a typical government. Market forces can’t eliminate that kind of inefficiency if it invariably affects all large enterprises, and the economies of scale enjoyed by such companies outweigh the perverse incentives of sub-organizations.
What strikes me as unique to government is the tendency for sufficiently powerful appendages to secure enough resources to start wagging the dog (e.g. the military industry in the US), although now that I think about it seems possible that it would happen within companies.
Also don’t forget how unevenly applied market forces are: if McDonald’s started charging 10% more for a hamburger they’d lose sales to Burger King a LOT faster than, say, Comcast or Oracle because the products are basically the same and most customers can switch almost effortlessly whereas you have to be especially mad to trench fiber out to your house or migrate every database in a large enterprise.
Any business with a natural monopoly, high migration costs, etc. can support a surprising amount of inefficiency even if most of their customers find the experience unsatisfying.
I think the common feature is just humans.
I think we imagine a lot of market forces that no doubt exist, but people aren’t logical in the face of them.
But large companies tend to have MBA types scurrying around rooting this stuff out as it pops up or shortly thereafter. Government has no such sort of immune system to fix these problems on the go. It just gets sicker and sicker until the tax payers vote for something drastic or revolt.
You see this in nonprofit entities too. They get big, abstract away from their mission and waste a lot of money until someone gets tasked with cleaning house or a more mission-driven comes along and replaces them.
You will be surprised, but I saw the same behavior in the large tech company.
Department either uses or loses the budget, so, there was a push to make sure nothing is left.
I talked about that topic with my principal when I was in school.
He told me that the school had to prevent those automatic budget cuts. His reasoning was that it’s nearly impossible to get a higher budget if some big expenses had to be made. And suddenly needing a higher budge, after for example 3 years of low expenses, doesn’t make a good impression on higher-level administrators.
Office always had such a weird version of what happens in an”office”. Having a secretary who people ask to take photocopies?
Sounds like the eighties!
I think that’s sort of the point. The branch was run by a guy (Michael Scott) that is actually pretty old school and utterly unaware of it.
This happens in private industry too. I can set my watch by the fiscal calendar of certain groups in public companies having to spend their budgets by the end of their year so it doesn’t get cut the next year.
It’s a decentralized implementation of a quota system.
By slowly releasing supply you prevent anyone having to self-regulate (which requires unreasonable deprivation, OR global knowledge) and everyone bases their decisions off of the only global signal, free space.
Tragedy of the Commons is the libertarian “private property is essential” interpretation. It’s a cynical take, assuming that human selfishness is the deepest of truths and that there is no use fighting it, that the best solution is to organize society around it.
The conventional Game Theory take is that this is a prisoners dilemma, and everyone creating balloon.txt files are defecting. They are making the most rational choice under the rules of the game (no communication thus no reliable cooperation). It’s no globally optimal, but it is locally for each of them. This take also suffers from the same assumption: that rationality is centered on self-interest only.
If we are to evolve as a species, then we need to get beyond such limited thinking. We need to transcend our base natures. That is the whole point of culture: to transcend as a group what our genes otherwise program us as individuals to do.
Tragedy of the Commons effects are well-established to exist both in economics and outside it (in ecology, for instance). You seem to be attempting to shoehorn some misguided political take into the situation, even though Tragedy of the Commons is a decent characterization of this particular social pathology.
Their point was that tragedy of the commons need not be a given anywhere we see tragedies or commons. Last paragraph is lofty but I think the whole idea is we have the cognitive ability to deliberately prove its not a natural law.
Understanding sociology as ecology at human scale is core to libertarianism.
Yes. Thank you.
Though did you mean, “Understanding sociology as Darwinian ecology at human scale is core to libertarianism.”? Because the notion that ecology is characterized only by “the law of the jungle” is also strongly debated. Even “the selfish gene” is debatable simplistic reductionism. Individuals aren’t the only actors; there are higher order emergent entities, e.g. species and ecosystems, that also evolve to perpetuate themselves and flourish, much like our own bodies are cooperative and interdependent systems of cells (with native and foreign DNA, the latter existing primarily in our GI tract) that originally evolved as single-celled “selfish” organisms.
And as you point out, “we have the cognitive ability” that nature lacks. We can do at least as well.
As to “lofty”, I agree. But let’s consider other things that were once considered lofty if not insanity:
– in ancient Greece, that democracy should be extended beyond the aristocracy
– in Medieval Europe, that democracy should exist at all, that the divine right of kings should be seen as a scam
– in the 19th century United States, that democracy should include women and blacks
– in the 1970’s United States, that lesbians, gays, bisexuals, transexuals and queers should be treated with the same dignity as straights, should be able to marry, adopt children and serve in the military. And that we stop using “he/him” by default as you just did because that is an artifact of patriarchy as well as outmoded thinking about even binary gender.
– in India today, that when a woman is raped, she should be protected by law and the male rapist should be punished, not the other way around. The same proposition if proposed in America or Europe not all that long ago.
– I can make a really long list but you get it 🙂
This is self conflicting. You take “human selfishness is the deepest of truths” as a mere assumption, then you say “we need to transcend our base natures”.
Human selfishness IS nature. It is not just about humans either, all evolution is guided by environment (resource availability).
For anything else you need ALL people to NOT be selfish, only some being altruistic does not cut it. Your only other option is to punish selfishness, but then you will ban progress.
If most people don’t create the balloon.txt file, BUT, there is no punishment for creating one, then if I believe I have a good idea and that I DESERVE more resources to pursue it, I’ll create a nice big balloon.txt file. Your only option is to punish me for doing so. I would not want to live in a world where people are punished for trying to gather resources to make things that most other people won’t. Some people have bright ideas, and they need resources to pursue them. Most people don’t have many ideas and they don’t want to do anything. If you prevent the means of passionate people to gather big resources to do big things, and want to live in a zero entropy world where everything is equal (made sure through the use of force / punishment, which will eventually be corrupt, because by definition punishers can’t be equals to others) and nothing moves because of it, keep dreaming. It is not even scary because that literally cannot happen.
The way to resolve this particular tragedy of the commons, like most other such cases, is to privatize the commons: make people pay for the disk space they use. If you want a nice big balloon.txt file to reserve space for the future, fine, but you’re paying for the space you reserved. How you use it is up to you. In return, the administrators get both the money and incentive they need to buy more storage capacity, ensuring that running out of available space will be less of a concern.
> we need to get beyond such limited thinking. We need to transcend our base natures
Refusing to accept the human nature as-is and always requiring some sort of “evolved new man” is one of the characteristics of the communist/socialist ideology.
Also a handy excuse when the system inevitably fails: it wasn’t the system, it was the selfish people who did not implement it correctly.
Ahhh the old “socialism/communism inevitably fails” meme.
Let’s assume one could even call those failures communism/socialism. How long have we experimented with and developed socialism/communism? 100 years.
How long have we been trying to get democracy right? 2,500 years. With many starts, fits and failures, devolving into dictatorships many, many times. The self-proclaimed “greatest democracy in history” is guilty of genocide and slavery. Even today how much it is a democracy as opposed to an oligarchy/kleptocracy/plutocracy is questionable.
How about capitalism? 500-800 years. And in that time it has exploited, enslaved and murdered people, pillaged entire nations and continents, raped the environment, and poisoned every culture that has adopted it with the notion that “selfishness is a virtue”.
The only reason capitalism hasn’t collapsed (yet) is because capitalists are smart enough to not do pure capitalism, knowing that it would lead quickly to revolution, and because the environment’s revolt is just getting started.
 “The west called [the Soviet Union] Socialism in order to defame Socialism by associating it with this miserable tyranny; the Soviet Union called it Socialism to benefit from the moral appeal that true Socialism had among large parts of the general world population.” ~ Chomsky
 The United States: “look how many people died in the Soviet Union’s industrialization program!”
Socialists: “how did the United States industrialize again?
The United States: “look, you need to do a BIT of genocide and slavery to kick things off…” ~ Existential Comics
 One of the most beneficial things about immersing yourself in deep study of American history is that you get to a point where this country can no longer effectively lie to you about why it is the way it is. It disabuses you of the notion that the inequality we see is an accident. ~ Clint Smith
Capitalism is not exploiting anyone. Capitalism is purely about organising the economy around voluntary transactions.
Exploiting, enslaving and murdering is purely what socialist countries do – and they can get away with all of this, just because they can socialise the cost of all their evil deeds and force people into paying them money.
The only reason capitalism hasn’t collapsed is that it’s the only way to have a profitable economy. The crooks that you call government recognise that they can steal only so much from the economy before a country collapse.
I’d also argue that we’ve experimented with elements of socialism and elements of capitalism for the entire existence of civilisation.
Communism can’t work unless you have either perfect individuals or a tyrannical states which force resources distribution. In the real world, you end up with socialism. Because people are not perfect, the government which will redistribute resources won’t do a perfect job in the best case and will just be completely corrupted in the worst case.
And still, communism was attempted. When did we ever attempt to have an entire capitalist society without a government to ruin it?
So other people acknowledge the problem, provide a solution, and your response is to say “that is selfish libertarian propaganda, the real solution is some magical evolution”?
It’s not just libertarians. I’m sure that even communists accept the premise of human self interest, but instead of private property their solution is for one all powerful government to own everything
TIL: Kropotkin was not a communist. Nestor Makhno, Errico Malatesta, Jose Durruti? Not communists either. Who’d have thought?
Communism (as an end-goal) requires no governments. You are thinking of socialism.
If only there was some way to allocate resources based on their value to the user, like with prices or something.
That would be great if everyone were truly on a level playing field.
You could make that so in this shared computing scenario, but our broader world is systemically rigged in favor of some people and against others. Capitalism depends on the un-levelness of the playing field for cheap labor.
i.e. while it can be useful if prices are attached to commodities (with caveats around externalities etc), it is not a good thing that prices are attached to humans, making some people’s being and work less valued than others.
I worked at a large company during a migration from Lotus to Outlook. We were told we’d get our current Lost email storage + 100MiB as a new email quota limit under Outlook.
I made a bunch of 100MiB files of `/dev/random` noise (so they don’t compress, compressed size was part of the quota) and emailed them to myself before the migration, to get a few GiB of quota buffer.
My co-workers were constantly having to delete old emails in Outlook to stay under quota, but not me. I’d just delete one of my jumbo attachment emails, as needed. 😉
Email quotas aren’t just a cost thing. It forces deletion of files/communications that aren’t relevant anymore. The last thing the legal department wants is some executive’s laptop with 10 years of undeleted email to make it’s way to discovery.
Unfortunately, those goals are rarely communicated and accepted by the people they’re imposed on.
My first full-time job had an unexplained email expiry policy. After being frustrated several times at losing some explanation on how/why, I started forwarding all my emails to gmail. In retrospect, that’s probably a worse result to whoever imposed the expiration.
Fortunately, these days people are better about consolidating knowledge on wikis or some kind of shared docs instead of only email.
It’s a hush hush kind of thing. You advertise it’s to avoid discovery and you are openly admiting to liability should someone find out while trying to pull your execs email during discovery.
The excuse of resource contention provides plausible deniability
Yeah, this is really common. Normally there’ll be one unrecorded/easily deleted means of communication, and people use that for discussing things that potentially could expose the company to legal liability.
But nobody ever talks about it (except on said un-recorded meetings. That reminds me, I should explain this to our junior today, so that he knows for the future).
Lotus to Exchange migrations were all likely in the pre-Sarbanes Oxley and other retention regular era of email retention requirements
iirc at the time the only industries that required retention were health, legal and government
With SOX (PCI, FDIC, et al) retention laws we had another explosion of work rolling out all the compliance features of Exchange
Those were crazy times getting everybody either migrated with email or onto corporate email – there’s a similar explosion of work right now with migration to M365
Then why not just tell Exchange to delete any emails older than 5 years (or whatever your lawyers tell you to put)?
I knew a place where Exchange was configured to delete all mails after 6 months. Soon after I discovered that people started to form circles in which they would forward older mails from internal mailing lists to each other to retain them longer than that.
Fannie Mae did this. When you have targets on your back you minimize the collateral damage from possible blowback.
Imagine getting sued and having the entire paper trail in your email going back 3+ years. I expire all email after 1 year.
A previous company I worked for had a one month retention window in the email server. People just ended up storing email in their local machine’s Outlook folder so they can refer to old emails.
Or for the more technical folk with access to a linux server, setup postfix/dovecot, connect outlook to it and arrange for archived emails to go to the IMAP server.
The IT people get smart about looking for OST or PST files, but let’s see them catch that 🙂
If you have a Linux box (or a VM), install and configure it to route mail and provide IMAP support. Digital Ocean has the best tutorials for this:
Then configure a new mail account in Outlook and connect to the IMAP server. It’s optional, but convenient for replies, to configure the account to send via postfix if you have an internal SMTP server to connect to.
I gave up on email folders years ago, so at the end of the month would just create two new folders in the archive account (YYYYMM and YYYYMM_Sent) and drag all the mail from the Exchange account into the IMAP folders. Et voila! You now have your own local email archive.
I imagine it looks better at discovery time to say ‘oh sorry we lost these emails because we ran out of disk space’ rather than ‘we deleted them because we didn’t want you to read them’.
No, companies need to be able to point to an official retention policy that says in writing that emails older then x months or years get deleted. Most do (including my employer), and it’s because of legal discovery. But it feels like we’re lobotomizing ourselves, as often the reason some odd thing was done was based on a long-deleted email discussion.
Archiving is likely solving the wrong problem, for legal reasons they don’t want those old emails hanging around.
Sounds like the retention policy is also solving the wrong problem. If for legal reasons you want to destroy any potential evidence, maybe it’s a good idea to stop doing illegal actions.
It’s not necessarily illegal actions, just those that would look bad in discovery. Lawyers (as always) tend to err on the side of caution.
I remember Matt Levine talking about how regulators would often find emails along the line of “Let’s sell this crap to those idiots” and use that as leverage to force settlements rather than showing actual violation of regulations.
The reason being that it’s hard to show intent to defraud, and much easier to threaten bad press.
Thanks to patents, everyone in technology is doing “illegal actions” all the time, since you can’t do anything without infringing hundreds of patents. And if you can find an email somewhere indicating that someone knows that a competitor has feature X, or knows about the existence of a patent, viola, evidence of knowing infringement! Triple damages under US law.
That’s why I did it. I’d always be trying to find an email from the prior year, that held a fix I needed to use again, but it had been deleted to stay in quota. Old email can be helpful.
Also prevents users from using email as a filing cabinet or shared drive.
Email hosts love 50/100gb/unlimited mailboxes because nobody wants to migrate a bunch of giant mailboxes
I am sure “legal” might want it but is it not better for society in general if they where discoverable.
A bit like when investigating police/government misconduct and a lot of files turn out to have been destroyed – but of course our data gets kept forever
Sane companies just have retention policies instead of doing some obtuse hack like this.
Same thing happens with floating licenses, if they are too scarce, people open the program first thing in the morning ‘just in case’ and keep a license reserved all day.
The real game starts when people run infinite while/for loops that try to check out one as soon as it’s available. Or run useless operations within the licensed software just so that that the license doesn’t expire and return to the pool. I’m guilty of both, sadly. In an academic environment, additional resources aren’t going to fall from the sky.
ouch I can’t stop thinking now about how much cost gets imposed on the economy by habits like this established in higher education – I built my original business with very few formally qualified people who included a large proportion of the most experienced and professionally qualified individuals including several with multinational boardroom careers in F500s. we didn’t have the culture to tolerate games like holding up a floating licence (of which licence a lot of critical software used) and we weren’t the generation raised with computers by a few distant, but hearing this both makes perfect sense that it might be prevalent and simultaneously is thoroughly unnerving me about how strongly I might react on encountering the same if my present venture gets going.
I guess that’s the reason Qlik (a business intelligence software provider) started using licensing by the minute – yes, like a phone call.
At the opposite end, I heard a story of actually full storage from the beginning of the century, when I worked at a “large medical and research institution in the Midwest”. They had expensive SMB shares (NetApp?) that kept getting full all the time. So they did the sane thing in the era of Napster: they started deleting MP3 files, with or without prior warning. Pretty soon, they got an angry call that music could not be played in the operating room. Oops. Surgeons, as you can guess, were treated like royalty and didn’t appreciate seeing their routines disrupted.
Okay, that is hilarious.
I use some scripts that monitor disk space, and monitor disk usages by “subsystem” (logs, mail, services, etc) using Nagios. And as DevOps Borat says, “Disk not full unless Nagios say ‘Disk is full'” :_) Although long before it is full it starts warning me.
It doesn’t go off very much, but it did when I had a bunch of attacks on my web server that started core dumping and that filled up disk reasonably quickly.
Back in the day we actually put different things in different partitions so that we could partition failures but that seems out of favor with a lot of the distros these days.
I thought it was standard to never write logs to the same machine and to worm for that matter
This is a surprisingly common hoarding behavior among humans using scarce resources. In technology you see it everywhere, virtualization infrastructure, disk storage, etc.
This is actually kind of clever. How the tribal knowledge for how to “reserve space” was developed and disseminated would be pretty interesting to study.
Similarly Germans are infamous for reserving pool chairs by placing a towel on them long before they actually want to use them
At school we had a 800Mb quota for each class (around 90 people). Usually the first year everyone discovered the space problem when trying to get everything done for your first project. When you cannot compile code or generate pdf because there’s no space left the witch hunt starts: there’s always some people with left-over files from .pdf to .tex conversions.
To help some students had put in place a crawler making statistics about who was using the space for all classes. And usually once bitten you made your own space requisition script which would take any byte left when available until it hit some reasonable size.
That’s dire, ~8Mb per person. Its an interesting problem though, When the resource is not scarce, allocating 800Mb per class is the correct way to do things. someone who needs 9, 12 or 30Mb would be able to complete the allocation. But as soon as resource contention happens, students with the biggest allocation would need to relinquish alot of data. 800Mb is nothing over a modern connection nowadays but playing this game with petabytes would be a nightmare.
I remember in Android one year the focus was on slimming down the memory usage. Of course we found an app that shall not be named the allocated a chunk of memory on startup just in case it was going to be needed later.
Partly that, partly the opposite.
It’s basically reserving part of the disk for very important things only, which scares off less important uses. Like making the commons seem more polluted than it actually is to get some action taken.
If those files weren’t there, the space would probably fill up, but now without any emergency relief valves.
It would be better if these files were a smaller fraction of space and had more oversight… but that’s just a quota system. This is something halfway in between real quotas and full-on tragedy of the commons.
I am far from an expert on game theory, but it seems that the cause of the tragedy of the commons is that people can use the shared resource for free. If there was a price to be paid, and the price was dynamically adjusted depending on conditions, then the overuse could be avoided.
Similarly for file storage and “reserving” it by creating huge but useless files. If everyone was charged a fee per gigabyte per day, then people would be less likely to create those placeholder files. You probably have to be careful about how you measure, otherwise you’ll get automated processes that delete the placeholder files at 11:59pm and create them at 12:01am.
I was more on a sociological/existential plane but I take that information too. I wish I’d read this kind of economic books rather than supply/demand or finance
This happens at some restaurants – we’ll save a table while you get the food. Half the place is people not eating be cause it’s so busy.
I always leave some unallocated space in LVM in my machines. However, in a cloud environment it’s probably easier or only possible to delete that 8 GB file.
For everyone saying “This isn’t a real solution!” I’d like to explain why I think you’re wrong.
1) It’s not intended to be a Real Solution(tm). It’s intended to buy the admin some time to solve the Real Issue.
2) Having a failsafe on standby such as this will save an admin’s butt when it’s 2am and PagerDuty won’t shut up, and you’re just awake enough to apply a temp fix and work on it in the morning.
3) Because “FIX IT NOW OR ELSE” is a thing. Okay, sure. Null the file and then fill it with 7GB. Problem solved, for now. Everybody is happy and now I can work on the Real Problem: Bob won’t stop hoarding spam.
That is all.
This reminds me of the reserve tank toggle on some motorcycles. When you run out of gas, you switch the toggle and drive directly to a gas station.
Motorboat fuel tanks have a reserve as well. It’s just a raised area that splits the bottom of the tank into 2 separate concave areas. One of the concave areas contains the end of the fuel line, and the other doesn’t. When you run out of gas, you tip the tank up to dump the remaining gas from the other basin into the main one, and then you restart the engine (or keep it from stopping at all if you’re quick enough on the draw) and head for the docks.
Old SCUBA tanks didn’t have gauges, they had a reserve tank with enough air to get you to the surface. You’d realize you were running low (which I’m sure was terrifying) then hit the switch and slowly surface (you don’t want to surface quickly when diving).
Yeah, my dad had a tank like that. I dove with it exactly once – never again, yikes. It was coated inside and out so, despite being a steel tank, it was in excellent shape.
The bikes I’ve had that have had reserve tanks have also been old enough to raise the disconcerting follow-on question, which is: “is the reserve gas also full of sludgey crap that’s settled in the tank and hasn’t been disturbed really in a year, and am i about to run that through my poor carbs?”
My friend had a truck with a reserve tank, but it was the same size as the main tank, so he would just flip the switch at every fill up to make sure they both got used.
Had this in a 70s F150. A “Main – Aux” switch on the dash, right above the 8-track player. I used to let the main tank sputter out on fumes and then triumphantly shout “Rerouting auxiliary power to engine!” while sliding the switch. Letting them empty out alternately would have been a lot smarter.
My father drove a ’95 F-150 for years that had the dual tanks. Shortly after highschool I got in accident that ended up totaling my vehicle and got a couple months I was using his truck (he runs an Auto repair shop from a garage behind the house so he almost always had something available to drive) and I ended up using it to go out on a date with someone I had met at work.
I noticed on the way to pick them up that the truck was running on empty in the main tank but I checked and the aux tank was full. Then I remembered the first time my dad let the tank run down and start sputtering down the road and decided to keep going on the empty tank.
Make it to pick them up and start heading down the highway(where we were it was a good 3-4 miles to the nearest gas station) and then the truck finally started to sputter. I proceed to play along with it pretending to panic for a good 20 seconds and then I turned and saw the look on their face and couldn’t help but start laughing. Switched to the aux tank and when the truck started running again I turned and and the look I was getting indicated I was being mentally murdered. Then they punched the crap outta my arm and started laughing and calling me not so nice things.
Ended up being an awesome night out with someone I’d end up being friends with for a long time. It’s weird how this kind of random conversation in an unrelated internet post can drag you way back down memory lane.
This is typically used for agricultural/off-road fuel which is not priced with road taxes and as a result much cheaper. Off road fuel is dyed red in the US. If you get caught running dyed diesel on road you will be fined. Thus the switch on the dash, when you leave the highway to drive on your farm you flip over to dyed fuel to save $$.
Oh, fascinating! My first vehicle was the family’s 3/4-ton Diesel ’84 Chevy Pickup from the farm, and I’d forgotten it had an Aux fuel tank! This makes a lot of sense.
It’s not a separate tank (in any of my bikes at least) so it gets disturbed every time you refill the tank?
the two-tube design of the tank on my 1975 honda CB meant that there was about an inch and a half of tank that sat below the primary fuel port. Tank crud (steel tank, theoretically passivated, 40 years old) settles faster than I ran through a tank of gas, so the bottom layer had sediment in it fairly regularly.
I kept spare inline fuel filters in a tool roll just in case after a while.
always fun when you’re barreling down the highway and the engine starts to lean out, prompting you to hurriedly locate and switch the petcock over before the engine stalls completely.
suppose then that you go fill up and forget to set the petcock back to normal. 8ball says: “I see a long walk in your future.”
IME it doesn’t take too many hikes to learn that part of the procedure for turning off the engine is “turn the fuel switch off reserve”.
out of years of riding it’s only happened to me a couple times.
one time i was eastbound on the bay bridge when my bike started to sputter. i’d just reassembled the tank and had left the screw-style reserve fuel valve open, so there was no reserve fuel to be had. a very kind lady put her blinkers on behind me and followed as i coasted the last few hundred yards toward yerba buena island.
i pushed my bike up the ramp and looked in the tank to assess. it’s a dirtbike, so the tank has two distinct “lobes” to accomodate the top tube of the frame. I had a few ounces in the tank but they were not in the lobe with the fuel pickup, so i dumped the bike on its side to get the fuel to slosh over to where i wanted it.
i got back on the highway and, going quite slowly and gently, managed to get to the gas station at west oakland bart, the engine leaning out and sputtering right as i rolled into their lot.
I think that driving on those last few ounces of fuel is a completely different feeling.
Normally you take for granted that the engine works for hours at at time.
When you’ve come to a stop and found those last few ounces of fuel, it’s such a relief that the engine can run again, and you know it won’t run for very long, but every minute that it continues running saves you many minutes of walking or pushing. You appreciate every minute that the engine produces that amazing amount of power (compared to your own power when you’re pushing a 300+ pound bike)
It’s a crazy amount of energy. The 2.5 gallons of gas that that tank holds has more energy than all the food I eat in a month.
I once put a new fuel pump in a Chevy pickup with two tanks on the side of the road because I was switched to the empty tank. Good times.
Surprised there isn’t a mechanism that mechanically switches the petcock over when you put a fuel nozzle up to the port
Typically there aren’t two separate tanks – In one tank there are two tubes at different heights. As the fuel level falls below the height of the “main” tube the engine sputters, then turning the petcock engages the lower down “reserve” tube which is still below the fuel level. It’s more of a warning than a true reserve, and most bikes with an actual fuel gauge don’t have a reserve.
On bikes like that, there’s a reserve-reserve trick sometimes. Sometimes, the tank is an inverted U shape so when the pickup runs dry there’s still a little more fuel on the other side of the U. If the bike is light enough, you can stop the bike and lean it way over to pour that last bit over to the pickup side. Might get you another couple miles.
Most motorcycles are surprisingly manual. This was originally a necessity (like in cars), but remains aesthetically preferable for many riders.
OTOH, Honda Goldwings have stereo systems. They might grow an automatic fuel reserve switcher-backer someday too. 🙂
Fuel injected motorcycles don’t have reserve (at least, none that I’ve seen.) instead they have low fuel lights or full fuel gauges. I’m guessing it’s because the fuel pumps are in the tank and the fuel injection system needs high pressure.
Fuel injectors require filtered gas because even small particles can clog them, and said filter is more likely to be clogged or even compromised by sucking up the last drops of fuel (and scale and debris) in the tank, so the low-fuel warning is required.
Carb jets can get clogged, too, but are wider since they’re not under as much pressure. Also, since they’re a wear item they’re a lot easier to clean and/or replace.
I think grandparent commenter had it right: it’s because the pump is in the tank. There’s just no good way to have an external petcock determine where a tank-internal pump gets its fuel from.
Many new bikes come with a lot of rider aids for safety (ABS, TCS) as well as all kinds of electronics (fuel maps), so this is changing. But of course manual transmission won’t go away until bike are electric.
I am one of those who likes things old school. My bike still has a carburetor, has no fuel light or tachometer, and I have certainly had some practice reaching down to turn the fuel petcock to reserve while sputtering on the highway. If they didn’t intend for me to do that, why did they put it on the left side? 🙂
> But of course manual transmission won’t go away until bike are electric.
See multiple Honda bikes with DCT (dual clutch transmission). This is what I’m planning to get as my first bike.
Goldwings also have a reverse gear. Even more remarkable: I used to have an Aprilia scooter that had a remote release button (on the key fob) for the under-seat storage area. I think I used it once just to see if it works.
Some newer bikes, like mine, don’t have a reserve petcock. They have a low fuel light. No forgetting about the petcock and an obvious warning light instead of sputtering.
Some older bikes, like my ’99 Ducati Monster, don’t have a petcock. It has a low fuel light that first failed in around 2002, and for which that part that fails (the in-tank float switch) stopped being available in about 2015 or so. No petcock _or_ warning light. (And that trip where the speedo cable fails so I couldn’t even use thew trip meter to estimate fuel requirements was a fun one…)
Can you find someone who can adapt a float switch from a different bike? It seems like a very useful thing to have, even if it’s not the original factory part.
I’ve just gotten used to it. I’m fairly reliable about always resetting the trip meter when I fill it up (and always fill it to full). I know it’ll get 200km easy, maybe only 180 if I’m having _way_ too much fun. That’s always about time I want to stop and stretch my legs anyway. It doesn’t bother me enough to “solve the problem”.
Most motorcycles with a manual petcock are very manual in nature. Often this is to minimize the number of moving parts that could die on you if you take it into rural areas. An automatic petcock adds more complexity that could cause a malfunction.
It is a shame that motorcycles have moved away from this model. My last bike had a manual petcock with a reserve setting. It was problematic because I’d forget to turn it from off to on, take off on what’s left in the carburetor bowl, and the engine would start sputtering just down the road. But I also never got stranded.
New bike has a vacuum-actuated fuel valve, no reserve. It does have a fuel gauge but since the tank is not a nice simple rectangle and the angle makes a difference the gauge is basically untrustworthy. So I go by the mileage and hope I don’t get it wrong. How hard would it be for them to add a reserve setting so it could just be between On and Reserve so I could just flip between them as needed?
In the Honda CBF125 group on Facebook, a fellow Indian shared a photo of his bike. A British guy asked what’s the switch, he’s never seen one before. Same bike, same country of origin, but only certain markets get the switch and the recessed panel.
It is extremely thick plastic. I wouldn’t be surprised if it dislodged from the frame before it burst. In any event, in any collision violent enough to rupture the tank, the rider will have already been thrown a hundred feet away (and be dead…)
> 1) It’s not intended to be a Real Solution(tm). It’s intended to buy the admin some time to solve the Real Issue.
If you don’t have monitoring, will you even be aware that your disk is filling up?
If you do have monitoring, why are you artificially filling up your disk so that it will be at 100% more quickly instead of just setting your monitoring up to alert you when it’s at $whateverItWasSetToMinusEightGB?
One argument in favor of it is the 8GB file may cause a runaway process to crash, leaving you without it continuing to chew up space and able to recover.
A second argument is it’s not opened by any process. One problem I’ve had fixing disk full errors was figuring out which process still had a file open.
(For any POSIX noobs: the space occupied by a file is controlled by its inode. Deleting a file “unlinks” the inode from the directory, but an open filehandle counts as a link to that inode. Until all links to the inode are deleted, the OS won’t release the space occupied by the file. Particularly with log files, you need to kill any processes that have it open to actually reclaim the disk space.)
Except that you probably don’t realize that a process had it open until after you deleted it.
Because even if you have monitoring, some unforseen issue rapidly eating disk space at 3:00 am may not give you the time to solve it without downtime or degraded performance unless you can immediately remove the bottleneck while you troubleshoot.
Then why not automate the removal of the 8 GB spacer file when the disk gets full? Or in other words, just sound your alarms when there is 8 GB of free disk space.
Because if it is a broken process then it will fill up the disk again before you wake up and look at it.
I think the idea is that once you are at the system you can try to find out the cause without removing the file, or worse case remove the file and act fast (you may be on a short timer at this point). So for example if you find out that process X broke and is writing a ton of logs you can disable that process, remove the file, then most of your system is operational while you can properly fix the root cause or at the very least decide how to handle the data that filled up the disk in the first place. (You can’t always just delete it without thought)
I think a more refined approach would be disk quotas that ensured that root (or a debugging user) always had a buffer to do the repairs. This file just serves as a system wide disk quota (but you need to remove it to take advantage of that reserved space).
I actually suggested exactly that in another comment, thoigtnto do it in stages: 4gb with an alarm, the more alarms and the other 4gb if not resolved.
Besides runaway log files that aren’t being properly rotated, human error can cause it too. I managed to completely eat up the disk space of one of our staging servers a few weeks ago trying to tar up a directory so I could work on it locally. Didn’t realize the directory was 31GB and we only had 25GB of space. By the time the notification for 80% usage was triggered (no more than 2 minutes after we hit 80%), the entire disk was full. Luckily it was just a staging server and no real harm was done, but such a mistake could have just as easily been made on a production server. In this case, the obvious solution is to just delete the file you were creating but if you’re running a more complicated process that is generating logs and many files, it may not be so easy and this 8GB empty file might be useful after you cancel the process.
Monitors can fail, you can miss an email, etc etc etc
There’s always a big gap between what should never happen because you planned well and what does happen
An extra failsafe? You can do both. What if your cron/netdata are not forwarding emails for some reason (eg nullmailer gets errors from Mailgun)?
Right, but again, what good does the spacer file do if you’re not aware that you’re running low on disk space? That is: if your monitoring isn’t working, how do you know that you need to quickly make room?
And if your monitoring is working correctly, the spacer file really serves no purpose other than lowering the available disk space.
1. When your DBMS is no longer responding to queries, your boss and your customers replace your monitoring system (unlimited free phone calls 24/7 included ;). Case in point: HN is often a better place to check than Google Cloud status page, for example.
2. Maybe you didn’t get it, but “nullmailer not forwarding cron email due to mailgun problems” was a bit too specific to be an example I just made up, wasn’t it? Again, the premise “if your monitoring is working correctly” is not a good one to base your reasoning upon. Especially if you have 1 VM (VPS) and not a whole k8s cluster with a devops team with rotational on-call assignments.
The reason was, I thought, discussed in the article.
When you actually fill up your disc, many linux commands will simply fail to run, meaning getting out of that state is extremely difficult. Deleting the file means you have room to move files around / run emacs / whatever, to fix the problem.
Somebody will notify you. If the service is just for yourself, you don’t need monitoring at all.
Yes, yes, but they will notify you after your service is down (because that’s when they notice), in part thanks to a spacer file that eats up available disk space without being of any use. A monitoring service would notify you before your service is down, users grab pitchforks and start looking for torches.
I understand the benefit to be able to quickly delete some file to be able to run some command that would need space, though I find that highly theoretical. If it’s your shell that requires space to start, you won’t be able to run the command to remove the spacer, and once you’re in the shell, I’ve never found it hard to clean up space; path autocompletion is the only noticeable victim usually. And at this point, the services are down anyhow, and you likely don’t want to restart them before figuring out what the problem was, so I don’t see the point of quickly being able to make some room.
It feels like “having two flat tires at the same it is highly unlikely, so I always drive with a flat tire just to make sure I don’t get an unforeseen flat tire”. It’s cute, but I’d look for a new job if anyone in the company suggested that unironically.
This is an additional safety net. It’s like doing backups. Of course you should replace your hard drive before the other drive breaks down, but you want to have a backup in case your server burns down.
because sometimes you run things that don’t really need monitoring.
I run bunch of websites for pet projects and for friends clubs etc. They don’t need monitoring, and even if they go down for couple of hours (or days) doesn’t really matter.
I do monitor them, but mostly as an excuse to test various software, that I don’t get you use during my day job (pretty sure that bunch of static sites and low use forums don’t need elstic cluster, for log storage 🙂 )
And sometims you simply don’t have the time to deal with this right now. So you do a quick hack, and do it later.
This is one of those great solutions where they got 90% of the value of the Real Solution(tm) with 5 minutes of work.
I agree with this assessment. Of course its not a solution. Its delaying the inevitable. But depending on the rate of “filling up the disk for unknown reason” it will buy you time.
So when you’re running out of space, you immediately delete the junk file. Suddenly there’s “No Problem” and you’ve reset the symptom back to hopefully well before it was an issue. Now you can run whatever you need to, do reports, do traces etc. Even add more storage if necessary.
More importantly, as soon as you delete that junk file now you have space for logs. You have space and time for investigation.
> 1) It’s not intended to be a Real Solution(tm). It’s intended to buy the admin some time to solve the Real Issue.
It doesn’t do that though. If you don’t have monitoring/alerting that can either a) give you sufficient notice that you’re trending out of disk space, b) take action on its own (e.g. defensively shutting down the machine), or c) both of the above, then having your server disks fill up is bad whether you have a ballast file or not.
If your database server goes to 100%, you can’t trust your database anymore whether you could ssh in and delete an 8GB file or not.
Real Solutions ™ are indeed nice, but hackers get shit done – this is an utterly shameless hack, and I do it myself.
It’s a tool, and should be celebrated as such. It gives you breathing rom to actually solve the problem. It’s an early warning system. 🙂
I find that either a server needs more space, or has files that can be deleted. For the former you just increase the disk space, since most things are VMs these days and increasing space is easy. For the latter you can usually delete enough files to get the service back up before you start the proper cleanup.
If you really need some reserve space (physical server), I’d much rather store it in a vg (or zfs/btrfs subvolume). Will you remember the file exists at 2am? What about the other admins on your team?
> Will you remember the file exists at 2am?
As someone who has been woken up at 2am for this exact issue, emphatically yes. I would much rather be back in bed than trying to Google the command to find large files on disk.
> Will you remember the file exists at 2am? What about the other admins on your team?
Hopefully if you were doing something like this it would be part of your standard incident response runsheet/checklist.
Setup proper monitoring and never get to the Real Issue to begin with. These sysadmin hacks are not helpful
“proper monitoring” is extremely broad. And, I would say, almost unreachable goal.
You have it mail you when it goes over 80% disk usage (and what if you are on holiday)? Does it mail all colleagues? Who picks it up (I thought Bob picked it up, but Bob thought Anne picked it up. So no one did)? Does it come and wake you in person when it reaches 92%?
Will this catch this async job that fails (but should never) in an endless loop but keeps creating 20MB json files as fast as the disk allows it to?
Is it an alerting that finds anomalies in trends? Will it be fast enough for you to come online before that job has filled the disk?
I’ve been doing a lot of hosting management and such. And there is one constant: all unforeseen issues are unforeseen.
> I’ve been doing a lot of hosting management and such. And there is one constant: all unforeseen issues are unforeseen.
I work in hosting too, and have been for a long time. I feel ya.
Slack warning/ticket at 75%, page at 85% (to oncall obviously). Don’t let user workload crap into your root partition. I’ve been doing this for over 10 years and managed many thousands of nodes and literally don’t recall full disk problem unless it was in staging somewhere where monitoring was deliberately disabled.
Your requirements for “proper monitoring” are not everyones requirements.
On a current gig, we host at heroku. Our monitoring is all about 95th percentile response-times, secondary services, backlogs, slow-queries and whatnots. For another job, “disk space filling up” is important. Again another job will need to monitor email-delivery-rates and so on and so forth.
Keep in mind that sysadmins are essentially babysitting software that they do not develop. The hacks that we come up with are to work around responsible party and help us get a good night’s sleep instead of a 2am wakeup call. I try to cut you guys some slack, usually this proliferates when management decides they are willing to accept some inefficiency in favor of getting new features out the door. I get it, really.
My org is in the middle of a SRE introduction and for some reason I’m getting a lot of pushback on the topic of ‘error budgets’ and what to do with alerts when they are exceeded. Can’t imagine why.
How does using proposed solution prevents a 2am wake up call? Your monitoring/alerting does, this just makes it easier to recover already broken software. And btw I’ve been carrying pagers for more than a decade so well aware of all the organizational dynamics here. Best way to prevent this is have devs carrying pager too (amazons “you built it you run it”) – and magically your nighttime oncall is much more pleasant 😉
THANK YOU. How are so many people in this thread content with saying “monitoring isn’t perfect, this solution is ingenious”. Ofc nothing is perfect and even when you do everything right things can still go wrong, but if you don’t have a ROBUST monitoring/alert system in place then you’re not even doing the bare minimum. They’re acting like it’s rocket science to set thresholds, and have meaningful alerts and checks in place. Not to mention if you wait until disk full you risk issues like block corruption among others and your 8GB of space doesn’t do anything. It’s why people in this industry are on call, it’s why they have monitoring on their monitoring systems. The bare minimum
Yeah it’s crazy. If someone does this on their homelab server it’s probably fine but if they run it in production I really want to know because Im not buying jack from them.
Of course! But do you put all your trust in your monitoring, 100%? You’ve never had monitoring fail for any reason at all? You’ve never had a server fill up before you can respond to the alert?
This 8gb file idea isn’t to replace monitoring. It’s to offer a quick stopgap solution so you can do things in a hurry and give yourself a little extra “out” when things go awry. Because believe me, they WILL go awry. And if you’re not prepared for that eventuality, then I don’t know what else to say.
> But do you put all your trust in your monitoring, 100%?
Yes. If I didn’t feel that I can trust it, I would get another solution.
> You’ve never had a server fill up before you can respond to the alert?
I have. With the proposed hack in this article: it would fill up even faster: by that amount of time it would take the problem to write 8gb of data.
> Because believe me, they WILL go awry.
In my experience: not in any way that this would help. If your disk fills up, it’s either slow (and your monitoring alerts you days or at least hours before it’s a problem) or it’s really, really fast. In the latter case, it’s much faster than you can jump on your computer, ssh into the machine and delete your spacer file.
Invest in better monitoring, that’s much, much, much, much better than adding spacer files to fill up your disk or changing the wall clock to give you more time.
Ah I see where you are coming from. You see the spacer as a way to prevent a problem that should be prevented by better monitoring. But that’s not what it is for. It’s for quickly providing a stopgap so that you have time to solve the root cause without enduring more downtime.
If you’ve had a disk go full on you, what’s the first thing you do? For me, I log in and start looking for a log file to truncate to buy me a few megs of space, at least. This spacer file is just a guaranteed way to find the space you need without having to hunt for it.
Also it doesn’t HAVE to be 8GB. On most systems I think a 500mb file would be every bit as effective.
> he had put aside those two megabytes of memory early in the development cycle. He knew from experience that it was always impossible to cut content down to memory budgets, and that many projects had come close to failing because of it. So now, as a regular practice, he always put aside a nice block of memory to free up when it’s really needed.
In my work it is very common to make the memory map a little smaller than it has to be. If you can’t ship an initial version in a reduced footprint you will have no hope of shipping future bugfixes.
Many years ago I spent a couple of weeks fixing a firmware bug. The firmware was only a few dozen bytes shy of the EEPROM. I just #ifdef’d out a bunch of features to focus on debugging what was broken, but to get the fix released I had to manually optimize several other parts of the code to get everything to fit in the 2MB or whatever it was.
Would’ve been nice if someone had reserved some space ahead of time. Maybe they did, but nobody was around who remembered that codebase.
My favorite part of that story is how the initial question about overflow should make it obvious that what they’re doing doesn’t work, but nobody noticed.
I’d read in ‘Apollo: Race To The Moon’, Murray and Cox, that the booster engineers had done something similar with their weight budget, something the spacecraft engineers wound up needing. Contingency funds of all sorts are a great thing.
Back in the late eighties a colleague of mine was making a game for the Atari ST and he purposely put in some time wasting code in the game loop so that he had to work against a smaller budget which gave him some contingency for later on when he needed some extra cycles.
If true, I hate that story. Think of the better art assets that were needlessly left behind. How is it that said block of memory had never been identified by any profiling?
> Think of the better art assets that were needlessly left behind.
Consider how long it takes to edit or recreate art assets to reduce their size. Depending on the asset, you might be basically starting over from scratch. Rewriting code to reduce its size is likely to be an even worse option, introducing new bugs and possibly running slower to boot. At least smaller, simpler art assets are likely to render faster.
This is also the kind of problem that’s more likely to occur later in the schedule, when time is even more scarce. Between these two factors (lack of time and amount of effort required to get art assets which are both decent looking and smaller), I think in practice you’re actually more likely to get better quality art assets by having an artificially reduced memory budget from the outset.
I see it as a “Choose your problem.” affair.
1. Deal with possibly multiple issues possibly involving multiple people with the politics that entails resulting in a lot of stress for all involved as any one issue could render it a complete failure.
2. Have extra space you can decide to optimise if you want. You could even have politics and arguments over what to optimise, but if nothing happens it all still works so there is a lot less stress.
I pick 2.
If it would be detected by profiling that does make the technique asymmetric in that it would only stick around if nobody profiled to find it.
Or if you didn’t have an understanding with the sort of people who would run the profiler…
Better PMs do this today by having buffer-features they can cut when needed. It’ll handle the not-enough-memory issue as well as a meddlesome VP who think you’re over-subscribed and wants you to cut to meet your dates.
Also, don’t forget you’re hearing decades-later retellings of someone else’s story. I don’t doubt that they trickled this extra space out as changing requirements mandated it, but that they kept from doing so until the team had actually reached a certain level of product-maturity and reclaimed all of their own waste first.
Remember that the PMs goal is to ship. Them blocking some assets but actually shipping is a success. Better 95% of the product than 0%.
There’s a difference between “The server is not responding right now. We’re loosing customers.”, and “Low resources during product development”. Actually the latter may be a case of enforcing premature optimization. So no, it’s not the same idea.
I think we are thinking of a different baseline. You are thinking along the lines of “this should run, we can reduce server costs later”, I would suggest (if I may) “the app needs to run on any Android device with 2GB RAM”. And then you develop a game to run on a 1.5GB RAM phone, expecting that it will eventually fit into 2GB RAM budget.
A lot of tips in this thread are about how to better alert when you get low on disk space, how to recover, etc. but I’d like to highlight the statement: “The disk filled up, and that’s one thing you don’t want on a Linux server—or a Mac for that matter. When the disk is full nothing good happens.”
As developers, we need to be better at handling edge cases like out of disk space, out of memory, pegged bandwidth and pegged CPU. We typically see the bug in our triage queue and think in our minds “Oh! out of disk space: Edge case. P3. Punt it to the backlog forever.” This is how we get in this place where every tool in the toolbox simply stops working when there’s zero disk space.
Especially on today’s mobile devices, running out of disk space is common. I know people who install apps, use them, then uninstall them when they’re done, in order to save space, because their filesystem is choked with thousands of pictures and videos. It’s not an edge case anymore, and should not be treated as such.
A lot of measures are preventative, and kind of have to be.
Consider the hypothetical scenario of being totally out of memory. I mean completely: not a single page free, all buffers and caches flushed, everything else taken up by data that cannot be evicted. So in result, you cannot spawn a process. You cannot do any filesystem operations that would end up in allocations. You can’t even get new page tables.
Hence things like Linux’s OOM killer, which judiciously kills processes–not necessarily the ones you would like killed in such a situation. And again, a lot of preventative measures to not let it come that far.
Our Turing Machines still want infinite tapes, in a way.
I had this on my Ubuntu server… The NFS mount died for some reason and the downloading app wrote it all to the local filesystem, filling my SSD to the brink within minutes. By the time I ssh’d in the NFS had remounted, so it took ages to figure out where all that disk space actually was used since all dir scan tools would traverse into the NFS mount again.
It felt like everything was falling apart. As soon as I deleted something another app filled it up in minutes. Even Bash Tab completion breaks… There really should be a 98% disk usage threshold in Linux so that you can at least use all system tools to try and fix it.
I know when our server’s /tmp directory is full because Bash’s tab autocompletion stops working.
/home still has space, though, so nothing truly breaks. Perhaps I should file a bug report about that.
Early Symbian apps are an excellent example how to write apps so that they don’t crash when storage or memory becomes full. They just show an error dialog and the user can still use the system to free storage or memory. Modern phone apps either crash or the entire phone crashes in similar situations.
It doesn’t help that the base model of many phones had ridiculously undersized storage for so many years.
“I have an unlimited data plan, I’ll just store everything in the cloud.” only to discover later that unlimited has an asterisk by it and a footnote that says “LOL it’s still limited”.
> As developers, we need to be better at handling edge cases like out of disk space, out of memory, pegged bandwidth and pegged CPU
In what situation though? Let’s consider disk space. This certainly does not apply to all developers or all programs. Making your program understand the fact that the system has no space left does not seem like something that would be very productive in the vast majority of cases. Like running out of memory, it is not something the program can recover from all by itself unless it knows it created temporary files somewhere that it could go and delete. If that scenario does in fact apply to your program, then it’s not even an edge case: the program should be deleting temporary files if it doesn’t need them anymore. If the P3 was created to add support for that exact function, then I agree that it should be acted upon. A P3 is fine as long as it’s reached. If you don’t reach your P3s ever, then there are different issues that need addressing. I’d even say for something littering users’ disks it should be higher than a P3, but the point is it’s a specific case where it makes sense to handle that error. In every other case, your best bet is a _generic_ exception handler for write operations that will catch any failure and inform the user (e.g. “[Errno 28] No space left on device”), but that’s something that should already be a habit.
There are cases when you want to try to avoid running out of disk space because your program might know that it needs to consume a lot of it (e.g. installers) so it will be checked preemptively. Even then you probably do want to try to handle running out of disk space (e.g. in the unfortunate event that something else consumed the rest of your disk _after_ you preemptively calculated how much was required) so you can attempt a rollback and inform the user to try again.
Other than that, when else is that _specific_ error more important than knowing that the data just couldn’t be written in general? Let’s say you have a camera app that tries to save an image. Surely you’d have a generic exception handler for not being able to save the image, rather than a specific handler for “out of space”, which seems oddly specific considering there are literally hundreds of specific errnos you could be encountering that would prohibit you from writing. I’m sure the user doesn’t want to see something like “Looks like you’re out of disk space. Do you want to try save this image in lower quality instead?”
So my point in all of this is I agree that we should _consider_ the impact of disk space but it doesn’t need to be prioritized by developers unless it’s actually important like in the first few examples I gave.
It’s important that you can recover from this condition.
For example, I’m working on an NVR project. It has a SQLite database that should be placed on your SSD-based root filesystem and puts video frames on spinning disks. It’s essentially a specialized DBMS. You should never touch its data except though its interface.
If you misconfigure it, it will fill the spinning disks and stall. No surprise there. The logical thing for the admin to do is stop it, go into the config tool, reduce the retention, and restart. (Eventually I’d like to be able to reconfigure a running system but for now this is fine.)
But…in an earlier version, this wouldn’t work. It updates a small metadata file in each video dir on startup to help catch accidents like starting with an older version of the db than the dir or vice versa. It used to do this by writing a new metadata file and then renaming into place. This procedure would fail and you couldn’t delete anything. Ugh.
I fixed it through a(nother) variation of preallocation. Now the metadata files are a fixed 512 bytes. I just overwrite them directly, assuming the filesystem/block/hardware layers offer atomic writes this size. I’m not sure this assumption is entirely true (you really can’t find an authoritative list of filesystem guarantees, unfortunately), but it’s more true then assuming disks never fill.
It might also not start if your root filesystem is full because it expects to be able to run SQLite transactions, which might grow the database or WAL. I’m not as concerned about this. The SQLite db is normally relatively small and you should have other options for freeing space on the root filesystem. Certainly you could keep a delete-me file around as the author does.