## Perpetual Motion Via Negative Matter?

One of the most important things we will ever learn about the universe is just how big it is, practically, for our purposes. In the last century we’ve learned that it it is far larger than we knew, in a great many ways. At the moment we are pretty sure that it is about 13 billion years old, and that it seems much larger in spatial directions. We have decent estimates for both the total space-time volume we can ever see, and all that we can ever influence.

For each of these volumes, we also have decent estimates of the amount of ordinary matter they contain, how much entropy that now contains, and how much entropy it could create via nuclear reactions. We also have decent estimates of the amount of non-ordinary matter, and of the much larger amount of entropy that matter of all types could produce if collected into black holes.

In addition, we have plausible estimates of how (VERY) long it will take to actually use all that potential entropy. If you recall, matter and volume is what we need to make stuff, and potential entropy, beyond current actual entropy, (also known as “negentropy”) is they key resource needed to drive thus stuff in desired directions. This includes both biological life and artificial machinery.

Probably the thing we most care about doing with all that stuff in the universe this is creating and sustaining minds like ours. We know that this can be done via bodies and brains like ours, but it seems that far more minds could be supported via artificial computer hardware. However, we are pretty uncertain about how much computing power it takes (when done right) to support a mind like ours, and also about how much matter, volume, and entropy it takes (when done right) to produce any given amount of computing power.

For example, in computing theory we don’t even know if P=NP. We think this claim is false, but if true it seems that we can produce vastly more useful computation with any given amount of computing power, which probably means sustaining a lot more minds. Though I know of no concrete estimate of how many more.

It might seem that at least our physics estimates of available potential entropy are less uncertain that this, but I was recently reminded that we actually aren’t even sure that this amount is finite. That is, it might be that our universe has no upper limit to entropy. In which case, one could keep run physical processes (like computers) that increase entropy forever, create proverbial “perpetual motion machines”. Some say that such machines are in conflict with thermodynamics, but that is only true if there’s a maximum entropy.

Yes, there’s a sense in which a spatially infinite universe has infinite entropy, but that’s not useful for running any one machine. Yes, if it were possible to perpetually create “baby universes”, then one might perpetually run a machine that can fit each time into the entrance from one universe into its descendant universe. But that may be a pretty severe machine size limit, and we don’t actually know that baby universes are possible. No, what I have in mind here is the possibility of negative mass, which might allow unbounded entropy even in a finite region of ordinary space-time.

Within the basic equations of Newtonian physics lie the potential for an exotic kind of matter: negative mass. Just let the mass of some particles be negative, and you’ll see that gravitationally the negative masses push away from each other, but are drawn toward the positive masses, which are drawn toward each other. Other forces can exist too, and in terms of dynamics, it’s all perfectly consistent.

Now today we formally attribute the Casimir effect to spatial regions filled with negative mass/energy, and we sometimes formally treat the absence of a material as another material (think of bubbles in water), and these often formally have negative mass. But other than these, we’ve so far not seen any material up close that acts locally like it has negative mass, and this has been a fine reason to ignore the possibility.

However, we’ve known for a while now that over 95% of the universe seems to be made of unknown stuff that we’ve never seen interact with any of the stuff around us, except via long distance gravity interactions. And most of that stuff seems to be a “dark energy” which can be thought of as having a negative mass/energy density. So negative mass particles seem a reasonable candidate to consider for this strange stuff. And the reason I thought about this possibility recently is that I came across this article by Jamie Farnes, and associated commentary. Farnes suggests negative mass particles may fill voids between galaxies, and crowd around galaxies compacting them, simultaneously explaining galaxy rotation curves and accelerating cosmic expansion.

Apparently, Einstein considered invoking negative mass particles to explain (what he thought was) the observed lack of cosmic expansion, before he switched to a more abstract explanation, which he dropped after cosmic expansion was observed. Some say that Farnes’s attempt to integrate negative mass into general relative and quantum particle physics fails, and I have no opinion on that. Here I’ll just focus on simpler physics considerations, and presume that there must be some reasonable way to extend the concept of negative mass particles in those directions.

One of the first things one usually learns about negative mass is what happens in the simple scenario wherein two particles with exactly equal and opposite masses start off exactly at rest relative to one another, and have any force between them. In this scenario, these two particles accelerate together in the same direction, staying at the same relative distance, forevermore. This produces arbitrarily large velocities in simple Newtonian physics, and arbitrarily larger absolute masses in relativistic physics. This seems a crazy result, and it probably put me off from of the negative mass idea when I first heard about it.

But this turns out to be an extremely unusual scenario for negative mass particles. Farnes did many computer simulations with thousands of gravitationally interacting negative and positive mass particles of exactly equal mass magnitudes. These simulations consistently “reach dynamic equilibrium” and “no runaway particles were detected”. So as a matter of practice, runaway seems quite rare, at least via gravity.

A related worry is that if there were a substantial coupling associated with making pairs of positive and negative mass particles that together satisfy relative conservation laws, such pairs would be created often, leading to a rapid and apparently unending expansion in total particle number. But the whole idea of dark stuff is that it only couples very weakly to ordinary matter. So if we are to explain dark stuff via negative mass particles, we can and should postulate no strong couplings that allow easy creation of pairs of positive and negative mass particles.

However, even if the postulate of negative mass particles were consistent with all of our observations of a stable pretty-empty universe (and of course that’s still a big if), the runaway mass pair scenario does at least weakly suggest that entropy may have no upper bound when negative masses are included. The stability we observe only suggests that current equilibrium is “metastable” in the sense of not quickly changing.

Metastability is already known to hold for black holes; merging available matter into a few huge black holes could vastly increase entropy, but that only happens naturally at a very slow rate. By making it happen faster, our descendants might greatly increase their currently available potential entropy. Similarly, our descendants might gain even more potential entropy by inducing interactions between mass and negative mass that would naturally be very rare.

That is, we don’t even know if potential entropy is finite, even within a finite volume. Learning that will be very big news, for good or bad.

## Choose: Allies or Accuracy

Imagine that person A tells you something flattering or unflattering about person B. All else equal, this should move your opinion of B in the direction of A’s claim. But how far? If you care mainly about accuracy, you’ll want to take into account base rates on claimers A and targets B, as well as more specific specific signs on the accuracy of A regarding B.

But what if you care mainly about seeming loyal to your allies? Well if A is more of your ally than is B, as suggested by your listening now to A, then you’ll be more inclined to just believe A, no matter what. Perhaps if other allies give a different opinion, you’ll have to decide which of your allies to back. But if not, trying to be accurate on B mainly risks seeming disloyal to A and you’re other allies.

It seems that humans tend to just believe gossip like this, mostly ignoring signs of accuracy:

The trustworthiness of person-related information … can vary considerably, as in the case of gossip, rumors, lies, or “fake news.” …. Social–emotional information about the (im)moral behavior of previously unknown persons was verbally presented as trustworthy fact (e.g., “He bullied his apprentice”) or marked as untrustworthy gossip (by adding, e.g., allegedly), using verbal qualifiers that are frequently used in conversations, news, and social media to indicate the questionable trustworthiness of the information and as a precaution against wrong accusations. In Experiment 1, spontaneous likability, deliberate person judgments, and electrophysiological measures of emotional person evaluation were strongly influenced by negative information yet remarkably unaffected by the trustworthiness of the information. Experiment 2 replicated these findings and extended them to positive information. Our findings demonstrate a tendency for strong emotional evaluations and person judgments even when they are knowingly based on unclear evidence. (more; HT Rolf Degen)

I’ve toyed with the idea of independent juries to deal with Twitter mobs. Pay a random jury a modest amount to 1) read a fuller context and background on the participants, 2) talk a bit among themselves, and then 3) choose which side they declare as more reasonable. Sure sometimes the jury would hang, but often they could give a voice of reason that might otherwise be drown out by loud participants. I’d have been willing to pay for this a few times. And once juries became a standard thing, we could lower costs via making prediction markets on jury verdicts if a case were randomly choose for jury evaluation.

But alas, I’m skeptical that most would care much about what an independent jury is estimated to say, or even about what it actually says. For that, they’d have to care more about truth than about showing support for allies.

## The Aristillus Series

There’s a contradiction at the heart of science fiction. Science fiction tends to celebrate the engineers and other techies who are its main fans. But there are two conflicting ways to do this. One is to fill a story with credible technical details, details that matter to the plot, and celebrate characters who manage this detail well. The other approach is to present tech as the main cause of an impressive future world, and of big pivotal events in that world.

The conflict comes from it being hard to give credible technical details about an impressive future world, as we don’t know much about future tech. One can give lots of detail about current tech, but people aren’t very impressed with the world they live in (though they should be). Or one can make up detail about future tech, but that detail isn’t very credible.

A clever way to mitigate this conflict is to introduce one dramatic new tech, and then leave all other tech the same. (Vinge gave a classic example.) Here, readers can be impressed by how big a difference one new tech could make, and yet still revel in heroes who win in part by mastering familiar tech detail. Also, people like me who like to think about the social implications of tech can enjoy a relatively manageable task: guess how one big new tech would change an otherwise familiar world.

I recently enjoyed the science fiction book pair The Aristillus Series: Powers of the Earth, and Causes of Separation, by Travis J I Corcoran (@MorlockP), funded in part via Kickstarter, because it in part followed this strategy. Also, it depicts betting markets as playing a small part in spreading info about war details. In addition, while most novels push some sort of unrealistic moral theme, the theme here is at least relatively congenial to me: nice libertarians seek independence from a mean over-regulated Earth:

Earth in 2064 is politically corrupt and in economic decline. The Long Depression has dragged on for 56 years, and the Bureau of Sustainable Research is making sure that no new technologies disrupt the planned economy. Ten years ago a band of malcontents, dreamers, and libertarian radicals used a privately developed anti-gravity drive to equip obsolete and rusting sea-going cargo ships – and flew them to the moon.There, using real world tunnel-boring-machines and earth-moving equipment, they’ve built their own retreat.

The one big new tech here is anti-gravity, made cheaply from ordinary materials and constructible by ordinary people with common tools. One team figures it out, and for a long time no other team has any idea how to do it, or any remotely similar tech, and no one tries to improve it; it just is.

Attaching antigrav devices to simple refitted ocean-going ships, our heroes travel to the moon, set up a colony, and create a smuggling ring to transport people and stuff to there. Aside from those magic antigravity devices, these books are choc full of technical mastery of familiar tech not much beyond our level, like tunnel diggers, guns, space suits, bikes, rovers, crypto signatures, and computers software. These are shown to have awkward gritty tradeoffs, like most real tech does.

Alas, Corcoran messes this up a bit by adding two more magic techs: one superintelligent AI, and a few dozen smarter-than-human dogs. Oh and the same small group is implausibly responsible for saving all three magic techs from destruction. As with antigravity, in each case one team figures it out, no other team has any remotely similar tech, and no one tries to improve them. But these don’t actually matter that much to the story, and I can hope they will be cut if/when this is made into a movie.

The story begins roughly a decade after the moon colony started, when it has one hundred thousand or a million residents. (I heard conflicting figures at different points.) Compared to Earth folk, colonists are shown as enjoying as much product variety, and a higher standard of living. This is attributed to their lower regulation.

While Earth powers dislike the colony, they are depicted at first as being only rarely able to find and stop smugglers. But a year later, when thousands of ships try to fly to the moon all at once from thousands of secret locations around the planet, Earth powers are depicted as being able to find and shoot down 90% of them. Even though this should be harder when thousands fly at once. This change is never explained.

Even given the advantage of a freer economy, I find it pretty implausible that a colony could be built this big and fast with this level of variety and wealth, all with no funding beyond what colonists can carry. The moon is a long way from Earth, and it is a much harsher environment. For example, while colonists are said to have their own chip industry to avoid regulation embedded in Earth chips, the real chip industry has huge economies of scale that make it quite hard to serve only one million customers.

After they acquire antigrav tech, Earth powers go to war with the moon. As the Earth’s economy is roughly ten thousand times larger that the moon’s, without a huge tech advantage is a mystery why anyone thinks the moon has any chance whatsoever to win this war.

The biggest blunder, however, is that no one in the book imagines using antigrav tech on Earth. But if the cost to ship stuff to the moon using antigrav isn’t crazy high, then antigravity must make it far cheaper to ship stuff around on Earth. Antigrav could also make tall buildings cheaper, allowing much denser city centers. The profits to be gained from these applications seem far larger than from smuggling stuff to a small poor moon colony.

So even if we ignore the AI and smart dogs, this still isn’t a competent extrapolation of what happens if we add cheap antigravity to a world like ours. Which is too bad; that would be an interesting scenario to explore.

Added 5:30p: In the book, antigrav is only used to smuggle stuff to/from moon, until it is used to send armies to the moon. But demand for smuggling should be far larger between places on Earth. In the book thousands of ordinary people are seen willing to make their own antigrav devices to migrate to moon, But a larger number should be making such devices to smuggle stuff around on Earth.

## When OK to Discriminate?

Two days ago I asked 8 related questions via Twitter. Here is one:

The rest of the questions made one of two changes. One change was to swap the type of choice from work/life to “producer (P) of a good or service to choose its customers (or price), or for a consumer (C) to choose from whom it buys”. The other change was to swap the choice basis from “political views or ideology” to “age”, “sex/gender”, or “race/ethnicity”. Here is the table of answer percentages (and total votes):

(Column “W not L” means “P not C” for relevant rows. Matching tweets, by table row #: 1,2,3,4,5,6,7,8.)

While the people who answered my poll are not a random sample of my nation or planet, I still think we can draw some tentative conclusions:

1) People are consistently more forgiving of discrimination in living spaces relative to work, and by consumers relative to producers. Almost no one is willing to allow it for work/producers, and yet not for living/consumers.

2) Opinion varies a lot. Aside from the empty column just described, most other answers get substantial support. Thought it seems few are against using age or sex/gender to choose who you live with.

3) Some kinds of bases are more accepted than others. Support was weakest for discrimination using race/ethnicity, and strongest for using age.

4) There seems to be more support for treating work and living mates differently than for treating producers and consumers differently.

## Do I Offend?

The last eight months have seen four episodes where many people on Twitter called me a bad offensive person, often via rude profanity, sometimes calling for me to be fired or arrested. These four episodes were: sex inequality and redistribution, chances of a delayed harassment complaint, morality-induced overconfidence on historical counterfactuals, and implicit harassment in A Star Is Born. While these topics have occupied only a small fraction of my thought over these months, and a much smaller fraction over my career, they may have disproportionate effects on my reputation. So I’ve tried to pay close attention to the reasons people give.

I think I see a consistent story. While in these cases I have not made moral, value, or political claims, when people read small parts of what I’ve claimed or asked, they say they can imagine someone writing those words for the purpose of promoting political views they dislike. And not just mild views that just a bit on other side of the political spectrum. No, they attribute to me the most extreme bad views imaginable, such as that I advocate rape, murder, slavery, and genocide. People say they are directly and emotionally traumatized by the offensive “creepy” feeling they get when they encounter someone with any prestige and audience seeming to publicly promote views with which they strongly disagree.

Some plausibly contributing factors here include my sometimes discussing sensitive topics, our increasing political polarization, the ease of making mobs and taking words out of context on Twitter, increasing ease of making new accusations similar to previous ones, and my terse and analytic writing style combined with my adding disclaimers re my allegiance to “correct” views. There’s also my following the standard poll practice of not telling those who answer polls the motives for those polls. And I’m a non-poor older white male associated with economics in general and GMU econ in particular; many see all these as indicators of bad political views.

Digging a little deeper, trauma is plausibly increased by a poll format, which stokes fears that bad people will find out that they are not alone, and be encouraged to learn that many others share their views. I suspect this helps explain complaints that my poll population is not representative of my nation or planet.

I also suspect bad faith. Long ago when I had two young kids, they would sometimes pick fights, for example on long car trips. One might start singing, to which the other would complain. We might agree that singing is too much for such a small space. Then the first might start to quietly hum, which we might decide is okay. Then first might hum more loudly and triumphantly, while the second might writhe, cover their ears, and make a dramatic display of suffering.

Similarly, I suspect bad faith when some a) claim to experience “harassment” level suffering due to encountering political views with which they disagree, and yet are fine with high levels of sex, violence, and profanity in TV & movies, b) infer indirectly from my neutral analytical text that I promote the most extreme views imaginable, and c) do not notice that such claims are both a priori implausible and inconsistent with my large corpus of public writing; they either haven’t read much of it or purposely mischaracterize it.

The idea of a large shared intellectual sphere wherein we can together analyze difficult topics holds a strong appeal to me. The main criteria for consideration in such a sphere should be the coherence and persuasiveness of specific relevant arguments. When evaluating each arguments, there is usually little need to infer distantly related positions of those who offer arguments. Usually an argument either works or it doesn’t, regardless of who says it or why.

I try to live up to such ideals in how I write and talk. I hope that many who read and follow me share these ideals, and I appreciate their support. I’m thus not favorably inclined toward suggestions that I stop discussing sensitive topics, or that adopt a much more elaborate disclaimer style, or that I stop asking my followers questions, to prevent others from being traumatized by hearing their answers, and or to keep followers from finding out that others share their opinions.

## Its All Data

Bayesian decision theory is often a useful approximation as a theory of decisions, evidence, and learning. And according to it, everything you experience or see or get as an input can be used as data. Some of it may be more informative or useful, but it’s all data; just update via Bayes rule and off you go.

So what then is “scientific” data? Well “science” treated as a social phenomena is broken into many different disciplines and sub-fields, and each field tends to have its own standards for what kinds of data they will publish. These standards vary across fields, and have varied across time, and I can think of no universals that apply to all fields at all times.

For example, at some times in some fields one might be allowed to report on the content of one’s dreams, while in other fields at times that isn’t okay but it is okay to give statistics summarizing the contents of all the dreams of some set of patients at a hospital, while in other fields at other times they just don’t want to hear anything subjective about dreams.

Most field’s restrictions probably make a fair bit of sense for them. Journal space is limited, so even if all data can tell you something, they may judge that certain kinds of data rarely say enough, compared to other available kinds. Which is fine. But the not-published kinds of data are not “unscientific”, though they may temporarily be “un-X” for field X. And you should remember that as most academic fields put a higher priority on being impressive than informative, they may thus neglect unimpressive data sources.

For example, chemists may insist that chemistry experiments know what are the chemicals being tested. But geology papers can give data on tests made on samples obtained from particular locations, without knowing the exact chemical composition of those samples. And they don’t need these samples to be uniformly sampled from the volume of the Earth or the universe; it is often enough to specify where samples came from.

Consider agricultural science field experiments, where they grow different types of crops in different kinds of soil and climate. They usually don’t insist on knowing the exact chemical composition of the soil, or the exact DNA of the crops. But they can at least tell you where they got the crops, where exactly is the test field, how they were watered, weeded, and fertilized, and some simple stats on the soils. It would be silly to insist that such experiments use a “representative” sample of crops, fields, or growing conditions. Should it be uniformly sampled from actual farming conditions used today, from all possible land on Earth’s surface, or from random mass or volume in the universe across its history?

Lab experiments in the human and social sciences today typically use convenience samples of subjects. They post invitations to their local school or community and then accept most everyone who signs up or shows up. They collect a few stats on subjects, but do not even attempt to create “representative” samples of subjects. Nationally, globally-now, or over-all-history representative samples of lab subjects would just be vastly more expensive. Medical experiments are done similarly. They may shoot for balance along a few particular measured dimensions, but on other parameters they take whoever they can get.

I mention all this because over the last few months I’ve had some fun doing Twitter polls. And I’ve consistently had critics tell me I shouldn’t do this, because Twitter polls are “meaningless” or “worthless” or “unscientific”. They tell me I should only collect the sort of data I could publish in a social science journal today, and if I show people any other kind of data I’m an intellectual fraud. Aas if they saw some kinds of data as “unscientific”.

Today I have ~24,700 followers, and I can typically get roughly a thousand people to answer each poll question. And as my book Elephant in the Brain suggests, I have many basic questions about human behavior that aren’t very specific to particular groups of people; we have many things to learn that apply to most people everywhere at all times. Whenever a question occurs to me, I can take a minute to post it, and within a few hours get some thought-provoking answers.

Yes, the subset of my Twitter followers who actually respond to my polls are not a representative sample of my nation, world, profession, university, or even of Twitter users. But why exactly is it so important to have a representative sample from such a group?

Well there is a big advantage to having many representative polls from the same group, no matter what that group. Then when comparing such polls you don’t have to wonder less whether sample differences are driving the results. But the more questions I ask of my Twitter followers, the more I can usefully compare those different polls. For example, if I ask them at different times, I can see how their attitudes change over time. Or if I make slight changes in wording, I can see what difference wording changes make.

Of course if I were collecting data to help a political candidate, I’d want data representative of potential voters in that candidate’s district. But if I’m just trying to understand the basics of human behavior, its not clear why I need any particular distribution over people polled. Yes, the answers to each thing I ask might vary greatly over people, and my sample might have few members of groups who act the most differently. But this can happen for any distribution over the people sampled.

Even though the people who do lab experiments on humans usually use convenience samples that are not representative of a larger world, what they do is still science. We just have to keep in mind that differing results might be explained by different sources of subjects. Similarly, the data I get from my Twitter polls can still be useful to a careful intellectual, even if isn’t representative of some larger world.

If one suspects that some specific Twitter poll results of mine differ from other results due to my differing sample, or due to my differing wordings, the obvious checks are to ask the same questions of different samples, or using different wordings. Such as having other people on Twitter post the a similar poll to their different pool of followers. Alas, people seem to be willing to spend lots of time complaining about my polls, but almost never are willing to take a few seconds to help check on them in this way.

## #MeToo In A Star Is Born

The Me Too movement (or #MeToo movement), with many local and international alternative names, is a movement against sexual harassment and sexual assault. #MeToo spread virally in October 2017 as a hashtag on social media in an attempt to demonstrate the widespread prevalence of sexual assault and harassment, especially in the workplace. It followed soon after the sexual abuse allegations against Harvey Weinstein. (more)

It is now a bit over a year since #MeToo started to push for more strongly censuring an expanded range of activities. Both the facts of change and expansion suggest that we are now less clear on what exactly counts as unacceptable sexual “harassment.” This increased ambiguity struck me when watching the new Star is Born movie, which now has a 64% chance to win the Best Picture Oscar, and when asking my twitter followers a few related questions.

The prototypical #MeToo villain was Harvey Weinstein, a powerful older man in the movie industry who offered to help young pretty much-less-powerful women with their acting careers, in trade for sex. He’d help by recommending them for jobs, or hurt them by recommending against them. Yes, Harvey was also accused of directly forcing himself on some women, but society already had a strong consensus against that. These sex for career help offers were the newer issue.

In the new movie A Star is Born, a popular older male singer hears a young amateur female singer. He then quickly expresses both sexual and professional interest in her, and many people around the two of them indicate that they see both of these interests expressed. He offers to fly her to his next show, she declines, but then changes her mind. He brings her on stage to sing a song, which greatly helps her career. She stay with him that night and they had sex. She continues to travel with him, and he continues to help and they continue to have sex. She isn’t an idiot, so we must presume she knows that if she stops having sex with him, there’s a good chance he will stop helping her career.

One could interpret this situation as him making an implicit offer to trade career help for sex, and her accepting this offer. Which seems to violate the #MeToo standard that Harvey Weinstein violated. Yet few complain of this, even in a politically sensitive industry during this extra sensitive time. And in fact, while most of my twitter followers seemed reluctant to take any position on this, those who did were about 3 to 1 against blaming this man. They instead said they would not defend this woman and “believe her” if, a month into their relationship, she had soured on it and publicly accused him of abusing his position of power:

Yet given an abstract description of this sort of situation, about half of my twitter followers say that his behavior is not okay, and that he is not saved by her liking the deal overall, his asking only once, or his offering an implicit deal that gives her (and him) plausibly deniability:

These results seem to me to imply a lot of uncertainty, disagreement, and individual inconsistency. Whatever the actual causes of these opinions, we seem far from achieving a consensus on what behaviors to censure how much.

Added 7pm:  Many on Twitter now say that my last poll above is aggressive, pro-harrassment, and itself constitutes harassment, because I allow respondents the possibility of saying that the man’s offer could be okay. In particular, respected economist Betsey Stevenson says:

This kind of “innocent query” sums up why economics is a more hostile profession for women than many others. … it suggests the options you gave are potentially ok behavior. … [harassment] is not an assumption, its how I and many others experience it. … why don’t you change your behavior given the feedback if you don’t want to be harassing.

Note that she claims she can tell what is “harassment” via her direct experience, no inference required.

## Response To Hossenfelder

In my last post I said:

In her new book Lost in Math, theoretical physicist Sabine Hossenfelder describes just how bad things have become. … To fix these problems, Hossenfelder proposes that theoretical physicists learn about and prevent biases, promote criticism, have clearer rules, prefer longer job tenures, allow more specialization and changes of fields, and pay peer reviewers. Alas, as noted in a Science review, Hossenfelder’s proposed solutions, even if good ideas, don’t seem remotely up to the task of fixing the problems she identifies.

In the comments she took issue:

I am quite disappointed that you, too, repeat the clearly false assertion that I don’t have solutions to offer. … I originally meant to write a book about what’s going wrong with academia in general, but both my agent and my editor strongly advised me to stick with physics and avoid the sociology. That’s why I kept my elaborations about academia to an absolute minimum. You are right in complaining that it’s sketchy, but that was as much as I could reasonably fit in.

But I have on my blog discussed what I think should be done, eg here. Which is a project I have partly realized, see here. And in case that isn’t enough, I have a 15 page proposal here. On the proposal I should add that, due to space limitations, it does not contain an explanation for why I think that’s the right thing to do. But I guess you’ll figure it out yourself, as we spoke about the “prestige optimization” last week.

I hadn’t seen any of those 3 links, and your book did list some concrete proposals, so I incorrectly assumed that if you had more proposals then you’d mention them in your book. I’m happy to support your proposed research project. … I don’t see our two proposals as competing, since both could be adopted.

She agreed:

I don’t see them as competing either. Indeed, I think they fit well .

Then she wrote a whole blog post elaborating!:

And then there are those who, at some time in my life, handed me a piece of the puzzle I’ve since tried to assemble; people I am sorry I forgot about. … For example … Robin Hanson, with whom I had a run-in 10 years ago and later met at SciFoo. I spoke with Robin the other day. … The reason I had an argument with him is that Robin proposed – all the way back in 1990 – that “gambling” would save science. He wanted scientists to bet on the outcomes of their colleagues’ predictions and claimed this would fix the broken incentive structure of academia. I wasn’t fond of Robin’s idea back then. The major reason was that I couldn’t see scientists spend much time on a betting market. …

But what if scientists could make larger gains by betting smartly than they could make by promoting their own research? “Who would bet against their career?” I asked Robin when we spoke last week. “You did,” he pointed out. He got me there. … So, Robin is right. It’s not how I thought about it, but I made a bet. … In other words, yeah, maybe a betting market would be a good idea. Snort.

My thoughts have moved on since 2007, so have Robin’s. During our conversation, it became clear our views about what’s wrong with academia and what to do about it have converged over the years. To begin with, Robin seems to have recognized that scientists themselves are indeed unlikely candidates to do the betting. Instead, he now envisions that higher education institutions and funding agencies employ dedicated personnel to gather information and place bets. …

This arrangement makes a lot of sense to me. First and foremost, it’s structurally consistent. … Second, it makes financial sense. … Third, it is minimally intrusive yet maximally effective.  … So, I quite like Robin’s proposal. Though, I wish to complain, it’s too vague to be practical and needs more work. It’s very, erm, academic. …

That’s also why Robin’s proposal looks good to me. It looks better the more I think about it. Three days have passed, and now I think it’s brilliant. Funding agencies would make much better financial investments if they’d draw on information from such a prediction market. Unfortunately, without startup support it’s not going to happen. And who will pay for it?

This brings me back to my book. Seeing the utter lack of self-reflection in my community, I concluded scientists cannot solve the problem themselves. The only way to solve it is massive public pressure. The only way to solve the problem is that you speak up. Say it often and say it loudly, that you’re fed up watching research funds go to waste on citation games. Ask for proposals like Robin’s to be implemented.

As Hossenfelder has been kind enough to consider my proposal in some detail, let me dig a bit into her proposal:

We really need is a practical solution. And of course I have one on offer: An open-source software that allows every researcher to customize their own measure for what they think is “good science” based on the available data. That would include the number of publications and their citations. But there is much more information in the data which currently isn’t used. … individualized measures wouldn’t only automatically update as people revise criteria, but they would also counteract the streamlining of global research and encourage local variety. (more)

We created this website so you can generate a keyword cloud from your publications. You can then download the image and add it to your website or your CV. You can also generate a keyword cloud for your institution so that, rather than listing the always-same five research groups, you can visually display your faculty’s activity. In addition, you can use our website to search for authors with interests close to a list of input keywords or author names. This, so we hope, will aid you in finding speakers for conferences or colloquia while avoiding the biases that creep in when we rely on memory or personal connections. (more)

The basic idea here seems to be that what scientists read today is distorted somehow by their reading systems. For example, instead of just reading the good stuff, current reading systems might induce scientists to read prestigious and popular stuff. If so, then by giving scientists better tools for finding good things to read, distortions would be fewer, good science would be read more, and thereby gain more prestige and popularity, relative to the current situation.

Okay, current systems for finding things to read do probably introduce some distortions. But today there are so many ways to find things to read, and so many ways to make new reading systems, that I really find it hard to see this as the limiting factor. Instead, I expect that incentives are mainly to blame, such as for example biasing readers toward prestigious and popular stuff. You and your papers are looked on more favorably when they are seen as building on other prestigious and popular papers.

Consider an analogy with citations. Someone who is honestly just trying to do good science will sometimes need to read other papers, and when they write papers they will sometimes make enough use of another paper that they should mention that fact as a citation. If we had a big population of people who read and cited papers solely for the purpose of doing science, and whose priorities were representative of science, then we could use stats on who this group read or cited as an indicator of scientific quality. You are higher quality if you write more things that are read or cited. And this quality metric could even be used to hand out publications, jobs, funding, etc.

However, as we all know, simple citation counts have problems when these assumptions don’t hold. When we count papers equally ignoring that some mattered more. Or when we count boring but prestigious and popular papers more. Or when referees lobby to get their works cited. Or when authors trade favors to cite allies. The same sort of incentives the distort who scientists cite can also distort who they read, especially when reading stats are made visible. Like when people today make bots to download their papers from servers to produce big download counts.

I say the main problem is bad incentives, not bad what-to-read tools. So better tools to pursue existing incentives will make only limited gains.

## Can Foundational Physics Be Saved?

Thirty-four years ago I left physics with a Masters degree, to start a nine year stint doing AI/CS at Lockheed and NASA. I loved physics theory, and given how far physics had advanced over the previous two 34 year periods, I expected to be giving up many chances for glory. But though I didn’t entirely leave (I’ve since published two physics journal articles), I’ve felt like I dodged a bullet overall; physics theory has progressed far less in the last 34 years, mainly because data dried up:

One experiment after the other is returning null results: No new particles, no new dimensions, no new symmetries. Sure, there are some anomalies in the data here and there, and maybe one of them will turn out to be real news. But experimentalists are just poking in the dark. They have no clue where new physics may be to find. And their colleagues in theory development are of no help.

In her new book Lost in Math, theoretical physicist Sabine Hossenfelder describes just how bad things have become. Previously, physics foundations theorists were disciplined by a strong norm of respecting the theories that best fit the data. But with less data, theorists have turned to mainly judging proposed theories via various standards of “beauty” which advocates claim to have inferred from past patterns of success with data. Except that these standards (and their inferences) are mostly informal, change over time, differ greatly between individuals and schools of thought, and tend to label as “ugly” our actual best theories so far.

Yes, when data is truly scarce, theory must suggest where to look, and so we must choose somehow among as-yet-untested theories. The worry is that we may be choosing badly:

During experiments, the LHC creates about a billion proton-proton collisions per second. … The events are filtered in real time and discarded unless an algorithm marks them as interesting. From a billion events, this “trigger mechanism” keeps only one hundred to two hundred selected ones. … That CERN has spent the last ten years deleting data that hold the key to new fundamental physics is what I would call the nightmare scenario.

One bad sign is that physicists have consistently, confidently, and falsely told each other and the public that big basic progress was coming soon:

The second rule for inventing a new particle is that you need an argument for why it’s just about to be discovered, because otherwise nobody will care. This doesn’t have to be a good argument—everyone in the business wants to believe you anyway—but you have to give your audience an explanation they can repeat. …

Lies and exaggerations have become routine in proposal writing. …

This has resulted in decades of predictions for new effects that were always just about measurable with an upcoming experiment. And if that experiment didn’t find anything, the predictions were revised to fall within the scope of the next upcoming experiment.

Theorists doesn’t seem to have learned much from the data drought, as they tout the same sort of theories, and predict similar rates of progress, as they did before informative data stopped. In addition, theorists are subject to many known cognitive and social biases; see many related book quotes at the end of this post. Perhaps most disturbing, physicists seem to be in denial about these problems:

My colleagues only laugh when I tell them biases are a problem, and why they dismiss my “social arguments,” believing they are not relevant to scientific discourse. … Scientists trust in science. They’re not worried. “It’s the system,” they say with a shrug, and then they tell themselves and everybody willing to listen that it doesn’t matter, because they believe that science works, somehow, anyway. “Look,” they say, “it’s always worked.” And then they preach the gospel of innovation by serendipity. It doesn’t matter what we do, the gospel goes; you can’t foresee breakthroughs, anyway.

Of course physicists don’t really believe that “it doesn’t matter what we do”’; they fight fiercely over funding, jobs, publications, etc.

Hossenfelder ends saying the public can’t now trust the conclusions of most all who study foundations of physics, as such people haven’t taken steps to address cognitive biases, offer balanced account of pros and cons, protect themselves from peer pressure, and find funding that doesn’t depend on their producing popular and expected results. But she doesn’t tell the public who to believe instead.

To fix these problems, Hossenfelder proposes that theoretical physicists learn about and prevent biases, promote criticism, have clearer rules, prefer longer job tenures, allow more specialization and changes of fields, and pay peer reviewers. Alas, as noted in a Science review, Hossenfelder’s proposed solutions, even if good ideas, don’t seem remotely up to the task of fixing the problems she identifies. Sermons preaching good intentions usually lose against bad incentives, and the incentive problems here are big.

It seems to me that it will take much larger incentive changes to substantially deal with this problem. Let me take the rest of this post to elaborate.

To me, “science” is efforts to find theories that accurately predict important data, and efforts to find data that distinguishes between likely theories. The phrases “to find” are essential here. It is not enough to merely claim that you aim for this sort of theory or data; these must actually be primary aims driving your behavior. Without such aims, you may know science, use science, or help science, but you are not doing science. Many different motives, such as prestige, money, or altruism, can cause you to have such aims; the issue here is more results than feelings.

A fast enough flow of new theories and data can naturally create sufficient incentives for these aims. When the mutual testing of theory and data happens in times much shorter than a career, then theorists can gain by explaining new data, and experimenters gain by distinguishing theories. But in data droughts, theorists can have stronger incentives to join beauty fashion cabals, and data collectors can be tempted to ignore promising but unpopular theories.

Hossenfelder has convinced me that in fundamental physics today, we have reason to doubt how much such aims actually drive behavior. Yes, science is the aim they claim, but it looks more like their main aim is to appease local beauty fashions, fashions that are more rationalized than driven by an ability to explain future data.

One way to tilt fundamental physics theory back toward science is stronger longer term incentives. That is, we might induce current researchers to care more about matches between theory and data that may not be seen for decades. One approach might be to have scientists live in near poverty today, in the hopes of huge rewards later for their children or grandchildren. Alas, while we might find some who could be motivated this way, they probably don’t make the best scientists.

Fortunately, in a market economy it is possible to give long term incentives to organizations that use short-term incentives to hire experts who help them. Yes, all else equal it costs more to create long term incentives, relative to short term ones. But when a cheap product doesn’t work, consider paying more for higher quality, even when that is more expensive.

In particular, I’ve proposed that science patrons subsidize long-term prediction markets on many particular questions in fundamental physics, and on the future-evaluated prestige of each paper, person, and institution. Eleven years ago, Hossenfelder didn’t think much of my earlier proposals:

The whole idea fails on a very obvious point: … the majority of scientists is not in academia because they want to make profit by investing smartly. Financial profit … is just not what drives most scientists. … No, the majority of experts wouldn’t pay any attention to that betting market.

To argue to the contrary, that my proposals are feasible, let me remind you all of what the stock market does today for firm incentives.

Today, the market prices of firms influences firm prestige, which influences how people associated with that firm are treated. When others look to invite speakers to conferences, or to quote experts in the media, or to hire new employees, they prefer to choose individuals from higher priced firms. Also, market traders take the prestige of people and activities associated with a firm into account when setting the market price of that firm. So if there were were markets estimating future historian evaluations of the prestige of papers, scientists, and institutions, we should expect a two-way feedback between these market prices and other markers of scientific prestige, such as publications and jobs.

Today, thirty year bonds are traded often. Thus there are financial actors who in effect care about what happens in thirty years. Some of these actors are organizations. Thus there can be organizations who in effect care about possible winnings in prediction markets several decades later. So it is possible to induce organized effort today via sufficiently subsidized long term prediction markets.

Today, many organizations, such as hedge funds, gain most of their income from trading in financial markets. They do this via paying for efforts to collect info, and then using that info to make trades that are more likely to win than to lose. Yes, this requires that there exist fools out there who take the losing side of these trades, but such fools exist in most financial markets, and if they didn’t exist we could produce the same effect via subsidized financial markets (such as via automated market makers).

Hedge fund employees use many strategies to collect relevant info, including statistical analysis and complex computation. But many of them gain that info via gossip from a wide network of social contacts to whom they talk to regularly, and whom they compensate in mostly informal ways. So the people who contribute relevant info to markets need not trade themselves, or be employees of first that trade. The employees who collect info need not take all or even any of the risks of the trades that they info the collect induces. And employees of a firm focused on making long term gains need not personally care much about the long term.

Similarly, science could be funded via long-term prediction markets. Given sufficient subsidies, hedge funds would appear that specialize in trading in such markets, and while some of those would specialize in short-term trades, others would specialize in strategies that sometimes require holding asserts over long terms. Such funds could gather info in many ways, including via gossip and via hiring both theorists and data collectors to do research and then let their patron trade first on resulting info. These hired scientists need not themselves trade, take financial risks, or care about the long term.

So in my imagined healthy future physics, there are still many “core” scientists who focus primarily on their theories or data collection efforts. But this group is not quite as an autonomous a force, only accountable to other current prestigious core scientists. Instead, some part of science funding goes to pay the overhead to create hedge funds, and to give them long-term incentives to correct current market estimates, and to decide which core science efforts to fund and believe. Yes that might be more overhead than today, but overhead worth its price.

When you or anyone saw a current research effort that looked to you hopeless, you could expect to profit by selling it short. And when you or anyone saw something that looked much more promising than suggested by its current prestige markers, you could buy and expect to profit. Which is a lot more than you can usually do today.

Finally, here are those promised book quotes on biases in theoretical physics:

The criteria we use … are mostly social and aesthetic. And I doubt they are self-correcting.

There are lots of diseases in academia and one of them is that you keep doing what you’ve been doing. Somehow it turns out that everybody does the analytic continuation of what they’ve been doing for their PhD.’

This pressure to please and publish discourages innovation: it is easier to get recognition for and publish research on already known topics than to pursue new and unusual ideas.

Being open about the shortcomings of one’s research program … means sabotaging one’s chances of future funding. We’re set up to produce more of the same.…

Even tenured researchers are now expected to constantly publish well-cited papers and win grants, both of which require ongoing peer approval. The more peers approve, the better. …

Probably the most prevalent brain bug in science is confirmation bias. …

We’ve always had cognitive and social biases, of course. … And science has progressed just fine, so why should we start paying attention now? (By the way, that’s called the status quo bias.) Larger groups are less effective at sharing relevant information. Moreover, the more specialized a group is, the more likely its members are to hear only what supports their point of view. …

Default assumption must be that theory assessment is both cognitively and socially biased unless steps are taken to address these issues. But no such steps are currently being taken.

## Overconfidence From Moral Signaling

Tyler Cowen in Stubborn Attachments:

The real issue is that we don’t know whether our actions today will in fact give rise to a better future, even when it appears that they will. If you ponder these time travel conundrums enough, you’ll realize that the effects of our current actions are very hard to predict,

While I think we often have good ways to guess which action is more likely to produce better outcomes, I agree with Tyler than we face great uncertainty. Once our actions get mixed up with a big complex world, it becomes quite likely that, not matter what we choose, in fact things would have turned out better had we made a different choice.

But for actions that take on a moral flavor, most people are reluctant to admit this:

If you knew enough history you’d see >10% as the only reasonable answer, for most any big historical counterfactual. But giving that answer to the above risks making you seem pro-South or pro-slavery. So most people express far more confidence. In fact, more than half give the max possible confidence!

I initially asked a similar question on if the world would have been better off overall if Nazis had won WWII, and for the first day I got very similar answers to the above. But I made the above survey on the South for one day, while I gave two days for the Nazi survey. And in its second day my Nazi survey was retweeted ~100 times, apparently attracting many actual pro-Nazis:

Yes, in principle the survey could have attracted wise historians, but the text replies to my tweet don’t support that theory. My tweet survey also attracted many people who denounced me in rude and crude ways as personally racist and pro-Nazi for even asking this question. And suggested I be fired. Sigh.