• 0 Posts
  • 15 Comments
Joined 1 year ago
cake
Cake day: October 17th, 2023

help-circle
  • exactly: It’s “open source” like android. The core android is open source (in many cases because they are required to), but that does not include anything that makes the actual system work for normal users. The core android is open source (“Android Open Source Project”), but that includes practically nothing: Essentially the stuff that is in there are things that have to be open source (like the linux kernel they use). However, if you want to have the system “practically useable” you need a lot more, which is usually the “Google Mobile Services”, which are proprietary. You are also generally required to install all items in the GMS, i.e. even if you only need the play store, you still have to install google chrome.

    Further, the android name and logo are trademarked by google, so even if you want to roll your own android, you would not be allowed to call it android. WearOS is essentially the same thing: The android subsystem is open, the actual thing you call WearOS (plus trademarks, etc.) are not.



  • train one with all the Nintendo leaks

    This is fine

    generate some Zelda art and a new Mario title

    This is copyright infringement.

    The ruling in japan (and as I predict also in other countries) is that the act of training a model (which is just a statistical estimator) is not copyrightable, so cannot be copyright infringement. This is already standard practice for everything else: You cannot copyright a mathematical function, regardless of how much data you use to fit to it (that is sensible: CERN has fit physics models to petabytes worth of data, that doesn’t mean they hold a copyright on laws of nature, they just hold the copyright on the data itself). However, if you generate something that is copyrighted, that item is still copyrighted: It doesn’t matter whether you used an AI image generator, photoshop, or a tattoo gun.


  • First, I don’t think that’s the right comparison. You need to compare them to taxis.

    It’s not just that, you generally have a significant distribution shift when comparing the self-drivers/driving assistants to normal humans. This is because people only use self-driving in situations where it has a chance of working, which is especially true with stuff like tesla’s self-driving where ultimately people are not even going to start the autopilot when it gets tricky (nevermind intervening dynamically: they won’t start it in the first place!)

    For instance, one of the most common confounding factors is the ratio of highway driving vs non-highway driving: Highways are inherently less accident prone since you don’t have to deal with intersections, oncoming traffic, people merging in from every random house, or children chasing a ball into the street. Self-drivers tend to report a lot more highway traffic than ordinary drivers, due to how the availability of technology dictates where you end up measuring. You can correct for that by e.g. explicitly computing the likelihood p(accident|highway) and use a common p(highway) derived from the entire population of car traffic.





  • The problem is that the model is actually doing exactly what it’s supposed to, it’s just not what openai wants it to do. The reason the prompt extraction method works is because the underlying statistical model gets shifted far outside the domain of “real” language. In that case the correct maximizing posterior becomes a sample from the prior (here that would be a sample from the dataset, this is combined with things like repetition penalties).

    This is the correct way a statistical estimator is supposed to work, but not the way you want it to work. That’s also why they can’t really fix this: there’s nothing broken to begin with (and “unbreaking” it would almost surely blow something take up)


  • You cannot run Signal without “Signal - the company” existing. All of their systems are designed to be attached to one specific backend, namely the signal-run backend, meaning without re-engineering the existing infrastructure you cannot simply swap over.

    As @kpw already mentioned, “Signal - the company” dying would involve a functional reset of everything: No contacts, no servers, no infrastructure. COULD you fork the thing and build you own system? Sure, but it would be functionally unusable since no one else would be using it, since everything relies on specifically the signal servers to function. A post-signal system could re-use some of their code (if it runs outside signal corp - “works on my machine” could be present in this project as well), but would need to rebuild the actual network.

    This is in contrast to something like the matrix protocol: If a specific matrix instance goes kaput, you still have the overall network working. This means that even if an instance implodes, you would have an easy migration path since the matrix network itself persists.





  • Surely a company should be governed by the laws of the state in which they are based

    This is not true and wouldn’t make why sense: let’s say you are a delivery company and one of your drivers runs over a dog in Texas. The lawsuit can be filed in Texas, regardless of whether your company is in Texas, California, or even outside the united states. The place you are incorporated in doesn’t change the damages or laws you violated when running over the dog. Of course you can also move the venue to the state the company is based in.

    You cannot (generally) move it to another state, since that state doesn’t even have jurisdiction over any part of the incident.

    The internet is just special in the sense that really something that happened on the internet happened everywhere on earth at the same time, meaning any venue is a place where potential damages were accrued.


  • You are vastly overestimating the amount of storage you need since you are looking at some download which itself has to choose the encoding (which is independent of whatever youtube does: youtube absolutely crushes the quality).
    Most estimates assume that youtube has 1 exabyte of storage, let’s say we buy this in bulk from retail (which we wouldn’t do: you wait as long as possible since storage prices are going down and retail stores would give you the finger if you ordered and exabyte worth).
    Let’s take that number and run with it:
    Buying retail, you can get Seagate Exos X20 20TB drives for 280€, 1 exabyte is 1Mio terabyte, meaning we have 1_000_000/20 * 280 = 14 Mio € (you’d need machines to put those into but you also wouldn’t buy the entire thing upfront, and using retail prices either).

    Compute also isn’t that big of a deal if you do it correctly: the expensive part in video hosting is usually video encoding since to get small video sizes you need to spend compute beforehand to compress it.
    However, you can shift this in significant parts to the user by implementing the transcoding in WASM and running this clientside (see e.g. https://www.w3.org/2021/03/media-production-workshop/talks/qiang-fu-video-transcoding.html) in that case users would compress locally in the browser before uploading (this presumably wouldn’t even take longer than normal uploads for most people since you trade off transcoding time against upload time).
    There are still other compute expenses but those are much more limited.
    These mechanisms don’t (at least to my knowledge) exist in peertube yet, but would be possible.

    The actually expensive part is always the actual networking: Networking is one of the few things that actually get more expensive at scale due to the complexity explosion, rather than cheaper (e.g. having dedicated transcoding hardware drops in price per user since you have higher utilization).
    Networking quickly runs into bottlenecks where you have to account for all the covariances between datasets in the network.
    Basically to increase the amount of e.g. storage available everything in the network needs to be increased (from the local machines connections, over the cables and switches up to routers and outgoing connections) due to you increasing the density at one point, you have to increase the network everywhere.
    That’s why networking dwarfs everything: you just get crushed by networking being the bottleneck between your increasingly dense devices.

    The clue behind peertube is that this is not as extreme of an effect due to

    1. federation (certain connections just aren’t dense due to the overall network topology being distributed)
    2. torrents

    The latter is the important part: instead of having network cost rising (super) linearly to the amount of users you have it rise linearly to the amount of simultaneous unique videos.
    This is a much smaller number which means you do not need to compete in that space, which is the dominant cost factor. (if you have a method where one user can retain the video and share it without actively watching that same video, you can probably get real-world sublinear scaling)

    Mind you, the costs involved here are still large, but not insurmountably large, especially considering there is not one unique organisation that would have to pay for the entire thing and its not an upfront expense. Fundamentally though the system is built such that it won’t be crushed as users flood into the network.


  • There are certain things you are allowed to use cookies for even without asking for permission (i.e. they wouldn’t even need to tell you about them). These are effectively the kinds of things that are necessary for your website to work in the first place: For instance if you have a dark and a light mode and you want people to change this even without logging in, another example is language settings (this is why sites like e.g. duckduckgo can have a “settings” tab despite the fact you are not logged into anything).

    The rule-of-thumb is that everything that is directly related to the functionality of your website is fair even without asking (they are “essential”).
    Of course the specifics are a little more tricky: For instance you could have a shop in which you can put things into your “shopping basket” without being logged in. This is fine since it’s core functionality. However, if you use that same cookie to also inform your recommendation algorithm, you could get into trouble. Another aspect is 3rd party cookies: These, while not theoretically always requiring permissions, in practice do need expressed permission since you, as the website host, cannot guarantee what happens with these cookies (and 3rd party cookies are, in general, an easy way to track users, which isn’t core functionality for most websites).