Apple says it took a 'responsible' approach to training its Apple Intelligence models | TechCrunch (2024)

Apple has published a technical paper detailing the models that it developed to power Apple Intelligence, the range of generative AI features headed to iOS, macOS and iPadOS over the next few months.

In the paper, Apple pushes back against accusations that it took an ethically questionable approach to training some of its models, reiterating that it didn’t use private user data and drew on a combination of publicly available and licensed data for Apple Intelligence.

“[The] pre-training data set consists of … data we have licensed from publishers, curated publicly available or open-sourced datasets and publicly available information crawled by our web crawler, Applebot,” Apple writes in the paper. “Given our focus on protecting user privacy, we note that no private Apple user data is included in the data mixture.”

In July, Proof News reported that Apple used a data set called The Pile, which contains subtitles from hundreds of thousands of YouTube videos, to train a family of models designed for on-device processing. Many YouTube creators whose subtitles were swept up in The Pile weren’t aware of and didn’t consent to this; Apple later released a statement saying that it didn’t intend to use those models to power any AI features in its products.

The technical paper, which peels back the curtains on models Apple first revealed at WWDC 2024 in June, called Apple Foundation Models (AFM), emphasizes that the training data for the AFM models was sourced in a “responsible” way — or responsible by Apple’s definition, at least.

The AFM models’ training data includes publicly available web data as well as licensed data from undisclosed publishers. According to The New York Times, Apple reached out to several publishers toward the end of 2023, including NBC, Condé Nast and IAC, about multi-year deals worth at least $50 million to train models on publishers’ news archives. Apple’s AFM models were also trained on open source code hosted on GitHub, specifically Swift, Python, C, Objective-C, C++, JavaScript, Java and Go code.

Training models on code without permission, even open code, is a point of contention among developers. Some open source codebases aren’t licensed or don’t allow for AI training in their terms of use, some developers argue. But Apple says that it “license-filtered” for code to try to include only repositories with minimal usage restrictions, like those under an MIT, ISC or Apache license.

To boost the AFM models’ mathematics skills, Apple specifically included in the training set math questions and answers from webpages, math forums, blogs, tutorials and seminars, according to the paper. The company also tapped “high-quality, publicly-available” data sets (which the paper doesn’t name) with “licenses that permit use for training … models,” filtered to remove sensitive information.

All told, the training data set for the AFM models weighs in at about 6.3 trillion tokens. (Tokens are bite-sized pieces of data that are generally easier for generative AI models to ingest.) For comparison, that’s less than half the number of tokens — 15 trillion — Meta used to train its flagship text-generating model, Llama 3.1 405B.

Apple sourced additional data, including data from human feedback and synthetic data, to fine-tune the AFM models and attempt to mitigate any undesirable behaviors, like spouting toxicity.

“Our models have been created with the purpose of helping users do everyday activities across their Apple products, grounded
in Apple’s core values, and rooted in our responsible AI principles at every stage,” the company says.

There’s no smoking gun or shocking insight in the paper — and that’s by careful design. Rarely are papers like these very revealing, owing to competitive pressures but also because disclosing too much could land companies in legal trouble.

Some companies training models by scraping public web data assert that their practice is protected by fair use doctrine. But it’s a matter that’s very much up for debate and the subject of a growing number of lawsuits.

Apple notes in the paper that it allows webmasters to block its crawler from scraping their data. But that leaves individual creators in a lurch. What’s an artist to do if, for example, their portfolio is hosted on a site that refuses to block Apple’s data scraping?

Courtroom battles will decide the fate of generative AI models and the way they’re trained. For now, though, Apple’s trying to position itself as an ethical player while avoiding unwanted legal scrutiny.

Apple says it took a 'responsible' approach to training its Apple Intelligence models | TechCrunch (2024)

FAQs

Apple says it took a 'responsible' approach to training its Apple Intelligence models | TechCrunch? ›

In the paper, Apple pushes back against accusations that it took an ethically questionable approach to training some of its models, reiterating that it didn't use private user data and drew on a combination of publicly available and licensed data for Apple Intelligence.

Where is Apple Intelligence available? ›

Apple Intelligence: Which iPhones? It's only available on two current iPhones, the iPhone 15 Pro and iPhone 15 Pro Max. It will be on all iPhone 16 models. It will also work on iPads with M1 or later chips, if you download the iPadOS 18.1 beta and with Macs with the same chips, with the beta of macOS 15.1.

How to request Apple Intelligence? ›

How to join the Apple Intelligence waitlist
  1. Open Settings.
  2. Choose Apple Intelligence & Siri.
  3. Select Join Waitlist.
6 days ago

Which iPhones will get Apple Intelligence? ›

But in addition to being an Apple Developer Program member, if you want get access to to the iOS 18.1 beta with Apple Intelligence, you'll also need to have an iPhone 15 Pro or iPhone 15 Pro Max.

What will Apple Intelligence work on? ›

Apple Intelligence is compatible with these devices.
  • iPhone 15 Pro Max A17 Pro.
  • iPhone 15 Pro A17 Pro.
  • iPad Pro M1 and later.
  • iPad Air M1 and later.
  • MacBook Air M1 and later.
  • MacBook Pro M1 and later.
  • iMac M1 and later.
  • Mac mini M1 and later.

Who leads AI at Apple? ›

Apple is behind on artificial intelligence. Now, the company is getting ready to unleash its first wave of user-facing AI products. And behind that push is John Giannandrea, a Silicon Valley veteran who is Apple's top executive in charge of AI strategy.

What is the Apple app for intellectuals? ›

Unfortunately, the average person doesn't have time to dedicate hours to research, which is why modern intellectuals are turning to an app called Blinkist. Even Apple recommends Blinkist for those who are lifelong learners, naming it one of the best apps in the world.

What can Apple Intelligence Beta do? ›

Apple Intelligence provides Siri with enhanced action capabilities, and developers can take advantage of predefined and pretrained App Intents across a range of domains to not only give Siri the ability to take actions in your app, but to make your app's actions more discoverable in places like Spotlight, the Shortcuts ...

Is Apple Intelligence worldwide? ›

Waitlist for early preview of Apple Intelligence

Apple advises that before downloading the update, “Both device language and Siri language must be set to U.S. English, and the device region must be set to the United States”. Apple has clarified that Apple Intelligence is not available in the EU or China.

How do I turn on Apple AI? ›

How to enable Apple Intelligence on your iPhone
  1. Open the Settings app.
  2. Go to the Apple Intelligence & Siri menu.
  3. Tap on the "Join the Apple Intelligence waitlist" option.
5 days ago

Does Apple Genius still exist? ›

Genius is a free service. To use Genius, you must turn on Genius (on each computer you want to use it on), be connected to the internet, and have an Apple ID.

Does Apple Intelligence use ChatGPT? ›

Apple Intelligence Brings AI to the iPhone With ChatGPT Integration and More. Apple's approach with AI is to understand personal context when delivering answers and carrying out tasks. Lisa Eadicicco is a senior editor for CNET covering mobile devices. She has been writing about technology for almost a decade.

Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 5313

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.