New streaming multiprocessors: The Ampere GPUs feature newly designed Streaming Multiprocessors (AMD equivalent of Compute Unit), promising as much as twice the performance gain compared to Turing. NVIDIA’s using a new metric here, Shader-TFLOPs with each Ampere SM set to offer 30 S-TFLOPs of performance.Traditionally, NVIDIA tweaks the SM every generation, changing the core count, SFUs and/or the arrangement of the shaders. With Turing we saw a major change to the SM, with the inclusion of separate Integer and Floating-Point cores for asynchronous execution as well as the addition of RT and Tensor cores. I don’t expect the same kind of overhaul this generation as well, although we should see a sizeable IPC gain per CUDA core (and SM), potentially due to a revamped wrap scheduler and dispatcher.
Nvidia's existing range of RTX 20 cards use an optimized 16nm Turing microarchitecture, but as Wccftech reports, analysts believe Nvidia is preparing the 7nm Ampere architecture for launch early next year.Xbox Project Scarlett: What We Know So Far. Gigabyte Launches a Water-Cooled External Graphics Card.
RT Cores: New dedicated RT Cores deliver 2x the throughput of the previous generation, plus concurrent ray tracing and shading and compute, with 58 RT-TFLOPS of processing power. Third-gen Tensor Cores: New dedicated Tensor Cores, with up to 2x the throughput of the previous generation, making it faster and more efficient to run AI-powered technologies, like NVIDIA DLSS, and 238 Tensor-TFLOPS of processing power.As expected, both the RT and Tensor cores are getting a major uplift, delivering twice as much performance than their Turing brethren. Hopefully, ray-tracing will become more mainstream with these improvements.
Google Is Going In VR. Google has fully embraced the virtual reality experience and it is dedicating a lot of resources to it. In fact, Google Cardboard was once considered to be a side project for the company before it became a hit. Some people say that Google Maps' street view, which launched in 2007, was an early example of virtual reality. In recent years, Google hired a lot of people specifically for virtual reality and they are researching all aspects of it.
NVIDIA RTX IO: Enables rapid GPU-based loading and game asset decompression, accelerating input/output performance by up to 100x compared with hard drives and traditional storage APIs. In conjunction with Microsoft’s new DirectStorage for WindowsAPI, RTX IO offloads dozens of CPU cores’ worth of work to the RTX GPU, improving frame rates and enabling near-instantaneous game loading.This is similar to how the Velocity Architecture on the Xbox Series X improves the loading times and I/O throughput. Basically, DirectStorage and thereby RTX IO allows simultaneous loading of assets and textures in the background without breaking the immersion with a loading screen.
Previous gen games had an asset streaming budget of ~50MB/s which even at smaller 64k block sizes (ie. one texture tile) amounts to only hundreds of IO requests per second. With multi-gigabyte a second capable NVMe drives, to take advantage of the full bandwidth, this quickly explodes to tens of thousands of IO requests a second. Taking the Series X’s 2.4GB/s capable drive and the same 64k block sizes as an example, that amounts to >35,000 IO requests per second to saturate it.
Existing APIs require the application to manage and handle each of these requests one at a time first by submitting the request, waiting for it to complete, and then handling its completion. The overhead of each request is not very large and wasn’t a choke point for older games running on slower hard drives, but multiplied tens of thousands of times per second, IO overhead can quickly become too expensive preventing games from being able to take advantage of the increased NVMe drive bandwidths.The DirectStorage API is architected in a way that takes all this into account and maximizes performance throughout the entire pipeline from NVMe drive all the way to the GPU.
It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion.MS
This is done by offloading the I/O instructions from the CPU to the GPU (in case of RTX IO), reducing the CPU overhead by as much as two-thirds. This becomes more and more important as 4K textures become more widespread and SSD speeds increase manifold. DS gives developers low-level access to the storage controller of SSDs, allowing for fine-grained optimizations.GDDR6X Memory @ 19Gbps: NVIDIA has worked with Micron to create an improved variant of GDDR6 graphics memory for the RTX 30 Series, GDDR6X. It provides data speeds of up to 760 GB/s for the RTX 3080 and nearly 1TB/s for the 3090, without the use of expensive HBM memory.
The VR Today. Currently Virtual Reality is growing in popularity and while companies like the Oculus Rift are losing some of their customers because of unpopular marketing practices, other devices, including the HTC Vive are taking the VR stage. Furthermore, with Google Cardboard creating the concept and other companies taking note, Smartphone Virtual Reality Goggles are letting consumers easily enjoy and experience immersive virtual and augmented reality. With huge consumer base, the multiple platforms for development, and the lack of many VR games and experiences, small start-ups as well as huge companies are investing huge amounts of money into the development of content for Virtual Reality, which might very well help VR finally achieve the world-wide recognition it didn’t manage to achieve on the market for years.
Samsung 8nm process: It appears that Twitter was right all along. NVIDIA’s consumer Ampere lineup will stick with Samsung’s 8nm LPP process. NVIDIA calls this a “custom” 8nm process but I don’t think the density and efficiency will be any different.
Meet the new GPU Heavyweights: RTX 3070, 3080 and 3090GeForce RTX 3070: Starting at $499, the RTX 3070 is expected to be faster than the RTX 2080 Ti at less than half the price, and on average is 60 percent faster than the original RTX 2070. It is equipped with 8GB of GDDR6 memory, hitting the sweet spot of performance for games running at 4K and 1440p resolutions. GeForce RTX 3080: Starting at $699, the RTX 3080 will be the next go-to graphics card for 4K enthusiasts and gamers. Featuring 10GB of GDDR6X memory running at 19Gbps, the RTX 3080 is supposed to be 80-90% faster than the RTX 2080 and around 30-40% faster than the RTX 2080 Ti.GeForce RTX 3090: At the very top, we have the RTX 3090, priced at $1,499. Owing to that price-tag (similar to the Titan), it’ll be more of a titular flagship, with most gamers flocking to the 3080. With as much as 24GB of GDDR6X memory, NVIDIA is aiming this card at workstations rather than gamers. It’s expected to be as much as 50% faster than the RTX Titan based on Turing.
The above shader/core counts include both the Floating-point and Integer cores. To get the actual figure, divide the figure by half: 5,248 for the 3090, 4,352 for the 3080 and 2,944 CUDA cores for the 3070.
Nintendo’s Virtual Boy 3D Gaming Console. Similar to SEGA, Nintendo also had the vision of putting out a Virtual Reality headset for the gaming market. They even went as far as putting a VR headset on the market, but unfortunately it didn’t make it far. Released in the mid 1990s and known as the Virtual Boy, the device was a 3D gaming console that had a 3D viewing system rigged out to look like virtual reality. While it was way cheaper than the other options on the market at the time, the device also didn’t manage to truly spark the VR movement, simply because it lacked head-tracking and quality graphics and only offered stereoscopic 3D display.
Special Features of Founders’ Edition GPUsDual-Axial, Flow-Through Thermal Solution: Up to 2x more cooling performance, with a stunning unibody design. Gamers and creators will be able to enjoy more performance while their GPUs simultaneously run cooler and quieter than ever.
Exquisite Mechanical and Electrical Design: A stronger mechanical structure with a new low-profile leaf spring along with a new 12-pin power connector. This allows more space for components and cooling, and is compatible with 8-pin connectors in existing power supplies, with an included adapter.HDMI 2.1: The increased bandwidth provided by HDMI 2.1 allows, for the first time, a single cable connection to 8K HDR TVs for ultra-high-resolution gaming.
AV1 Decode: First discrete GPUs with support for the new AV1 codec, enabling gamers to watch up to 8K HDR internet video using as much as 50 percent less bandwidth.