Skip to main content

Nouveau Summer Project : Pdaemon -> Host & Fermi Scripting Engine

I had known and was warned that once I start working on this, there will not be a definitive end. I guess after some time you need to put a stop, just so that you can move on to the next phase. This post will mark the end of X.Org EVOC program and begin my journey as a Nouveau contributor. 
The second phase of the program has been a rather complex one and filled with unexpected hurdles.  

Many changes had to be introduced to command submission algorithm, that we had thought was fit to be implemented. The new implementation after testing proved to work almost completely bug-free.


Probably the most glaring difference would be the omission of 'memcpy' and 'wrap_around' functions.
The 'memcpy' performed a simple a task of copying a fixed length of data from a given location to another specified destination. It took three arguments, namely source, destination and length. This simple function however had a very basic problem. It did not account for the wrapping around ring buffer and hence was not compatible. To fix this problem, a new totally new function has been introduced named as 'memcpy_ring'. The 'memcpy_ring' takes five arguments. Three of the arguments are the same as 'memcpy' namely, destination, source and length. However, two new arguments which have been introduced are ring_base and ring_size they account for wrapping around ring buffer. ring_base specifies the starting memory location on the ring buffer and ring_size specifies the total number of memory locations on the ring.

The 'Wrap_around' function is more or less the same and performs the same task as before. The main difference which has been introduced it that has now been implemented as a macro rather than a function.

The next step in the process was to design a brand new ISA. The ISA meant to serve a primary purpose of being able to successfully execute scripts on pdaemon. These scripts would be of a nature that would provide an easy way to achieve memory re-clocking.

As the first step, the existing HWSQ was studied carefully and an encode_decode implementation was done in C. This not only helped achieving a similar implementation for FSE [Fermi Scripting Engine] at a faster pace but also made understanding HWSQ
easier.
To design the ISA [FSE] the following functions were targeted :

  1. Delay
  2. MMIO Write
  3. MMIO Mask
  4. MMIO Wait
  5. Send_msg / Pdaemon -> Host
Even though I tried to design the ISA myself and spent a lot of time troubling myself to be able to do it, I was unable to. I believe I was not up to the challenge of doing so. Luckily enough mupuf had foreseen this and already had a basic layout. He took out sometime and we had a complete version of the ISA. The ISA reads as follows:

https://github.com/Supreetpal/evoc-scratch/blob/master/FSE.txt


The implementation was a three step process or basically producing three files:

  1. FSE.h
  2. FSE_encode_decode.c
  3. FSE.fuc
This final FµC implementation would then be merged in to pdaemon.The first two were similar work to that done in HWSQ. These two files should be referred to understand the working of FSE and can be found as follows. After the completion of FSE.h and FSE_encode_decode.c , the logical implementation was in place and working. This left FµC porting of the code as the next major step which would finally be implemented as a part of Pdaemon.

FSE.h - 

FSE_encode_decode.c - 
The FµC implementation seemed rather straightforward at first but after the initial commit and testing, I ran in to errors. It came forward that I was trying to access unaligned memory locations. Realizing that the existing load 'ld' command was not sufficient, a new set of functions were implemented. A group of three ld_XX functions were implemented in FµC. The ld_32, ld_16 and ld_08 for 32bit, 16bit and 8bit loads respectively:


The current implementation of FSE in FµC looks as follows. This implementation is still under testing and has not yet proved to be in a completely working condition. I should be able to gather enough time and submit a final patch to PDAEMON with a successful implementation while at XDC or as soon as I return.





Comments

Popular posts from this blog

Nouveau - Summer Project

Implementing a software scripting engine on Fermi to achieve safe memory re-clocking. Fermi stands for Nvidia GPUs based on Fermi architecture. NVidia cards have long had the possibility to reclock at least some of the engines of its GPUs. Up to the geforce 7 (included), reclocking used to happen at boot time and usually didn't involve memory reclocking at all. It changed with geforce 8 (nv50) where almost all laptops got the capability to reclock both the VRAM and the main engines. This was introduced in order to lower power consumption when the GPU was mostly idle. The default boot clocks were usually in some intermediate state between the slowest and the fastest clocks. The reclocking process for these cards is mostly understood and Nouveau is not far from being safely reclock on the fly, even while gaming. Geforce 200 (nva3) introduced load-based reclocking on all the cards. This started being a real problem because the default boot clocks are a third to a half of the

uCharts - Financial Charting API

A few months back, the first stable release of the charting API, that I have been working on was released. A part of the uTrade product portfolio, it has been aptly named uCharts. uCharts is a general purpose charting API with prime focus on financial markets and data. In this post, I will give a brief overview of the features, compatibility and scope of extensions. Features The API currently supports 6 types of charts: Line Area CandleStick OHLC Bar Pie It has been designed in a manner that all aspects of the charts are user defined. Starting from the color of the charts, width of the candle bars till the number of ticks on each axis. Mentioning each element seems like a futile exercise. However, brushing over a few notable features seems more fruitful. Aggregation Formula The number of data points that can be displayed on a screen or inside a DIV is limited by its resolution. The number of pixels available can lead to a severe limitation especially

Getting Started with Open Source Projects

Fellow Open-source Enthusiasts Going to conferences is great , getting inspired by attending them is even better but you know what is better? Getting to attend conferences [National/International] for free , getting cool stickers and t-shirts being delivered at your place , being one of the first few to get your hands on a developer device or developer release of an upcoming software/OS and all this for free! I will hopefully be pardoned by true open-source enthusiasts for making this look like an advertisement rather than a motivator but I somehow couldn't resist.  Now , ignoring the above perks because they are 'perks' , the real deal is getting to learn while you work. Being involved with any open-source project brings along a great deal of learning , experience and contribution to real-time projects which people around the globe use in their daily lives. Once you are into development and learning , conferences make a whole lot of more sense as you actually ge