Email address: Isamu.Shigemori@eng.sun.com * Use #sysdef to display all system defined params (like SHMNI) shmid = shmget(key, size, flags); shmid is the index into shmid_ds[SHMNI]; addr = shmat(shmid, ...); shared mem shmid_ds[] entry: ptr to anon_map; --> anon_hdr --> **array_chunk --> anon -->page offset or page_t |--> swapfs vnode -> The process who has this shared mem segment has the following links: proc struct --> as --> seg --> segvn_data --> anon_map --------------------------------------------------------------------- read(fd, buf, mem)--> ufs_read base=segmap_getmapflt(vnode,offset,....); uiomove(base, userbuffer,nbytes,...); /* Now page fault happens for kernel on the base address. so it reads the vnode.*/ segmap_release(...); ---------------------------------------------------------------------- struct as kas; /* Kernel address space */ kas --> --> seg --> (segmap_ops, smap_data,i.e. list of smap_structs); |--> page table >kas::walk seg| $< seg For each seg, say 256MB, there is smap_data for all pages of 256MB. Each smap_struct has (vnode,offset,page_t,... ); ---------------------------------------------------------------------- anon struct is 1 per anon page of memory. contains location of page, i.e. in mem or on swap (or both). This also maintains refcnt. ---------------------------------------------------------------------- page_t which helps to get to the page : 1 per page in an array setup at boot. contains info about how and who uses the page. ---------------------------------------------------------------------- anon_map maintains a refcnt. When it reaches 0, the anon pages freed. so are anon hdr, array chunks freed. ---------------------------------------------------------------------- page tables of process shared by all segments. The page_t entries are allocated from this and entered into seg structures. The page table is directly maintained by HAT. Note that you can travel from page table entry to page and seg entry to page_t(i.e. page table entry). ---------------------------------------------------------------------- With vfork, the parent waits until the child exits or execs. ---------------------------------------------------------------------- Multiprocessor System Arch. Ben Catanzaro, Sunsoft 1995 -- Good book. ---------------------------------------------------------------------- Level 1 cache-- level 2 cache-etc. sun4d uses sun xerox bus... The paper by Talluri and Khalidi uses a new page table algo for 64 bit ---------------------------------------------------------------------- 64 bit addressing: 0-12: page offset 13-16: index 17-63: virtual page number --->hash--> (vpn, tag)-> use index to look into page table pte (16 entries table since 4 bits index) --> get the page TLB --> TSB --> then map it. ---------------------------------------------------------------------- Level 1 cache is virt addr cache.(is write thru cache) level 2 is mem cache (is writeback cache). virt addr mapping has a context so mapping would be done to differentiate the running processes. CPU --> Level 1 --> Level2 |--> TLB Only if the TLB mapped phys addr is same as level 1 result, that is chosen. what is this ???? If TLB missed, level 1 is not used. To list kernel symbols, use: # nm -x /dev/ksyms |grep variable_name lotsfree is a variable in kernel which starts slowscan when number of free pages fall below threshold. Uses clock algo. Solaris 7 has optional priority paging which is disabled by default. Has new var cachefree. Page pages belonging to file data --not executable. ---------------------------------------------------------------------- cond variable and mutexes. mutex_enter(&mp); /* mutex lock */ cv_wait(&cv, &mp); /* release lock, when signaled, get lock,ret */ mutex_exit(&mp); ---- other thread --- mutex_enter(&mp); cv_signal(&cv); cv -> conditional variable. /* You need to hold the lock when you do cv_signal */ mutex_exit(&mp); ---------------------------------------------------------------------- cv_waitsig --> wait can be killed with kill -9, otherwise not. see man condvar ---------------------------------------------------------------------- Reader/Writer lock: Many could have read lock. Only one write lock could be active. see man rwlock; rwlock_t rwlock_init etc. when many threads waiting to acquire lock, writers get preference. ---------------------------------------------------------------------- counting semaphores enqueues the waiting threads on semaphore ---------------------------------------------------------------------- The dispatch priorities: dispadmin -c TS -g displays priority table: # Time Sharing Dispatcher Configuration RES=1000 # ts_quantum ts_tqexp ts_slpret ts_maxwait ts_lwait PRIORITY LEVEL 200 0 50 0 50 # 0 200 0 50 0 50 # 1 200 0 50 0 50 # 2 people get more time slices in lower priority. (they will get prempted within this time slice by higher priority threads, anyway). if (runnable without running upto max ts_maxwait seconds) new priority = ts_lwait; This is how a compute bound thread avoids starvation by acquiring higher priority after some time. Scheduling algo: If it gets preempted, remain in disp queue with the remaining time left with same priority. If the preemption is due to using up all the time, then the priority is reduced. Among all runnable threads: 1. Get highest priority RT thread on my dispq. If none found go next; 2. Get higher priority non-bound RT thread from any other dispq. 3. Get highest thread on my dispq 4. get highest prio non-bound thread on any other dispq. 5. If none found, run idle thread. RT threads stay at same priorities even after using up its timeslice. ----- priority inversion -- interesting thing where higher prio thread is blocked by lower prio thread. T1 (pri=0) T2(prio=100) T3(prio=10) T(prio=11) .... .. .... lock(prio from 0 to 100) .. .... lock ... unlock(eprio=100) --------- use more /usr/words/dict on one term. use ps -e addr,comm |grep more you got: 30001659540 more > 30001659540 $< proc2u Look at nfiles=127. Get 4th file entry then $< file > 30001580c00 $< file 0x30001580c08: flag count vnode 2001 1 30001c39350 0x30001580c18: offset cred audit_data 2000 3000065bc28 0 offset is lseek offset. count gets updated when dup and fork. 30001c39350 $< vnode 0x30001c39358: flag refcnt vfsmnt 0 2 0 0x30001c39368: op vfsp stream ufs_vnodeops 1043f4f0 0 0x30001c39380: pages type rdev 1062a840 1 0 0x30001c39398: data filocks shrlocks 30001c392c0 0 0 The refcnt is > 1 since dnlc references it. use vmstat -s to see the dnlc directory name lookup cache hit rate. data contains inode. 30001c392c0 $< inode The inode number is displayed as 14090. direct_blocks ...... 0x30001c392f8: direct_blocks 1c8b8 1c8c0 1c8c8 1c8d0 1c8d8 1c8e0 1c8e8 1c8f0 1c8f8 1c900 1c908 1c910 0x30001c39328: indirect_blocks 37c30 0 0 $ ls -li /usr/dict/words 14090 -r--r--r-- 1 root bin 206662 Jan 5 2000 /usr/dict/words confirm that the inode displayed is same. direct blocks listed in inode contents is really disk block number. Use dd if=/dev/dsk/c0t0d0s0 bs=1k iseek=dec_blkno count=8 dd if=/dev/dsk/c0t0d0s0 bs=1k iseek=116920 count=8 mail to: max@axiomax.com Max Bruning