View Full Version : Speed V/S timings it's a zen thing
Archer
04-13-2009, 11:30 AM
From real world benchmarks what can we really conclude?
http://img258.imageshack.us/img258/1913/charti.jpg
http://img20.imageshack.us/img20/3296/chart2.jpg
The timings that are calculated are relative to FSB 200 only and in actuality have not changed but the effect of these timings at the set FSB is an increase in strobes relative to base 200 fsb.
EnJoY
04-13-2009, 11:42 AM
If you have clock speed, you can sacrifice in regards to latencies. If you have only low speed, you need to tighten latencies in order to make up bandwidth wherever you can. It's a balancing act, but less so with DDR3 given it's huge clock speeds than it was with DDR2, or especially with DDR which was really all about latency.
Archer
04-13-2009, 12:03 PM
Thank you, I am hoping to get some data on DDR3 and do some direct comparisons. Toms did an ok review but it was lacking but I posted and it was ignored (I guess if they do all the testing then they might offend the manufactures that sent them the product for testing).
zanzabar
04-13-2009, 05:17 PM
i think that ddr3 is all about timings since the clocks are easy to max out a board on mhz
the balance is when u have a choice like with ddr2 when u can go 1100 or 1033
You know, it would depend on the CPU arch. being used... and then, the task at hand. The second important matter is, the difference in timings and speeds we're talking about comparing here.
For instance, DDR3-2000 9-9-9 from DDR3-1066 7-7-7 (speed) is a much greater difference than DDR3-1333 5-5-5 from DDR3-1333 7-7-7 (timings), even without tweaking one over another. With DDR3, the speed possibilities are large so primarily one would push for speed first and then tweak latencies the best you can. With DDR2, the general daily-runnable clock range is far lower and easily attainable by most so the tweakability is confined to latencies. Highlights the points made by the above guys.
I really can't settle on a generic answer for most parameters but some that I can settle on is that raw speed with some subtimings (i.e. MAL/tRD) are the most critical and where people should focus most.
In the case of Agena/Deneb K10 arch., RAM clock speed counts most as Sami had confirmed last year. The catch is that to see this, you have to keep internal subtimings fixed or the data becomes skewed. 1250 5-5-5 T2 is lower latency and higher bandwidth than 1000 4-4-4 T1 with this arch. if you keep major subtimings fixed. Likewise, 1066 5-5-5 2T is higher bandwidth and lower latency than 800 3-3-3 1T with this arch.
Archer
04-15-2009, 08:30 AM
Guys I understand this but why will I only get responses and no willingness to do some testing? I want to actually see if what is true in my situation is applicable in all/most/no other situations. I can only do so much testing on 3 modern systems (I don't play with money builds) and some ddr/sdr dinosaurs.
What I want to do are some zip, AV, AVI/DIVX, MP3, (sandra ARTH, MULTI, and all memory) tests. I was hoping for more input as what I am doing is looking for a direct correlation between timings/performance V/S speed/performance. Forget the 800 4-4-4 = 1200 #-#-# because if you don't use the bandwidth then you are still stuck with higher timings and that hurts, besides all I have ever seen from this calculation is an arguement and no real (not synthetic) numbers.
zanzabar
04-15-2009, 07:06 PM
there has been testing and it depends on what u are doing, everest and sandra like clocks super pi likes lower ns latency
Neuromancer
04-15-2009, 10:56 PM
I voted timings, because for what I do that is what matters (90% of time is spent on the desktop)
What matters more to me than either speed or timings is capacity.
Will not use less than 4GB anymore, since "Desktop" to me means multiple browsers, IMs multimedia apps, encoding videos etc etc.
And then when I decide to game, I do not close anything :p
Archer
04-17-2009, 11:17 AM
If you have clock speed, you can sacrifice in regards to latencies. If you have only low speed, you need to tighten latencies in order to make up bandwidth wherever you can. It's a balancing act, but less so with DDR3 given it's huge clock speeds than it was with DDR2, or especially with DDR which was really all about latency.
i think that ddr3 is all about timings since the clocks are easy to max out a board on mhz
the balance is when u have a choice like with ddr2 when u can go 1100 or 1033
You know, it would depend on the CPU arch. being used... and then, the task at hand. The second important matter is, the difference in timings and speeds we're talking about comparing here.
For instance, DDR3-2000 9-9-9 from DDR3-1066 7-7-7 (speed) is a much greater difference than DDR3-1333 5-5-5 from DDR3-1333 7-7-7 (timings), even without tweaking one over another. With DDR3, the speed possibilities are large so primarily one would push for speed first and then tweak latencies the best you can. With DDR2, the general daily-runnable clock range is far lower and easily attainable by most so the tweakability is confined to latencies. Highlights the points made by the above guys.
I really can't settle on a generic answer for most parameters but some that I can settle on is that raw speed with some subtimings (i.e. MAL/tRD) are the most critical and where people should focus most.
In the case of Agena/Deneb K10 arch., RAM clock speed counts most as Sami had confirmed last year. The catch is that to see this, you have to keep internal subtimings fixed or the data becomes skewed. 1250 5-5-5 T2 is lower latency and higher bandwidth than 1000 4-4-4 T1 with this arch. if you keep major subtimings fixed. Likewise, 1066 5-5-5 2T is higher bandwidth and lower latency than 800 3-3-3 1T with this arch.
there has been testing and it depends on what u are doing, everest and sandra like clocks super pi likes lower ns latency
Boy I really love these responses? If we want to make this forum noticeable then we need to put it on the map. Gee-wiz Walley I think I will post a link to another forum because I don't want to take the time to do anything and help grow this forum? Is that how we want to be "hey lets start off 2nd rate and stay that way".
Neuromancer
04-23-2009, 11:03 AM
Interesting what Chew found on an AM3 platform...
http://www.techreaction.net/forums/showpost.php?p=4459&postcount=1
TehGh0st
04-23-2009, 11:55 AM
IMHO Its a delicate balance that can be slightly skewed depending on what your doing.
G
Archer
04-23-2009, 01:02 PM
Interesting what Chew found on an AM3 platform...
http://www.techreaction.net/forums/showpost.php?p=4459&postcount=1
Saw it I made a post asking if more testing could be done (sorta)
IMHO Its a delicate balance that can be slightly skewed depending on what your doing.
G
Agreed my main thing is I want to see some real benefit to this amazing bandwidth other than synthetics.
Interesting what Chew found on an AM3 platform...
http://www.techreaction.net/forums/showpost.php?p=4459&postcount=1
DDR3-1486.8 CAS 5 = 3.36ns latency
2.9GHz NB, MRL value unknown
To compare to higher speed, efficiency wise, he'd need the same NB speed, MRL value, CR and the same latency of 3.36ns. Something like CAS 7 DDR3-2083 has the same latency. Then you can compare accurately.
Which will be faster in those benches, number crunches, A/V encoding, rendering and so on?
Archer
04-23-2009, 11:25 PM
Well I have installed my new memory and I can feel a slight boost in responsiveness:cool: I will commence testing later this week as I want to see some real numbers first hand:D
Tony is doing an AM3/DDR3 comparison, hopefully he'll have some real world benchmarks: http://www.ocztechnologyforum.com/forum/showthread.php?p=377706
He's confined himself only to DDR3-1600/DDR3-1333 though with quite loose timings for both. The RAM clocking overall is poor for DDR2 and DDR3 on AMD K10 systems so tight latencies will always be the key, simply because its the only possible way.
Archer
04-24-2009, 08:01 AM
Tony is doing an AM3/DDR3 comparison, hopefully he'll have some real world benchmarks: http://www.ocztechnologyforum.com/forum/showthread.php?p=377706
He's confined himself only to DDR3-1600/DDR3-1333 though with quite loose timings for both. The RAM clocking overall is poor for DDR2 and DDR3 on AMD K10 systems so tight latencies will always be the key, simply because its the only possible way.
Thanks for the info:up: I will love to see those results.
These guys had extensively reviewed DDR2 vs DDR3 differences across the board a while back. In case you missed it -
AM3 955/945 with DDR2/DDR3: http://ht4u.net/reviews/2009/amd_phenom2_955/index10.php
AM3 810/720 with DDR2/DDR3: http://ht4u.net/reviews/2009/amd_phenom2_am3/index16.php
Timings/speeds mentioned in the second are used in both reviews.
Archer
04-25-2009, 12:27 AM
These guys had extensively reviewed DDR2 vs DDR3 differences across the board a while back. In case you missed it -
AM3 955/945 with DDR2/DDR3: http://ht4u.net/reviews/2009/amd_phenom2_955/index10.php
AM3 810/720 with DDR2/DDR3: http://ht4u.net/reviews/2009/amd_phenom2_am3/index16.php
Timings/speeds mentioned in the second are used in both reviews.
Mine Dutch is kaput, but it does show some trends as I thought but in some tests the 1066 7-7-7-20 out paced the 1333 with the same timings? That is weird:eek:
Then you'll just love this Russian: http://www.ixbt.com/mainboard/phenom-2-ddr3.shtml :D
iXbt is known for the best IT review content I've seen over the years. The only problem being that most of it is in Russian and only some parts get ported over to the English site, iXbt Labs, after quite some period from the initial articles. That's their latest on the K10 45nm DDR2 vs DDR3 front. Granted that they've only explored frequency influence for now. The only thing you need to really look at is the setup and the results tables.
First table after the RAM timings, the left heading column:
Av. MEM Read, ns, 1 core
Av. MEM Write, ns, 1 core
Max. MEM Read, ns, 1 core
Max. MEM Write, ns, 1 core
Av. MEM Read, ns, 4 core
Av. MEM Write, ns, 4 core
Max. MEM Read, ns, 4 core
Max. MEM Write, ns, 4 core
Min. MEM pseudo-random latency, ns
Min. MEM random latency, ns
The two tables after that are very clear. They usually have video/image/render tests in there which are missing this time around, but I'm expecting them on the English sister sites' version soon.
Archer
04-25-2009, 06:42 PM
KTE any ideas about the anomalies with the speeds?
Really, they don't give enough info to ascertain anything so we're left in the dark. tRFC, tRC, tRTP and MRL can all change MEM performance drastically such that one speed setting which is usually expected to win will lose over another. There is also slight variation in testing so that small differences count as no difference. Many times you see these reviewers label 64.10 FPS and 64.18 FPS as some decent win for one hardware while that's just hogwash and counts as zilch difference between them to a buyer.
You have a K10 65nm CPU... mess around with the MRL setting whilst keeping the NB/CPU/RAM speed fixed and see for yourself. ;)
This is what I settled on last ->
http://www.hostthenpost.org/uploads/039cd68752416dd03c7e80dd09f6469a_thumb.png (http://www.hostthenpost.org/uploads/039cd68752416dd03c7e80dd09f6469a.png)
Archer
04-26-2009, 11:02 PM
I am in win 7 64 to begin my initial testing As of this time I can feel the snap I like it:D
NB speed and MEM sub-timings you generally run?
RMMA/RMMT are of the best tools I encourage you to test MEM with.
Archer
04-27-2009, 11:59 AM
Well all of us know the story change timings in bios and ???nothing??? Crap got to pull the reset jumper %$^&*$ ^&%$#*& ^&## $%^&!!!!!!!!!!!!!! So I decided on a mod and guess what it is a good IDEA everyone who does any ocing will need one so here is what I did.
Firs I analyzed the problem:
http://img7.imageshack.us/img7/7844/bios4.jpg
That jumper is a pain to get to as most are so I took a long look and boom it hit me on-off-on put a switch in as long as I could find a comparable connector.
I got a volunteer:cool:
http://img7.imageshack.us/img7/92/bios3r.jpg
After testing fit surgery commenced:D
http://img12.imageshack.us/img12/7615/bios1a.jpg
Be sure to label which wire is which.
http://img7.imageshack.us/img7/5664/bios2r.jpg
I will post more pics as I am using a toggle internally until I can find a rocker to mount on the outside of the case:pint:
Archer
04-27-2009, 12:01 PM
NB speed and MEM sub-timings you generally run?
RMMA/RMMT are of the best tools I encourage you to test MEM with.
Will do as soon as I finish my case mod I am sick of opening my box:banghead:
Archer
04-27-2009, 12:13 PM
http://img17.imageshack.us/img17/1200/timingsnow.jpg
The 1T with 4 sticks? not matched? has me a little perplexed. I guess I am lucky:drool:
This is my starting point I have had much luck so far but I got happy wrote nothing down before I lost my BIOS settings (:banghead:yes there is a backup option and yes I am stupid for not using it:banghead:).
After I get back from the shack and finish my bios reset switch mod I am going to start with dropping the timings to 4-4-4-12 3-20-4-2. There are more timings than this but this is a start. When I set 1T manually I cant boot but I get 1T on auto I am cool with that.
CR 1 is definitely possible, we were able to get it working by December '07 with Phenoms after many BIOS bugs. 4 DIMM CR1 was very unstable above DDR2-800 last year though and even that required 5-5-5-15 timings. Not sure of what advancements they made in this direction now but your timings are good with CR1. Cross-check in AOD/CPU-Tweaker as they're much better known for picking up the correct timings with K10 whereas Sandra is known for errors in this.
Now that I think back, I did a CR1 vs CR2 investigation back then with a 9500 and a 9600BE. Very lengthy one too. I wish I still had the data.
Just checking if I have any saved ss from back then...
Here's what a K10 system with a decent BIOS in late 2007 could run quite perfectly with 2.20v vDIMM:
http://www.hostthenpost.org/uploads/2c3386b184b7a4bb7ad20529f3ca7221_thumb.png (http://www.hostthenpost.org/uploads/2c3386b184b7a4bb7ad20529f3ca7221.png)
That was my daily settings for about 3 weeks (10 tRAS though), and I'm pretty sure my 9500 sample was the best around back then too (until it decided to go POW). I'm 100% sure I tested DDR2-667/DDR2-800 5-5-5-15, 4-4-4-12, 3-3-3-10 and DDR2-1000 5-5-5-15 and 4-4-4-12, CR1 and CR2, Ganged and Unganged back then. The investigating took me months and obviously was the only reason I had the hardware. The test limit for 3-3-3-9 CR1 was DDR2-848 in my case. We were discussing this back then when Oldguy said he found his system running faster at DDR2-667 and DDR2-533 than at DDR2-800 and DDR2-1066 in some of his hobbies (I think it was video encoding, not sure now). More guys posted their own little short comparisons which ended up with the results portraying some tasks favoring latency whereas others favored bandwidth (over the other).
I definitely share your pain of absolutely hating going into a fully built system. I always tend to leave it twice as bad as I entered-> :o
GeorgeStorm
04-28-2009, 04:26 AM
Well, I said freq
But I tihnk its a balance, but I tend to favour speed more than timings, (to try to increase bandwidth)
I would be happy to run some tests
What 'real world' applications do you have in mind?
I could run the same cpu + fsb speed, with varying mem freq and timings.....
I will also do Everest Mem benchmark at various settings to show the bandwith, as its easy to compare,
Just give me some details, and ill start working on it, (well, try to, got my AS's coming up, :P)
Archer
04-28-2009, 04:40 AM
Something similar to this:
http://img11.imageshack.us/img11/1681/reallty.jpg
Change your speeds and timings.
Choose your own stuff of various sizes and use the same software to do the testing at each speed/timing
GeorgeStorm
04-28-2009, 07:09 AM
Ok,
Ill run things like superpi 1m and 32m, everest bandwidth bench,
Ill also check to see if RAM really affects wprime, and pcmark.
Archer
04-28-2009, 07:51 AM
To really see you will want to zip, un-zip, encode audio and video and possibly a game frame rate test. The synthetic (except Super pi) only show potential but not actual effects of timings as much.
SNiiPE_DoGG
04-28-2009, 09:40 AM
My best memory testers are definitely Everest bench(very informative on all parts of the system really ;)) and Spi 32m. 1m is useless in testing ram :p
Archer
04-28-2009, 10:16 AM
My best memory testers are definitely Everest bench(very informative on all parts of the system really ;)) and Spi 32m. 1m is useless in testing ram :p
I will agree but there is more to be seen from a more extensive and inclusive set of benches I mean in all honesty those synthe show the potential theoretical limits doing a Vir scan, a huge zip, converting a cd to mp3 and then mp3 to wma, converting a movie from one format to another and maybe some time demo frame rate tests as well as 3d mark:cool:
Archer
04-28-2009, 10:41 AM
Test I ram stability at ~3 Ghz using HT only through AOD. I am not going for a record I am looking for DATA related to the subject of this thread. There will be no testing at the baseline settings as these are only for reference.
http://img56.imageshack.us/img56/153/base1.jpg
http://img123.imageshack.us/img123/1130/base3.jpg
^^Initial settings^^
HT 222 ram 444 = 888 DDR, 2000 x 2 Hyper Transport as this was max Phenom 1
http://img524.imageshack.us/img524/7076/htclock222mon.jpg
http://img524.imageshack.us/img524/6575/htclock222.jpg
http://img48.imageshack.us/img48/2787/htclock222mem.jpg
Testing for these settings will be logged in a sheet and posted at the end of testing for each memory speed/timings tested.
Archer
04-28-2009, 10:42 AM
Testing layout:
Encode CD to WMA Iron Maiden: A Real Live Dead One disk 1
Trans code WMA to MP3: A Real Live Dead One disk 1
Encode WMV: Home Video ~520 Mb
Transcode: WMV to DIVX: Home Video Same one as encode
Virus scan 73.4 GB
ZIP: trans coded files MP# as batch and DIVX as an independent large file then a mix of file sizes.
Synthetics to see if results are even close to the same:cool: I bet they are not:rofl:
Sandra CPU and memory tests and I might mess with Everest
Super Pi 1, 8 and 32 Meg (this should scale with the median setup)
3d mark
As I roll through this all tests will be done at 3 Ghz first test HT adjust in AOD you can see the rest of the specs in post 34, the rest will be BIOS setting the multi and varing memory speed and timings.
Archer
04-28-2009, 02:43 PM
:claps:let the games begin:
http://img141.imageshack.us/img141/2965/testingday1.jpg
Use the latest AOD 3.0 Archer, it's much better with better detection and tweaking options. In that AOD, you don't have the MRL option available (which is a limitation of the board), but the new AOD should force it up either way. Two different MRL settings are never comparable, and, it is freshly calculated on every boot (can be different on different boots for the same setting).
Your RAM is set to draw high amps there for such low speeds. I didn't push that much through even with DDR2-1040 4-4-4-4. Be careful. ;)
Archer
04-28-2009, 05:25 PM
Use the latest AOD 3.0 Archer, it's much better with better detection and tweaking options. In that AOD, you don't have the MRL option available (which is a limitation of the board), but the new AOD should force it up either way. Two different MRL settings are never comparable, and, it is freshly calculated on every boot (can be different on different boots for the same setting).
Your RAM is set to draw high amps there for such low speeds. I didn't push that much through even with DDR2-1040 4-4-4-4. Be careful. ;)
I will update the AOD and for this testing I will be running the memory no faster than current settings everything else will be slower with timings pushed a little tighter
I will probably take a minimum of ~3-5 days to do this but I want it to be as comprehensive as possible.
The timings are set manually I have not touched anything else ?could this be a default?
The timings are set manually I have not touched anything else ?could this be a default?
It's set by default by the board BIOS, yep. You might remember the hoo haa about Agena+vDIMM=dead CPU, well it was these amp settings that were causing the deaths on corrupt mainboards due to their overly aggressive current settings. The settings are entirely arbitrary and usually nothing to do with what the RAM is rated at to run at a speed (which you can check in your SPD tables using even Everest).
That also meant overheating RAM, higher power drawing RAM/CPU-NB and lower RAM/NB clocks with an absolute hatred for voltage on both since the current was being pumped much higher than should be required. I learnt this after killing two Phenoms back then.
Take your time. ;)
Archer
04-28-2009, 07:35 PM
It's set by default by the board BIOS, yep. You might remember the hoo haa about Agena+vDIMM=dead CPU, well it was these amp settings that were causing the deaths on corrupt mainboards due to their overly aggressive current settings. The settings are entirely arbitrary and usually nothing to do with what the RAM is rated at to run at a speed (which you can check in your SPD tables using even Everest).
That also meant overheating RAM, higher power drawing RAM/CPU-NB and lower RAM/NB clocks with an absolute hatred for voltage on both since the current was being pumped much higher than should be required. I learnt this after killing two Phenoms back then.
Take your time. ;)
Yeah I think I will look for some min settings thanks.
So that a reader will know, I'm referring to the various drive strengths of each memory channel. 2x is max, a combination of 1x and 1.25x should be all that's required to get sub-1070 speeds stable for the most part. Just like voltage, the lower, the better.
Archer
04-28-2009, 09:38 PM
So that a reader will know, I'm referring to the various drive strengths of each memory channel. 2x is max, a combination of 1x and 1.25x should be all that's required to get sub-1070 speeds stable for the most part. Just like voltage, the lower, the better.
I can only raise voltage in my bios the rating for this memory is 1.8~1.9 as to the effect on the CPU hell I am running everything in spec so????? All of it is under warranty:D The way I figure it if I am in Spec then at least I may get an RMA.
Neuromancer
04-29-2009, 07:22 AM
So that a reader will know, I'm referring to the various drive strengths of each memory channel. 2x is max, a combination of 1x and 1.25x should be all that's required to get sub-1070 speeds stable for the most part. Just like voltage, the lower, the better.
Well standard electrical theory is more voltage is better/safer than less voltage.
More voltage increases the likelihood of transmigration. But if TDP stays the same than less voltage = more amperage needed to even out the equation. More amperage is NEVER a good thing with electronic components.
Might be good for moving trains... (requisite obscure 80s movie reference)
Archer
04-29-2009, 08:25 AM
Can we not get into the whole DC/AC Circuit thing it makes me sleepy:D
Neuro KTE I really don't think you can force draw. A component only draws what it needs lest all of our home outlets (ac) would be amp matched and you could never have a DC circuit 30A available for a motor that draws 15 amps. But neuro that is true that as V increases A draw decreases but I don't know if that applies to micro electronics as there is more than one component affecting the amperage. I wish I still had my DC/AC Circuit analysis book as I could refresh my mind and speak as a knowledgeable person but that was a long time ago and I have forgotten most everything. That being said, it may be that there is a set amperage and the V+ does not cause an A- hence more heat.
Neuromancer
04-29-2009, 08:33 AM
Can we not get into the whole DC/AC Circuit thing it makes me sleepy:D
Neuro KTE I really don't think you can force draw. A component only draws what it needs lest all of our home outlets (ac) would be amp matched and you could never have a DC circuit 30A available for a motor that draws 15 amps. But neuro that is true that as V + A draw - But I dont know if that applies to micro electronics. I wish I still had my DC/AC Circuit analasys book as I could refresh my mind and speak as a knoledgeable person but that was a long time ago and I have forgotten most everything.
Yah definitely OT for this thread. I will look into it when I have the time :)
But, I was not referring to forcing wattage, it is what it is. Amperage will change based on vcore, however, I expect that it must be different for CPUs, other wise 125watt cpus running 1.25v would need 100 amps!
Archer
04-29-2009, 08:41 AM
http://img407.imageshack.us/img407/7076/htclock222mon.jpg
http://img407.imageshack.us/img407/2787/htclock222mem.jpg
http://img407.imageshack.us/img407/6575/htclock222.jpg
Archer: your MRL is 52 clks there, which is quite loose (and its changeable now). Generally you need 52 clks for 2.35G NB / 1066 RAM with Agena. You can't run it any lower?
I assume the 667 (etc) rows are set to the same timings as the ones you listed above them?
Neuromancer: not really sure to what you're replying in our context. Lower operating voltage for PC system devices is better, this is basic electrics and the assumption is at a fixed amperage which for the NB in our context is set to a predefined limit at factory. But this'll just lead to completely OT topics.
Archer
04-29-2009, 09:31 PM
Archer: your MRL is 52 clks there, which is quite loose (and its changeable now). Generally you need 52 clks for 2.35G NB / 1066 RAM with Agena. You can't run it any lower?
I assume the 667 (etc) rows are set to the same timings as the ones you listed above them?
Neuromancer: not really sure to what you're replying in our context. Lower operating voltage for PC system devices is better, this is basic electrics and the assumption is at a fixed amperage which for the NB in our context is set to a predefined limit at factory. But this'll just lead to completely OT topics.
I will finish this round of testing and add another in the 888 in the timings section make that change I think -15% from the 52 about 45 for the MRL to see the difference and then continue with the rest of the testing as listed.
EDIT: BTW beyond giving a wealth of info what else can everest do for me as far as testing?
EDIT: found it Ultimate edition not corp ed.
I did some testing regarding timing/speed with 3Dmark Vantage cpu tests.
Vantage CPU tests, Vista 64bits SP1.
System: Smackover, i7 920, 3x1Gb Crucial Value.
i7 @ 4GHz (191 x 21)
Run each test 3 times, those are the best of 3 results:
1530 9-9-9-28 -> 24252
http://img524.imageshack.us/img524/1222/vant1530.th.jpg (http://img524.imageshack.us/my.php?image=vant1530.jpg)
1530 7-7-7-24 -> 24491
http://img513.imageshack.us/img513/1228/vant1530c7.th.jpg (http://img513.imageshack.us/my.php?image=vant1530c7.jpg)
1910 9-9-9-28 -> 24646
http://img164.imageshack.us/img164/2185/vant1910.th.jpg (http://img164.imageshack.us/my.php?image=vant1910.jpg)
1910 8-8-8-24 -> 24723
http://img164.imageshack.us/img164/3343/vant1910c8.th.jpg (http://img164.imageshack.us/my.php?image=vant1910c8.jpg)
Archer
04-30-2009, 08:33 AM
I did some testing regarding timing/speed with 3Dmark Vantage cpu tests.
Vantage CPU tests, Vista 64bits SP1.
System: Smackover, i7 920, 3x1Gb Crucial Value.
i7 @ 4GHz (191 x 21)
Run each test 3 times, those are the best of 3 results:
1530 9-9-9-28 -> 24252
http://img524.imageshack.us/img524/1222/vant1530.th.jpg (http://img524.imageshack.us/my.php?image=vant1530.jpg)
1530 7-7-7-24 -> 24491
http://img513.imageshack.us/img513/1228/vant1530c7.th.jpg (http://img513.imageshack.us/my.php?image=vant1530c7.jpg)
1910 9-9-9-28 -> 24646
http://img164.imageshack.us/img164/2185/vant1910.th.jpg (http://img164.imageshack.us/my.php?image=vant1910.jpg)
1910 8-8-8-24 -> 24723
http://img164.imageshack.us/img164/3343/vant1910c8.th.jpg (http://img164.imageshack.us/my.php?image=vant1910c8.jpg)
Yeah the timings don't make much difference in 3Dmark mainly what I look for when testing with 3dmark is a major drop in score as I don't think timings can cause much of an increase. Thanks for the post.
I will finish this round of testing and add another in the 888 in the timings section make that change I think -15% from the 52 about 45 for the MRL to see the difference and then continue with the rest of the testing as listed.
Good one. Hoping you can run that setting (45 clks is quite tight).
EDIT: BTW beyond giving a wealth of info what else can everest do for me as far as testing?
EDIT: found it Ultimate edition not corp ed.In the debug info, you can get the cache fill rate at various line sizes. You can also get the latency for most x86 instructions for the CPU you're running there (measured) in clocks/ns. 1500 altogether. That and the fact that the memory tests in the debug info are the best and most extensive.
Archer
05-01-2009, 07:27 AM
I am hoping I can get a lot of testing done (all of it) this weekend but alas my family must come first. The synthetic are easy but the other testing is time consuming as yo must pay attention to time. I will finish this? I have changed one of the tests to a multi task test mixed type to test aspects of memory which may not be noticed on a single app test. I know that it has a lot to do with the memory controller but it will also check the truest effects of timings/speed/efficiency IMHO perhaps trans code and zip or whatever I think of.
http://img207.imageshack.us/img207/4420/results.jpg
Any other details about testing please ask I use Avast AV software as well. This is the end of round one.
Archer
05-01-2009, 09:48 PM
MRL jumped to 50 from 52 when I tried 45 a noisy freeze occured but since it lowered I will skip that test for now and go to 800 and work my way down and hopefully revisit this 888 MRL adjust later.
Archer
05-02-2009, 01:59 AM
http://img21.imageshack.us/img21/3839/timingschart.jpghttp://img8.imageshack.us/img8/3839/timingschart.jpg
I will finish the testing this weekend but I am vary disappointed my BIOS and\or chipset seems to have a limit of 3 latency which means I cant seem to collect the data that I wanted. Though finishing out this testing as it is will be pointless however I can run at normal 800 latencys for the hell of it but as it stands right now this testing seems to be inconclusive other than the fact that synths dont mean a whole lot in the big scheme of things as most of this data does not hold true to the real world. What I do see is that you need the tighest timings you can get with the fastest speed you can afford. Will post again with full data and form a conclusion IT's a ZEN thang you need both as most of us agree Yin & Yang.
I had originally planned on using win 7 64 but for some reason ended up with Vista 32? Well I am going to have some fun I am going to run all tests in & 64 also as this will give a showing if extra memory and 64 bit (if any) while allowing a comparison of the memory interface efficiency of the tweaked 64 bit OS.
I will also be adding some different speeds to the memory test with full specs as well:)
Interesting.
OCZ's Tony posted these DDR2vDDR3 benches recently: http://www.xtremesystems.org/Forums/showthread.php?t=223532
For whatever they're worth...
Some bits are long confirmed right. DDR2-4-4-4 1000 will beat lower clocked normal DDR2. We've done and known this since before he played with a Phenom. Many of us were running benches at 1000-1040 4-4-4-10 back then. However, most general sticks cannot run such settings, even though my last years sets ran much better than his new DDR2 sticks are doing. He failed to realize the difference MRL makes or post it so his tests don't mean much to me, especially when he was using Ganged for DDR2 but Unganged for DDR3. My reservation for any online test of anything is just this - a) I need every detail since the slightest can change the outcome, and b) the tester needs reliability and trust built up beforehand or its invalid.
Archer
05-04-2009, 09:25 AM
Interesting.
OCZ's Tony posted these DDR2vDDR3 benches recently: http://www.xtremesystems.org/Forums/showthread.php?t=223532
For whatever they're worth...
Some bits are long confirmed right. DDR2-4-4-4 1000 will beat lower clocked normal DDR2. We've done and known this since before he played with a Phenom. Many of us were running benches at 1000-1040 4-4-4-10 back then. However, most general sticks cannot run such settings, even though my last years sets ran much better than his new DDR2 sticks are doing. He failed to realize the difference MRL makes or post it so his tests don't mean much to me, especially when he was using Ganged for DDR2 but Unganged for DDR3. My reservation for any online test of anything is just this - a) I need every detail since the slightest can change the outcome, and b) the tester needs reliability and trust built up beforehand or its invalid.
Yeah I have seen many confusing sets of numbers posted an sometimes the results seem askew but what else is new. There are actually two motives involved starting this 1. I was going to do this testing (as I always do on new components) anyway. 2. I wanted to get something similar to this on the board before I had to ban myself for popping at someone for telling a noob to loosen timings to do an FSB clock without mentioning the other option: drop memory speed and tighten timings then open up the FSB/HT and ramp it up from there. Both options are legit but #2 usually get forgotten about.
EDIT: I will post full system profiles soon, KTE, so you can have a full look at all of my memory timings.
Archer
05-05-2009, 10:30 PM
I have a question as my head is hurting. Have there been any studies as to the effect of L3 cache and the timings of memory? KTE? I have collected some anomalous data, small files not seeming to scale with tightened timings. I will finish this study but as I look at it now it seems that the best timings for this core are the 4 latency as I see no improvement at 3 latencies. And if this is the case then it is possible (if someone had the hardware) to actually tailor system's for maximum efficiency and put it on a chart.
That being said I have noticed a marked improvement using a HT clock increase as evidently the timings are still an effect of the FSB (I am just using that term as I know it does not apply to the IMC but the effect is still basically the same for memory correct me if I am wrong) and that being the case although the CPU is only capabe of X amount of work per cycle when you increase the FSB you are in effect allowing for more strobes per unit. ie. at 200 cas 5 200000000/5 400000 but at 220 cas 5 220000000/5 440000 hence more work less lost cpu cycles.
If this is wrong let me know and please give me the correct answer (with backup please)
What I am looking at is why titer timings have no measureable effect the only thing I can think of is that I am CPU limited and I can only put that off on the L3. If that is the case then the L3 helps more than I thought because this is a good thing not a bad thing.
As you might be aware, since October now I've been completely away and castrated myself from computing related involvements. I don't read on 1/100th of what I did when into them, don't comment or talk in nearly all the deep tech as I did before, don't spend hours daily in professional research and such studies nor go anywhere near into them. I stay with only completely basic concepts and limit myself to basic replies that don't require much explaining/time/knowledge. Simple basics I know I can help and comment on. Mainly I do it to push someone in the right direction when I see others being very mislead but that's about it. That's because I don't like ignorance, what it leads to nor people being mislead, whether out of ego driven arrogance, blind following, laziness or carelessness. I can't start or say something requiring lengthy time/energy, due to what it requires in commitments or leads to subsequently, simply because I don't have the required time to spend on extras when all my free time and effort is going into my aims and focus in life where computer related things are at the bottom of the pile since October '08. I sit down at a laptop or PC now only when working on projects and when very tired. Even though I would love to answer when I can, it's beyond me currently to satisfy the typical online qs demands and the follow on discussions which typically occur with intelligent:ignorant/stupid/fanatic users all in the online mix with a 1:1000 ratio. I might not be able to get back to depth and read/answer for 3-7 weeks which ruins everything. My policy has been since day one that I don't answer on any such qs/topic without knowing fully of being factual, an estimation/prediction, or a discussion. Misleading someone is too major in my book and too big of a responsibility, even though most of the internet rides and strives on such nonsense especially in this geek world. I hope that gives you enough of a preface to understanding any of my posts since Oct period and onwards. A lot of the qs require essays full of technicality with a plethora of references to answer properly, if I be totally honest.
To your qs, I'll try and give pointers which will help either directly or indirectly.
As a primer recap, there are two basic ways to increase work per cycle done by the CPU on any application it is working on. One is to decrease latencies and the second is to increase throughput. However, not all the data is situated within the L1 cache of the CPU Core to work on directly but you have external components involved at times. Whenever you do have such external component access, they usually become the bottleneck to more performance. That means improving the performance of the external components can also improve the total end CPU time required for the task completion - usually by a good amount. The catch here is that the time required by the CPU to finish a task will only be influenced by external component performance IF the actual code the CPU requires to work on is situated on or communicates with the external components i.e. RAM. This is why you can easily have super-linear speedups with a parallelized code broken into sub-problems when you break it down so it fills upper cache levels more, and when you remove any disk access that it may require, to now reside completely in RAM. The latency reduction from main disk to main memory is many orders of magnitude and hence the CPU completes the same work far faster, even with more sub-problems and thus heavier code to solve.
When it comes to improving the work per cycle by the CPU with an item being fetched from main memory, you can increase the performance per second by the CPU by simply getting the data to the CPU quicker. Again, there are two ways to do this - lower latencies for the CPU in accessing that data, and/or, higher throughput of transferring that data to the CPU from main memory.
On a typical memory demand, the CPU will search within itself for the required datum all the way to the last L3 cache on K10 CPUs and when the datum are not found there, a cache miss, the CPU requests a memory lookup. The performance of the total lookup within the CPU is limited by the CPU speed, cache speeds, bus line widths/latencies, algorithms, component distance and so on. A hit in memory will require the datum within RAM to be fetched to the CPU for required work. That itself is limited by the NB/RAM bus bandwidths in the way, latencies, and so on. This is where NB/RAM read bandwidth and latency is critical because until the data is within the CPU, it's work per cycle is completely stalled. On this specific instruction execution cycle, all the pipeline will be idling waiting for the datum arrival from main memory. On any given main memory access/fetch, the final delay is given by how long the delay was for the CPU to wait for such datum. The delay we call latency is given in nanoseconds, or, in how many CPU clocks were wasted whilst waiting for such data from main memory.
CPU clock speed does affect this memory latency slightly. I'm talking in picoseconds. That's because the MMU is situated within the Core of the CPU and not within the NB component, so it runs at Core clock speed. Higher Core clock speed will lower it's latency slightly. You can see that effect here (http://techreport.com/articles.x/15967/3). However, all the rest of the components which affect a memory request/fetch by the CPU are latencies and throughputs of the various components in the NB and the RAM itself. I'll give you K10's actual formula for this delay as Michael of Lost Circuits had mentioned it in one of our recent discussions but the site is down since a day so it'll have to wait. Mainly, you have MMU/CAS/CR/tRFC/MRL latencies combined and then NB latencies and throughputs. The biggest latency is the delay it takes for an item in main memory to reach the CPU Core.
In the way, the NB bus bandwidth link with the MCTs, DCTs and RAM will pose a limit as it can only transfer a specific throughput of data but it will hardly ever be saturated enough to - the NB latency will pose an even bigger limit since it will always apply in every access scenario. The RAM will have a final latency before it can release data on a clock tick after a column is found to contain the required data by the sense logic and hence a throughput limit which will pose the second biggest limit. In most cases the throughput again will not be the saturated so will not limit the CPU performance per cycle but the latency of this task will in every case limit it's performance per cycle.
Now when you tune main memory latency to lower overall for a read access, and/or the main memory throughput to higher for a read access, it will never affect the end performance of the CPU (per cycle) unless working set data is required from main memory for the given work, that is not already cached on the CPU. Modern CPUs all cache generously with locality in mind and code is tuned so that it makes use of data locality, thereby reducing access to memory after the initial instance. Does the work you are talking about fit into the caches or is the data pool too big and random to fit in caches and hence the CPU will always have to access RAM randomly for the data?
Whenever you have main memory access required by a given CPU task, it will always show to be sensitive to main memory latencies. As described above, it's logical with how the system functions.
HT Reference clock itself is controlled by the Southbridge component based off multipliers. What we call PLLs. As you know, all of AMDs hardware clocks run off a base clock, the HT reference clock. The PLLs to derive the various clocks are situated in the AMD CPU though. Each individual component has one (if not more i.e. 4x Core Clock PLLs). HT reference itself doesn't affect any performance directly but is just the base clock. The only way it can affect any other components performance per cycle is by manipulating the final clock speed that component runs on. A bus with a 1 IPC throughput, with 1E09 instructions to move on a 1GHz bus clock speed with no other latencies will move at 1 ins/cycle, in other words, it will require 1E09 clock cycles, 1 full second, to complete the requests regardless of how it derive that final clock speed, 100MHzx10 or 10MHzx100, or whatever. If I move the clock speed of the bus up, it will require less clock cycles to complete this task, which means less CPU time and hence, less time.
If you test with RMMA and RMMT - one of the main reasons I mentioned it earlier - you will be able to see which latencies and throughputs are changing at each setting and thereby able to factually discern why a typical workload is speeding up and even explain which type of workload is not sensitive and hence, not influenced by such hardware speedups.
There must be professional studies done on what you say, whether publicly available or not they are always done, but to research them is going to require lengthy time and energy. Maybe I'll have such on this weekend but maybe not for 2-3 weeks. My work and research responsibilities are too lengthy and random to predict for months now, and then I have family too which comes first by far for me. :)
Archer
05-07-2009, 07:31 AM
Looks like I will be at the AMD website this weekend and through the rest of the week looking at white papers for a generalizations so that I can possibly formulate a good search. I am thinking that if I can get details I can work up a VM (pencil and paper) on any cpu that will give a good generalization of what can happen with a base design at a given speed, memory, cache and the like. I know it really is not worth the effort and after I look at it long and hard I may decide that it is not worth the time.
Thanks again,
Archer
Np.
There are quite a few studies done on Phenom/Nehalem IMC/RAM dependencies although there are very few of Deneb based silicon. Additionally, they usually put out a lot of detail on the fastest products (Nehalem) whilst very little on products that aren't the fastest (Phenom).
For instance, did you see this comparing DDR2-533 vs DDR2-1066? http://ixbtlabs.com/articles3/cpu/amd-phenom-x4-9850-ddr2-533-p1.html
That will show which applications are mem bandwidth sensitive for K10 based silicon, less on latency because the latency there stays quite similar. Since you have Agena based silicon yourself, that will apply directly to your own CPU too.
Then you have very good Nehalem studies on such a topic, for instance:
http://www.bit-tech.net/hardware/cpus/2009/01/26/intel-core-i7-memory-performance-review/1
http://techreport.com/articles.x/15967
You'll see there that IMC speed and DDR3 speed along with it are the best performance gainers for such a tech. And they do improve performance in real world applications, whereas changing the timings alone at any fixed speed - without changing IMC speed - doesn't give much of a benefit, unless the drop is huge. If you check DDR3-1066, then going from 9-9-9 to 5-5-5 gives it a decent performance boost in many applications. As much as you can expect. DDR3-1333 was found to do slightly better, especially going from 9-9-9 to 5-5-5.
Personally, I will be going DDR3 with a dual-core AMD, simply upgrading form the K10 I have, but not till we get economic RAM that will do DDR3-1400 5-5-5 stable and an AMD CPU that will do 3.0GHz NB stable with max 1.4v. That's probably a year away currently.
If you checkout reviews where cache comparisons of CPUs are done, like here (http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3559&p=2), then that will tell you which workload is likely to be mem sensitive and which not.
Overall that helps build a picture of what limits and affects performance in certain applications at certain setups.
Also, Tony updated his test results here (http://www.xtremesystems.org/Forums/showthread.php?t=223532) so you can see DDR2 with speed is actually better for the K10 CPU. The problem with K10 CPUs is that IMC speed and hence performance is too low for DDR speed to gain much benefit. You'll hit around DDR3-1600 6-6-6 and stop noticing any performance gains (real apps) because the internal data routes in the NB act as the bottleneck.
It's the same overall functioning with Nehalem but Intel did a much better implementation of an IMC and hence they won on many facets with it. They knew the IMC acts as the bottleneck, so they only allowed DDR3-1066 with minimum 2.13GHz IMC, DDR3-1333 with minimum 2.67GHz IMC and DDR3-1600 with a minimum 3.2GHz IMC. Hard-coded limits. Obviously, with an IMC of 3.2GHz, even K10 cache/mem performance would be very good, but on air, physically, it's quite impossible to date whereas Intel has no problem hitting such clocks.
Archer
05-07-2009, 09:18 PM
Thanks for the links, I found the results with the i7 quite interesting. The Phenom was not what I expected, scaling seems different from what I have experienced at times on my old memory but does seem to jive with my new setup. I will continue my testing as soon as I solve a couple of problens with this system as I hate an incomplete chart:banghead:
I'm waiting... :D
I'll be honest, I'm checking these days due to some free time now and then but I may not get back to checking here for weeks if not months later on, hence the heads up. :up:
Archer
05-12-2009, 08:44 AM
hopefully this week or over the weekend at the latest. As I said I will test at 5-5-5 and possibly 6 timings also as I cant get to 2 timings with this chipset mem combination. I would have done some last night but I had a meeting, then went to make deposits only to find that the bank was out of envelopes.
long story short finally made deposit, restaurant closed no dinner, home and to bad of a mood to test (I got no shrimp egg rolls:mad:).
Archer
05-13-2009, 11:48 PM
At this point I think I will finish filling in the chart as I see no need to drag this out further than necessary. in a nut shell timings over speed within reason of coarse. when I finish the chart I will break down the timings issue with the 888. As to wondering about the timing v/s clock speed look at it this way: The timings and the clock speed have an indirect correlation to each other but a direct correlation to total system performance. Let us say you are running a 667 Mhz colum strobe of 5 is not bad however so the CPU sends a fetch every 5 cycles of the FSB so 200/5=40x1Mhz=4000000 now we move to cas 4=5000000 now realize that this is independent of the memory and is controlled by the cpu through bios (limited by memory capability) however and that you also have row and column to row. Now if we take the cpus ability to fetch data that is not in cache and write that data back that is where latency comes into play as long as your throughput is >= mem speed then you will always be faster moreover if mem speed and or timings are too slow total system performance is hurt.
as to the 888 the strobe is actually 5550000 per second which is ~3.6 latency
5500000 - 5000000 = 500000 more address strobes per second which means the cpu can get more work done with files it has to swap (I/O) from cache to memory.
If there is a problem with this conclusion let me know as I am only human and am going by memories of a dated brain. The rest of the testing will be done over the next couple of days.
Archer
05-14-2009, 09:51 AM
http://img11.imageshack.us/img11/1681/reallty.jpg
http://img258.imageshack.us/img258/1913/charti.jpg
I do notice a pattern favoring 4 timings with my cpu? I have more data on the chart but I am not in a position to post it at this time but it seems that no gain is made from 3 timings evidently duu to the interface.
Edit: I will be editing the chart as I want to test 667 3 timings HT ref 222 if possible as I want to see the effect of the increase in cycles available to increase strobe.
Analyzing your data will take me some time... so I'm still reading before I post :)
Thanks for the work. :up:
Archer
05-14-2009, 11:38 PM
http://img258.imageshack.us/img258/1913/charti.jpg
http://img20.imageshack.us/img20/3296/chart2.jpg
New settings and chart additions will check the time consuming tests as I get time. The timings that are calculated are relative to FSB 200 only and in actuality have not changed but the effect of these timings at the set FSB is an increase in strobes relative to base 200 fsb.
Archer
05-15-2009, 11:02 AM
I am starting to draw conclusions which still hold true to the tests I have been conducting for years.
Timing's are much more relevant in the real world than in synths. I will hopefully finish this chart and testing soon and see what anomalous data occurs (I hate outliers) and look for the reason.
As to the sub timings. Sub timings are critical but first you must isolate where you want your base timings set, otherwise, to do complete tests with all timings would take months of T&E and it is not pointless to do this but due to time constraints I have left them in auto (The sub timings can cause an occasional outlier).
Can you list your NB speeds for each setting?
I can already guess where the NB was faster. I'll wait till results are complete.
Sub-timings aren't really critical but MRL is more critical than anything including CAS.
Archer
05-15-2009, 05:27 PM
Can you list your NB speeds for each setting?
I can already guess where the NB was faster. I'll wait till results are complete.
Sub-timings aren't really critical but MRL is more critical than anything including CAS.
Yeah the NB seems to be locked at 9 X HT ref
66
67
And 1998 @ 222
Here's two good comparison's of DDR2/DDR3, speed and timings:
http://www.madshrimps.be/?action=getarticle&number=6&artpage=4057&articID=926
http://www.xtremesystems.org/Forums/showthread.php?t=224502
Archer
05-16-2009, 03:15 PM
I appreciate the links, they were interesting reading and my skew was evidently due to the NB speed as I suspected. I could adjust the mem speed and timings, and run nothing anywhere close to spec but I believe that would make what I am doing hard to follow. All I am trying to do is give a good idea of why timings are important to performance. Most of us don't go for records but we do want maximum performance and although top CPU speed may be great if you sacrifice anywhere else to get there, you shoot yourself in the foot.
EDIT: I did not qualify my POV as to the skew. The 667 3 timings are clearly superior to the 800 5 200ref and some of the 800 4 200ref and following this and my explanation of the strobes the NB skew is not moot but less relevant than the timings.
BTW: ZEN-The path of enlightenment. As such, it de-emphasizes both theoretical (http://en.wikipedia.org/wiki/Theory) knowledge (http://en.wikipedia.org/wiki/Knowledge) and the study of religious texts (http://en.wikipedia.org/wiki/Religious_text) in favor of direct, experiential realization through meditation (http://en.wikipedia.org/wiki/Meditation) and dharma (http://en.wikipedia.org/wiki/Dharma) practice.
As it applies to Computers: Nothing is what it seems as long as there is a tweak no matter what the manufacturer or others say. The true Zenith (Zenith is also used for the highest point reached by a celestial body during its apparent orbit around a given point of observation.) can only be reached with devotion, dedication and the willingness to kill your system.
The article I was waiting on going live was just published. Something you'll be interested in: http://ixbtlabs.com/articles3/mainboard/phenom-2-ddr3-p1.html
I've been through all your data now, but have to re-read the thread yet.
Archer
05-18-2009, 03:27 AM
I will attempt to isolate the skew but it will take a couple of days (800 6-6-6 240 = 800 5-5-5 200) but I am having trouble getting stability with 6 timings at 240 ref clock.
New Chart:
http://img258.imageshack.us/img258/1913/charti.jpg
http://img20.imageshack.us/img20/3296/chart2.jpg
The timings that are calculated are relative to FSB 200 only and in actuality have not changed but the effect of these timings at the set FSB is an increase in strobes relative to base 200 fsb.
Great work and results there. I'll get back to you in a bit of detail. Meanwhile, I think it'll be good to have a read and gain a thorough understanding of these two:
http://www.thetechrepository.com/showthread.php?t=160
http://www.thetechrepository.com/showthread.php?t=205
As per OCZ Tony's two app tests, he basically is showing C5 DDR3-1333 to be equal to C6 DDR3-1600 performance: http://www.xtremesystems.org/forums/showthread.php?t=224502
Although his tests are too few to make conclusions, at least it'll work for the guys who like to run Super Pi. :)
Can you give more details as to the applications you are using for ZIP/Encode and if they are multi-threaded?
Archer
05-20-2009, 01:32 AM
http://www.zipgenius.com/ for zip as far as weather on not it uses more than one thread I really cant answer that as I cant find the answer, moreover a dual core is recommended.
http://www.dbpoweramp.com/ for encode. I am running unregistered and according to documentation unregistered is not multithreaded. Way back when it did encode 2 mp3/wma (one per core)at one time but it no longer does that.
http://www.zipgenius.com/ for zip as far as weather on not it uses more than one thread I really cant answer that as I cant find the answer, moreover a dual core is recommended.
http://www.dbpoweramp.com/ for encode. I am running unregistered and according to documentation unregistered is not multithreaded. Way back when it did encode 2 mp3/wma (one per core)at one time but it no longer does that.
My mistake, sorry, I remember it mentioned earlier on. :o
I did a checkup on them: zipgenuis is indeed single-threaded but dbpoweramp uses a core per track, so if you load up a 4 track encode, it will attribute each encode to a core thus you'll have all the cores used (in your situation, both).
I've ran out of net time for now (been reading/posting OCF after a few weeks). Detailed posting comes later for certain.
Meanwhile I'll leave you with another article you might enjoy (similar topic): http://www.madshrimps.be/?action=getarticle&number=1&artpage=3962&articID=909
Only major downside for us here is that his experimentation was aimed at OC apps primarily rather than real-world.
Archer
05-22-2009, 08:53 AM
Well I did not like the testing that was done but time considerations cause this and I don't blame maddshrimps or any tester for doing things this way as it gives a good generalization (system tuning is a personal experience any way. I just like to share my little adventures.). The tests that were done were great, as applies to the HW that was tested, as they show the marked improvement that can come with working with several aspects of a system.
carpo93
07-15-2009, 08:30 AM
nice thread
i voted for high clock
Archer
07-15-2009, 08:50 AM
Thanks there were a lot of hours put into this.
Archer
07-24-2009, 12:14 PM
[/URL]
[URL]http://www.tomshardware.com/picturestory/511-memory-scaling-ddr3.html (http://www.tomshardware.com/picturestory/511-memory-scaling-ddr3.html)
I found this to be interesting as it does a decent job of showing the lack of diffrences and reaches conclusions similar to my own.
That above article is really for those who run systems stock, not enthusiasts. By far well discussed and known by now is the fact that there is no point in high-speed RAM if you don't overclock the CPU-NB to at least 2400MHz because the internal Family 10h CPU-NB is bandwidth limited by anything above DDR2-800. The DCT won't let more BW cross in. THG is long lost in many ways, I wouldn't waste time basing anything from what they put out unless it's repeatedly corroborated by independent, reliable, external sources.
They left the CPU-NB at 2000MHz there which makes the whole tests futile for fast RAM and timings were too lax to count on that note either. There will be hardly any performance change above variation with such a restricted CPU-NB. The timings needed to be tested at a set CPU-NB ≥2400MHz are DDR2-1066 4-5-5, DDR2-1200 5-5-5, DDR2-1250 5-5-5, DDR3-800 5-5-5, 6-6-6, DDR3-1066 5-5-5, 6-6-6, 7-7-7, DDR3-1333 5-5-5, 6-6-6, 7-7-7 and DDR3-1600 6-6-6, 7-7-7, 8-8-8. The timings enthusiasts will be using and aiming for. In all of them the MRL must be kept the same. If not, the tests are invalid as a comparison.
After all that is done and dusted, the performance impact of RAM will still be very little unless the code has a large number of L3 misses (i.e. Winrar, UT3, Photoworxx). If you poll K10 behavior, you won't find many apps spilling past the L3. Without a memory dependency by the code, speed of the memory subsystem will be useless by way of remaining unused. Akin to having FTTx network structure to your home while you're only wired to use the traditional POTS copper pair from your CO.
Archer
07-24-2009, 10:34 PM
I will not debate this, because of course I agree in principal. Most AMD ocers these days get the BE procs and are impervious to the fact that a combo OC will yield more than a multi oc only. I also think that the tests that were done showed that the pipes can't be filled by current CPU's and my case is if you can't fill the river (bandwidth) then at least increase the flow (timings).
BTW, here's another AM3 DDR3 article exploring DRAM gains: http://www.legionhardware.com/document.php?id=845&p=0
AMDs IMC is saturated at 2.0GHz DDR3-1333, so you need higher IMC frequency to get any gain above it. They did well to keep timings, clocks, DCT mode and NB controlled although an encoding and rendering workload would have added the much needed additional value to that article.
Archer
10-12-2009, 10:28 AM
As it pertains to the OC I think that opening up the FSB and then using the multi on be CPU's is a real advantage to get that little extra kick out of your system.
http://img266.imageshack.us/img266/8137/fsbadvantage.jpg
Neuromancer
10-12-2009, 02:57 PM
Interesting. Would have thought that BUS clocking would be better across the board. But straight 2D benches (the everest CPU benches) show a decrease in performance, however slight.
Archer
10-12-2009, 06:06 PM
I would say that Everest was pretty flat IMHO. I think those tests are more cpu cache intensive whereas the Futuremark seem more practicle.
Archer, you're confusing me and probably yourself too :p
FSB? AMD only has the HT bus, I'm sure you know.
Opening FSB? AMD runs the HT Link bus and in both your test cases the AMD CPU is running at 2000MHz internal bus. There isn't a difference in that CPUs bus speed.
So the differences in those - by nature - exaggerative synthetics is more an application fluctuation/discrepency rather than anything else. 1-3% difference is marked as typical application variance even when all settings are fixed.
Archer
10-13-2009, 08:57 AM
Archer, you're confusing me and probably yourself too :p
FSB? AMD only has the HT bus, I'm sure you know.
Opening FSB? AMD runs the HT Link bus and in both your test cases the AMD CPU is running at 2000MHz internal bus. There isn't a difference in that CPUs bus speed.
So the differences in those - by nature - exaggerative synthetics is more an application fluctuation/discrepency rather than anything else. 1-3% difference is marked as typical application variance even when all settings are fixed.
How many reboots and reruns would it take to convince you that it is not random?
These tests were done on the Athlon II 620 which has no L3, when repeating the same tests with the same settings to the t on a PhII it makes no difference to a slight (less than .5%) negative impact. This seems to be saying that the metronome (base clock of 200) is tuned to the L3 and the only way to see the benefits are to allow for the rest of the CPU to ramp up with the Base clock, keeping the busses in tune with the caches.
100
101
102
103
Has the design changed since the K8 as far as how HT works, is it not separate from the memory bus, and only controlled by the same bus clock generator?
I see (saw) two separate buses, independent of each other, yet quite interdependent as to the CPU operation.
The timings are still relative to the base BUS are they not? And if so this would give the same effect as decreased memory timings relative to a CPU?
Found what I was looking for:
104
As can be seen above the memory is not on the HT bus though the same clock generator is the metronome for the entire CPU.
Red and Aqua below separation of I/O (HT) and Memory (IMC) pulled from AMD.
Even AMD separates them Yellow
AMD Phenom™ II Key Architectural Features
The industry's first true Quad core x86 processor
True quad-core and triple-core designed from the ground up for better communication between cores.
Benefit: Cores can communicate on die rather than on package for better performance
AMD64 with Direct Connect Architecture
Helps improve system performance and efficiency by directly connecting the processors, the memory controller, and the I/O to the CPU.
Designed to enable simultaneous 32- and 64-bit computing
Integrated Memory Controller
Benefits: Increases application performance by dramatically reducing memory latency
Scales memory bandwidth and performance to match compute needs
HyperTransport™ Technology provides up to 16.0GB/s peak bandwidth per processor—reducing I/O bottlenecks
Up to 37GB/s total delivered processor-to-system bandwidth (HyperTransport bus + memory bus)
AMD Balanced Smart Cache
Shared L3 cache (either 6MB or 4MB)
512K L2 cache per core
Benefit: Shortened access times to the highly accessed data for better performance.
AMD Wide Floating Point Accelerator
128-bit floating point unit (FPU)
High performance (128bit internal data path) floating point unit per core.
Benefit: Larger data paths and quicker floating point calculations for better performance.
HyperTransport™ Technology
One 16-bit link at up to 4000MT/s
Up to 8.0GB/s HyperTransport™ I/O bandwidth; Up to 16GB/s in HyperTransport Generation 3.0 mode
Up to 37GB/s total delivered processor-to-system bandwidth (HyperTransport bus + memory bus)
Benefit: Quick access times to system I/O for better performance.
Integrated DRAM Controller with AMD Memory Optimizer Technology
A high-bandwidth, low-latency integrated memory controller
Supports PC2-8500 (DDR2-1066); PC2-6400 (DDR2-800), PC2-5300 (DDR2-667), PC2-4200 (DDR2-533) or PC2-3200 (DDR2-400) SDRAM unbuffered DIMMs – AM2+
Support for unregistered DIMMs up to PC2 8500(DDR2-1066MHz) and PC3 10600 (DDR3-1333MHz) – AM3
Up to 17.1GB/s memory bandwidth for DDR2 and up to 21GB/s memory bandwidth for DDR3
Benefit: Quick access to system memory for better performance.
AMD Virtualization™ (AMD-V™) Technology With Rapid Virtualization Indexing
Silicon feature-set enhancements designed to improve the performance, reliability, and security of existing and future virtualization environments by allowing virtualized applications with direct and rapid access to their allocated memory.
Benefit: Helps virtualization software to run more securely and efficiently enabling a better experience when dealing with virtual systems
AMD PowerNow!™ Technology (Cool’n’Quiet™ Technology)
Enhanced power management features which automatically and instantaneously adjusts performance states and features based on processor performance requirements
For quieter operation and reduced power requirements
Benefit: Enables cooler and quieter platform designs by providing extremely efficient performance and energy usage.
AMD CoolCore™ Technology
Reduces processor energy consumption by turning off unused parts of the processor. For example, the memory controller can turn off the write logic when reading from memory, helping reduce system power.
Works automatically without the need for drivers or BIOS enablement.
Power can be switched on or off within a single clock cycle, saving energy with no impact to performance.
Benefit: Helps users get more efficient performance by dynamically activating or turning off parts of the processor.
Dual Dynamic Power Management™
Enables more granular power management capabilities to reduce processor energy consumption.
Separate power planes for cores and memory controller, for optimum power consumption and performance, creating more opportunities for power savings within the cores and memory controller.
Benefit: Helps improve platform efficiency by providing on demand memory performance while still allowing for decreased system power consumption
I am not posting this as an argument I just want clarification to my evident misuse of logic. The memory communication to the CPU is still in effect FSB (MemBus) and not HT.
Yep, that above is certainly right. HT bus is separate to the DRAM bus (different bytes per cycle, bus speeds, bus widths and purpose). The "DRAM" timings and settings we get options to choose in BIOS and in applications like AOD are a mixture of IMC/RAM timings. So you can tune both to improve performance. That's because there are 3 buses involved from CPU to IMC to DRAM.
I'm selling a Sempron 140 (which opens to X2 Regor) and a cut-down Athlon II 5000 (which opens to a full Deneb) because I've picked up a Regor 235e 45W last weekend which is what I want to end with in our AIO. Just waiting on motherboard support because they don't yet support the CPU since it isn't officially released till later this month. Once that is in, I can run some real-world tests to compare bus speeds (as in HT Ref and HT Link) if you want. Synthetics can be added although I refuse to use Vantage... :)
I'll have to tune it for daily running in terms of HT/IMC/RAM so I will have to check these things anyway. The Regor is very similar to your Propus. Both of these CPUs will be DRAM heavy and sensitive because they have very little onboard caches.
The only one place I currently know HT Link speed does make a big impact - which macci confirmed last year - is when you use an IGP, because that uses the HT bus to communicate with the CPU.
EDIT:
Just a quick qs if you don't mind - what's the max efficiency in 1M SuperPi you've managed with the Propus?
Efficiency = Clock × Time
Archer
10-16-2009, 05:02 PM
89700 @ 3.9 Do understand that it is about 2 seconds faster on an AM3 setup with fast ram.
83901.6 @ 3.6
Thanks...
I'm not interested in the DDR3 platform so that will be fine.
Just a note - Regor 235e undervolts extremely well. Stock 2.7GHz 1.325v 45W TDP, max undervolt stable is 2.7GHz 1.088v (which is 30W TDP).
And they've been delayed from Oct launch to Nov, again. All the upcoming batch of CPUs.
89700 @ 3.9 Do understand that it is about 2 seconds faster on an AM3 setup with fast ram.
83901.6 @ 3.6
Compared to Regor, that is really bad. A casual stock Regor is doing 72.2k, so your 3.6G would equal Regor 3.1G time.
Looks like my board is setting some awful timings... the second I fall below DDR2-800, MEM perf. drops to low levels much slower than K8s. The faster the MHz go, the higher the synthetic memory application perf.
DDR2-1066 5-5-5-15 in every working RAM benchmark is far ahead of DDR2-800 perf. no matter how tight I run it. Single and multi-threaded. Heck, even 2.25-2.36GHz NB ~DDR2-800 4-4-3-9 is ~2GB/s slower than DDR2-1066 5-5-5-15 2.0GHz NB and has higher access latency. STREAM, RMMT, RMMA, Everest, Sandra, you name it. That's the same as how Agena/Deneb worked so tentatively, it doesn't look like there is any change with these L3-less dies. I'll be running some application tests to check how these synthetic values fall over to real world as soon as I can find the component limits so there is no definitive conclusion yet.
Archer
10-21-2009, 12:58 PM
Regor has twice the L1, even comparing to the PhenomII @ lower clocks it is at a loss. I think it would be comparable to the brisbane at equal clocks.
Regor has twice the L1, even comparing to the PhenomII @ lower clocks it is at a loss. I think it would be comparable to the brisbane at equal clocks.
Regor has 512KB more L2 per Core than Propus/Deneb. Nothing else is different. It is the same everything else as Family 10h.
Per Core, it will be slightly faster than Propus due to 512KB extra cache but slower than Deneb due to missing a huge shared 5MB.
I am unclear about what you mean by comparison nor where there was a performance comparison. If we are talking about RAM speed/timings, then there is nothing incomparable between the same Core with slightly different cache amounts. Deneb 800/900 series are the only outliers because they have far more onboard cache. All of them have the same MEM performance at equal speeds, they have the same DRAM controller.
I will run your DDR2-667 settings as you gave above at 200HT Ref. and 250 HT Ref. Awaiting to see what the difference is and where it shows up.
Archer
10-21-2009, 01:49 PM
Regor has 512KB more L2 per Core than Propus/Deneb. Nothing else is different. It is the same everything else as Family 10h.
i AGREE AND IF i MADE IT SOUND DIFFERENT THAN THAT i AM SORRY.
Hit the caps lock and I aint taking the time to fix it sorry.
I am unclear about what you mean by comparison nor where there was a performance comparison.
:scratch head smile: WE NEED ONE
Oh, you have nothing to be sorry about :)
Caps doesn't matter either when not intentional, it's understandable.
I like your aim in the thread and your little testing. I'm curious so wanting to investigate a little into it. That's all.
Ended up killing the IMC by messing around. Will have to wait until I can pickup another chip. :o
Archer
10-24-2009, 05:28 AM
Ended up killing the IMC by messing around. Will have to wait until I can pickup another chip. :o
And for this there is master card:)
Ahh, I have a unique mastercard called work. :D
I'll just go in and pick-up whatever of 235e/240e they aren't using or haven't recorded. They'll have significant spares from orders like usual. Only limitation there is that I can't damage it as it'll be signed out for.
I knew something was wrong when the Everest latency (at your settings) jumped from 50ns to 110ns and DRAM B/W went down to 3.6GB/s... off for a night and it never started back up.
I've compared Deneb/Propus/Regor in reviews and you made a good point I had looked over before (so thanks). Deneb is only 100-200MHz faster than Regor but Propus/Rana takes a big hit in many applications due to the much smaller cache per Core. Your understanding was definitely right there. Just take a look here; Deneb, Heka, Callisto, Rana, Propus and Regor are compared: http://www.xbitlabs.com/articles/cpu/display/athlon-ii-x3-435_4.html
Kinda shows how too many of the benchmark applications are cache heavy.
Chew*
10-25-2009, 02:55 PM
Heh,
I tend to run tight timings and fast.........
that 800 4-4-4 stuff versus 1066 5-5-5 is for the birds.......
You really won't know till you start comparing 1150 4-4-4 to 1250+ 5-5-5
As for my old results..........
who needs 7-7-7 2000 ram to match my 5-5-5 14XXmhz latency.......
I think 6-6-5-16-22 90 NS @ 940 is fast enough and no KTE I have not backed off MRL.........that would be pointless.
Me personally I tune my ram quite a bit differently than most.....I rarely ever sacrifice timings...but I can also max the speed out to insane levels.
http://chew.ln2cooling.com/qdig-files/converted-images/XS%20low%20clock/lrg_3d%2005%20low%20clock%20challenge%203525mhz.jp g
Archer
10-25-2009, 03:12 PM
Thanks for the input:) I agree for the most part.
Chew*s perspective is good coming from and for someone who spends a lot of time benchmarking and tweaking subzero synthetics... thus maximising efficiency in them. I haven't seen anyone bench/test more AMD K10s subzero than him so he should have sound knowledge, especially through being around the two AMD guys who understand AMD CPU clocking the most. Sami was doing it with first silicon 6-7 months earlier than anyone. ;)
And that's also why it doesn't apply to what I'm talking about. Real world apps and daily settings as the OP talked about... nobody posting online I've seen run +1100 4-4-4 or +1250 5-5-5 with AMD on a daily setup. Extremely rare.
Talking DDR2, all 2008 and most of 2009, AMD CPUs had massive problems doing >1200 5-5-5, let alone stable. If I wasn't involved in the bitching about it between MB BIOS devs, I probably wouldn't know. Most kits completely failed; kits that do 1350-1400 5-5-5 on Intel P35/P45 fine.
I wouldn't be making comments unless I had tested this extensively. I've already tested real apps at 1257 5-5-5-15, 1100 4-4-4-10, 1066 4-5-5-15, 1000 4-4-4 1T down to 800 3-3-3 1T. Max I could do on both my Tracers at 2.2v. All the CPUs failed POST with +2.25v, 1 kit died. 1257 5-5-5-15 always won, although the differences were not big. Hence my reply earlier. But all these lay timing tweaks were quite pointless if the bigger timings calculated at every POST were loose. From the Family 10h BIOS and Kernel Developer's Guide:
The MaxRdLatency value determines when the node's memory controller can receive incoming data from the DCTs. Calculating MaxRdLatency consists of summing all the synchronous and asynchronous delays in the path from the processor to the DRAM and back at a given MEMCLK frequency.The kit Archer posted above can't do those timings/speeds you're talking about Chew*, I'm sure he would gun for the same settings if his kit could. These are low-end daily kits on heavily used systems. My HTPC kit is the same. A 2.1v 800 kit, $20. Max it can do is 1066 5-5-5-15 and around 860 4-4-4-12. Sure, we're talking small differences but the only setting that is going to apply to me is whatever lies in this range. Hardware limit.
Chew*
10-26-2009, 04:30 PM
KTE I have 2 kits that should and can do decent speeds reliably 24/7.
You might want to look into them.
My findings have been more so that the actual IC's must first agree with AMD chips before you even think of acchieving a damn nice ram clock. The other factor is it seems the higher end chips have the stronger IMC.
Case in point, elpida BBSE works horribly on AMD chips for whatever reason, on the other hand Elpida Hypers do extremely well.
There seems to be no rhyme or reason that I can put my finger on yet as the cause.
That said you might want to look into High bin Promos IC's for 5-5-5 1200+ 24/7 operation in 2x2g flavor.
As far as 4-4-4 I have yet to find a 2x2 variant capable but whatever micron d9 variant Cellshock was using in its 2x1g kit 1000 4-4-4 kit does extremely well on amd as well......it does rated under volted with no issues 24/7, at rated voltage 1150 4-4-4 is a breeze, even prime stable.
I have also found that timings versus speed is clock speed dependant.....for instance 5-5-5 1400+ is great until you start pushing really high. It is better suited for 4 gig cpu speeds....
Once you start running at 6 gig the tide changes and you must back off to c6 + 1700+.
Strong logical points. All I can say is that I agree with your points whole heartedly.
As far as 4-4-4 I have yet to find a 2x2 variant capable but whatever micron d9 variant Cellshock was using in its 2x1g kit 1000 4-4-4 kit does extremely well on amd as well......it does rated under volted with no issues 24/7, at rated voltage 1150 4-4-4 is a breeze, even prime stable.
I've tried quite a lot of DDR2 on Intel>AMD. More 2x1GB kits than 2x2GB but I didn't try any CS sticks since they have never been available here. The only sticks I've ran for a few months that did 1040 4-4-4-10 2T stable at 2.2v on AMD were my two '07 Micron D9 2x1GB Crucial kits. Long sold off now. Probably the best air kits I've seen. They were the ones I posted back then running WCG 1200 4-4-4 PL5 2.25v stable on low-end P35/P45. Max speed and my daily speed was 1350 5-5-5-15 2.25v. On air, they hated voltage. But even they failed above ~1257 5-5-5-15 2T and 1040 4-4-4-10 2T 2.2v on AMD, on +10 CPUs.
I didn't plan on running those MHz daily on AMD, quite honestly, because we killed and degraded so many CPUs, MBs and RAMs in testing that I gave up hope. Experimenting and testing things for MB MFGs. On a hiatus till my research work is over now. Usually a new arch is what interests me which sadly isn't till late 2011 for AMD.
I haven't found a 2x2GB doing such MHz on AMD either. Granted that as you said, the IMC is a major factor in what the RAM does but I've slacked in testing the 2x2GB kits I've used. Just ran easy settings on them.
Apart from running stock, on AMD, I haven't touched DDR3 clocking hence why I'm not speaking on it. DDR3 has much more potential for clocks and timings than DDR2.
Chew*
10-28-2009, 10:28 PM
Apart from running stock, on AMD, I haven't touched DDR3 clocking hence why I'm not speaking on it. DDR3 has much more potential for clocks and timings than DDR2.
DDR III certainly has potential but it can certainly be rather tricky as well. I've seen some rather odd anomalies where memtest is a breeze yet you go into windows and 32m locks up instantly.
Rather odd.
I have also noted that gigabyte for instance has a hardwall on air around 1680 for almost every set of ram i have tossed at it except for hypers.
As soon as you introduce a little cold to the cpu say 0C ( it doesn't need much ) all of a sudden the ram/board clock like a champ.
If I could figure this out and could have added a setting I would have had gigabyte do it already however after talking with quite a few "VIP" tech guys there seems to be no setting we can truly implement.
I notice a few other boards OC a tad better Ram wise but I have done comparisons and I would say if it looks like some "stuff" was backed off it probably was ;)
Be wary of magical bios's that all of a sudden allow high ram clocks.
I have also noticed a trend between week and certain steppings, for the most part AMD is making improvements slowly but surely.
Just a qs - have you tried the OCZ3P2000EB2GK 2x1GB 2000 9-8-8 1.8v kit on AMD, and if so how did it do?
I have a pair lying around and am thinking of moving the AMD setup to DDR3 with it. ICs on it are HFC0s. TBH, I'd want +1500 with tight timings if I moved the setup to DDR3.
Chew*
10-29-2009, 01:35 PM
Just a qs - have you tried the OCZ3P2000EB2GK 2x1GB 2000 9-8-8 1.8v kit on AMD, and if so how did it do?
I have a pair lying around and am thinking of moving the AMD setup to DDR3 with it. ICs on it are HFC0s. TBH, I'd want +1500 with tight timings if I moved the setup to DDR3.
I haven't tested those particular OCZ's however we did use samsung based HFC0's by kingston in 2x1g.
On an undiguised ES chip prior to 955 lauch we were doing 1800 7-7-7 at somewhat high volts when cpu was cold.
We could not get them to do 1800 on a retail 955....we have not tested them in a while, they might do 1800 now on 965's but i'm guessing 1760 ish will be there limit.
IIRC they were not fond of tight timings at 1600, never dropped divider to test 1333.
I haven't tested those particular OCZ's however we did use samsung based HFC0's by kingston in 2x1g.
...
IIRC they were not fond of tight timings at 1600, never dropped divider to test 1333.
Thanks.
On my dead 235e, the IMC wasn't good, 1.25v 2.37GHz wasn't stable. But on this new OEM 235e, the IMC is pretty good so maybe the DDR3 kits will tune well. This is max at 1.25v BIOS boot: http://picasaweb.google.co.uk/lh/photo/fJboxOQ2hUtI92G1LfL1hw?feat=directlink
2.98GHz in OS crashed system, never bothered trying again. New BIOS I've just been mailed allows 1.325VID max on NB so might try it sometime this week.
That's an IGP HTPC system, 45W 2.7GHz CPU. All passive, no fans, mini case, $35 quick check. Clocks to max 3.84GHz/2.96GHz/296HT really easy at stock volts, BIOS boot [unstable under Linpack though].
Stable, I'm unsure. Highest I've tested stable is 1.250v 2.56GHz NB but that was on my K8 4850e board > http://picasaweb.google.co.uk/lh/photo/6OLyW5uH7_c-cZwv0aTWRw?feat=directlink
New CPU-Tweaker is coming with a fixed option I've asked for so tuning will be much easier soon.
Odd, but this is the first time I can say this; my 800 4-4-4 2.1v XMS2 kit that failed on many Intel CPUs/MBs to do more than 860 5-5-5 2.2v is easily able to do 1100 5-5-5-15 2.1v stable with that AMD CPU. Usually it's always been the opposite way round. :up:
Chew*
10-31-2009, 12:00 AM
This board / cpu / memory combo is very interesting. This is by far the highest my OCZ AMD kit has gone and with nothing set manually but the primaries in cpu-z mem tab.:eek:
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_prime%20x4%20620%203.5g.JPG
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_x4%20620%20ram%20clocking.JPG
That RAM setting looks exactly like what I'm after. :D
What vDIMM and vNB does that require (for the 2.7GHz NB shot)?
Chew*
10-31-2009, 12:35 AM
hmmm I think I was 1.35 V_NB but this board is a micro ATX and has wicked droop, might need less on other boards.
Ram I set to 1.68 or 1.69 to offset droop.
Chew*
11-01-2009, 12:28 AM
One more for you KTE, a little higher ram, a little less NB, seems the cpu is a bottleneck so the NB at 2700 is not helping.
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_prime%203.6%20x4%20620.JPG
Seems in synthetics the higher nb lower ram cpu is not as fast.......except for everest.
These results were both done at the above prime stable shots.
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_x4%20620%20asus%20micro%20atx%203.5g.JPG
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_x4%20620%20asus%20micro%20atx%203.6g.JPG
Think i'm gonna drop my hypers in to do a little testing, I think with the L3 missing the tight timing high ram clocks might really benefit these chips.
Great data. Keep it coming...
Stuck with a little BIOS bug until Taiwan gets back to me. Probably not before you've posted some C3 results.
Think i'm gonna drop my hypers in to do a little testing, I think with the L3 missing the tight timing high ram clocks might really benefit these chips.
You picked my mind :)
That's exactly the case and exactly why I'm interested. There's no 6MB L3 with these chips, so all that 6MB of data is now directly accessing RAM.
Question in point: L3 has what, 5ns latency and 8GB/s r/w at stock. Deneb/Callisto are only faster than Propus/Regor because of the L3. On average 10% faster: http://www.xbitlabs.com/articles/cpu/display/athlon-ii-x4-630_4.html#sect0
Is it possible for us to close that performance gap between Deneb/Callisto and Propus/Regor by clocking and tuning NB/RAM?
Keeping in mind that Propus/Regor are about half price (if not less) of Deneb/Callisto too.
Above there, your settings are already 40ns DRAM latency instead of 55-65ns that's normal for Propus and your DRAM BW is higher than a stock Deneb L3 gets. The only thing different for an application accessing that DRAM and L3 is the access latency. We're still 8x slower for DRAM.
Archer, O Brother, where art thou?
Archer
11-01-2009, 12:04 PM
Going to do some testing to see I suspect that the NB/IMC will have moer of a profound effect on these than the PhII.
Archer
11-01-2009, 08:28 PM
@ chew* can you do testing with a 400 HT ref with various NB speeds doing as direct a comparison as you can to base 200 with all other settings the same? I did some limited testing with this and I think with the proper combination it may prove a theory of mine.
From my tests comparing the PhII to the AthII the PhII was hampered by settings that had an overall 2% improvement on the Athlon.
I think with a 300+ HT ref and good timings we may see as much as a 4% increase in performance.
Chew*
11-02-2009, 08:25 PM
OK here's where your theory gets a little shaky.....
If you are only setting the primary 6 timings + CR and letting the board choose auto...... then a lower divider is going to AUTO pick tighter timings becasue it thinks you have lower speed ram...
If you go 100% full manual like I have tested on my asus boards it makes not one IOTA of difference ;)
NB performance is directly correlated to memory and cpu speed.....if your ram is not fast enough and cpu is not fast enough then you run into a wall where more NB has little if any gains....
Seems someone has been doing something far more intersting on the bot though and benching in single channel where it appears ram makes little to no impact......in 3 certain benchmarks.
Archer
11-02-2009, 08:38 PM
The only way to speed up the Athlon per cycle is to increase everything else (memory and NB). I have noticed more of an impact on the Athlon than the Phenom from memory and memory interface setting.
Chew*
11-02-2009, 09:01 PM
Ok KTE i did not have much time as I hosed one of my hard drives ( refuses to pass a full format now :( ) but i did a quick format on it and ran SP1.....this is a handicap in a big way.....sp3 is much faster but i'm pressed for time so here goes.
Wprime is useless to compare to in sp1.....its really slow in sp1.
This definitely needs further testing with the ram "dialed" in and on a fresh sp3 to be apples to apples but I think you can see the NB clock was unleashed by the ram speed.
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_athlon%20ram%20compare.JPG
Archer
11-02-2009, 10:17 PM
Been playing with sub timings and have a 2 sec improvement in Pi.:) with a NB 2769. I will keep working with it and I will see what I can get out of this.
Those are some great numbers chew*
NB performance is directly correlated to memory and cpu speed.....if your ram is not fast enough and cpu is not fast enough then you run into a wall where more NB has little if any gains...
I found very similar data (sub 3.85GHz CPU).
I can't post all the data here but I'll upload it somewhere soon... more than 70 screenies that I haven't sorted out either.
Basically, I can clock the NB upto 2.96GHz BIOS boot at low RAM clocks, 1.30v NB, but any higher than 780 4-4-4-12 fails to POST. This is on RAM that can do 1150 5-5-5-15 2.1v on the same CPU. That trend continues until 2.66GHz NB.
At low RAM clocks (=high MRL), 2.66-2.96GHz NB is easy but at high RAM clocks/tight timings (=low MRL), above >2.66GHz NB won't even POST.
NB efficiency with low RAM clocks and/or loose timings is very low. The higher the MEM clocks and tighter the timings, the better the NB efficiency gets. My 235e runs out of steam at 2.65GHz NB. Any more MHz and I get huge performance drops because it requires more voltage/cooling (that I'm not giving it) for a high DRAM clock and tighter MRL to give it good continued performance scaling. Natural variance, every chip and combo will differ in where that point is. My Regor IMC happens to clock a lot better than most at similar voltage/temps.
How that makes sense with what the architecture engineers wrote is, tighter MEM timings and higher MEM clocks = tighter MRL. And that is difficult to do above 2.6GHz NB. In lay words, MRL calc takes more than a page to write, is reran for every CPU+NB+DRAM settings combo and most know it as DRAM Training at POST.
So basically, NB clocks alone are very useless, just like RAM clocks alone are useless. The setting Chew* has posted with his Propus+DDR3 is the top-end of efficiency, meaning it will be very difficult to do on most setups with that cooling/voltage. Don't be fooled by his Unganged shots either... Everest is 700-1000MB/s slower than in Ganged so his DRAM perf. in his last screeny is near 13GB/s at 32ns (really tight). ;)
I'm glad someone is posting some well tuned settings without comprimising another component. 1800 6-6-6 1T 2.7GHz NB, good job :beerchug:
Oh, and it's easy to catch out a low performing NB/DRAM combo. It's why NB or DRAM clock alone doesn't mean anything to me.
Coincidentally I've also been fighting a dead OS. It was my main XP install :( AOD has some weird problem crashing at first startup and then on every reboot. Brand new install. That's what ruined it. Hard to troubelshoot something that leaves no meaningful error log.
I've tested Archers hypothesis of 200HT being slower than 250HT when all the rest is kept constant too, just have to post on it. Same settings as his exactly.
Chew*
11-03-2009, 09:42 AM
Finally got my hard drive sorted, what a pita......quick format, install OS, then disk check and repaired bad sectors......
Now a full format works. I have some pressing testing that I need to have done by midnight after that I will do some more testing in sp3 and at speeds that are "benchable" etc not prime stable.
This chip will pull 3000NB and 4000+ core on water for certain benches so should be interesting.
Archer
11-03-2009, 02:30 PM
I've tested Archers hypothesis of 200HT being slower than 250HT when all the rest is kept constant too, just have to post on it. Same settings as his exactly.
This will be of interest:) I must know if it is able to be reproduced on all Athlon II systems of if it is system specific.
I know the same settings on the PhII have a neutral to negative effect.
I have yet to compare the results or tabulate them. Only finished running the individual tests so I don't know how they turned out. Maybe one of them is slower or faster than the other. I don't like to skew my own perception when I know the theory behind the arch. :)
I will chart something else that I stumbled upon in the process, hopefully.
Archer
11-04-2009, 08:45 AM
Cool, I think if we can garter enough data, put it together, make a comprehensive analysis and find positive/negative results we might make up a good guide of what to and not to do:) put that together with a memory timings and drive strength guide then perhaps a blog on the front page.
Chew*
11-06-2009, 11:34 PM
Lets throw a third one into mix.....landing a decent combo with ddr II was a pita........
http://chew.ln2cooling.com/qdig-files/converted-images/X4%20620/lrg_untitled.JPG
Passing by, will check out all the posts later on in detail...
Exactly how MRL works and what it stands for even on AMD> http://anandtech.com/mb/showdoc.aspx?i=3671&p=6 ;)
(http://anandtech.com/mb/showdoc.aspx?i=3671&p=6)
Lets throw a third one into mix.....landing a decent combo with ddr II was a pita........
Looks as if moving between DDR3<->DDR2 confused you. You just did what we've been saying is the avoid route -> the high but loose NB = lower performance. :p
If you had looked at your previous tight 3510CPU+2700NB+900RAM DDR3 setup, you'd have been able to tell that route has higher efficiency. 3510CPU+2700NB+540RAM DDR2 would be about as fast as the 3625CPU+2900NB+484RAM 1M you just posted.
Look over these 7 pics when you have time, it shows how the NB/RAM scales. Tighter NB at lower NB clocks and higher DRAM clocks is faster than such combos: http://picasaweb.google.co.uk/KTEALIN/EfficiencyCPUNBMEMCombos# (http://picasaweb.google.co.uk/lh/photo/qMGeODigWdd86nd251oN3w?feat=directlink)
You might need to use the zoom button at 1 O'Clock for each image.
Chew*
11-11-2009, 09:33 AM
Nah it didn't confuse me, this is one of those particular boards that "loses" timing options as you drop divider.
That said my 1200 c5 sticks did not want to cooperate due to the forced lower timings, my c4 1000 sticks however did not mind.
I should have tossed a 2500 NB result in there but i think it was dropping me almost to 2400 which is why I opted for 2700NB on this run.
Actually I just noticed I was at 2900 NB, wtf was I thinking :poke:
Yeah, you can do better ;)
Looks as if moving between DDR3<->DDR2 confused you. You just did what we've been saying is the avoid route -> the high but loose NB = lower performance. :p
is this true for real world performance, or just for pi efficiency?
ive been able to try lower, but tighter nb (2471, 53MRL) and got lower read/write/copy and higher latency than my normal (2750, 58MRL) on this everest memory test. i haven't tested pi efficiency yet
vBulletin® v3.8.2, Copyright ©2000-2010, Jelsoft Enterprises Ltd.